Francis Tyers

Phonological rules

Rule interactions

You might be wondering at this point how we can do complex transformations if we can only work with changing a single symbol at once and have no concept of rule ordering. Let’s take a look at the Guaraní accentuation rules to get an idea of how rules can interact.

There are two types of suffixes in Guaraní: tonal (meaning that they bear stress) and atonal (that do not bear stress). If a suffix is atonal the stress should be moved to the previous morpheme.

For example locative marker -pe is atonal, therefore adding -pe to apyka “silla, chair” will result in apykápe and in the case of ava “gente, people”, it will result in avápe.

Let’s define two new sets of tonal and atonal vowels in twol file:

VowsTon = Á É Í Ó Ú Ý
          á é í ó ú ý ;

VowsAton = A E I O U Y
           a e i o u y ;

And define a new rule which changes unaccented vowel to accented before the atonal suffix.

"Change vowel to tonal before the atonal suffix"
Va:Vt <=> _ %>: %{m%}: e: ;
          where Va in VowsAton 
                Vt in VowsTon 
          matched;

The rule states: restrict the realisation of any atonal vowel in VowAton to appear on the surface as the tonal equivalent if the suffix -{m}e (-pe, -me) follows it.

Now check the correct forms for apyka and ava:

$ echo "apyka<n><loc>" | hfst-lookup -qp grn.gen.hfst
apyka<n><loc>	apykápe	0,000000

$ echo "ava<n><loc>" | hfst-lookup -qp grn.gen.hfst 
ava<n><loc>	avápe	0,000000

At the same if the last vowel of the stem is nasal as in irũ “amigo, friend” nothing changes as the tilde ~ symbol in Guaraní signifies stress as well as an acento agudo.

$ echo "irũ<n><loc>" | hfst-lookup -qp grn.gen.hfst
irũ<n><loc>	irũme	0,000000

¡Perfecto!

Now let us introduce a comparative suffix -icha. It is also atonal, thus we can add it to the lexc lexicon and simply define a new context in the previous twol rule.

Va:Vt <=> _ %>: [ %{m%}: e: | i: c: h: a: ] ;
          where Va in VowsAton 
                Vt in VowsTon 
          matched;

But what if at this point we have a noun akãvai “locura” that we want to add to our nominal lexicon? We will get something like that in the output:

akãvai<n><comp>	akãvaíicha	0,000000

Guaraní does not allow two vowels which are the same, therefore, we need to delete the first i that comes from the stem.

Let’s define a new rule that deletes -i before the comparative -icha:

"Delete ending -[i] before comparative -icha"
Vx:0 <=> _ %>: i: c: h: a: ;
            where Vx in ( ĩ i í ) ;

Oh no…. What happened? We have a rule conflict now!

Reading input from grn.twol.
Writing output to grn.twol.hfst.
Reading alphabet.
Reading sets.
Reading rules and compiling their contexts and centers.
There is a <=-rule conflict between "Change vowel to tonal before atonal suffix SUBCASE: Va=i Vt=í" and "Delete ending -[i] before comparative -icha SUBCASE: Vx=i".
E.g. in context __HFST_TWOLC_.#.:__HFST_TWOLC_.#. _ >: i:í c:c h:h a:á __HFST_TWOLC_.#.:__HFST_TWOLC_.#. 

Compiling rules.
Storing rules.
hfst-compose-intersect -1 grn.lexc.hfst -2 grn.twol.hfst -o grn.gen.hfst

This happened because we have contradictory rules: (1) tries to make the final i in the stem tonal and (2) tries to delete it in the same context of -icha suffix. Let’s take a look at the string pairs:

(1)
    a k ã v a i > i c h a
    a k ã v a í 0 i c h a

(2)
    a k ã v a i > i c h a
    a k ã v a 0 0 i c h a

We can confirm this by running the command hfst-pair-test. First let’s get the output from the morphotactic side of the transducer for the word we are interested in.

$ hfst-fst2strings grn.lexc.hfst | grep 'akãvai<n><comp>'
akãvai<n><comp>:akãvai>icha

We then write out the pairs that we expect to be valid,

$ echo "a k ã v a i:0 >:0 i c h a" | hfst-pair-test grn.twol.hfst 
Rule "Change vowel to tonal before the atonal suffix  fails:
#:0 a k ã v a HERE ---> i:0 >:0 i c h a #:0 

FAIL: a k ã v a i:0 >:0 i c h a REJECTED

Test failed.

The command shows that the rule "Change vowel to tonal before the atonal suffix" rejects the pair i:0 in the context a k ã v a i:0 >:0 i c h a, as it expects the pair to be i:í.

How to handle rule conflicts so that they do not conflict with each other? In this case we can reduce the set of VowsAton and VowsTon and delete i from set. But it could happen that later you will need full sets of tonal and atonal vowels, then we can define inline sets in the rule as in -i deletion rule:

Va:Vt <=> _ %>: [%{m%}: e: | i: c: h: a:] ;
            where Va in ( a e o u y )
                  Vt in ( á é ó ú ý ) matched ;

And introduce an additional accentuation rule for i in contexts other than -icha (suffix -pe in our case). At this point our three accentuation rules should look like this:

"Change vowel to tonal before atonal suffix"

Va:Vt <=> _ %>: [%{m%}: e: | i: c: h: a:] ;
          where Va in ( a e o u y )
                Vt in ( á é ó ú ý ) matched ;

"Delete ending -[i] before comparative -icha"
Vx:0 <=> _ %>: i: c: h: a: ;
            where Vx in ( ĩ i í ) ;

"Change i to tonal before atonal suffixes"
i:í <=>  _ %>: %{m%}: e: ;

And lets check that all the words analyse correctly:

$ echo "akãvai<n><comp>" | hfst-lookup -qp grn.gen.hfst
akãvai<n><comp>	akãvaicha	0,000000


$ echo "ava<n><comp>" | hfst-lookup -qp grn.gen.hfst
ava<n><comp>	aváicha	0,000000

Digraphs

Nasalisation is another phonological process that happens in Guaraní.

Compare apykakúera and irũnguéra where -kuéra is a plural suffix. Nasalisation always spreads from the stem to adjacent suffixes and prefixes, thus initial -k- of the suffix changes to digraph -ng-.

As far as -ng- is a digraph we cannot simply change k to ng in one step — recall that twol describes constraints over pairs of symbols —, thus we need to define two archiphonemes %{g%} which will appear as suface k or g and %{n%} which will appear as 0 or n on the surface.

Remember that these archiphonemes should appear in the Alphabet section of the grn.twol file.

Add new morpheme to the lexicon PL and don’t forget to define new archiphonemes in Multichar_Symbols as well as in the twol alphabet.

Your new lexicon will look like this:

LEXICON PL

%<pl%>:%>%{n%}%{g%}uéra CASE ;

This should be in your grn.lexc file.

Now let’s work with the grn.twol file and introduce a new set of nasals Nas to the Sets:

Nas = m n ñ ã ẽ ĩ õ ỹ ũ
      M N Ñ Ã Ẽ Ĩ Õ Ỹ Ũ ;

and add phonological rules into the Rules section:

"Surface [n] in plural nouns"
%{n%}:n <=> Nas: %>: _ ;

"Surface [g] in pl nouns after nasals"
%{g%}:g <=> %{n%}:n _ ;

The first rule says that %{n%} is n in the context of any nasal to the left, and the second rule says that %{g%} must be g if the preceding %{n%} is realised as n.

Thus we get:

$ echo "irũ<n><pl><nom>" | hfst-lookup grn.gen.hfst
> irũ<n><pl><nom>	irũnguéra	0,000000

$ echo "apyka<n><pl><nom>" | hfst-lookup grn.gen.hfst
> apyka<n><pl><nom>	apykakuéra	0,000000

Sets and operations on sets

All good… BUT we have a new problem now! Try to analyse the word óga “house”, you will get this:

echo "óga<n><pl><nom>" | hfst-lookup grn.gen.hfst

> óga<n><pl><nom>	ógakuéra	0,000000

The problem is a Guaraní word cannot bear two stress accents. As -kuéra is a tonal suffix we should remove acento from the previous accented vowel i.e. ó in order to receive ogakuéra.

Define a new rule that changes tonal vowels to atonal. The rule is a bit more complicated than the previous rules that we had:

"Change tonal vowel to atonal if tonal in affix"
Vt:Va <=>  _ [ ? - VowsTon]+  VowsTon: ;
          where Vt in ( á é ó ú ý ) 
                Va in ( a e o u y ) 
          matched;

Symbol definitions:

?: matches any symbol pair in the alphabet;
+: matches one or more occurences.
-: subtraction on sets
- For example, if we have the set A = {a, b, c, d} and the set B = {a, b} then A - B = {c, d}

Thus the expression [ ? - VowsTon]+ we have just written means: Match one or more occurences of any symbol pair in the alphabet except the symbol pairs in the set VowsTon. If any vowel from VowsTon set follows this expression Vt will be changed to Va.

Now test it!

echo "óga<n><pl><nom>" | hfst-lookup grn.gen.hfst

> óga<n><pl><nom>	ogakuéra	0,000000

WIN!