You might be wondering at this point how we can do complex transformations if we can only work with changing a single symbol at once and have no concept of rule ordering. Let’s take a look at the Guaraní accentuation rules to get an idea of how rules can interact.
There are two types of suffixes in Guaraní: tonal (meaning that they bear stress) and atonal (that do not bear stress). If a suffix is atonal the stress should be moved to the previous morpheme.
For example locative marker -pe is atonal, therefore adding -pe to apyka “silla, chair” will result in apykápe and in the case of ava “gente, people”, it will result in avápe.
Let’s define two new sets of tonal and atonal vowels in twol
file:
VowsTon = Á É Í Ó Ú Ý
á é í ó ú ý ;
VowsAton = A E I O U Y
a e i o u y ;
And define a new rule which changes unaccented vowel to accented before the atonal suffix.
"Change vowel to tonal before the atonal suffix"
Va:Vt <=> _ %>: %{m%}: e: ;
where Va in VowsAton
Vt in VowsTon
matched;
The rule states: restrict the realisation of any atonal vowel in VowAton
to appear on the surface as the tonal equivalent
if the suffix -{m}e
(-pe, -me) follows it.
Now check the correct forms for apyka and ava:
$ echo "apyka<n><loc>" | hfst-lookup -qp grn.gen.hfst
apyka<n><loc> apykápe 0,000000
$ echo "ava<n><loc>" | hfst-lookup -qp grn.gen.hfst
ava<n><loc> avápe 0,000000
At the same if the last vowel of the stem is nasal as in irũ “amigo, friend” nothing changes as the tilde ~
symbol in
Guaraní signifies stress as well as an acento agudo.
$ echo "irũ<n><loc>" | hfst-lookup -qp grn.gen.hfst
irũ<n><loc> irũme 0,000000
¡Perfecto!
Now let us introduce a comparative suffix -icha. It is also atonal, thus we can add it to the lexc
lexicon and
simply define a new context in the previous twol
rule.
Va:Vt <=> _ %>: [ %{m%}: e: | i: c: h: a: ] ;
where Va in VowsAton
Vt in VowsTon
matched;
But what if at this point we have a noun akãvai “locura” that we want to add to our nominal lexicon? We will get something like that in the output:
akãvai<n><comp> akãvaíicha 0,000000
Guaraní does not allow two vowels which are the same, therefore, we need to delete the first i that comes from the stem.
Let’s define a new rule that deletes -i before the comparative -icha:
"Delete ending -[i] before comparative -icha"
Vx:0 <=> _ %>: i: c: h: a: ;
where Vx in ( ĩ i í ) ;
Oh no…. What happened? We have a rule conflict now!
Reading input from grn.twol.
Writing output to grn.twol.hfst.
Reading alphabet.
Reading sets.
Reading rules and compiling their contexts and centers.
There is a <=-rule conflict between "Change vowel to tonal before atonal suffix SUBCASE: Va=i Vt=í" and "Delete ending -[i] before comparative -icha SUBCASE: Vx=i".
E.g. in context __HFST_TWOLC_.#.:__HFST_TWOLC_.#. _ >: i:í c:c h:h a:á __HFST_TWOLC_.#.:__HFST_TWOLC_.#.
Compiling rules.
Storing rules.
hfst-compose-intersect -1 grn.lexc.hfst -2 grn.twol.hfst -o grn.gen.hfst
This happened because we have contradictory rules: (1) tries to make the final i in the stem tonal and (2) tries to delete it in the same context of -icha suffix. Let’s take a look at the string pairs:
(1)
a k ã v a i > i c h a
a k ã v a í 0 i c h a
(2)
a k ã v a i > i c h a
a k ã v a 0 0 i c h a
We can confirm this by running the command hfst-pair-test
. First let’s get the output from the morphotactic side of the transducer for
the word we are interested in.
$ hfst-fst2strings grn.lexc.hfst | grep 'akãvai<n><comp>'
akãvai<n><comp>:akãvai>icha
We then write out the pairs that we expect to be valid,
$ echo "a k ã v a i:0 >:0 i c h a" | hfst-pair-test grn.twol.hfst
Rule "Change vowel to tonal before the atonal suffix fails:
#:0 a k ã v a HERE ---> i:0 >:0 i c h a #:0
FAIL: a k ã v a i:0 >:0 i c h a REJECTED
Test failed.
The command shows that the rule "Change vowel to tonal before the atonal suffix"
rejects the pair i:0
in the
context a k ã v a i:0 >:0 i c h a
, as it expects the pair to be i:í
.
How to handle rule conflicts so that they do not conflict with each other? In this case we can reduce the set of VowsAton
and VowsTon
and delete i from set. But it could happen that later you will need full sets of tonal and atonal vowels, then we can define inline sets in the rule as in -i deletion rule:
Va:Vt <=> _ %>: [%{m%}: e: | i: c: h: a:] ;
where Va in ( a e o u y )
Vt in ( á é ó ú ý ) matched ;
And introduce an additional accentuation rule for i in contexts other than -icha (suffix -pe in our case). At this point our three accentuation rules should look like this:
"Change vowel to tonal before atonal suffix"
Va:Vt <=> _ %>: [%{m%}: e: | i: c: h: a:] ;
where Va in ( a e o u y )
Vt in ( á é ó ú ý ) matched ;
"Delete ending -[i] before comparative -icha"
Vx:0 <=> _ %>: i: c: h: a: ;
where Vx in ( ĩ i í ) ;
"Change i to tonal before atonal suffixes"
i:í <=> _ %>: %{m%}: e: ;
And lets check that all the words analyse correctly:
$ echo "akãvai<n><comp>" | hfst-lookup -qp grn.gen.hfst
akãvai<n><comp> akãvaicha 0,000000
$ echo "ava<n><comp>" | hfst-lookup -qp grn.gen.hfst
ava<n><comp> aváicha 0,000000
Nasalisation is another phonological process that happens in Guaraní.
Compare apykakúera and irũnguéra where -kuéra is a plural suffix. Nasalisation always spreads from the stem to adjacent suffixes and prefixes, thus initial -k- of the suffix changes to digraph -ng-.
As far as -ng- is a digraph we cannot simply change k to ng in one step — recall that twol
describes constraints
over pairs of symbols —, thus we need to define two archiphonemes %{g%}
which will appear as suface k or g
and %{n%}
which will appear as 0
or n on the surface.
Remember that these archiphonemes should appear in the Alphabet
section of the grn.twol
file.
Add new morpheme to the lexicon PL
and don’t forget to define new archiphonemes in Multichar_Symbols
as well as
in the twol
alphabet.
Your new lexicon will look like this:
LEXICON PL
%<pl%>:%>%{n%}%{g%}uéra CASE ;
This should be in your grn.lexc
file.
Now let’s work with the grn.twol
file and introduce a new set of nasals Nas
to the Sets
:
Nas = m n ñ ã ẽ ĩ õ ỹ ũ
M N Ñ Ã Ẽ Ĩ Õ Ỹ Ũ ;
and add phonological rules into the Rules
section:
"Surface [n] in plural nouns"
%{n%}:n <=> Nas: %>: _ ;
"Surface [g] in pl nouns after nasals"
%{g%}:g <=> %{n%}:n _ ;
The first rule says that %{n%}
is n
in the context of any nasal to the left, and the second rule says that %{g%}
must be g
if the preceding %{n%}
is realised as n.
Thus we get:
$ echo "irũ<n><pl><nom>" | hfst-lookup grn.gen.hfst
> irũ<n><pl><nom> irũnguéra 0,000000
$ echo "apyka<n><pl><nom>" | hfst-lookup grn.gen.hfst
> apyka<n><pl><nom> apykakuéra 0,000000
All good… BUT we have a new problem now! Try to analyse the word óga “house”, you will get this:
echo "óga<n><pl><nom>" | hfst-lookup grn.gen.hfst
> óga<n><pl><nom> ógakuéra 0,000000
The problem is a Guaraní word cannot bear two stress accents. As -kuéra is a tonal suffix we should remove acento from the previous accented vowel i.e. ó in order to receive ogakuéra.
Define a new rule that changes tonal vowels to atonal. The rule is a bit more complicated than the previous rules that we had:
"Change tonal vowel to atonal if tonal in affix"
Vt:Va <=> _ [ ? - VowsTon]+ VowsTon: ;
where Vt in ( á é ó ú ý )
Va in ( a e o u y )
matched;
Symbol definitions:
?
: matches any symbol pair in the alphabet;+
: matches one or more occurences.-
: subtraction on sets
A = {a, b, c, d}
and the set B = {a, b}
then A - B = {c, d}
Thus the expression [ ? - VowsTon]+
we have just written means: Match one or more occurences of any symbol pair in the alphabet
except the symbol pairs in the set VowsTon
. If any vowel from VowsTon
set follows this expression Vt
will be changed to Va
.
Now test it!
echo "óga<n><pl><nom>" | hfst-lookup grn.gen.hfst
> óga<n><pl><nom> ogakuéra 0,000000
WIN!