[Egyptian] Stacking and syntax

Wed Jul 20 11:46:04 BST 2016

Dear all, 

More considerations about stacking and syntax (apologies to
Michel and Michael for overlap with another message I sent last week): 

(1) My understanding is that we need to work under the assumption
that the control characters do nothing complicated by themselves,
but substitution rules are used to map sequences of signs plus
control characters to precomposed groups, which are stored
as separate glyphs in the font. Is this correct?

If the above is correct, then there are some follow-up questions.

(2) Should we not worry about the 64K limit on the number of glyphs in OpenType?
It would be interesting to know how many different groups (not just ligatures) there 
are in Ramses?

(3) I wonder whether in the discussion about stacking (superimposed signs,
monograms) we were comparing apples and oranges. I proposed the stacking
operation could be done dynamically, which is consistent with experiments I've
done with OpenType, and which would require linearly many anchor points to be
stored in the font, if we restrict our attention to pairs of signs being
stacked (not three or more signs). In the easiest case each anchor point could
be the center point of the bounding box, and determining these could even be
automated, say by a Python script in FontForge. Michael says that this is not how
font designers like to do things, and they would still like to store
precomposed glyphs. Okay, let's take this as given. But then what is the
objection against stacking? That there would be too many precomposed glyph
combinations? 

If we assume that all stacked combinations are stored as glyphs in the font,
we would have N * N such glyphs. But how about normal groups with pure
horizontal and vertical grouping? If you similarly want to store these as
precomposed glyphs, and if we assume that such groups can have up to 4 glyphs,
you already need N * N * N * N combinations, which dwarfs the costs of
implementing stacking.

If we do _not_ assume all horizontal/vertical groups are precomposed, but only
the ones we have found in some corpus (the 'fallback assumption')
then obviously, we have much fewer glyphs to store than N * N * N * N.
But then why is it not acceptable to precompose only those stacked pairs that
are known from a corpus?

So if we compare apples and apples, so to speak, both normal horizontal/vertical
groups and stackings require excessive storage space. If we compare oranges
and oranges, both normal horizontal/vertical groups and stackings are feasible,
and stackings more so than horizontal/vertical groups. Do I see this wrong?

(4) Coming back to syntax in more detail than in previous message. The 
Richmond & Glass proposal had three characters, with & (ligatures) having tightest 
binding, then * (horizontal grouping), then : (vertical grouping).  But we need (very
limited) recursion of horizontal/vertical grouping, say up to three levels;
two is surely too little to handle perfectly ordinary Middle Egyptian
horizontal texts. And we need finer control characters, such as the INSERT.
This implies we quickly need many levels of operator precedence.

For example: 
"A INSERT_TOP_RIGHT B VERTICALGROUPING C" 
could mean:
" ( A INSERT_TOP_RIGHT B ) VERTICALGROUPING C " 
or
" A INSERT_TOP_RIGHT ( B VERTICALGROUPING C ) "

Both interpretations correspond to attested groups. If we try to disambiguate by
operator precedence, then either we need several copies of each control
character that differ in tightness of binding, leading to horrible complexity,
or we need brackets. In order to avoid the problems with operator precedence,
I chose an entirely different syntax for the draft proposal, under the motto:
if we need brackets, we might as well use them consistently in combination
with prefix operators, and get rid of infix operators altogether. A formal
grammar is in the draft proposal in the appendix, and you can see it is very 
simple and uniform.

Now, I do understand that the closer we stay to traditions of "ordinary"
writing systems, the better it is for adoption of our encoding. If including
infix operators is the only way to make the encoding palatable to font
designers, so be it. But can I just ask: If it is the case that
sequences of hieroglyphs and control characters are replaced by precomposed
single glyphs (question (1) above), then would the choice between
prefix or infix or postfix (reverse Polish notation) make any real
difference for actual realization in terms of feature files of fonts ?

Regards,
 Mark-Jan