[Egyptian] Brackets in the TLA encoding

Sat Jul 23 17:10:00 BST 2016

Hi Mark-Jan

A mathematician or physicist will also tell you not to use Einsteins Equations when Newtons suffice. Especially while cycling down a hill at speed however well versed you are in tensor calculus. 

On a more serious note, have you organized the data from Stéphane  to map against your model with  4 corners yet? No point in us both doing the same thing.

Have a list of examples that you think need large levels of nesting in your model?

Have you any comment on group joiners?

You mention below "Using infix operators is really only justifiable if notation is meant for human consumption." Quite. Human and machine consumption is exactly what we are designing for. Text is about people not parsers.

Try opening my font description document in Word and insert/delete characters in hieroglyphic strings, see the control codes come and go. Think about what end controls in your model imply.

Bob

-----Original Message-----
From: Egyptian [mailto:egyptian-bounces at evertype.com] On Behalf Of Mark-Jan Nederhof
Sent: 23 July 2016 13:47
To: egyptian at evertype.com
Subject: Re: [Egyptian] Brackets in the TLA encoding

Hi Simon, Hi Stéphane, Hi All,

This is very helpful.

Physicists tell us that if you want to gather and use data, you need hypotheses first, or else you don't know what to look for. For me the relevant hypotheses are:
(1) The primitives we have in our current document allow description of most of the groups in an accurate enough way. Both 'most of' and 'accurate enough' are subjective of course. There is no escaping that.
(2) It would be quite difficult to reduce the expressive power before we would lose coverage. There is an implicit parameter, which is a limit on the depth of nesting, which I assume is 3. As also Simon confirmed once more, 2 is not enough, even for the most basic, run-of-the-mill classical
(horizontal) inscriptions.

As to (1), we have moved away quite considerably from descriptive power that is machine-interpretable. This was motivated by people finding the original encoding too complicated, and arguing that fonts would do a lot of fine-tuning anyway for particular choices of signs. Also, we don't really care about a sign being printed 0.5 mm too much to the left or to the right, as long as the user gets a rough idea of what the text looks like.

These arguments all sound reasonable, but realise two things:
* If even stupid machines don't know how to render an encoding roughly as it was intended, perhaps there is not enough information present for humans to know what was meant either.
* As stressed once more by Stéphane, the kinds of groups we are talking about are productive. We don't want to be manually fine-tuning the appearance of an unbounded number of groups, so some approximately correct automatic rendering would be quite useful.

I think we are still okay with the present version of the proposal, but we have moved a long way from existing routines that do the rendering in a deterministic, predictable manner, to needing lots more refinements to program code and the result being not quite well-defined.

As to both (1) and (2), the provided examples include quite a few cases of insertions and stacking, insertions into stacked groups, and even groups with insertions that are themselves inserted. So far I don't see either hypothesis refuted. 

I had to struggle quite a bit to get rid of prefix operators. As anyone with the slightest knowledge of formal languages knows, prefix or suffix operators are ideal for automatic processing, because the problem of ambiguity simply does not exist, whereas endless volumes of textbooks since the late 1950s have been written about the ambiguities caused by infix operators and how to solve them using principled or not so principled methods involving operator precedence and low-level hacks in shift-reduce parsers. Using infix operators is really only justifiable if notation is meant for human consumption. That is why I was very surprised to hear objections with the argument that font technology is too primitive to handle prefix operators.
If anything, I would have imagined that primitive tools would have a lot of difficulty with parsing in the presence of operator precedence and such. 
I implemented OpenType substitution rules that analysed bracket structures and prefix operators myself, and that works fine. It would be a nightmare for me to have to implement OpenType substitution rules in the presence of operator precedence. There may be something in the arguments people use that I don't understand. 

Anyway, one thing to look out for (I say this in particular to Simon, Stéphane and Serge, with whom this was discussed in detail in Cambridge), is that in the process of getting rid of prefix operators, and avoiding ambiguity, the following coverage was lost: 
it is not possible anymore to insert A into the top-left corner of B, and to insert the resulting group into the bottom-left corner of C. The same holds for the right corners. I have yet to see a group where this matters. It is still possible however to insert A into the bottom-left corner of B, and to insert the resulting group in the bottom-left corner of C. The same holds for two upper-left corners and the corresponding right corners. There are groups of these forms among the provided examples.

The problem of course is that inability to find certain structures in the corpora we happen to have at this very moment does not prove their non-existence. At best it means our encoding won't be too much lacking in terms of coverage.

Best regards,
Mark-Jan

On Thursday 21 Jul 2016 15:24:51 Simon Schweitzer wrote:
> Hi all,
> 
> @Stéphane: thank you for your .gly-files! In this mail, I want to add 
> some remarks concerning the subgroup topic.
> 
> As in Ramses, there are many encodings with "(" and ")" in the TLA. I 
> collected these encodings ans I want to present you my evaluation:
> * In some cases, the encoding is invalid, e.g. (F12-S29):D21, which 
> should be understood as F12*S29:D21.
> * Sometimes, the encoding of the brackets is superflous. There are 
> many cases of Hiero1:(Hiero2*Hiero3) The brackets are not necessary: 
> use
> Hiero1:Hiero2*Hiero3 !
> * But in many cases, the parsing without the brackets would be misleading:
> 1) There are many vertical groups in horizontal groups in vertical 
> groups. I list only 10 examples:
> N35:"⸮"*"?"*(W22:Z2) (Rezepte Papyrus Zagreb E-597-3, l. 1;
> ID:ABLN5PNQ2BBENE7LWO72KDRPPU)
> Aa1&D58:(X1:N35)*N25 (Stele Louvre C 284 ("Bentresch-Stele"), l. 22;
> ID:4VLZLA44UVGJZN22WIWP774LOQ)
> Aa13:S40*(X1:O49) (Stele Louvre C 284 ("Bentresch-Stele"), l. 21;
> ID:4VLZLA44UVGJZN22WIWP774LOQ)
> Aa15:W19*(X1:X1) (Harfnerlieder Text C, l. 2; 
> ID:H6Z5TORPQFFZXOU6CJODODZHYQ)
> D21:V28*(X1:B1) (〈Stele des Montuhotep (Cambridge E.9.1922)〉, l. C.3;
> ID:LQ7QIJTK7NFWTIFBS7R6AIYWGY)
> D21:V7*(W24:X1) (〈Stele des Montuhotep (Kairo CG 20539)〉, l. I.b.18;
> ID:ZOLMMIAB2NHV7PSOSOOLAHN64U)
> D35A:(X1:Z4A)*G37 (〈Stele des Antef (Louvre C 167 = E 3111)〉, l. C.1;
> ID:DWHZIO5ZCFBURLZ6G4T26YIP7U)
> D36:D21:N29*(X1:"⸮"*Z4*"?") (〈Stele des Antef (Glasgow D1922.13)〉, l. 
> A.3; ID:OIYODBZ74RHM7OPTR72OLCMJ3A)
> I10&I9:X2*(X4:Z2) (〈Stele des Antef (Glasgow D1922.13)〉, l. A.7;
> ID:OIYODBZ74RHM7OPTR72OLCMJ3A)
> K4:G1*(Z7:X1) (Sinuhe AOS, vs. 18; ID:RP2F6BGNDBAARDBNDHGFSFPIEM) As 
> you can see, this kind of grouping occurs in hieroglyphic and in 
> hieratic texts, and this feature is also attested in the "classical"
> period from the Middle Kingdom (the examples from the stela of 
> Montuhotep and Antef).
> 2) horizontal grouping of vertical groups in columns If the text is 
> written vertically, there are cases of horizontal groups of vertical 
> groups, e.g. in the Buch von der Himmelskuh 
> (ID:WHEOIX5P5ZAVVFU4BGO6OWASV4), M17*(Q3:N35), (X1:Z1)*I12,
> M17*S29*(A2:Z2) and so on.
> 
> Best regards,
> 
> Simon
> 
> _______________________________________________
> Egyptian mailing list
> Egyptian at evertype.com
> http://evertype.com/mailman/listinfo/egyptian_evertype.com
> 

_______________________________________________
Egyptian mailing list
Egyptian at evertype.com
http://evertype.com/mailman/listinfo/egyptian_evertype.com