[Egyptian] List of considerations

Mark-Jan Nederhof mn31 at st-andrews.ac.uk
Wed Jul 20 11:21:34 BST 2016


Dear All,

I'm afraid the momentum is lost if we wait any longer with resuming the discussion. 
So let me make an inventory about how I see things. This assumes familiarity with 
the issues in the last version of the proposal:

https://mjn.host.cs.st-andrews.ac.uk/tmp/unicode.pdf

as well as with the discussions in Cambridge, inside and outside the Fitzwilliam.

One thought is that we should make the encoding as simple as possible, but not
simpler. Another thought is that we need a systematic design, not a bunch of
individual control characters thrown together. This pertains both to
functionality (what can be expressed) and to syntax (how is it expressed). As
for functionality, the primitives should cover a natural range of what
Egyptologists want the encoding to express at a very minimum, recognizing
the need to sacrifice precision for simplicity. As to syntax, we should not lose 
sight of the bigger picture, or we might tie ourselves into a knot with operator 
precedence.

More on the functionality: our team (TLA/Ramses/St Andrews) had relatively few
qualms simplifying the semantics of the insertions, to allow an inserted group
to spill over to outside the bounding box of the 'big' sign. This makes it easier
to use, in particular removing the need of the EMPTY to artificially increase the 
size of the bounding box. (It makes the mapping to richer and more precise 
encodings outside Unicode more difficult, but this aside.) With the simplification, 
we could abandon the four insertions at the four sides, because then these are
basically horizontal and vertical grouping in combination with the "JOIN" from
our original proposal.  However, if we then drop the JOIN, it would be quite
odd to be able to have a primitive for "insert into a corner" but not for the
functionality of "insert into a side". If you look at typical uses of the & in
the past, then you see many can only be expressed as "insert into a side", or
alternatively horizontal/vertical grouping with JOIN.  So JOIN plus the four
insertions into the four corners forms a logical whole. 

In more detail, when we insert G into a corner of S: if G is small, it might fit into 
the bounding box of S; if G is big, it may extend to outside the bounding box; 
the extension would be to the left or right for signs with unit height (as for 
most birds), and to the top or bottom for signs that are less high (as for the tongue).

In the case of birds, insertion into the lower-left corner would no longer mean 
strictly in the corner, but normally just above the feet of the bird. Here we 
sacrifice precision for simplicity of use. Note that PLOTTEXT distinguished
insertion-into-lower-left-corner and insertion-above-the-bird's-feet.

An example of insert into a side is Hm-kA, with the Hm club half inside the
pair of raised arms. It would now be encoded as a vertical group of Hm 
and kA, with a JOIN in between.

There is only one case I can think of where insert-at-the-bottom is not very
well expressed in terms of vertical grouping with JOIN, and that is with
two X1 next to one another at the bottom within the bounding box of S22.
One could probably live with an approximation.

More on syntax: Suppose we have * and : and INSERT and perhaps STACK, 
all represented as infix operators, then the question is what
A * B INSERT C : D STACK E 
means. No one has yet provided a reasonable syntax with infix operators but
without brackets that disambiguates in a satisfactory manner. From my
understanding of the well-established academic discipline of the design of 
programming languages, I would assume such a syntax does not exist.

Personally I welcome the prospect to put some separating characters between 
signs within a horizontal/vertical group, because that simplifies OpenType 
substitution rules; it would also streamline the syntax of the JOIN with the 
'normal' way of horizontal/vertical grouping. So, some part of the notation 
could well be infix, but not having brackets leads us into the abyss of structural 
ambiguity. 

Mixed systems of operator precedence plus brackets to override operator precedence 
where needed are a bit old-fashioned, and are only helpful if we assume that the 
control characters would be actually typed one by one by the user, instead of
having a specialized graphical editor for hieroglyphic text that relieves the user
of having to worry about syntax.

To recapitulate what I wrote earlier, my proposal for representing
horizontal grouping would be exemplified by:
OPEN_HOR arg1 NORMAL_SEP arg2 JOIN_SEP arg3 CLOSE
where I combine a normal separator with the joining (fitting) one.
Here arg1 and arg2 could be single signs or vertical groups or insertions, etc.
An example of vertical grouping could be:
OPEN_VERT arg1 NORMAL_SEP arg2 CLOSE
Insertion could be:
OPEN_INSERT arg1 TOP_LEFT big_sign TOP_RIGHT arg2 BOTTOM_RIGHT arg3 CLOSE
which would mean insert arg1 into the top left corner of the big sign, insert
arg2 into the top right corner and insert arg3 into the bottom right corner.

Note: if we insert G into S, then S is usually a single sign, but not always. Consider
for example a superimposition (stacking) of P6 and D36, with N5 inserted into
the lower-left corner and Z1 inserted into the lower-right corner. So 'big_sign' above
could be a group as well. If for now we want to assume it is always a single sign,
that is fine, and we can drop the restriction some time in the future, when 
font technology has evolved. This should be a guiding principle in general:
We can put restrictions anywhere we want, motivated by limitations on today's
font technology, as long as it doesn't cause major problems 10 or 20 years 
down the line. Future generations will be grateful if we dare think a little
ahead.

Other issues:

* Richard has written quite a few interesting things about horizontal
'writing-mode' vs vertical 'writing-mode' in other scripts. I don't think the
matter has been exhaustively discussed for hieroglyphs. More about this later.

* Cartouches (enclosures/boxes): we need to have a suggestion that fits into the
design.  It is fine to postpone formal proposal of cartouches, but again, we
need design, not a bunch of loose characters thrown together. My proposal for
syntax would be exemplified by:
OPEN_CARTOUCHE first_group NORMAL_SEP second_group JOIN_SEP third_group CLOSE 
which would fit in with the proposed syntax of the other elements. 
The problem with reinterpreting the enclosure characters among
the existing 1071 Unicode signs is that the isolated hieratic open-cartouche
and close-cartouche are then missing. This would be a big problem. So, if we
reinterpret pairs of the existing characters to produce full-form enclosures, 
we need to at least add two more characters for transcription of hieratic.

* The EMPTY glyph: With the new semantics of the four insert-into-a-corner
primitives, the EMPTY is less urgent, but it is still very useful. Can we
take an existing EMPTY character from Unicode? It would be nice to pick a
specific one. This means one fewer character needs to be proposed, but it would
be good to mention in the proposal that using an EMPTY in place of
a hieroglyph is legitimate.

* Stacking. Almost all the Egyptologists wanted to have a stacking primitive,
while some UTC members were objecting on technical grounds, stacking being
difficult to implement dynamically. I think the discussion at the meeting
stranded because it was comparing apples and oranges (dynamic versus
precomposed), and opposition against stacking was based on the wrong
arguments. More about this later.

* Insert-into. This requires further investigation. Some observations:
- One (very) convincing example is wabt, with the leg-and-jug-of-flowing-water with
the feminine ending inside. 
- If D031 didn't already exist, how would one encode it? I think D032 with the Hm sign 
in a vertical group with JOIN is possible.
- N018A and N018B I would be tempted to see as atomic signs, unless
there are many similar combinations of N018 (or X004B) with other flat signs.
- To me O010C seems definitely D002 inserted into O018.
- This raises the question whether the notation of a box/enclosure for the
Hwt sign with something inside is appropriate, or whether 'insert-into' is
preferable. For a cartouche/serekh/castle-walls, the text inside can be quite long. 
Does that apply to Hwt as well? If the length of the text inside Hwt is 
always quite limited, it doesn't seem to be of the same nature as a cartouche.

I'm revising the proposal to match the above. Feedback sooner rather than later
would be helpful.

Best regards,
 Mark-Jan





More information about the Egyptian mailing list