[Egyptian] On "A system of control characters for Ancient Egyptian hieroglyphic text" (2016-07-23)

Tue Jul 26 12:02:54 BST 2016

Hello everyone,

If I may jump in with a little consideration about the evidence, taking
into account that as said I didn't know any of you before the meeting in
Cambridge, so it could be that you have already discussed that in other
contexts - in that case just ignore my comment, but at the same time
perhaps in that case it could be useful to put this material available
online somewhere (a website? a shared drive? something) or to share a link
if these data are already online.

I think (but again, it is just my feeling, to take as a possible way to try
to solve the dispute, not as "truth") that part of the problem could be
that the kind "evidence" you IT people and you Egyptologists are talking
about is conceptually slightly different. In particular it seems to me that
egyptologists here tend to think in a qualitative way, while IT people tend
to think in a quantitate way - or at least need quantitative data.

For instance, the IT people want to have evidence for the need of 5 level
recursivity in egyptian groups (just a random example).
I am sure that egyptologists will be able to find an example of actual text
with an actual group with a 5 level recursivity. And I have the feeling
that for them this will be enough as "evidence".

>From an IT point of view (and a Unicode point of view), however, this is
not enough. The point is not only to see if some feature is attested, but
also how frequent it is.

5 level recursivity is attested in egyptian writing, good. But how often
does it occur in texts, in the whole corpus of egyptian texts? Is it
something that happens every three words? is it something that happens only
in specific texts in specific periods (like e.g. 5% of the the total), and
in those texts it is very frequent, or is it something attested only a few
times among all the egyptian texts (from the old kingdom to the roman
period)?

Because if something is very frequent, then for instance it makes sense to
consider to code in uncode. If however something is quite rare, then other
solutions (e.g. HLP) could be more efficient to deal with it.

Giving evidence that something is attested, without giving quantitative
data about its frequency/importance/relevance is probably not enough.

And note that this is not a problem only for the encoding of Egyptian: this
is a very common problem that exists for the encoding of major living
languages and scripts used by hundreds of millions of people.
Just a very simple example (it is an example about ligatures, but it would
be the same with control characters): devanagare scripts, the writing
system of hindi, sanskrit and other indian languages, is essentially
syllabic (with consent(s) + vowel structure, so only open syllables), and
uses ligatures to represent syllables with consonant clusters.
in other words: in devanagare, "ma", "ta", "ka" are coded as single
characters. While syllables like "tma" or "kta" are rendered with ligatures
as single characters. Now indian languages have hundreds of possible
consonantal clusters, with 2, 3, 4 (and potentially more) consonants
clustered togheter. If in addition you consider that devanagare not only
was used for centuries for sanskrit and prakrit and other ancient
languages, but is a script still used today for modern, living languages
(which therefore could need, for instance, to transcribe words from other
languages with even more complex consonantal clusters that are not attested
in indian languages, like e.g. if you want to transcribe in devanagare the
icelandic word "islansklukkur" - nsklu does not sound like a cluster that
could be "native" in indian languages - or just consider that in nepal, for
instance, devanagare is used to transcribe dozens of different languages,
many of which are not even indoeuropen and therefore have very specific
phonologies and clusters), the number of possible ligatures that could
exists that are attested and for which one could indeed supply some form of
evidence would be very huge. Thinking to code all of these combinations
just because they are attested (or could be needed) would not be very
practical and in fact is not what happens in "unicode reality".
What happens is that only the most common and most frequent ligatures are
encoded and specifically dealt with. For all the other less common
consonant clusters, other strategies are adopted. For instance, if you want
to display a complex/rare ligature that is not part of the standard set,
you just use a diacritic that will indicate that the signs bearing it have
to be read and understood as ligated with the following one, although in
practice they are not displayed as such.

And again, we are talking about devanagare, which is a purely phonetic
writing system used by hundreds of millions of people.

So again, to go back to Egyptian: showing that a given graphic
phenomenon/combination/etc is attested, is not enough to claim that a given
feature is need and need to be encoded (now, perhaps in the future). Its
frequency and relative importance have also to be shown.

Now again, i repeat: i don't know if this has already be done. if so,
ignore my comments above, but at the same time it would be interesting and
useful to have access to such data.
So far however, what I have just seen "samples" of specific cases showing
that a given feature is attested, but no data about how common and frequent
such specific cases are.
So for instance (and believe me, and believe me nigel :-p it is not to open
again the discussion), in the case of the tw/wt. Ok, a good example of
ambiguity that can be interpreted as deriving at least in part from the
spatial distribution of signs. But this is one case. How many other cases
of similar ambiguities involving other signs, and how frequent and common
are they on the whole corpus of surviving egyptian texts?
And it is a serious question, because i haven't worked for 15 years on
egyptian texts, but still i have been dealing with egyptian texts for more
than 10 years now -gez..- and right now i can't think of any other really
common and really frequent similar case (in my period of competence at
least). I can think of exceptional examples perhaps, but nothing common and
systematic.

And this is important, because if this is a frequent feature recurring
often and with many different characters, then it would make sense to deal
with it at the unicode level, with control characters or whatever.

If instead is something that essentially happens only with the sign t and
w, and perhaps in a bunch of other random cases on the whole of the
egyptian literary history, then there could be more
practical/reasonable/efficient (chose the word you prefer) solutions, from
a IT/unicode perspective, to deal with them (such as for instance
introducing a t&w character, as they did in the basic unicode set).

So again, evidence that a given feature is attested is good, but it would
be better to have info also about its frequency, relevance etc.
And it these data are already available somewhere, then it could be useful
to share them (again?) here

Best :-)

Marwan

On Tue, Jul 26, 2016 at 11:51 AM, Stéphane polis <s.polis at ulg.ac.be> wrote:

>
> Sorry Stéphane you get the brunt, price you pay for being the first J.
>
>
> Hi Bob,
>
> There is nothing to be sorry about: a discussion is a discussion and you
> can have your opinion, based on your experience and I respect it (same with
> Marwan: as he said, there is nothing personal here, and I definitely
> agree).
>
> Now, if one wants to reach a consensus at some point, as Nigel was
> pleading it, it would be great not to distord what I said in order to
> perpetuate this ‘unbalanced relationship’ that leads to nothing (if I am to
> trust what you say as IT and font specialist, why would you systematically
> discard the evidence provided and the argument developed as regards the
> basic capabilities that would be important for us, i.e. the egyptologists
> working on electronic corpora and developing the textual resources for the
> filed, to have. Respecting you as a specialist of your field, when the
> reverse does not seem to hold, is not always an easy task I confess and
> requires some sangfroid.)
>
> Congratulations for being the first to (heartfully) express public support
> for a representation of MdC X1:R1 that uses  *3 control characters*.
>
> Certainly Unicode could handle it but I’m a little sceptical most
> Egyptologists could. I hope we’ll have this topic cleared up in the next
> few days otherwise I think we are going to have to draw more Egyptologists
> into the discussion.
>
>
> Why do you feel the need to distord my words (if not to antagonize the
> positions): I said that the syntax (number of control characters, etc.) had
> no importance for me whatsoever, leaving this up to specialists; what
> matters are the principles, and I quote ‘what you can effectively achieve’
> with these control characters. At the moment, these basic needs — and
> significant compromises and simplifications have been made, don’t you
> agree? — are only covered by Mark-Jan’s proposal. That’s it, nothing less,
> nothing more. If you have a better ‘technical’ solution for these
> capabilities, please go for it and submit them to the group, the way it’s
> done doesn’t matter to me at all!
>
> Incidentally have you ever attempted to edit text written in a Complex
> Script using a word processor? If you haven’t I suggest you do then perhaps
> think again about what you are putting your name to.
>
>
> No (depending on your definition of complex scripts obviously), and that’s
> why I trust people like you, So, Marwan and others to find acceptable
> solution for the edition, I fully accept that you’re the specialists and
> that it is not a simple issue. But we’re in the 21st century and you’re
> imaginative IT guys: I’m sure that the basic arrangement of signs that
> Egyptians could handle is not beyond you reach, even with Unicode! (Note,
> as I said several times: ancient Egyptian is not English, 5.000.000 words
> max., including Demotic, and very soon most of the texts will be available
> freely in MdC, so the encoding speed/efficiency that you repeatedly mention
> is almost of no concern for long texts that could be converted; and for
> short quotations, words, etc., I’m pretty confident that between 30 sec. a
> sentence with limited capabilities and 2 min. a sentence for a more
> adequate encoding, most Egyptologists would choose the second [again, I’m
> not representative of anyone but myself, that’s a general and subjective
> feeling; you can ask for a consultation using the IAE mailing list I
> suppose, but be ready for some actual craziness ;)].
>
> Now, let me return the argument: have you spent the last 15 years reading,
> teaching, publishing, encoding and annotating all sorts of hieratic and
> hieroglyphic texts from OK inscriptions down to Late Period rituals? If you
> haven’t I suggest you do and perhaps think again about what kind of
> encoding scheme you’re developing. That’s completely caricatural and
> ridiculous, isn’t it? You certainly know what John Locke said about this
> kind of argument from authority, and I would never use them as you do in
> any sort of debate. Up until now, egyptological evidence was systematically
> provided by us for any kind of principle advocated for. There might be
> other ways to deal with theses cases than with control characters in
> Unicode (ligatures in fonts, HLP, compound characters in Unicode, name it),
> fine, but please no ‘further evidence is needed’ for the basic capabilities
> we’re talking about.
>
> Reading what you’ve said here I suspect the underlying issue is you feel
> that if we don’t do everything now all is lost.
>
>
> No, my sole concern is that we do not go in one direction that will be
> problematic in the future.
> As such I will be really happy to see how your proposal can be expanded,
> e.g., for dealing with multiple levels of embedding: complex groups
> inserted in corners, several levels of embedding, etc. (you have all the
> data needed, right?). I’m simply not ready to buy a pig in a poke: we
> provide you with egyptological evidence, if you can do the same for the
> encoding scheme, perfect! My only concern is about long term evolution: I
> agree that having the equivalent of ‘:’ and ‘*’ in MdC would already be a
> big plus, I wrote it several times, but it should fit into the broader
> picture of a coherent scheme and not lead to some ‘bricolage’ on top of a
> (too) simple scheme.
>
> Furthermore, I get perfectly that Unicode can be expanded and that it’s
> not a one time thing, but my understanding (correct me if I’m wrong) is
> that withdrawing and/or revising substantially a scheme is not likely to
> happen at any point, right? (just because resources will be quickly
> created, and that you do not want to be incompatible, right?)
>
> You say you leave it to the specialists but then seem to want to ignore
> what we recommend! I can’t  think of anyone I’ve met with solid software
> experience who would see the syntax of  “A System of …” as a practical step
> forward – if you find one put them in touch with me I enjoy new experiences!
>
>
> Nope, again: I’m not ignoring what you recommend. I’m listening as
> carefully as I can (sorry for not being Bill Gates though) and, as I said,
> I’m open to any syntax guaranteeing coherent and adequate long-term
> extensions: simply show us how you’d proceed; evidence is needed both ways
> in well-balanced relationships. And, I hate to return the argument, but « I
> can’t think of anyone I’ve met with solid egyptological experience who
> would see the actual limitations of your scheme as a practical step forward
> for the Egyptology in the long run. » [and please take this as a joke, the
> argument from authority story.]
>
> In a nutshell: I’m happily supporting any proposal that implements the
> basic capabilities described in Mark-Jan’s document (because, really, no
> further evidence is needed), or even part of it, *as long as one can show
> how it can be developed satisfactorily in the future* (I’m not an IT guy
> and I’m eager to learn and also enjoy new experiences). You asked for
> egyptological evidence, you have them; the opposite should hold, no? Again,
> I’d rather not buy a pig in a poke, so just tell me how your scheme can be
> expanded, at which level, etc. for supporting the capabilities of
> Mark-Jan’s proposal, with a comprehensive description that even people like
> me could understand, and I’m quite confident that we should be able to
> produce some consensus document!
>
> Take care,
>
> Stéphane
>
>
> Regards,
> Bob
>
> *From:* Egyptian [mailto:egyptian-bounces at evertype.com
> <egyptian-bounces at evertype.com>] *On Behalf Of *Stéphane polis
> *Sent:* 25 July 2016 11:25
> *To:* Egyptian Hieroglyphs in the UCS <egyptian at evertype.com>
> *Subject:* Re: [Egyptian] On "A system of control characters for Ancient
> Egyptian hieroglyphic text" (2016-07-23)
>
> Hi all,
>
> If i may jump in one last time.
>
> As I said several times, I leave entirely up to the specialists all the
> issues concerning the syntax of the operators.
>
> What matters for me is what you can effectively achieve, and Mak-Jan’s
> proposal covers precisely what we minimally would like to have (which is
> why I support it heartfully).
> Now, if it really ends up to be too much for Unicode and that there is no
> way to make this happen there, but that you are confident that it can be
> handled at the level of HLPs, then can I ask a very naive question: what
> need is there for any control character in Unicode?
>
> After all, Marwan’s font with the ligatures seems to work quite well for
> basic purposes, so it can be considered as a good solution for some users
> esp. when combined with So’s input system.
> For other uses, we will need several types of grouping, groups inserted in
> groups, etc.: why would some bits end up in Unicode, while other would be
> up to HLP? Shouldn’t we try to have a coherent scheme and not something
> made of bits and pieces?
>
> A real (even if maybe naive) concern.
> Take care,
>
> Stéphane
>
>
>
> Le 24 juil. 2016 à 20:41, Bob Richmond <bobqq at live.co.uk> a écrit :
>
> Hi Mark-Jan
>
> We’ve been talking about plain text on and off for over 18 months so I was
> interested to read your L2/16-177 “new system” earlier this month and your
> latest update yesterday (
> https://mjn.host.cs.st-andrews.ac.uk/tmp/unicode2.pdf). Good to see your
> number of control characters now reduced, more consideration has been given
> to OpenType and some discussion in Cambridge has been factored in. I
> understand what you are trying to do and there are points I’d like to
> discuss when we have time. There are items you note that indeed need to be
> progressed and agreed (EMPTY, STACK CARTOUCHE …)
>
> However I am disappointed you have not taken on board the fundamental
> failing with the scheme you’ve been developing that makes it unsuitable for
> Unicode plain text consideration, however useful it may be for other
> purposes. I thought this was clear at Cambridge but apparently not.
>
> Unicode plain text hieroglyphic is exposed to a huge audience and the very
> first consideration is *don’t make simple things complicated*. Use of
> BEGIN/END for every group breaks that basic rule. Your scheme is
> unnecessarily complicated so fails at the outset. It’s a continuing
> distraction for the less technically minded in this group to continue to
> hold it up as a viable alternative to the current UTC recommendation. It is
> not.
>
> MdC X1:R1 in your scheme uses *3 control characters*. Quite honestly I
> find it hard to understand why anyone thinks this is possibly a good idea.
>  *If anyone reading this does support 3 over 1 please let me know your
> reasoning I’m actually quite curious why I’ve had to spend time on this.*
>
> What you’ve tried to do is build a theoretical model which describes
> cluster layout given a set of constraints and that’s all good as an
> academic exercise. I’d  be interested to see it tested against texts and
> data. All features of could be added in some way to the three control
> system as HLP or extra controls.
>
> So, there’s no need to throw your work away. Some parts apply to plain
> text implementation e.g. your description on making an MdC-like font.
> Plenty more you could re-purpose should you wish to continue to be involved
> in Unicode developments.
>
> What I suggest you do is consider how you might use elements of your
> scheme in a simple higher level protocol on top of plain text as it is at
> present (I’ll be circulating my considered view on adding to the 3 control
> set on Tuesday). Brackets in an HLP could be ok for rare cases. May be
> useful for TLA and Ramses.
>
> Meanwhile if you have any ideas on how you would like to see e.g. 4
> corners added to the existing proposal I’d be pleased to hear from your or
> anyone else on the topic.
>
> Regards,
> Bob
>
>
>
>
>
>
>
>
> _______________________________________________
> Egyptian mailing list
> Egyptian at evertype.com
> http://evertype.com/mailman/listinfo/egyptian_evertype.com
>
>
> _______________________________________________
> Egyptian mailing list
> Egyptian at evertype.com
> http://evertype.com/mailman/listinfo/egyptian_evertype.com
>
>
>
> _______________________________________________
> Egyptian mailing list
> Egyptian at evertype.com
> http://evertype.com/mailman/listinfo/egyptian_evertype.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://evertype.com/pipermail/egyptian_evertype.com/attachments/20160726/bc0d157a/attachment.htm>