From bobqq at live.co.uk Tue Jul 25 11:11:14 2017 From: bobqq at live.co.uk (Bob Richmond) Date: Tue, 25 Jul 2017 10:11:14 +0000 Subject: [Egyptian] UMdC - A Unicode Coding Manual for Egyptian? Message-ID: Hi All 1071 Hieroglyphs have been available in Unicode since version 5.2 (2009). Six formatting characters are now in the pipeline (since May). Eventually there will be more hieroglyphs and likely control characters too. The idea of defining a data file format ?UMdC? acknowledging Unicode was discussed at I&E 2006 and afterwards but the lack of Unicode availability in the standard and issues of application and system support made this seem a little premature. It seems to me the time is now ripe to revisit the topic. The basics of UMdC (as I see it) are as follows: 1. A well defined file type ?umdc? containing plain text and markup (capable of being edited in simple text editors such as Windows Notepad and HTML textarea blocks). 2. Guidance on subset usage in database records. 3. Basic plain text including the 1071 + 6 for Egyptian characters (plus e.g. transliteration formats). 4. Markup to deal with elements missing from Unicode such as hieroglyphs not in the 1071 set. 5. Optional markup to help with preparing data for use with other formats such as HTML/CSS and Office applications. 6. Optional markup to help with interoperability with MdC88 based data formats (including extensions such as JSesh). 7. Specification of font requirements needed for representation of UMdC data. So long as the markup system is sufficiently flexible (e.g. use of XML-like tags) version 1 of UMdC need not be overly featured and then additions can be made as need is proven. It should be possible to create a version 1 specification supported with basic tools in months not years. I expect I?m not the only person who has already done related work. Has anyone any points to make of what they would like to see in UMdC? Anyone like to get involved in defining the markup scheme? Thanks Bob Richmond -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.werning at topoi.org Tue Jul 25 11:55:13 2017 From: daniel.werning at topoi.org (Daniel Werning) Date: Tue, 25 Jul 2017 12:55:13 +0200 Subject: [Egyptian] UMdC - A Unicode Coding Manual for Egyptian? In-Reply-To: References: Message-ID: <4f43eca9-5cc0-2159-3e66-6e91f042dd46@topoi.org> Dear Bob, I am generally interested in joining the discussion. I have exercised some encoding in TEI XML based on the current state of Unicode encoding. See: http://rosettastone.hieroglyphic-texts.net/tei-xml/. The encoding of the arrangement will be easier with the six control characters in pipeline for Unicode. However, there are cases left of non-unicode signs and non-simple arrangements, which can -- nicely, I believe -- be encoded in TEI XML. Anyhow, I can imagine that I can contribute based on my experience. All the best, Daniel (Werning) -- _____________________________________________________________ Dr. Daniel A. Werning daniel.werning at topoi.org http://www.topoi.org/person/werning-daniel-a/ Exzellenzcluster Topoi Humboldt-Universit?t zu Berlin _____________________________________________________________ Am 25.07.2017 um 12:11 schrieb Bob Richmond: > Hi All > > 1071 Hieroglyphs have been available in Unicode since version 5.2 > (2009). Six formatting characters are now in the pipeline (since May). > Eventually there will be more hieroglyphs and likely control characters too. > > The idea of defining a data file format ?UMdC? acknowledging Unicode was > discussed at I&E 2006 and afterwards but the lack of Unicode > availability in the standard and issues of application and system > support made this seem a little premature. It seems to me the time is > now ripe to revisit the topic. > > The basics of UMdC (as I see it) are as follows: > > 1. A well defined file type ?umdc? containing plain text and markup > (capable of being edited in simple text editors such as Windows > Notepad and HTML textarea blocks). > 2. Guidance on subset usage in database records. > 3. Basic plain text including the 1071 + 6 for Egyptian characters > (plus e.g. transliteration formats). > 4. Markup to deal with elements missing from Unicode such as > hieroglyphs not in the 1071 set. > 5. Optional markup to help with preparing data for use with other > formats such as HTML/CSS and Office applications. > 6. Optional markup to help with interoperability with MdC88 based data > formats (including extensions such as JSesh). > 7. Specification of font requirements needed for representation of UMdC > data. > > So long as the markup system is sufficiently flexible (e.g. use of > XML-like tags) version 1 of UMdC need not be overly featured and then > additions can be made as need is proven. It should be possible to create > a version 1 specification supported with basic tools in months not years. > > I expect I?m not the only person who has already done related work. Has > anyone any points to make of what they would like to see in UMdC? Anyone > like to get involved in defining the markup scheme? > > Thanks > > Bob Richmond > > > > _______________________________________________ > Egyptian mailing list > Egyptian at evertype.com > http://evertype.com/mailman/listinfo/egyptian_evertype.com > From rosmord at gmail.com Wed Jul 26 06:47:39 2017 From: rosmord at gmail.com (Serge Rosmorduc) Date: Wed, 26 Jul 2017 07:47:39 +0200 Subject: [Egyptian] UMdC - A Unicode Coding Manual for Egyptian? In-Reply-To: References: <4f43eca9-5cc0-2159-3e66-6e91f042dd46@topoi.org> Message-ID: dear all, My two cents on the subject: a) If XML is used (and XML is needed for some uses), I believe we should use it for all structural information. We don't want to need to parse XML and some light MDC in the same file. I also believe that regularity and ease of processing are more important than convenience when manually writing XML. In particular, I would use an XML element for each sign (it's needed in some cases to attach properties to the sign, so let's use it in all cases). b) For simple texts, the MDC has the advantage of being somehow readable - and will be even more with Unicode. Both uses might be kept alongside each other. best regards Serge Le 25 juil. 2017 13:57, "Daniel Werning" a ?crit : Dear Bob, I am generally interested in joining the discussion. I have exercised some encoding in TEI XML based on the current state of Unicode encoding. See: http://rosettastone.hieroglyphic-texts.net/tei-xml/. The encoding of the arrangement will be easier with the six control characters in pipeline for Unicode. However, there are cases left of non-unicode signs and non-simple arrangements, which can -- nicely, I believe -- be encoded in TEI XML. Anyhow, I can imagine that I can contribute based on my experience. All the best, Daniel (Werning) -- _____________________________________________________________ Dr. Daniel A. Werning daniel.werning at topoi.org http://www.topoi.org/person/werning-daniel-a/ Exzellenzcluster Topoi Humboldt-Universit?t zu Berlin _____________________________________________________________ Am 25.07.2017 um 12:11 schrieb Bob Richmond: > Hi All > > 1071 Hieroglyphs have been available in Unicode since version 5.2 (2009). > Six formatting characters are now in the pipeline (since May). Eventually > there will be more hieroglyphs and likely control characters too. > > The idea of defining a data file format ?UMdC? acknowledging Unicode was > discussed at I&E 2006 and afterwards but the lack of Unicode availability > in the standard and issues of application and system support made this seem > a little premature. It seems to me the time is now ripe to revisit the > topic. > > The basics of UMdC (as I see it) are as follows: > > 1. A well defined file type ?umdc? containing plain text and markup > > (capable of being edited in simple text editors such as Windows > Notepad and HTML textarea blocks). > 2. Guidance on subset usage in database records. > 3. Basic plain text including the 1071 + 6 for Egyptian characters > (plus e.g. transliteration formats). > 4. Markup to deal with elements missing from Unicode such as > > hieroglyphs not in the 1071 set. > 5. Optional markup to help with preparing data for use with other > > formats such as HTML/CSS and Office applications. > 6. Optional markup to help with interoperability with MdC88 based data > > formats (including extensions such as JSesh). > 7. Specification of font requirements needed for representation of UMdC > > data. > > So long as the markup system is sufficiently flexible (e.g. use of > XML-like tags) version 1 of UMdC need not be overly featured and then > additions can be made as need is proven. It should be possible to create a > version 1 specification supported with basic tools in months not years. > > I expect I?m not the only person who has already done related work. Has > anyone any points to make of what they would like to see in UMdC? Anyone > like to get involved in defining the markup scheme? > > Thanks > > Bob Richmond > > > > _______________________________________________ > Egyptian mailing list > Egyptian at evertype.com > http://evertype.com/mailman/listinfo/egyptian_evertype.com > > _______________________________________________ Egyptian mailing list Egyptian at evertype.com http://evertype.com/mailman/listinfo/egyptian_evertype.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From bobqq at live.co.uk Wed Jul 26 14:43:34 2017 From: bobqq at live.co.uk (Bob Richmond) Date: Wed, 26 Jul 2017 13:43:34 +0000 Subject: [Egyptian] UMdC - A Unicode Coding Manual for Egyptian? In-Reply-To: References: <4f43eca9-5cc0-2159-3e66-6e91f042dd46@topoi.org> , Message-ID: All My error: I had meant to say 8 format controls in total, 6 are new ? the original 2 (VJ and HJ) were proposed over two years ago though still waiting to be released in a future version of the Unicode standard. One thing to be clear about. There are many data formats that can use Unicode hieroglyphic so UMdC does not have to do everything. For instance Daniel mentions TEI-XML an established system and there are many more. Daniel: The Rosetta data makes for an excellent example lets discuss this off list and get a TEI with new Unicode version together ? replacing items like ?? with ?:? aids readability considerably and its interesting to see TEI and putative UMdC side by side. Incidentally I?m writing some guidelines for hieroglyphic in Markdown for publishing docs on Github etc. Serge: 1. I envisage UMdC as a simple plain text markup system focussed on end user usability rather than an XML file format so more akin to Markdown than TEI-XML (but less complex fundamentally than either). However I agree that (for instance) a tag convention for signs missing in Unicode can be useful. The possible approaches can be discussed in detail. 2. I agree. Regards, Bob From: Serge Rosmorduc Sent: 26 July 2017 06:49 To: Egyptian Hieroglyphs in the UCS Subject: Re: [Egyptian] UMdC - A Unicode Coding Manual for Egyptian? dear all, My two cents on the subject: a) If XML is used (and XML is needed for some uses), I believe we should use it for all structural information. We don't want to need to parse XML and some light MDC in the same file. I also believe that regularity and ease of processing are more important than convenience when manually writing XML. In particular, I would use an XML element for each sign (it's needed in some cases to attach properties to the sign, so let's use it in all cases). b) For simple texts, the MDC has the advantage of being somehow readable - and will be even more with Unicode. Both uses might be kept alongside each other. best regards Serge Le 25 juil. 2017 13:57, "Daniel Werning" > a ?crit : Dear Bob, I am generally interested in joining the discussion. I have exercised some encoding in TEI XML based on the current state of Unicode encoding. See: http://rosettastone.hieroglyphic-texts.net/tei-xml/. The encoding of the arrangement will be easier with the six control characters in pipeline for Unicode. However, there are cases left of non-unicode signs and non-simple arrangements, which can -- nicely, I believe -- be encoded in TEI XML. Anyhow, I can imagine that I can contribute based on my experience. All the best, Daniel (Werning) -- _____________________________________________________________ Dr. Daniel A. Werning daniel.werning at topoi.org http://www.topoi.org/person/werning-daniel-a/ Exzellenzcluster Topoi Humboldt-Universit?t zu Berlin _____________________________________________________________ Am 25.07.2017 um 12:11 schrieb Bob Richmond: Hi All 1071 Hieroglyphs have been available in Unicode since version 5.2 (2009). Six formatting characters are now in the pipeline (since May). Eventually there will be more hieroglyphs and likely control characters too. The idea of defining a data file format ?UMdC? acknowledging Unicode was discussed at I&E 2006 and afterwards but the lack of Unicode availability in the standard and issues of application and system support made this seem a little premature. It seems to me the time is now ripe to revisit the topic. The basics of UMdC (as I see it) are as follows: 1. A well defined file type ?umdc? containing plain text and markup (capable of being edited in simple text editors such as Windows Notepad and HTML textarea blocks). 2. Guidance on subset usage in database records. 3. Basic plain text including the 1071 + 6 for Egyptian characters (plus e.g. transliteration formats). 4. Markup to deal with elements missing from Unicode such as hieroglyphs not in the 1071 set. 5. Optional markup to help with preparing data for use with other formats such as HTML/CSS and Office applications. 6. Optional markup to help with interoperability with MdC88 based data formats (including extensions such as JSesh). 7. Specification of font requirements needed for representation of UMdC data. So long as the markup system is sufficiently flexible (e.g. use of XML-like tags) version 1 of UMdC need not be overly featured and then additions can be made as need is proven. It should be possible to create a version 1 specification supported with basic tools in months not years. I expect I?m not the only person who has already done related work. Has anyone any points to make of what they would like to see in UMdC? Anyone like to get involved in defining the markup scheme? Thanks Bob Richmond _______________________________________________ Egyptian mailing list Egyptian at evertype.com http://evertype.com/mailman/listinfo/egyptian_evertype.com _______________________________________________ Egyptian mailing list Egyptian at evertype.com http://evertype.com/mailman/listinfo/egyptian_evertype.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From bobqq at live.co.uk Thu Jul 27 11:31:47 2017 From: bobqq at live.co.uk (Bob Richmond) Date: Thu, 27 Jul 2017 10:31:47 +0000 Subject: [Egyptian] UMdC - representation of hieroglyphs not available in Unicode Message-ID: A tag system I?ve been using for some time to represent hieroglyphs not currently available in Unicode is illustrated in the following examples: ? represents Hieroglyphica B1A as a variation of ? (Unicode B001). ? represents Hieroglyphica H10 but no obvious variation in Unicode so far. This scheme can be extended further to identify variations where identification of a sign is uncertain and so forth. Not all entries in the Hieroglyphic Catalogue are necessarily best represented as these simple variations, for example geometrical arrangements of two or more hieroglyphs. I can come back to this. What systems (if any) are others using, e.g. in the corpus projects TLA and Ramses? At the moment for UMdC all is open so suggestions of alternatives welcome. Alternatives need not necessarily involve tags. Once I have replies here this can be written up in more detail. I?ll take silence as approval! Thanks, Bob PS Daniel: Your TEI example uses notation like in this situation but this introduces the problem that there is no Unicode character for the item in the document ? can TEI notation be changed to use something like the variation approach? -------------- next part -------------- An HTML attachment was scrubbed... URL: From bobqq at live.co.uk Mon Jul 31 15:33:24 2017 From: bobqq at live.co.uk (Bob Richmond) Date: Mon, 31 Jul 2017 14:33:24 +0000 Subject: [Egyptian] UMdC - representation of hieroglyphs not available in Unicode In-Reply-To: <6d734cda-fcbb-c33e-229b-6b65f3c01633@bbaw.de> References: , <6d734cda-fcbb-c33e-229b-6b65f3c01633@bbaw.de> Message-ID: Thanks Simon. So far so good. I mentioned geometrical arrangements and what I?ve been using here is a distinct tag , for instance ?? represents Hieroglyphica D59 (actually exists in Unicode as D059 ? so not needed) ?? represents Hieroglyphica O70 This also seems compatible with your (and TEI?) element? if revised to .... Clearly my use of distinct tags for variants and geometrical arrangements could be folded into a single tag ... if attributes are used to distinguish semantics ? I chose distinct tags for readability and clarity of meaning. Perhaps I should use for glyph variant? Has anyone any opinions on use of tags for unencoded signs? Bob https://hieroglyphseverywhere.blogspot.co.uk/ From: Simon Schweitzer Sent: 27 July 2017 12:51 To: Egyptian Hieroglyphs in the UCS; Bob Richmond Subject: Re: [Egyptian] UMdC - representation of hieroglyphs not available in Unicode Dear Bob, Am 27.07.2017 um 12:31 schrieb Bob Richmond: > A tag system I?ve been using for some time to represent hieroglyphs not currently available in Unicode is illustrated in the following examples: > > ? represents Hieroglyphica B1A as a variation of ? (Unicode B001). > ? represents Hieroglyphica H10 but no obvious variation in Unicode so far. > > This scheme can be extended further to identify variations where identification of a sign is uncertain and so forth. > > Not all entries in the Hieroglyphic Catalogue are necessarily best represented as these simple variations, for example geometrical arrangements of two or more hieroglyphs. I can come back to this. > A nice approach! Thanks! > What systems (if any) are others using, e.g. in the corpus projects TLA and Ramses? > As Daniel, we use the element, e.g. . We can transform our element -> ? All the best, Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: