Nov 19

The other week I worked on a project to “rehabilitate” two already-encoded letters that are badly specified, and which cause problems to people using Cyrillic in the UCS. Not problems just for the end user, but problems for implementers as well. The characters in question are U+0478 CYRILLIC CAPITAL LETTER UK, U+0479 CYRILLIC SMALL LETTER UK, U+047C CYRILLIC CAPITAL LETTER OMEGA WITH TITLO, U+047D CYRILLIC SMALL LETTER OMEGA WITH TITLO. The exciting story is found in this document.

My idea was to come up with practical solutions that will avoid ambiguity. On the other hand, theoretical perfection is something we don’t have the luxury for. We are doing damage control on bad choices made more than a decade ago! I am sure we would not have made those mistakes were we encoding Cyrillic for the first time today.

Today, I think we would have encoded a BROAD OMEGA and used diacritics for the beautiful omega or other things, and we would have encoded MONOGRAPH UK and left digraph UK to be encoded as a string of characters, Cyrillic о and у. Solution 2b and 3b in my document were attempts to achieve that situation, which would have been ideal, in my view.

The UTC was conservative on the side of stability, and more or less chose solutions 2a and 3a. (It’s not done till it’s published of course.) I had a concern that if they choose 2a, it will be possible to represent beautiful omega both as 047D and as BROAD OMEGA with two diacritics, and those will not be equivalent, which would cause ambiguity in text representation. (Of course, we have this now with OMEGA WITH TITLO, so the situation would not be worse than it is today.)

I thought that the case against 3a is a good deal stronger. A number of vendors are happy shipping monograph glyphs for 0479, and this poses no security issues. Looking at the Cyrillic fonts shipping with Windows XP, however, I found that all but one of them avoids encoding this character at all. My guess is that this is a question of security. So… we still have a problem here, since digraph UK can be represented by two letters, or (in principle) by this UK. I am thinking that the best solution for security’s sake is to recommend that the reference glyphs for 0479 are drawn with half-width letters, to distinguish it and make it unappealing to use the character at all. This is tantamount to deprecation—if everyone does this in their fonts, it would be a real solution.

5 Responses to “Rehabilitating two Cyrillic characters”

  1. Artemio Keidan says:

    Hi,
    a question I always wanted to ask you is this: how did they made to forget to encode in the Unicode standard the cyrillic ligature IA ia ? This is a real letter, not a variant, and it is quite difficult to write old slavonic and old russian texts without it.
    If you want, I could try to provide some documentation about this letter. For now look at this table (the tenth letter from bottom, only cyrillic, no glagolitic analogue.

    Best wishes,
    A.K.

  2. Michael Everson says:

    See N3194

  3. Anonymous says:

    Thank you very much for your answer, and for all this work you are doing for the rest of us!
    best wishes,
    Artemio Keidan

  4. Anonymous says:

    Dear Michael, I know that now Unicode Consorcium works with new Cyrillic Projekt. I found some Cyrillic signs, which there are not in Project. It is ligature dzh in “Sirjanisch” (Zyrian, Komi, Perm) alphabet by Carl Faulmann (we can see his book in books list of Glagolitic Standart), and this ligature as like as Dz and some other consonants with Grave.
    I have also sample (in one Ukrainian book) of using “double Grave” as Cyrillic Yot (i kratkoye) in Ukrainian Peresopnitsa Gospel of 16 century.
    I want to say and show it to anybody before Standart wil be ready.
    I tryed to contact with unicode.org, but my english is too bad to understand what they want before contacting :(
    May be I can send this samples to you or on any e-mail you would say?

    I’m multilingual font designer, ATypI country delegate for Ukraine – you can see my cv and some fonts at http://www.myfonts.com

    best regs

    Viktor Kharyk kharyk@gmx.de

  5. Michael Everson says:

    Viktor… can we find the DZH in anything other than Faulmann?

Leave a Reply

You must be logged in to post a comment.

preload preload preload