[Eng] [DejaVu-fonts] New glyph for U+014A

Sjur Moshagen sjurnm at mac.com
Tue Nov 19 18:29:14 GMT 2013


Dear all,

8. nov. 2013 kl. 10:53 skrev Michael Everson <everson at evertype.com>:

> I agree that the N-form, not the n-form capital eng should be the default. It is historically the older form. 
> 
> The problem is that we ought to have disunified the two engs long ago, by adding a new AFRICAN ENG since it is some African languages which prefer the n-form capital. But our colleagues in SIL did not want to do this, and our colleagues in Samiland maintain that they have too much data using the current code point to change.

To the two points above:

«Historically the older form»:

Since this discussion only relates to the encoding of the ENG letter in its various forms, the question here is which intended use was first encoded in Unicode, and the character sets that the first Unicode encoding of ENG was built upon. This is what Trond and I have found:

ENG is defined in the block Unicode Latin Extended A containing European Latin letters, that is, letters from the eight-bit character sets Latin2-4. Here we find letters such as Sámi Đđ but not Icelandic Đð (which is part of Latin1, and as such defined in the block Latin1 Supplement).

If we look at the specification for Latin-4 (see a.o. http://std.dkuug.dk/jtc1/sc2/wg3/docs/n413.pdf), it clearly says that its intended coverage is (among others) Sámi: "It is informally referred to as Latin-4 or North European. It was designed to cover Estonian, Latvian, Lithuanian, Greenlandic, and Sami.»

Unicode has another block called Unicode Latin Extended B, covering latin-based letters for non-european languages. Here we find e.g. the character Ɖ Latin capital letter African D (with the lower-case variant ɖ).

That is, in the case of D-with-stroke there is a clear pattern:
* Icelandic is in the Latin1 Supplement block
* Sámi is in the Latin-all-but-1 block named Latin Extended A
* African languages are in the block named Latin Extended B

It is also clear that the Unicode editors did a proper job regarding the D-with-stroke character.

What does the Unicode specification say about capital ENG? «glyph may also have appearance of large form of the small letter» (http://www.unicode.org/charts/PDF/U0100.pdf). As most of us know by now, this was a big mistake.

What does this all tell us?

1. the original intention for the Unicode code point 014A Ŋ LATIN CAPITAL LETTER ENG was the Sámi letter ENG
2. the unicode editors did a proper job for the D-with-stroke letter
3. the unicode editors did NOT do their job regarding ENG, even though it is exactly parallel D-with-stroke

Conclusion: as has already been concluded by Michael, the only working solution is a split. And based on the history of Unicode and the preceding character sets it is based on, the present ENG should be restricted to the N-shaped capital (the Sámi ENG), and a new pair capital+lower case AFRICAN ENG should be added.

The sooner this can be done the better.

Best regards,
Sjur Moshagen & Trond Trosterud
University of Tromsø





More information about the Eng mailing list