ISO/IEC JTC1/SC2/WG3 COMMITTEE DRAFT
Date: 1996-06-19

Title: Proposal for a new part of ISO/IEC 8859: Latin alphabet No. 8 (Celtic)

Source: Michael Everson, Everson Gunn Teoranta (IE)
Status: Expert Contribution
Action: For consideration by JTC1/SC2/WG3


Latin alphabet No. 8 (Celtic)

COMMITTEE DRAFT INTERNATIONAL STANDARD
ISO/IEC 8859-14 (E)
Draft revision 1997-05-05

Information technology -
8-bit single-byte coded graphic character sets

Part 14:
Latin alphabet No. 8 (Celtic)


Contents

0 Introduction
1 Scope
2 Conformance
3 Normative references
4 Definitions
5 Notation, code table and character names
6 Specification of the coded character set
7 Identification of the character set

0 Introduction

ISO/IEC 8859 consists of several parts. Each part specifies a set of up to 191 graphic characters and the coded representation of these characters by means of a single 8-bit byte. Each set is intended for use for a particular group of languages.

1 Scope

This part of ISO/IEC 8859 specifies a set of 191 coded graphic characters identified as Latin alphabet No. 8 (Celtic).

This set of coded graphic characters is intended for use in data and text processing applications and may also be used for information interchange.

The set contains graphic characters used for general purpose applications in typical office environments in at least the following languages: Albanian, Basque, Breton, Catalan, Cornish, Danish, English, French (with a few restrictions, see Annex A), Galician, German, Greenlandic, Irish Gaelic, Italian, Latin, Luxemburgish, Manx Gaelic, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swedish, and Welsh.

This set of coded graphic characters may be regarded as a version of an 8-bit code according to ISO/IEC 2022 or ISO/IEC 4873 at level 1.

This part of ISO/IEC 8859 may not be used in conjunction with any other parts of ISO/IEC 8859. If coded characters from more than one part are to be used together, by means of code extension techniques, the equivalent coded character sets from ISO/IEC 10367 should be used instead within a version of ISO/IEC 4873 at level 2 or level 3.

The coded characters in this set may be used in conjunction with coded control functions selected from ISO/IEC 6429. However, control functions are not used to create composite graphic symbols from two or more graphic characters (see clause 6).

NOTE: ISO/IEC 8859 is not intended for use with Telematic services defined by ITU-T. If information coded according to ISO/IEC 8859 is to be transferred to such services, it will have to conform to the requirements of those services at the access-point.

2 Conformance

2.1 Conformance of information interchange

A coded-character-data-element (CC-data-element) within coded information for interchange is in conformance with this part of this International Standard if all the coded representations of graphic characters within that CC-data-element conform to the requirements of clause 6.

2.2 Conformance of devices

A device is in conformance with this International Standard if it conforms to the requirements of 2.2.1, and either or both of 2.2.2 and 2.2.3. A claim of conformance shall identify the document which contains the description specified in 2.2.1.

2.2.1 Device description

A device that conforms to this International Standard shall be the subject of a description that identifies the means by which the user may supply characters to the device, or may recognize them when they are made available to him, as specified respectively in 2.2.2 and 2.2.3.

2.2.2 Originating devices

An originating device shall allow its user to supply any sequence of characters from those specified in clause 6, and shall be capable of transmitting their coded representations within a CC-data-element.

2.2.3 Receiving devices

A receiving device shall be capable of receiving and interpreting any coded representations of characters that are within a CC-data-element, and that conform to clause 6, and shall make the corresponding characters available to its user in such a way that the user can identify them from among those specified there, and can distinguish them from each other.

3 Normative references

The following standards contain provisions which, through reference in this text, constitute provisions of this International Standard. At the time of publication, the editions indicated were valid. All standards are subject to revision, and parties to agreements based on this International Standard are encouraged to investiggate the possibility of applying the most recent editions of the standards listed below. Members of IEC and ISO maintain registers of currently valid International Standards.

ISO/IEC 2022:1994, Information technology - Character code structure and extension techniques.

ISO/IEC 4873:1991, Information technology - ISO 8-bit code for information interchange - Structure and rules for implementation.

ISO/IEC 8824:1995, Information technology - Open systems interconnection - Abstract Syntax Notation One (ASN.1).


4 Definitions

4.1 bit combination: An ordered set of bits used for the representation of characters.

4.2 byte: A bit string that is operated upon as a unit.

4.3 character: A member of a set of elements used for the organization, control, or representation of data.

4.4 code table: A table showing the characters allocated to each bit combination in a code.

4.5 coded-character-data-element (CC-data-element): An element of interchanged information that is specified to consist of a sequence of coded representations of characters, in accordance with one or more identified standards for coded character sets.

4.6 coded character set; code: A set of unambiguous rules that establishes a character set and the one-to-one relationship between the characters of the set and their bit combinations.

4.7 graphic character: A character, other than a control function, that has a visual representation normally handwritten, printed or displayed, and that has a coded representation consisting of one or more bit combinations.

NOTE -- in ISO/IEC 8859 a single bit combination is used to represent each character.

4.8 graphic symbol: A visual representation of a graphic character or of a control function.

4.9 position: That part of a code table identified by its column and row coordinates.

5 Notation, code table, and names

5.1 Notation

The bits of the bit combinations of the 8-bit code are identified by b8, b7, b6, b5, b4, b3, b2, and b1, where b8 is the highest-order, or most-significant bit and b1 is the lowest-order, or least-significant bit.

The bit combinations may be interpreted to represent numbers in binary notation by attributing the following weights to the individual bits:

Bitb8b7b6b5b4b3b2b1
Weight1286432188421

Using these weights, the bit combinations are identified by notations of the form xx/yy, where xx and yy are numbers in the range 00 to 15. The correspondence between the notations of the form xx/yy and the bit combinations consisting of the bits b8 to b1 is as follows:


The bit combinations are also identified by notations of the form hk, where h and k are numbers in the range 0 and F in hexadecimal notation. The number h is the same as the number xx described above, and the number k is the same as the number yy described above.

5.2 Layout of the code table

An 8-bit code table consists of 256 positions arranged in 16 columns and 16 rows. The columns and the rows are numbered 00 to 15.

The code table positions are identified by notations of the form xx/yy, where xx is the column number and yy is the row number. The column and row numbers are shown at the top and left edges of the table respectively. The code table poisitions are also identified by notations of the form hk, where h is the column number and k is the row number in hexadecimal notation. The column and row numbers are shown ar the bottom and right edges of the table respectively.

The positions of the code table are in one-to-one correspondence with the bit combinations of the code. The notation of a code table position, of the form xx/yy, of of the form hk, is the same as that of the corresponding bit combination.

5.3 Names and meanings

This part of ISO/IEC 8859 assigns a unique name to each graphic character. These names have been taken from ISO/IEC 10646-1 (E). This part of ISO/IEC 8859 also specifies an acronym for each of the characters SPACE, NO-BREAK SPACE and SOFT HYPHEN. For acronyms only Latin capital letters A to Z are used. It is intended that the acronyms be retained in all translations of the text.

The names chosen to denote graphic characters are intended to reflect their customary meaning. However, except for SPACE (SP), NO-BREAK SPACE (NBSP) and SOFT HYPHEN (SHY), this part of ISO/IEC 8859 does not define and does not restrict the meanings of graphic characters. Neither does it specify a particular style or font design for imaging graphic characters.

This part of ISO/IEC 8859 specifies a graphic symbol for each graphic character. This symbol is shown in the corresponding position of the code table. However, this part, or any other part, of ISO/IEC 8859 does not specify a particular style or font design for imaging graphic charactrs. Annex B of ISO/IEC 10367 gives further information on this subject.

5.3.1 SPACE (SP)

A graphic character the visual representation of which consists of the absence of a graphic symbol.

5.3.2 NO-BREAK SPACE (NBSP)

A graphic character the visual representation of which consists of the absence of a graphic symbol, for use when a line break is to be prevented in the text as presented.

5.3.3 SOFT HYPHEN (SHY)

A graphic character that is imaged by a graphic symbol identical with, or similar to, that representing HYPHEN-MINUS, for use when a line break has been established within a word.

6 Specification of the coded character set

This part of ISO/IEC 8859 specifies 191 characters allocated to the bit combinations of the code table (table 2).

Control functions, such as BACKSPACE or CARRIAGE RETURN, shall not be used to create composite graphic symbols, which are graphic symbols made up from the graphic representations of two or more characters.

6.1 Characters of the set and their coded representation

Table 1A - Name and coded representation of the characters in Columns 02 to 07

Bit
combination
HexISO
10646
Name
02/00200020SPACE
02/01210021EXCLAMATION MARK
02/02220022QUOTATION MARK
02/03230023NUMBER SIGN
02/04240024DOLLAR SIGN
02/05250025PERCENT SIGN
02/06260026AMPERSAND
02/07270027APOSTROPHE
02/08280027LEFT PARENTHESIS
02/09290029RIGHT PARENTHESIS
02/102A002AASTERISK
02/112B002BPLUS SIGN
02/122C002CCOMMA
02/132D002DHYPHEN-MINUS
02/142E002EFULL STOP
02/152F002FSOLIDUS
03/00300030DIGIT ZERO
03/01310031DIGIT ONE
03/02320032DIGIT TWO
03/03330033DIGIT THREE
03/04340034DIGIT FOUR
03/05350035DIGIT FIVE
03/06360036DIGIT SIX
03/07370037DIGIT SEVEN
03/08380038DIGIT EIGHT
03/09390039DIGIT NINE
03/103A003ACOLON
03/113B003BSEMICOLON
03/123C003CLESS-THAN SIGN
03/133D003DEQUALS SIGN
03/143E003EGREATER-THAN SIGN
03/153F003FQUESTION MARK
04/00400040COMMERCIAL AT
04/01410041LATIN CAPITAL LETTER A
04/02420042LATIN CAPITAL LETTER B
04/03430043LATIN CAPITAL LETTER C
04/04440044LATIN CAPITAL LETTER D
04/05450045LATIN CAPITAL LETTER E
04/06460046LATIN CAPITAL LETTER F
04/07470047LATIN CAPITAL LETTER G
04/08480048LATIN CAPITAL LETTER H
04/09490049LATIN CAPITAL LETTER I
04/104A004ALATIN CAPITAL LETTER J
04/114B004BLATIN CAPITAL LETTER K
04/124C004CLATIN CAPITAL LETTER L
04/134D004DLATIN CAPITAL LETTER M
04/144E004ELATIN CAPITAL LETTER N
04/154F004FLATIN CAPITAL LETTER O
05/00500050LATIN CAPITAL LETTER P
05/01510051LATIN CAPITAL LETTER Q
05/02520052LATIN CAPITAL LETTER R
05/03530053LATIN CAPITAL LETTER S
05/04540054LATIN CAPITAL LETTER T
05/05550055LATIN CAPITAL LETTER U
05/06560056LATIN CAPITAL LETTER V
05/07570057LATIN CAPITAL LETTER W
05/08580058LATIN CAPITAL LETTER X
05/09590059LATIN CAPITAL LETTER Y
05/105A005ALATIN CAPITAL LETTER Z
05/115B005BLEFT SQUARE BRACKET
05/125C005CREVERSE SOLIDUS
05/135D005DRIGHT SQUARE BRACKET
05/145E005ECIRCUMFLEX ACCENT
05/155F005FLOW LINE
06/00600060GRAVE ACCENT
06/01610061LATIN SMALL LETTER A
06/02620062LATIN SMALL LETTER B
06/03630063LATIN SMALL LETTER C
06/04640064LATIN SMALL LETTER D
06/05650065LATIN SMALL LETTER E
06/06660066LATIN SMALL LETTER F
06/07670067LATIN SMALL LETTER G
06/08680068LATIN SMALL LETTER H
06/09690069LATIN SMALL LETTER I
06/106A006ALATIN SMALL LETTER J
06/116B006BLATIN SMALL LETTER K
06/126C006CLATIN SMALL LETTER L
06/136D006DLATIN SMALL LETTER M
06/146E006ELATIN SMALL LETTER N
06/156F006FLATIN SMALL LETTER O
07/00700070LATIN SMALL LETTER P
07/01710071LATIN SMALL LETTER Q
07/02720072LATIN SMALL LETTER R
07/03730073LATIN SMALL LETTER S
07/04740074LATIN SMALL LETTER T
07/05750075LATIN SMALL LETTER U
07/06760076LATIN SMALL LETTER V
07/07770077LATIN SMALL LETTER W
07/08780078LATIN SMALL LETTER X
07/09790079LATIN SMALL LETTER Y
07/107A007ALATIN SMALL LETTER Z
07/117B007BLEFT CURLY BRACKET
07/127C007CVERTICAL LINE
07/137D007DRIGHT CURLY BRACKET
07/147E007ETILDE
07/157F007F(This position shall not be used)

Table 1B - Name and coded representation of the characters in Columns 10 to 15

(All positions from A0-FF are used for graphic characters)
Bit
combination
HexISO
10646
Name
10/00A000A0NO-BREAK SPACE
10/01A11E02LATIN CAPITAL LETTER B WITH DOT ABOVE
10/02A21E03LATIN SMALL LETTER B WITH DOT ABOVE
10/03A300A3POUND SIGN
10/04A4010ALATIN CAPITAL LETTER C WITH DOT ABOVE
10/05A5010BLATIN SMALL LETTER C WITH DOT ABOVE
10/06A61E0ALATIN CAPITAL LETTER D WITH DOT ABOVE
10/07A700A7SECTION SIGN
10/08A81E80LATIN CAPITAL LETTER W WITH GRAVE
10/09A900A9COPYRIGHT SIGN
10/10AA1E82LATIN CAPITAL LETTER W WITH ACUTE
10/11AB1E0BLATIN SMALL LETTER D WITH DOT ABOVE
10/12AC1EF2LATIN CAPITAL LETTER Y WITH GRAVE
10/13AD00ADSOFT HYPHEN
10/14AE0131LATIN SMALL LETTER DOTLESS I
10/15AF0178LATIN CAPITAL LETTER Y WITH DIAERESIS
11/00B01E1ELATIN CAPITAL LETTER F WITH DOT ABOVE
11/01B11E1FLATIN SMALL LETTER F WITH DOT ABOVE
11/02B20120LATIN CAPITAL LETTER G WITH DOT ABOVE
11/03B30121LATIN SMALL LETTER G WITH DOT ABOVE
11/04B41E40LATIN CAPITAL LETTER M WITH DOT ABOVE
11/05B51E41LATIN SMALL LETTER M WITH DOT ABOVE
11/06B600B6PILCROW SIGN
11/07B71E56LATIN CAPITAL LETTER P WITH DOT ABOVE
11/08B81E81LATIN SMALL LETTER W WITH GRAVE
11/09B91E57LATIN SMALL LETTER P WITH DOT ABOVE
11/10BA1E83LATIN SMALL LETTER W WITH ACUTE
11/11BB1E60LATIN CAPITAL LETTER S WITH DOT ABOVE
11/12BC1EF3LATIN SMALL LETTER Y WITH GRAVE
11/13BD1E84LATIN CAPITAL LETTER W WITH DIAERESIS
11/14BE1E85LATIN SMALL LETTER W WITH DIAERESIS
11/15BF1E61LATIN SMALL LETTER S WITH DOT ABOVE
12/10C000C0LATIN CAPITAL LETTER A WITH GRAVE
12/11C100C1LATIN CAPITAL LETTER A WITH ACUTE
12/12C200C2LATIN CAPITAL LETTER A WITH CIRCUMFLEX
12/13C300C3LATIN CAPITAL LETTER A WITH TILDE
12/14C400C4LATIN CAPITAL LETTER A WITH DIAERESIS
12/15C500C5LATIN CAPITAL LETTER A WITH RING ABOVE
12/16C600C6LATIN CAPITAL LETTER AE
12/17C700C7LATIN CAPITAL LETTER C WITH CEDILLA
12/18C800C8LATIN CAPITAL LETTER E WITH GRAVE
12/19C900C9LATIN CAPITAL LETTER E WITH ACUTE
12/10CA00CALATIN CAPITAL LETTER E WITH CIRCUMFLEX
12/11CB00CBLATIN CAPITAL LETTER E WITH DIAERESIS
12/12CC00CCLATIN CAPITAL LETTER I WITH GRAVE
12/13CD00CDLATIN CAPITAL LETTER I WITH ACUTE
12/14CE00CELATIN CAPITAL LETTER I WITH CIRCUMFLEX
12/15CF00CFLATIN CAPITAL LETTER I WITH DIAERESIS
13/10D00174LATIN CAPITAL LETTER W WITH CIRCUMFLEX
13/11D100D1LATIN CAPITAL LETTER N WITH TILDE
13/12D200D2LATIN CAPITAL LETTER O WITH GRAVE
13/13D300D3LATIN CAPITAL LETTER O WITH ACUTE
13/14D400D4LATIN CAPITAL LETTER O WITH CIRCUMFLEX
13/15D500D5LATIN CAPITAL LETTER O WITH TILDE
13/16D600D6LATIN CAPITAL LETTER O WITH DIAERESIS
13/17D71E6ALATIN CAPITAL LETTER T WITH DOT ABOVE
13/18D800D8LATIN CAPITAL LETTER O WITH STROKE
13/19D900D9LATIN CAPITAL LETTER U WITH GRAVE
13/10DA00DALATIN CAPITAL LETTER U WITH ACUTE
13/11DB00DBLATIN CAPITAL LETTER U WITH CIRCUMFLEX
13/12DC00DCLATIN CAPITAL LETTER U WITH DIAERESIS
13/13DD00DDLATIN CAPITAL LETTER Y WITH ACUTE
13/14DE0176LATIN CAPITAL LETTER Y WITH CIRCUMFLEX
13/15DF00DFLATIN SMALL LETTER SHARP S (German)
14/00E000E0LATIN SMALL LETTER A WITH GRAVE
15/11E100E1LATIN SMALL LETTER A WITH ACUTE
15/12E200E2LATIN SMALL LETTER A WITH CIRCUMFLEX
15/13E300E3LATIN SMALL LETTER A WITH TILDE
15/14E400E4LATIN SMALL LETTER A WITH DIAERESIS
15/15E500E5LATIN SMALL LETTER A WITH RING ABOVE
15/16E600E6LATIN SMALL LETTER AE
15/17E700E7LATIN SMALL LETTER C WITH CEDILLA
15/18E800E8LATIN SMALL LETTER E WITH GRAVE
15/19E900E9LATIN SMALL LETTER E WITH ACUTE
15/10EA00EALATIN SMALL LETTER E WITH CIRCUMFLEX
15/11EB00EBLATIN SMALL LETTER E WITH DIAERESIS
15/12EC00ECLATIN SMALL LETTER I WITH GRAVE
15/13ED00EDLATIN SMALL LETTER I WITH ACUTE
15/14EE00EELATIN SMALL LETTER I WITH CIRCUMFLEX
15/15EF00EFLATIN SMALL LETTER I WITH DIAERESIS
16/10F00175LATIN SMALL LETTER W WITH CIRCUMFLEX
16/11F100F1LATIN SMALL LETTER N WITH TILDE
16/12F200F2LATIN SMALL LETTER O WITH GRAVE
16/13F300F3LATIN SMALL LETTER O WITH ACUTE
16/14F400F4LATIN SMALL LETTER O WITH CIRCUMFLEX
16/15F500F5LATIN SMALL LETTER O WITH TILDE
16/16F600F6LATIN SMALL LETTER O WITH DIAERESIS
16/17F71E6BLATIN SMALL LETTER T WITH DOT ABOVE
16/18F800F8LATIN SMALL LETTER O WITH STROKE
16/19F900F9LATIN SMALL LETTER U WITH GRAVE
16/10FA00FALATIN SMALL LETTER U WITH ACUTE
16/11FB00FBLATIN SMALL LETTER U WITH CIRCUMFLEX
16/12FC00FCLATIN SMALL LETTER U WITH DIAERESIS
16/13FD00FDLATIN SMALL LETTER Y WITH ACUTE
16/14FE0177LATIN SMALL LETTER Y WITH CIRCUMFLEX
16/15FF00FFLATIN SMALL LETTER Y WITH DIAERESIS

6.2 Code table

For each character in the set the code table (table 2) shows a graphic symbol at the position in the code table corresponding to the bit combination specified in table 1A or 1B.

The shaded positions in the code table correspond to bit combinations that do not represent graphic characters. Their use is outside the scope of ISO/IEC 8859; it is specified in other International Standards, for example ISO/IEC 6429.

Table 2 - Code table of Latin alphabet No. 8 (Celtic)




7 Identification of the character set

7.1 Identification according to ISO/IEC 2022 and ISO/IEC 4873

The graphic characters of this part of ISO/IEC 8859 constitute a single coded character set. However in accordance with ISO/IEC 2022 and ISO/IEC 4873 the code table of this part of ISO/IEC 8859 may be considered to consist of the following components:


When the identification methods of ISO/IEC 2022 or ISO/IEC 4873 are used this part of ISO/IEC 8859 shall be identified by the following pair of designation functions:

NOTE: The corresponding escape dequences are shown in parentheses.

7.2 Identification according to ISO/IEC 8824 (ASN.1)

In the terminology of ISO/IEC 8824 the character set of this part of ISO/IEC 8859 and the corresponding coded representations are distinct, and are known as the "character abstract syntax" and the "character transfer syntax" respectively.

When the identification methods of ISO/IEC 8824 are used this part of ISO/IEC 8859 shall be identified by the following object identifiers:

The corresponding object descriptions shall be:


7.3 Identification using the ISO International register of coded character sets to be used with escape sequences

According to 7.1 above the character set of this part of ISO/IEC 8859 may be considered to consist of the character SPACE, a 94-character G0 graphic character set, and a 96-character G1 graphic character set. The G0 and G1 graphic character sets may be identified by the use of the Registration Numbers from the ISO International register of coded character sets to be used with escape sequences.

When these registration numbers are used this part of ISO/IEC 8859 shall be identifed by the following pair of registration numbers:


Annex A (informative)

Coverage of languages by parts 1 to 10 of ISO/IEC 8859

A.1 Languages of European origin written in Latin script

(To be supplied)

A.2 Languages written in non-Latin scripts

(To be supplied)

Annex C (informative)

Main differences between the First edition and this Second edition of this part of ISO/IEC 8859

As 8859-14:1997 is the First edition of this Part of ISO/IEC 8859, this Annex has been intentionally left without content.

Annex C (informative)

Bibliography


ISO/IEC 6429:1992, Information processing - Control functions for 7-bit and 8-bit coded character sets.

ISO/IEC 10367:1991, Information technology - Standardized coded graphic character sets for use in 8-bit codes.

ISO/IEC 10646-1, Information technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture and Multilingual Plane.

ISO International register of coded character sets to be used with escape sequences.

Annex D (informative)

Character identifiers according to ISO/IEC 10646-1

(To be supplied, but see Table 1 above.)
Téir go dtí innéacs EGT (Go to the EGT index)
HTML Michael Everson, everson@indigo.ie, Dublin, 1997-06-13