This page uses content from Wikipedia and is licensed under CC BY-SA.

T.51/ISO/IEC 6937

T.51
Latin based coded character sets for telematic services
Base32 DE.svg
StatusIn force
Year started1984
Latest version(09/92)
September 1992
OrganizationITU-T
CommitteeStudy Group VIII
Related standardsT.61
Domainencoding
LicenseFreely available
Website[www.itu.int]

T.51 / ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of ASCII, or rather of ISO/IEC 646-IRV.[1] It was developed in common with ITU-T (then CCITT) for telematic services under the name of T.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on. Only certain combinations of lead byte and follow byte are allowed, and there are some exceptions to the lead byte interpretation for some follow bytes. However, there are no combining characters at all are encoded in ISO/IEC 6937. But one can represent some free-standing diacritics, often by letting the follow byte have the code for ASCII space.[2]

ISO/IEC 6937's architects were Hugh McGregor Ross, Peter Fenwick, Bernard Marti and Loek Zeckendorf.

ISO6937/2 defines 327 characters found in modern European languages using the Latin alphabet. Non-Latin European characters, such as Cyrillic and Greek, are not included in the standard. Also, some diacritics used with the Latin alphabet like the Romanian comma are not included, using cedilla instead as no distinction between cedilla and comma below was made at the time.

IANA has registered the charset names ISO_6937-2-25 and ISO_6937-2-add for two (older) versions of this standard (plus control codes). But in practice this character encoding is unused on the Internet.

The ISO/IEC 2022 escape sequence to specify the right-hand side of the ISO/IEC 6937 character set is ESC - R (hex 1B 2D 52).[3]

Single byte characters

The primary set of ISO6937/2 is based on ISO 646-IRV (characters 0x00..0x7F) before the ISO/IEC 646:1991 revision, that is with character 0x24 still denoted as a "international currency sign" (¤) instead of the dollar sign ($):

	!"#¤%&'()*+,-./0123456789:;<=>[email protected]
	ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
	abcdefghijklmnopqrstuvwxyz{|}

The supplementary set (characters 0x80..0xFF) contains a selection of spacing and non-spacing graphic characters, additional symbols and some locations reserved for future standardisation.

Two byte characters

The characters, which are not represented in the primary set, are coded on two bytes. The first byte, the "non spacing diacritical mark", is followed by a letter from the base set e.g.:

small e with acute accent (é) = [Acute]+e

In total 13 diacritical marks can be followed by the selected characters from the primary set:

Accent Code Second character Result
Grave 0xC1 AEIOUaeiou ÀÈÌÒÙàèìòù
Acute 0xC2 ACEILNORSUYZacegilnorsuyz ÁĆÉÍĹŃÓŔŚÚÝŹáćéģíĺńóŕśúýź
Circumflex 0xC3 ACEGHIJOSUWYaceghijosuwy ÂĈÊĜĤÎĴÔŜÛŴŶâĉêĝĥîĵôŝûŵŷ
Tilde 0xC4 AINOUainou ÃĨÑÕŨãĩñõũ
macron 0xC5 AEIOUaeiou ĀĒĪŌŪāēīōū
Breve 0xC6 AGUagu ĂĞŬăğŭ
Dot 0xC7 CEGIZcegz ĊĖĠİŻċėġż
Umlaut or diæresis 0xC8 AEIOUYaeiouy ÄËÏÖÜŸäëïöüÿ
Ring 0xCA AUau ÅŮåů
Cedilla 0xCB CGKLNRSTcklnrst ÇĢĶĻŅŖŞŢçķļņŗşţ
DoubleAcute 0xCD OUou ŐŰőű
Ogonek 0xCE AEIUaeiu ĄĘĮŲąęįų
Caron 0xCF CDELNRSTZcdelnrstz ČĎĚĽŇŘŠŤŽčďěľňřšťž

Codepage layout

The reference to combining characters in the U+0300—U+036F range for the codes in the range 0xC1—0xCF below are only indicative of which “accent” is usually intended by that lead byte. ISO/IEC 6937 does not encode any combining characters whatsoever. Instead, there is an explicit list of precomposed characters that are encoded.

A little anomaly is that Latin Small Letter G with Cedilla is coded as if it were with an acute accent, that is, with a 0xC2 lead byte, since due to its descender interfering with a cedilla, the lowercase letter is usually with turned comma above: Ģ ģ.

Unicode distinguishes 0xE2 into D with stroke and uppercase Eth, which usually look different for the lowercase letters (0xF2 and 0xF3).

ISO/IEC 6937 (Latin)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
0_
1_
2_ SP
0020
!
0021
"
0022
#
0023
$
0024
%
0025
&
0026
'
0027
(
0028
)
0029
*
002A
+
002B
,
002C
-
002D
.
002E
/
002F
3_ 0
0030
1
0031
2
0032
3
0033
4
0034
5
0035
6
0036
7
0037
8
0038
9
0039
:
003A
;
003B
<
003C
=
003D
>
003E
?
003F
4_ @
0040
A
0041
B
0042
C
0043
D
0044
E
0045
F
0046
G
0047
H
0048
I
0049
J
004A
K
004B
L
004C
M
004D
N
004E
O
004F
5_ P
0050
Q
0051
R
0052
S
0053
T
0054
U
0055
V
0056
W
0057
X
0058
Y
0059
Z
005A
[
005B
\
005C
]
005D
^
005E
_
005F
6_ `
0060
a
0061
b
0062
c
0063
d
0064
e
0065
f
0066
g
0067
h
0068
i
0069
j
006A
k
006B
l
006C
m
006D
n
006E
o
006F
7_ p
0070
q
0071
r
0072
s
0073
t
0074
u
0075
v
0076
w
0077
x
0078
y
0079
z
007A
{
007B
|
007C
}
007D
~
007E
8_
9_
A_ NBSP
00A0
¡
00A1
¢
00A2
£
00A3
¥
00A5
§
00A7
¤
00A4

2018

201C
«
00AB

2190

2191

2192

2193
B_ °
00B0
±
00B1
²
00B2
³
00B3
×
00D7
µ
00B5

00B6
·
00B7
÷
00F7

2019

201D
»
00BB
¼
00BC
½
00BD
¾
00BE
¿
00BF
C_ ̀
0300
́
0301
̂
0302
̃
0303
̄
0304
̆
0306
̇
0307
̈
0308
̊
030A
̧
0327
̋
030B
̨
0328
̌
030C
D_
2015
¹
00B9
®
00AE
©
00A9

2122

266A
¬
00AC
¦
00A6

215B

215C

215D

215E
E_ Ω
2126
Æ
00C6
Đ/Ð
0110/00D0
ª
00AA
Ħ
0126
IJ
0132
Ŀ
013F
Ł
0141
Ø
00D8
Œ
0152
º
00BA
Þ
00DE
Ŧ
0166
Ŋ
014A
ʼn
0149
F_ ĸ
0138
æ
00E6
đ
0111
ð
00F0
ħ
0127
ı
0131
ij
0133
ŀ
0140
ł
0142
ø
00F8
œ
0153
ß
00DF
þ
00FE
ŧ
0167
ŋ
014B
SHY
00AD

  Letter   Number   Punctuation   Symbol   Other  Undefined   Differences from T.61

ETS 300 706 version

The ETS 300 706 standard for World System Teletext bases its G2 set on ISO 6937.[4] It is a superset of the right-hand side of T.61, and almost a superset of the right hand side of T.51 except for the characters at the positions 0xD6, 0xD7 and 0xFF, which differ. Diacritic codes in the ETS version are specified as being "for association with" characters from the G0 set in use,[4] such as US-ASCII or BS_viewdata. This version is shown in the chart below.

World System Teletext, Latin G2 Set (ETS 300 706:1997)[4]
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
8_
9_
A_ NBSP
00A0
¡
00A1
¢
00A2
£
00A3
$
0024
¥
00A5
#
0023
§
00A7
¤
00A4

2018

201C
«
00AB

2190

2191

2192

2193
B_ °
00B0
±
00B1
²
00B2
³
00B3
×
00D7
µ
00B5

00B6
·
00B7
÷
00F7

2019

201D
»
00BB
¼
00BC
½
00BD
¾
00BE
¿
00BF
C_ ̀
0300
́
0301
̂
0302
̃
0303
̄
0304
̆
0306
̇
0307
̈
0308
̣
0323
̊
030A
̧
0327
̲
0332
̋
030B
̨
0328
̌
030C
D_
2015
¹
00B9
®
00AE
©
00A9

2122

266A

20A0

2030

221D

215B

215C

215D

215E
E_ Ω
2126
Æ
00C6
Đ/Ð
0110/00D0
ª
00AA
Ħ
0126
IJ
0132
Ŀ
013F
Ł
0141
Ø
00D8
Œ
0152
º
00BA
Þ
00DE
Ŧ
0166
Ŋ
014A
ʼn
0149
F_ ĸ
0138
æ
00E6
đ
0111
ð
00F0
ħ
0127
ı
0131
ij
0133
ŀ
0140
ł
0142
ø
00F8
œ
0153
ß
00DF
þ
00FE
ŧ
0167
ŋ
014B

25A0

  Letter   Number   Punctuation   Symbol   Other  Undefined   Different from both T.51 and T.61.   Matching T.61, not allocated in T.51.

See also

References

  1. ^ "T.51 : Latin based coded character sets for telematic services". www.itu.int. Archived from the original on 2019-10-08. Retrieved 2019-11-14.
  2. ^ Petersen, J. K. (2002-05-29). The Telecommunications Illustrated Dictionary. CRC Press. p. 888. ISBN 9781420040678.
  3. ^ Supplementary Set of ISO/IEC 6937:1992 The high-ASCII half of the character set. (The left-hand side is U.S. ASCII.)
  4. ^ a b c ETSI (1997). "15.6.3 Latin G2 Set". Enhanced Teletext specification (PDF) (PDF). p. 116. ETS 300 706.

External links