Unicode 6.2.0 Home | Site Map | Search 6.2.0 Core Specification All Chapters and Appendices Together: • Full Text pdf for Viewing (11 MB) 6.2.0 Front Matter Title and Copyright Contents Unicode 6.2 Web Bookmarks List of Figures List of Tables Preface 6.2.0 Chapters 1 Introduction 2 General Structure 3 Conformance 4 Character Properties 5 Implementation Guidelines 6 Writing Systems and Punctuation 7 European Alphabetic Scripts 8 Middle Eastern Scripts 9 South Asian Scripts - I 10 South Asian Scripts - II 11 Southeast Asian Scripts 12 East Asian Scripts 13 Additional Modern Scripts 14 Ancient and Historic Scripts 15 Symbols 16 Special Areas and Format Characters 17 About the Code Charts 6.2.0 Appendices and Back Matter A Notational Conventions B Unicode Publications and Resources C Relationship to ISO/IEC 10646 D Changes from Previous Versions E Han Unification History F Documentation of CJK Strokes R References I General Index Code Charts • Latest Code Charts • Delta Code Charts (additions to 6.2.0 highlighted) • Archival Code Charts (6.2.0) Han Radical-Stroke Indices • Interactive Han Radical-Stroke Index • IICore Radical-Stroke Index (3.2 MB) • Full Han Radical-Stroke Index (25 MB, unchanged from 6.1.0) 6.2.0 Unicode Standard Annexes UAX #9: The Unicode Bidirectional Algorithm UAX #11: East Asian Width UAX #14: Unicode Line Breaking Algorithm UAX #15: Unicode Normalization Forms UAX #24: Unicode Script Property UAX #29: Unicode Text Segmentation UAX #31: Unicode Identifier and Pattern Syntax UAX #34: Unicode Named Character Sequences UAX #38: Unicode Han Database (Unihan) UAX #41: Common References for Unicode Standard Annexes UAX #42: Unicode Character Database in XML UAX #44: Unicode Character Database UAX #45: U-Source Ideographs 6.2.0 UCD 6.2.0 (files) (about) 6.2.0 Zipped files (for bulk download) Related Links Unicode Acknowledgements Archive of Unicode Versions Updates and Errata Glossary of Unicode Terms Unicode Character Name Index Technical Reports
Released: 2012 September 26 (Announcement)
Version 6.2.0 has been superseded by the latest version of the Unicode Standard.
Unicode 6.2.0 is a minor version of the Unicode Standard. This page summarizes the important changes for the Unicode Standard, Version 6.2.0. In the discussion below, Version 6.2.0 may be abbreviated as "Unicode 6.2" or "Version 6.2."
B. Version Information
C. Stability Policy Update
D. Textual Changes and Character Additions
E. Conformance Changes
F. Unicode Character Database Changes
G. Unicode Standard Annex Changes
Version 6.2 of the Unicode Standard is a special release dedicated to the early publication of the newly encoded Turkish lira sign. This version also rolls in various minor corrections for errata and other small updates for the Unicode Character Database. In addition, there are some significant changes to the Unicode algorithms for text segmentation and line breaking, including changes to the line break property to improve line breaking for emoji symbols.
For detailed property changes see Section F. Unicode Character Database Changes.
Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 6.2:
This version of the Unicode Standard is synchronized with ISO/IEC 10646:2012, plus the accelerated publication of a single character: U+20BA TURKISH LIRA SIGN.
Version 6.2 of the Unicode Standard consists of the core specification, the delta and archival code charts for this version, the Unicode Standard Annexes, and the Unicode Character Database (UCD).
The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.
Version 6.2.0 of the Unicode Standard should be referenced as:
The Unicode Consortium. The Unicode Standard, Version 6.2.0, (Mountain View, CA: The Unicode Consortium, 2012. ISBN 978-1-936213-07-8)
A complete specification of the contributory files for Unicode 6.2 is found on the page Components for 6.2.0.That page also provides the recommended reference format for Unicode Standard Annexes.
The navigation bar on the left of this page provides links to both the core specification as a single file, as well as to individual chapters, and the appendices. Also provided are links to the code charts, the radical-stroke indices to CJK ideographs, the Unicode Standard Annexes and the data files for Version 6.2 of the Unicode Character Database.
Several sets of code charts are available. They serve different purposes:
- The latest set of code charts for the Unicode Standard are available online. Those charts are always the most current code charts available, and may be updated at any time. The charts are organized by scripts and blocks for easy reference. An online index by character name is also provided.
For Unicode 6.2.0 in particular two additional sets of code chart pages are provided:
- A set of delta code charts showing the block in which the Turkish lira sign was added for Unicode 6.2.0. That character is visually highlighted in the relevant chart. These delta code charts also include blocks which contain significant glyph changes to fix errata.
- A set of archival code charts that represent the entire set of characters, names and representative glyphs at the time of publication of Unicode 6.2.0.
The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.
A property value constraint has been added to guarantee that no new characters will be added to the standard with Decomposition_Mapping values whose first character has a non-zero Canonical_Combining_Class. There are four exceptions, which were encoded long ago, prior to Unicode 2.1.
Note: The Unicode Character Encoding Stability Policy restricts possible future changes to the Unicode Standard, but is not formally a part of the standard itself.
Textual changes are very minimal in this version, and are essentially limited to adding a description for the new Turkish lira sign.
Character Assignment Overview
One new character assignment was made in the BMP for the Unicode Standard, Version 6.2. This addition brings the total number of characters assigned in the standard to 110,117. (That is the traditional count, which totals up graphic and format characters, but omits surrogate code points, ISO control codes, noncharacters, and private-use allocations.)
No new blocks are defined in Version 6.2.
The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 6.2 can be found in UAX #44, Unicode Character Database.
Segmentation properties (Grapheme_Cluster_Break, Word_Break, Line_Break) have been modified to improve the segmentation of regional indicator symbols. Other modifications have been made to the Line_Break property values for pictographic symbols, to enable better line breaking behavior. A number of small corrections have also been made for numeric, East Asian width, script, and Unihan properties, and one name alias correction has been added.
Starting with Version 6.2, the encoding for the Unicode names list file (NamesList.txt) has been changed from Latin-1 to UTF-8. This change became possible because of an update of the charting tools which use the names list file in the production of the Unicode code charts.
The U-Source data and glyphs associated with UAX #45 have been added to the Unicode Character Database.
The Script_Extension property was changed from provisional to informative.
In Version 6.2, many of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.Unicode Standard AnnexChanges UAX #9
Unicode Bidirectional Algorithm No significant changes in this version. UAX #11
East Asian Width A note was added to definition ED3 in Section 4 to explain the East Asian Halfwidth property of U+20A9 WON SIGN. UAX #14
Unicode Line Breaking Algorithm The text was modified so that property values and rules prevent breaks between Regional Indicator (RI) characters. (Sequences of more than two RI characters should be separated by other characters, such as U+200B ZWSP.) UAX #15
Unicode Normalization Forms Additional equivalences were added to the Design Goals. UAX #24
Unicode Script Property The text was rewritten substantially to incorporate a fuller explanation of the Script_Extensions property and its property value assignments. A disclaimer was added about the stability of Script and Script_Extensions property values. UAX #29
Unicode Text Segmentation The text was modified so that property values and rules prevent breaks between Regional Indicator (RI) characters. (Sequences of more than two RI characters should be separated by other characters, such as U+200B ZWSP.) Regular expressions have been clarified in Table 1b, Combining Character Sequences and Grapheme Clusters. UAX #31
Unicode Identifier and Pattern Syntax No significant changes in this version. UAX #34
Unicode Named Character Sequences No significant changes in this version. UAX #38
Unicode Han Database (Unihan) No significant changes in this version. UAX #41
Common References for Unicode Standard Annexes No significant changes in this version. UAX #42
Unicode Character Database in XML No significant changes in this version. UAX #44
Unicode Character Database The status of Script_Extensions was updated to informative and the type of Bidi_Mirroring was updated from String to Miscellaneous. The Unicode_1_Name property was marked as obsolete. A clarification was added regarding change control for normative and informative property values. UAX #45
U-Source Ideographs UAX #45 has been updated from a Unicode Technical Report to a Unicode Standard Annex for this version. The data files for UAX #45 have been added to the Unicode Character Database.