Linked Languages ResourcesA contribution to the Web of Data
by Bernard Vatant, Mondeca
The resources of Lingvoj are now maintained by Ghislain Atemezing @ Mondeca.
Lingvoj Ontology v2.33. Added portuguese translations, thanks to Mariana Curado Malta.
Updated the examples of translations in conformance with the last ontology version.
Lingvoj Ontology v2.32. Added japanese translations, thanks to Shuji Kamitsuna.
New dataset, linking ISO 639 codes to various linguistic resources, some of them yet unknown in the linked data ecosystem. Both HTML table and RDF dump (Turtle file) are available. More resources to be added any time.
You can now search lingvoj.org pages using the Freebase Search Widget. You can enter search values in any of the nine languages supported by the Freebase search : English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Chinese (zh), Japanese (ja) and Korean (ko). But suggestions are proposed in the first language of the list (English), as explained in the Freebase Search Cookbook. A further improvement will be to select your search language. On the French version pages, the suggestions are in French.
How does it work? By the magic of shared keys, the ISO codes. The widget calls the Freebase API, filtering the autocomplete values by the type of things we are looking for here: human languages. On selection, it asks if the selected object has a ISO 639-3 code. If yes, this code is used to redirect to the matching lingvoj.org page. If no ISO 639-3 code is found, nothing happens. In that case you can go to the Freebase record and add the code if it's really missing. But it can also be the case that this language is not yet acknowledged by ISO.
Thanks to the helpful support of Freebase team to figure it out!
Note that the Freebase API answer might sometimes be slow or time out.
New individual pages for each language defined by an ISO 639 code (7874 languages to date), providing structured data and language resources.
Lingvoj Ontology v2.3 including representation of language resources, used in the above pages. (Turtle version)
Browse the full list of languages, or explore some examples : [ar], [fr], [sw], [zh]
2013-02-07 : Lingvoj Ontology v2.1
Version 2 includes more detailed properties to describe the levels of knowledge of a person in a language (understood, spoken, read, written), various ways a language is linked to a country (official, main, regional, minority) or to an organization, a project or an event. it also includes properties to describe the status of a language at a given date (living, endangered, extinct). All elements are named and defined in both English, French (v2.1) and Spanish (v2.2), more translations expected.
The ontology is available in Turtle syntax with inner comments. The html documentation is automatically generated from the Turtle file by redirection of the ontology namespace via content negotiation to the cool Parrot service.
This version also relies on the new Ontology Loose Coupling Annotation vocabulary to attach such properties to classes in popular ontologies without constraining too much their domain. Background reading for this new vocabulary on this blog post.
Why do we need that?
Languages are an endangered heritage
According to Ethnologue, the number of human languages currently used in the world amounts to almost 7,000. About half of them could be extinct before the end of this century. Only a small fraction of them is supported by some writing system and have written heritage, and among those, still less are used in modern information systems and on the Web. A good indication of the number of languages used on the Web is provided by the multilingual editions of Wikipedia, to-date 285 different languages, that is less than 5% of all known languages. Ranking of languages by importance of their respective wikipedia is a fairly good indicator for the Web influence of their communities of speakers, but very different from the ranking based on the number of speakers.
We need languages as Linked Data
In current XML and RDF practice, languages are identified by tags, typically used in the "xml:lang" attribute. The allowed values of tags are defined by BCP 47. Those language tags are typically used for rdfs:label or rdfs:comment, and allow the filtering of such elements of description by language, for example in SPARQL queries. But they do not provide support for queries such as:
- "Can I find native speakers of Bengali in Berlin?"
- "Which books by Victor Hugo are translated in Arabic?"
- "Is this software documented in Chinese?".
To answer such queries, languages need to be represented as resources, likely to be linked to other resources representing books, people, organizations, places, events, products ... through dedicated properties. Such properties can be found in the Lingvoj Ontology. URIs for languages have been defined in lingvoj.org namespace since 2007, and many other URIs have been defined afterwards in the linked data cloud. Since 2010 lingvoj.org URIs mainly redirect to those of lexvo.org. More details below.
2010-05-10 : lingvoj.org meets Lexvo.org
Since the launch of lingvoj.org in 2007, the linked data cloud has grown at a steady pace, and a growing number of URI sets have been published to identify human languages. Lexvo.org is providing the most exhaustive of those so far, in which URIs for languages are integrated in a global approach of terminology. Through exchanges with Gerard de Melo, editor at Lexvo.org, it has been decided to redirect and deprecate lingvoj.org URIs for individual languages to the benefit of the more stable and exhaustive publication at Lexvo.org.
From that date most lingvoj.org URIs for individual languages are redirected to lexvo.org URIs through content negotiation. A few exceptions are URIs of languages with no ISO 639-3 codes, since lexvo URIs are built on those codes, and languages with a regional tag, such as en-us.
The lingvo-to-lexvo RDF file provides the mappings and equivalence between lingvo and lexvo URIs. Applications using the lingvoj.org URIs are invited to change their references accordingly, although the redirection mechanism should avoid any breakdown of applications using lingvoj.org URIs.
2009-04-06 : Lingvoj Ontology v1.3 introducing the use of dcterms:language, as a superproperty of various lingvoj object properties, and its inverse property "is language of", used to link to active Wikipedia in the language when available (265 such languages to date).
2007-11-29 : Lingvoj Ontology v1.1 including the Translation class, allowing to declare facts such as : The resource A in original language L1 has beeen translated into resource B in target language L2, by the the translator Z. Examples of translations.
2007-10-09 : Eventually, with the precious help from the Linking Open Data community, achieved publication with proper content negociation, which works well with Firefox. For some reason this content negociation is not well supported by Internet Explorer.
Note that this results in new URIs for languages. URIs used in previous versions are no longer supported. Cool URI never change, which means the previous ones were not cool, and the new ones should be stable from now on.
Resources provided here have no official status, but URIs in the lingvoj namespaces are intended to remain "cool", which means stable and dereferencable.