Towards the Multilingual Web of Data

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (15 September 2018) | Viewed by 40939

Special Issue Editors


E-Mail Website
Guest Editor
Data Science Institute/Insight Centre for Data Analytics, National University of Ireland Galway, H91 CF50 Galway, Ireland
Interests: natural language processing; semantic web

E-Mail Website
Guest Editor
Ontology Engineering Group, Artificial Intelligence Department, Universidad Politécnica de Madrid, Madrid, Spain
Interests: semantic web; linguistic linked data; multilingualism; query Interpretation; ontology matching

Special Issue Information

Dear Colleagues,

The MDPI Information Journal invites submissions to a special issue on “Towards the Multilingual Web of Data”.

The Web of Data has increasingly become a space where concepts are described not only with logic and ontologies but also with linguistic information in the form of multilingual lexicons, terminologies, and thesauri. In particular, this has led to the creation of a growing cloud of linguistically-linked open data, which bridges the world of ontologies with dictionaries, corpora and other linguistic resources. This raises several challenges such as ontology localisation, cross-lingual question answering, cross-lingual ontology and data matching, representation of lexical information on the Web of Data, etc.

Furthermore, NLP and machine learning for linked data can benefit from exploiting multilingual language resources, such as annotated corpora, wordnets, bilingual dictionaries, etc., if they are themselves formally represented and linked by following the linked data principles. A critical mass of language resources as linked data on the Web is leading to a new generation of linked, data-aware NLP techniques and tools, which, in turn, will serve as basis for a richer, multilingual Web.

This Special Issue is concerned with groundbreaking topics at the interface of the Semantic Web, language resources and NLP, with particular emphasis on multilingual aspects.

Topics of call

  • Linguistic Linked Open Data
  • NLP on the Semantic Web
  • Multilinguality and semantics
  • Best practices for multilingual linked data
  • Validation, quality and legal issues for multilingual linked data
  • Language resources published as linked data
  • Ontologies, terminologies and models for multilingual linked data
  • Publishing language resources as linked data using language description models such as OntoLex-lemon
  • Cross-lingual information access, search and retrieval
  • NLP and machine learning approaches for the Semantic Web and Linked Data

Papers’ length has to be 9–15 pages and should be formatted according to the MDPI template. Complete instructions for authors can be found at:

https://www.mdpi.com/journal/information/instructions

Important Dates
Submission: 15 December 2017
Notification: January 2018

Dr. John P. McCrae
Dr. Jorge Gracia
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Linked Data
  • Semantic Web
  • Linguistic Linked Data
  • Multilinguality
  • Language Resources
  • Lexicography
  • Ontology-Lexicon Interface
  • Natural Language Processing
  • Ontologies
  • Terminologies

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

2 pages, 167 KiB  
Editorial
Foreword to the Special Issue: “Towards the Multilingual Web of Data”
by John P. McCrae and Jorge Gracia
Information 2019, 10(2), 56; https://doi.org/10.3390/info10020056 - 9 Feb 2019
Cited by 1 | Viewed by 2720
Abstract
We are pleased to introduce this special issue on the topic of “Towards the Multilingual Web of Data”, which we feel is a timely and valuable topic in our increasingly multilingual and interconnected world [...] Full article
(This article belongs to the Special Issue Towards the Multilingual Web of Data)

Research

Jump to: Editorial

17 pages, 1880 KiB  
Article
Towards the Representation of Etymological Data on the Semantic Web
by Anas Fahad Khan
Information 2018, 9(12), 304; https://doi.org/10.3390/info9120304 - 30 Nov 2018
Cited by 15 | Viewed by 5282
Abstract
In this article, we look at the potential for a wide-coverage modelling of etymological information as linked data using the Resource Data Framework (RDF) data model. We begin with a discussion of some of the most typical features of etymological data and the [...] Read more.
In this article, we look at the potential for a wide-coverage modelling of etymological information as linked data using the Resource Data Framework (RDF) data model. We begin with a discussion of some of the most typical features of etymological data and the challenges that these might pose to an RDF-based modelling. We then propose a new vocabulary for representing etymological data, the Ontolex-lemon Etymological Extension (lemonETY), based on the ontolex-lemon model. Each of the main elements of our new model is motivated with reference to the preceding discussion. Full article
(This article belongs to the Special Issue Towards the Multilingual Web of Data)
Show Figures

Figure 1

24 pages, 2248 KiB  
Article
Semantic Modelling and Publishing of Traditional Data Collection Questionnaires and Answers
by Yalemisew Abgaz, Amelie Dorn, Barbara Piringer, Eveline Wandl-Vogt and Andy Way
Information 2018, 9(12), 297; https://doi.org/10.3390/info9120297 - 24 Nov 2018
Cited by 14 | Viewed by 6436
Abstract
Extensive collections of data of linguistic, historical and socio-cultural importance are stored in libraries, museums and national archives with enormous potential to support research. However, a sizable portion of the data remains underutilised because of a lack of the required knowledge to model [...] Read more.
Extensive collections of data of linguistic, historical and socio-cultural importance are stored in libraries, museums and national archives with enormous potential to support research. However, a sizable portion of the data remains underutilised because of a lack of the required knowledge to model the data semantically and convert it into a format suitable for the semantic web. Although many institutions have produced digital versions of their collection, semantic enrichment, interlinking and exploration are still missing from digitised versions. In this paper, we present a model that provides structure and semantics to a non-standard linguistic and historical data collection on the example of the Bavarian dialects in Austria at the Austrian Academy of Sciences. We followed a semantic modelling approach that utilises the knowledge of domain experts and the corresponding schema produced during the data collection process. The model is used to enrich, interlink and publish the collection semantically. The dataset includes questionnaires and answers as well as supplementary information about the circumstances of the data collection (person, location, time, etc.). The semantic uplift is demonstrated by converting a subset of the collection to a Linked Open Data (LOD) format, where domain experts evaluated the model and the resulting dataset for its support of user queries. Full article
(This article belongs to the Special Issue Towards the Multilingual Web of Data)
Show Figures

Figure 1

16 pages, 591 KiB  
Article
Annotating a Low-Resource Language with LLOD Technology: Sumerian Morphology and Syntax
by Christian Chiarcos, Ilya Khait, Émilie Pagé-Perron, Niko Schenk, Jayanth, Christian Fäth, Julius Steuer, William Mcgrath and Jinyan Wang
Information 2018, 9(11), 290; https://doi.org/10.3390/info9110290 - 19 Nov 2018
Cited by 15 | Viewed by 6618
Abstract
This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general. Cuneiform texts are invaluable sources for the study of history, languages, economy, and cultures of Ancient Mesopotamia and its surrounding regions. [...] Read more.
This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general. Cuneiform texts are invaluable sources for the study of history, languages, economy, and cultures of Ancient Mesopotamia and its surrounding regions. Assyriology, the discipline dedicated to their study, has vast research potential, but lacks the modern means for computational processing and analysis. Our project, Machine Translation and Automated Analysis of Cuneiform Languages, aims to fill this gap by bringing together corpus data, lexical data, linguistic annotations and object metadata. The project’s main goal is to build a pipeline for machine translation and annotation of Sumerian Ur III administrative texts. The rich and structured data is then to be made accessible in the form of (Linguistic) Linked Open Data (LLOD), which should open them to a larger research community. Our contribution is two-fold: in terms of language technology, our work represents the first attempt to develop an integrative infrastructure for the annotation of morphology and syntax on the basis of RDF technologies and LLOD resources. With respect to Assyriology, we work towards producing the first syntactically annotated corpus of Sumerian. Full article
(This article belongs to the Special Issue Towards the Multilingual Web of Data)
Show Figures

Figure 1

30 pages, 2555 KiB  
Article
Conversion of the English-Xhosa Dictionary for Nurses to a Linguistic Linked Data Framework
by Frances Gillis-Webber
Information 2018, 9(11), 274; https://doi.org/10.3390/info9110274 - 6 Nov 2018
Cited by 4 | Viewed by 4551
Abstract
The English-Xhosa Dictionary for Nurses (EXDN) is a bilingual, unidirectional printed dictionary in the public domain, with English and isiXhosa as the language pair. By extending the digitisation efforts of EXDN from a human-readable digital object to a machine-readable state, using Resource Description [...] Read more.
The English-Xhosa Dictionary for Nurses (EXDN) is a bilingual, unidirectional printed dictionary in the public domain, with English and isiXhosa as the language pair. By extending the digitisation efforts of EXDN from a human-readable digital object to a machine-readable state, using Resource Description Framework (RDF) as the data model, semantically interoperable structured data can be created, thus enabling EXDN’s data to be reused, aggregated and integrated with other language resources, where it can serve as a potential aid in the development of future language resources for isiXhosa, an under-resourced language in South Africa. The methodological guidelines for the construction of a Linguistic Linked Data framework (LLDF) for a lexicographic resource, as applied to EXDN, are described, where an LLDF can be defined as a framework: (1) which describes data in RDF, (2) using a model designed for the representation of linguistic information, (3) which adheres to Linked Data principles, and (4) which supports versioning, allowing for change. The result is a bidirectional lexicographic resource, previously bounded and static, now unbounded and evolving, with the ability to extend to multilingualism. Full article
(This article belongs to the Special Issue Towards the Multilingual Web of Data)
Show Figures

Figure 1

22 pages, 1304 KiB  
Article
Language-Agnostic Relation Extraction from Abstracts in Wikis
by Nicolas Heist, Sven Hertling and Heiko Paulheim
Information 2018, 9(4), 75; https://doi.org/10.3390/info9040075 - 29 Mar 2018
Cited by 15 | Viewed by 8100
Abstract
Large-scale knowledge graphs, such as DBpedia, Wikidata, or YAGO, can be enhanced by relation extraction from text, using the data in the knowledge graph as training data, i.e., using distant supervision. While most existing approaches use language-specific methods (usually for English), we [...] Read more.
Large-scale knowledge graphs, such as DBpedia, Wikidata, or YAGO, can be enhanced by relation extraction from text, using the data in the knowledge graph as training data, i.e., using distant supervision. While most existing approaches use language-specific methods (usually for English), we present a language-agnostic approach that exploits background knowledge from the graph instead of language-specific techniques and builds machine learning models only from language-independent features. We demonstrate the extraction of relations from Wikipedia abstracts, using the twelve largest language editions of Wikipedia. From those, we can extract 1.6 M new relations in DBpedia at a level of precision of 95%, using a RandomForest classifier trained only on language-independent features. We furthermore investigate the similarity of models for different languages and show an exemplary geographical breakdown of the information extracted. In a second series of experiments, we show how the approach can be transferred to DBkWik, a knowledge graph extracted from thousands of Wikis. We discuss the challenges and first results of extracting relations from a larger set of Wikis, using a less formalized knowledge graph. Full article
(This article belongs to the Special Issue Towards the Multilingual Web of Data)
Show Figures

Figure 1

13 pages, 1278 KiB  
Article
Multilingual and Multiword Phenomena in a lemon Old Occitan Medico-Botanical Lexicon
by Andrea Bellandi, Emiliano Giovannetti and Anja Weingart
Information 2018, 9(3), 52; https://doi.org/10.3390/info9030052 - 28 Feb 2018
Cited by 14 | Viewed by 5475
Abstract
This article illustrates the progresses made in representing a multilingual and multi-alphabetical Old Occitan medico-botanical lexicon in the context of the project Dictionnaire de Termes Médico-botaniques de l’Ancien Occitan (DiTMAO). The chosen lexical model of reference is lemon, which has been extended [...] Read more.
This article illustrates the progresses made in representing a multilingual and multi-alphabetical Old Occitan medico-botanical lexicon in the context of the project Dictionnaire de Termes Médico-botaniques de l’Ancien Occitan (DiTMAO). The chosen lexical model of reference is lemon, which has been extended accordingly to some specific linguistic and lexical features of the lexicon. In particular, issues and solutions about the modeling of multilingual and multiword phenomena are discussed, as the way they are managed through LexO, a web editor developed in the context of the project. Full article
(This article belongs to the Special Issue Towards the Multilingual Web of Data)
Show Figures

Figure 1

Back to TopTop