Descripción
|
|
---|---|
Language resources are a cornerstone of linguistic research and for the development of natural language processing tools, but the discovery of relevant resources remains a challenging task. This is due to the fact that relevant metadata records are spread among different repositories and it is currently impossible to query all these repositories in an integrated fashion, as they use different data models and vocabularies. In this paper we present a first attempt to collect and harmonize the metadata of different repositories, thus making them queriable and browsable in an integrated way. We make use of RDF and linked data technologies for this and provide a first level of harmonization of the vocabularies used in the different resources by mapping them to standard RDF vocabularies including Dublin Core and DCAT. Further, we present an approach that relies on NLP and in particular word sense disambiguation techniques to harmonize resources by mapping values of attributes ? such as the type, license or intended use of a resource ? into normalized values. Finally, as there are duplicate entries within the same repository as well as across different repositories, we also report results of detection of these duplicates. | |
Internacional
|
Si |
Nombre congreso
|
4th Workshop on Linked Data in Linguistics (LDL'15) at ACL-IJCNLP 2015 |
Tipo de participación
|
960 |
Lugar del congreso
|
Beijing, China |
Revisores
|
Si |
ISBN o ISSN
|
978-1-941643-57-0 |
DOI
|
|
Fecha inicio congreso
|
31/07/2015 |
Fecha fin congreso
|
31/07/2015 |
Desde la página
|
39 |
Hasta la página
|
48 |
Título de las actas
|
Proc. of 4th Workshop on Linked Data in Linguistics (LDL'15) at ACL-IJCNLP 2015 |