Memorias de investigación
Otras publicaciones:
The Influence of the Order of Adding Documents to Datasets on Terminological Saturation
Año:2018

Áreas de investigación
  • Ciencias de la computación y tecnología informática

Datos
Descripción
This document reports the results of our experimental study aimed to find out the impact of different orders of adding documents to datasets for measuring terminological saturation. The motivation for this research activity lies in the fact that real world document collections are retrospective. So, terminological drift in time is often present in such collections. We empirically investigated the proper ways to cope with this temporal drift and its influence on terminological saturation. Our premise was that there could be several different orders of adding documents to the processed datasets, dealing with the time of publication: (i) chronological; (ii) reversed-chronological; (iii) bi-directional; and (iv) random. Experiments were performed using three different real world document collections coming from different domains, where the collections of high-quality documents were available as scientific papers. In the presence of different levels of noise it has also been checked if different orders are differently sensitive in detecting excessive noise. Based on the comparison of experimental results, we recommended that the reversed-chronological order of adding documents to datasets is preferrable as it demonstrated the most balanced performance
Internacional
Si
Entidad
Lugar
Páginas
Referencia/URL
https://www.researchgate.net/publication/322626440_The_Influence_of_the_Order_of_Adding_Documents_to_Datasets_on_Terminological_Saturation
Tipo de publicación
Technical Report TS-RTDC-TR-2018-1

Esta actividad pertenece a memorias de investigación

Participantes

Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Ontology Engineering Group
  • Departamento: Inteligencia Artificial