Descripción
|
|
---|---|
This document reports the results of our experimental study aimed to find out the impact of different orders of adding documents to datasets for measuring terminological saturation. The motivation for this research activity lies in the fact that real world document collections are retrospective. So, terminological drift in time is often present in such collections. We empirically investigated the proper ways to cope with this temporal drift and its influence on terminological saturation. Our premise was that there could be several different orders of adding documents to the processed datasets, dealing with the time of publication: (i) chronological; (ii) reversed-chronological; (iii) bi-directional; and (iv) random. Experiments were performed using three different real world document collections coming from different domains, where the collections of high-quality documents were available as scientific papers. In the presence of different levels of noise it has also been checked if different orders are differently sensitive in detecting excessive noise. Based on the comparison of experimental results, we recommended that the reversed-chronological order of adding documents to datasets is preferrable as it demonstrated the most balanced performance | |
Internacional
|
Si |
Entidad
|
|
Lugar
|
|
Páginas
|
|
Referencia/URL
|
https://www.researchgate.net/publication/322626440_The_Influence_of_the_Order_of_Adding_Documents_to_Datasets_on_Terminological_Saturation |
Tipo de publicación
|
Technical Report TS-RTDC-TR-2018-1 |