Descripción
|
|
---|---|
Assessing the completeness of a document collection, within a domain of interest, is a complicated task that requires substantial effort. Even if an automated technique is used, for example, terminology saturation measurement based on automated term extraction, run times grow quite quickly with the size of the input text. In this paper, we address this issue and propose an optimized approach based on partitioning the collection of documents in disjoint constituents and computing the required term candidate ranks (using the c-value method) independently with subsequent merge of the partial bags of extracted terms. It is proven in the paper that such an approach is formally correct ? the total c-values can be represented as the sums of the partial c-values. The approach is also validated experimentally and yields encouraging results in terms of the decrease of the necessary run time and straightforward parallelization without any loss in quality. | |
Internacional
|
Si |
Nombre congreso
|
15th International Conference on ICT in Education, Research and Industrial Applications. Integration, Harmonization and Knowledge Transfer |
Tipo de participación
|
960 |
Lugar del congreso
|
Kherson, Ucrania |
Revisores
|
Si |
ISBN o ISSN
|
1613-0073 |
DOI
|
|
Fecha inicio congreso
|
12/06/2019 |
Fecha fin congreso
|
15/06/2019 |
Desde la página
|
1 |
Hasta la página
|
16 |
Título de las actas
|
Proceedings of the 15th International Conference on ICT in Education, Research and Industrial Applications. Integration, Harmonization and Knowledge Transfer. Volume I: Main Conference |