Observatorio de I+D+i UPM

Memorias de investigación
Artículos en revistas:
A survey of stemming algorithms in information retrieval
Áreas de investigación
  • Ciencias de la computación y tecnología informática
During the last fifty years, improved information retrieval techniques have become necessary because of the huge amount of information people have available, which continues to increase rapidly due to the use of new technologies and the Internet. Stemming is one of the processes that can improve information retrieval in terms of accuracy and performance. This paper provides a detailed assessment of the current status of the stemming process framed in an information retrieval application field by tracing its historical evolution. Papers presenting the first approaches for stemming were reviewed to extract their main features, benefits and drawbacks. Additionally, papers dealing with stemmers for non-English languages or with some more recent proposals were also consulted and compiled. Finally, experimental papers defining the most well-known methods and metrics aimed at evaluating and classifying stemmers were also taken into account to expose their contributions and results. Even if not all researchers agree on the benefits and drawbacks of using stemming in an information retrieval process in general terms, many of them agree on its benefits in specific contexts, such as when the language is highly inflective, when documents are short or when there is limited space for storing data. Some researchers also state that the nature of the documents can influence the performance and the accuracy of the stemmer. Despite many researchers having investigated this field over many years, there are still some open questions, such as how to evaluate a stemmer independently of the information retrieval process, or how much a stemmer improves an information retrieval application in terms of speed. As a summary, some guidelines are also provided to help readers to determine which is the best stemmer for their needs and the tasks they have to carry out.
Título de la revista
Information Research-an International Electronic Journal
Factor de impacto JCR
Información de impacto
Datos JCR del año 2013
Número de revista
Desde la página
Hasta la página
Esta actividad pertenece a memorias de investigación
  • Autor: Cristian Moral Martos (UPM)
  • Autor: Angelica de Antonio Jimenez (UPM)
  • Autor: Ricardo Imbert Paredes (UPM)
  • Autor: Jaime Ramirez Rodriguez (UPM)
Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Departamento: Lenguajes y Sistemas Informáticos e Ingeniería de Software
S2i 2022 Observatorio de investigación @ UPM con la colaboración del Consejo Social UPM
Cofinanciación del MINECO en el marco del Programa INNCIDE 2011 (OTR-2011-0236)
Cofinanciación del MINECO en el marco del Programa INNPACTO (IPT-020000-2010-22)