Observatorio de I+D+i UPM

Memorias de investigación
Research Publications in journals:
A survey of stemming algorithms in information retrieval
Year:2014
Research Areas
  • Information technology and adata processing
Information
Abstract
During the last fifty years, improved information retrieval techniques have become necessary because of the huge amount of information people have available, which continues to increase rapidly due to the use of new technologies and the Internet. Stemming is one of the processes that can improve information retrieval in terms of accuracy and performance. This paper provides a detailed assessment of the current status of the stemming process framed in an information retrieval application field by tracing its historical evolution. Papers presenting the first approaches for stemming were reviewed to extract their main features, benefits and drawbacks. Additionally, papers dealing with stemmers for non-English languages or with some more recent proposals were also consulted and compiled. Finally, experimental papers defining the most well-known methods and metrics aimed at evaluating and classifying stemmers were also taken into account to expose their contributions and results. Even if not all researchers agree on the benefits and drawbacks of using stemming in an information retrieval process in general terms, many of them agree on its benefits in specific contexts, such as when the language is highly inflective, when documents are short or when there is limited space for storing data. Some researchers also state that the nature of the documents can influence the performance and the accuracy of the stemmer. Despite many researchers having investigated this field over many years, there are still some open questions, such as how to evaluate a stemmer independently of the information retrieval process, or how much a stemmer improves an information retrieval application in terms of speed. As a summary, some guidelines are also provided to help readers to determine which is the best stemmer for their needs and the tasks they have to carry out.
International
Si
JCR
Si
Title
Information Research-an International Electronic Journal
ISBN
1368-1613
Impact factor JCR
0,66
Impact info
Datos JCR del año 2013
Volume
19
Journal number
1
From page
0
To page
0
Month
MARZO
Ranking
Participants
  • Autor: Cristian Moral Martos (UPM)
  • Autor: Angelica de Antonio Jimenez (UPM)
  • Autor: Ricardo Imbert Paredes (UPM)
  • Autor: Jaime Ramirez Rodriguez (UPM)
Research Group, Departaments and Institutes related
  • Creador: Departamento: Lenguajes y Sistemas Informáticos e Ingeniería de Software
S2i 2019 Observatorio de investigación @ UPM con la colaboración del Consejo Social UPM
Cofinanciación del MINECO en el marco del Programa INNCIDE 2011 (OTR-2011-0236)
Cofinanciación del MINECO en el marco del Programa INNPACTO (IPT-020000-2010-22)