Observatorio de I+D+i UPM

Memorias de investigación
Communications at congresses:
Tagging Spanish Texts: the Problem of se
Year:2008
Research Areas
  • Artificial intelligence
Information
Abstract
Automatic tagging in Spanish has historically faced many problems because of some specific grammatical constructions. One of these traditional pitfalls is the ¿se¿ particle. This particle is a multifunctional and polysemous word used in many different contexts. Many taggers do not distinguish the possible uses of ¿se¿ and thus provide poor results at this point. In tune with the philosophy of free software, we have taken a free annotation tool as a basis, we have improved and enhanced its behaviour by adding new rules at different levels and by modifying certain parts in the code to allow for its possible implementation in other EAGLES-compliant tools. In this paper, we present the analysis carried out with different annotators for selecting the tool, the results obtained in all cases as well as the improvements added and the advantages of the modified tagger.
International
Si
Congress
Sixth International Conference on Language Resources and Evaluation (LREC 2008)
960
Place
Marrakech, Morocco
Reviewers
Si
ISBN/ISSN
2-9517408-4-0
Start Date
29/05/2008
End Date
29/05/2008
From page
11
To page
11
LREC 2008 Conference Abstracts
Participants
  • Participante: Javier Puche Alosete
  • Autor: Guadalupe Aguado De Cea (UPM)
  • Autor: José Ángel Ramos Gargantilla (UPM)
Research Group, Departaments and Institutes related
  • Creador: Grupo de Investigación: Ontology Engineering Group (LIA). Laboratorio Inteligencia Artificial. Grupo de Ingeniería Ontológica
  • Departamento: Inteligencia Artificial
  • Departamento: Lingüistica Aplicada a la ciencia y a la Tecnología
S2i 2019 Observatorio de investigación @ UPM con la colaboración del Consejo Social UPM
Cofinanciación del MINECO en el marco del Programa INNCIDE 2011 (OTR-2011-0236)
Cofinanciación del MINECO en el marco del Programa INNPACTO (IPT-020000-2010-22)