Memorias de investigación
Ponencias en congresos:
A Knowledge Engineering Approach to Recognizing and Extracting Sequences of Nucleic Acids from Scientific Literature
Año:2010

Áreas de investigación
  • Ciencias de la computación y tecnología informática

Datos
Descripción
In this paper we present a knowledge engineering approach to automatically recognize and extract genetic sequences from scientific articles. To carry out this task, we use a preliminary recognizer based on a finite state machine to extract all candidate DNA/RNA sequences. The latter are then fed into a knowledge-based system that automatically discards false positives and refines noisy and incorrectly merged sequences. We created the knowledge base by manually analyzing different manuscripts containing genetic sequences. Our approach was evaluated using a test set of 211 full-text articles in PDF format containing 3134 genetic sequences. For such set, we achieved 87.76% precision and 97.70% recall respectively. This method can facilitate different research tasks. These include text mining, information extraction, and information retrieval research dealing with large collections of documents containing genetic sequences.
Internacional
Si
Nombre congreso
EMBC 2010 - IEEE EMBS Conference
Tipo de participación
960
Lugar del congreso
Buenos Aires, Argentina
Revisores
Si
ISBN o ISSN
978-1-4244-4124-2
DOI
Fecha inicio congreso
31/08/2010
Fecha fin congreso
04/09/2011
Desde la página
1081
Hasta la página
1084
Título de las actas
Merging Medical Humanism and Technology

Esta actividad pertenece a memorias de investigación

Participantes

Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Grupo de Informática Biomédica (LIA)
  • Centro o Instituto I+D+i: Centro de tecnología Biomédica CTB