Memorias de investigación
Artículos en revistas:
First-Order Logic Rule Induction for Information Extraction in Web Resources
Año:2012

Áreas de investigación
  • Telemática

Datos
Descripción
Information extraction on web pages, commonly known as screen scraping, is usually performed through wrapper induction, a technique that is based on the internal structure of HTML documents. As such, the main limitation of these kinds of techniques is that a generated wrapper is only useful for the web page it was designed for. To overcome this, we have designed a system that generates ?rst-order logic rules that can be used to extract data from web pages. These rules are based on visual features such as font size, elements positioning or types of contents. Thus, they do not depend on a document structure, and can be applied on di erent sites. The system has been evaluated on a set of web pages, which has served to identify several design patterns used across the Web.
Internacional
Si
JCR del ISI
Si
Título de la revista
International Journal of Artificial Intelligence Tools
ISSN
0218-2130
Factor de impacto JCR
0,217
Información de impacto
2011 JCR Science Edition
Volumen
21
DOI
10.1142/S0218213012500327
Número de revista
6
Desde la página
1250032-1
Hasta la página
1250032-20
Mes
DICIEMBRE
Ranking
Q4 105/111 en COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Esta actividad pertenece a memorias de investigación

Participantes

Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Grupo de Sistemas Inteligentes
  • Departamento: Ingeniería de Sistemas Telemáticos