Observatorio de I+D+i UPM

| Otras actividades
HOME

Proyectos Internacionales Art�culos Patentes UPM Software UPM Empresas UPM Otras actividades Memorias de investigaci�n

Memorias de investigación

Art�culos en revistas:

First-Order Logic Rule Induction for Information Extraction in Web Resources

A�o:2012

�reas de investigaci�n

Telem�tica

Datos

Descripci�n
Information extraction on web pages, commonly known as screen scraping, is usually performed through wrapper induction, a technique that is based on the internal structure of HTML documents. As such, the main limitation of these kinds of techniques is that a generated wrapper is only useful for the web page it was designed for. To overcome this, we have designed a system that generates ?rst-order logic rules that can be used to extract data from web pages. These rules are based on visual features such as font size, elements positioning or types of contents. Thus, they do not depend on a document structure, and can be applied on dierent sites. The system has been evaluated on a set of web pages, which has served to identify several design patterns used across the Web.
Internacional	Si
JCR del ISI	Si
T�tulo de la revista	International Journal of Artificial Intelligence Tools
ISSN	0218-2130
Factor de impacto JCR	0,217
Informaci�n de impacto	2011 JCR Science Edition
Volumen	21
DOI	10.1142/S0218213012500327
N�mero de revista	6
Desde la p�gina	1250032-1
Hasta la p�gina	1250032-20
Mes	DICIEMBRE
Ranking	Q4 105/111 en COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Esta actividad pertenece a memorias de investigaci�n

Participantes

Autor: Jose Ignacio Fernandez Villamor UPM
Autor: Carlos Angel Iglesias Fernandez UPM
Autor: Mercedes Garijo Ayestaran UPM

Grupos de investigaci�n, Departamentos, Centros e Institutos de I+D+i relacionados

Creador: Grupo de Investigaci�n: Grupo de Sistemas Inteligentes
Departamento: Ingenier�a de Sistemas Telem�ticos