Descripción
|
|
---|---|
Information extraction on web pages, commonly known as screen scraping, is usually performed through wrapper induction, a technique that is based on the internal structure of HTML documents. As such, the main limitation of these kinds of techniques is that a generated wrapper is only useful for the web page it was designed for. To overcome this, we have designed a system that generates ?rst-order logic rules that can be used to extract data from web pages. These rules are based on visual features such as font size, elements positioning or types of contents. Thus, they do not depend on a document structure, and can be applied on dierent sites. The system has been evaluated on a set of web pages, which has served to identify several design patterns used across the Web. | |
Internacional
|
Si |
JCR del ISI
|
Si |
Título de la revista
|
International Journal of Artificial Intelligence Tools |
ISSN
|
0218-2130 |
Factor de impacto JCR
|
0,217 |
Información de impacto
|
2011 JCR Science Edition |
Volumen
|
21 |
DOI
|
10.1142/S0218213012500327 |
Número de revista
|
6 |
Desde la página
|
1250032-1 |
Hasta la página
|
1250032-20 |
Mes
|
DICIEMBRE |
Ranking
|
Q4 105/111 en COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE |