Paper
25 February 1999 Automatic generation of Web mining environments
Maurizio Cibelli, Gennaro Costagliola
Author Affiliations +
Abstract
The main problem related to the retrieval of information from the world wide web is the enormous number of unstructured documents and resources, i.e., the difficulty of locating and tracking appropriate sources. This paper presents a web mining environment (WME), which is capable of finding, extracting and structuring information related to a particular domain from web documents, using general purpose indices. The WME architecture includes a web engine filter (WEF), to sort and reduce the answer set returned by a web engine, a data source pre-processor (DSP), which processes html layout cues in order to collect and qualify page segments, and a heuristic-based information extraction system (HIES), to finally retrieve the required data. Furthermore, we present a web mining environment generator, WMEG, that allows naive users to generate a WME specific to a given domain by providing a set of specifications.
© (1999) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Maurizio Cibelli and Gennaro Costagliola "Automatic generation of Web mining environments", Proc. SPIE 3695, Data Mining and Knowledge Discovery: Theory, Tools, and Technology, (25 February 1999); https://doi.org/10.1117/12.339984
Lens.org Logo
CITATIONS
Cited by 3 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Mining

Databases

Digital signal processing

Prototyping

Internet

Precision measurement

Data mining

RELATED CONTENT

Research on the innovation and application of big data in...
Proceedings of SPIE (January 12 2023)
ETL based data integration scheduling
Proceedings of SPIE (January 12 2023)
Querying and browsing resources on the Internet
Proceedings of SPIE (September 30 1996)
Application of real time database to LAMOST control system
Proceedings of SPIE (September 15 2004)

Back to Top