The Web is the largest knowledge repository ever. In recent years there has been considerable interest in languages and approaches providing structured (eg XML) and semantic (eg Semantic Web) representation of Web content. However, most of the information available is still accessed via Web pages in HTML and documents in PDF, both of which have internal encoding conceived to present content on screen to human users. This makes automatic information extraction problematic.
Information Extraction from Presentation-Oriented Documents
Massimo Ruffolo;Ermelinda Oro
2012
Abstract
The Web is the largest knowledge repository ever. In recent years there has been considerable interest in languages and approaches providing structured (eg XML) and semantic (eg Semantic Web) representation of Web content. However, most of the information available is still accessed via Web pages in HTML and documents in PDF, both of which have internal encoding conceived to present content on screen to human users. This makes automatic information extraction problematic.File in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


