The amount of paper documents that need to be digitalized is huge. It is useful to have a system to capture, search and retrieve them on-line in a simple way. In this paper we present an acquisition and information retrieval system based on Zope/Plone, that allows a quick definition of customized data type and an easy management of the storage of digitalized documents. Extending Archetypes, it is possible to obtain the relevant interfaces and dynamic validations that allow multiple users to input such documents in a simple and quick way. In addition, the python client, which has been designed to work on HTTP/HTTPS, automates the acquisition phases and the delivery of the data to the server. Making the storing of data independent from the ZODB (the limit of which is highlighted by our benchmarks) and making it be dependent just on the transactional file-systems and on the Postgresql DBMS, it is possible to support a good scalability even for millions of documents and for hundreds of GigaBytes of images. The architecture is fully compliant with web standards and with its design principles. The approach of this paper is applied to a real case study regarding the acquisition, search and retrieval of millions of paper documents belonging to the Italian Registry of the ".it" ccTLD, managed by IIT-CNR.

An acquisition, search and retrieval system based on Zope/Plone

C Lucchesi;M Martinelli;G Vasarelli
2004

Abstract

The amount of paper documents that need to be digitalized is huge. It is useful to have a system to capture, search and retrieve them on-line in a simple way. In this paper we present an acquisition and information retrieval system based on Zope/Plone, that allows a quick definition of customized data type and an easy management of the storage of digitalized documents. Extending Archetypes, it is possible to obtain the relevant interfaces and dynamic validations that allow multiple users to input such documents in a simple and quick way. In addition, the python client, which has been designed to work on HTTP/HTTPS, automates the acquisition phases and the delivery of the data to the server. Making the storing of data independent from the ZODB (the limit of which is highlighted by our benchmarks) and making it be dependent just on the transactional file-systems and on the Postgresql DBMS, it is possible to support a good scalability even for millions of documents and for hundreds of GigaBytes of images. The architecture is fully compliant with web standards and with its design principles. The approach of this paper is applied to a real case study regarding the acquisition, search and retrieval of millions of paper documents belonging to the Italian Registry of the ".it" ccTLD, managed by IIT-CNR.
2004
Istituto di informatica e telematica - IIT
Content-management-system (CMS)
Plone
Workflow
Html
Css
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/141742
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact