Enabling effective and efficient Content-Based Image Re- trieval (CBIR) on Very Large Digital Libraries (VLDLs), is today an important research issue. While there exist well-known approaches for information retrieval on textual content for VLDLs, the research for an effective CBIR method that is also able to scale to very large collections is still open. A practical effect of this situation is that most of the image retrieval services currently available for VLDLs are based only on tex- tual metadata. In this paper, we report on our experience in creating a collection of 106 million images, i.e., the CoPhIR collection, the largest currently available to the scientific community for research purposes.We discuss the various issues arising from working with a such large col- lection and dealing with a complex retrieval model on information-rich features. We present the non-trivial process of image crawling and de- scriptive feature extraction, using the European EGEE computer GRID. The feature extraction phase is often ignored when discussing the scala- bility issue while, as we show in this work, it could be one of the toughest issues to be solved in order to make CBIR feasible on VLDLs

Enabling content-based image retrieval in very large digital libraries

Lucchese C;Perego R;Bolettieri P;Esuli A;Falchi F;Rabitti F
2009

Abstract

Enabling effective and efficient Content-Based Image Re- trieval (CBIR) on Very Large Digital Libraries (VLDLs), is today an important research issue. While there exist well-known approaches for information retrieval on textual content for VLDLs, the research for an effective CBIR method that is also able to scale to very large collections is still open. A practical effect of this situation is that most of the image retrieval services currently available for VLDLs are based only on tex- tual metadata. In this paper, we report on our experience in creating a collection of 106 million images, i.e., the CoPhIR collection, the largest currently available to the scientific community for research purposes.We discuss the various issues arising from working with a such large col- lection and dealing with a complex retrieval model on information-rich features. We present the non-trivial process of image crawling and de- scriptive feature extraction, using the European EGEE computer GRID. The feature extraction phase is often ignored when discussing the scala- bility issue while, as we show in this work, it could be one of the toughest issues to be solved in order to make CBIR feasible on VLDLs
2009
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
9788888506852
Image similarity search
Image crawling
Descriptive feature extraction
File in questo prodotto:
File Dimensione Formato  
prod_91973-doc_21104.pdf

solo utenti autorizzati

Descrizione: Enabling content-based image retrieval in very large digital libraries
Tipologia: Versione Editoriale (PDF)
Dimensione 195.96 kB
Formato Adobe PDF
195.96 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
prod_91973-doc_36759.pdf

solo utenti autorizzati

Descrizione: copertina e prefazione atti
Tipologia: Versione Editoriale (PDF)
Dimensione 155.87 kB
Formato Adobe PDF
155.87 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/62321
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact