Enabling effective and efficient Content-Based Image Re- trieval (CBIR) on Very Large Digital Libraries (VLDLs), is today an important research issue. While there exist well-known approaches for information retrieval on textual content for VLDLs, the research for an effective CBIR method that is also able to scale to very large collections is still open. A practical effect of this situation is that most of the image retrieval services currently available for VLDLs are based only on tex- tual metadata. In this paper, we report on our experience in creating a collection of 106 million images, i.e., the CoPhIR collection, the largest currently available to the scientific community for research purposes.We discuss the various issues arising from working with a such large col- lection and dealing with a complex retrieval model on information-rich features. We present the non-trivial process of image crawling and de- scriptive feature extraction, using the European EGEE computer GRID. The feature extraction phase is often ignored when discussing the scala- bility issue while, as we show in this work, it could be one of the toughest issues to be solved in order to make CBIR feasible on VLDLs
Enabling content-based image retrieval in very large digital libraries
Lucchese C;Perego R;Bolettieri P;Esuli A;Falchi F;Rabitti F
2009
Abstract
Enabling effective and efficient Content-Based Image Re- trieval (CBIR) on Very Large Digital Libraries (VLDLs), is today an important research issue. While there exist well-known approaches for information retrieval on textual content for VLDLs, the research for an effective CBIR method that is also able to scale to very large collections is still open. A practical effect of this situation is that most of the image retrieval services currently available for VLDLs are based only on tex- tual metadata. In this paper, we report on our experience in creating a collection of 106 million images, i.e., the CoPhIR collection, the largest currently available to the scientific community for research purposes.We discuss the various issues arising from working with a such large col- lection and dealing with a complex retrieval model on information-rich features. We present the non-trivial process of image crawling and de- scriptive feature extraction, using the European EGEE computer GRID. The feature extraction phase is often ignored when discussing the scala- bility issue while, as we show in this work, it could be one of the toughest issues to be solved in order to make CBIR feasible on VLDLs| File | Dimensione | Formato | |
|---|---|---|---|
|
prod_91973-doc_21104.pdf
solo utenti autorizzati
Descrizione: Enabling content-based image retrieval in very large digital libraries
Tipologia:
Versione Editoriale (PDF)
Dimensione
195.96 kB
Formato
Adobe PDF
|
195.96 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
|
prod_91973-doc_36759.pdf
solo utenti autorizzati
Descrizione: copertina e prefazione atti
Tipologia:
Versione Editoriale (PDF)
Dimensione
155.87 kB
Formato
Adobe PDF
|
155.87 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


