Titre | Image Retrieval in DLs - A Large Scale Multicollection Experimentation |
Type de publication | Article de colloque/conférence |
Année de publication | 2017 |
Auteurs | Jean-Philippe Moreux, Guillaume Chiron |
Nom du colloque | IFLA News Media Section, Dresden |
Date de la réunion | 2017/08/15 |
Mots clés | automatic image classification; CBIR; data mining; deep learning; digital libraries; heritage documents; image retrieval; metadata; OCR |
Résumé | While historically digital heritage libraries were first powered in image mode, they quickly took advantage of OCR technology to index printed collections and consequently improve the perimeter and performance of the information retrieval service offered to users. But the access to iconographic resources has not progressed in the same way, and the latter remain in the shadows: manual incomplete and heterogeneous indexation, data silos by iconographic genre. Today, however, it would be possible to make better use of these resources, especially by exploiting the enormous volumes of OCR produced during the last two decades, and thus valorize these engravings, drawings, photographs, maps, etc. for their own value but also as an attractive entry point into the collections, supporting the discovery and serenpidity from document to document and collection to collection. This article presents an ETL (extract-transform-load) approach to this need, that aims to: Identify and extract iconography wherever it may be found, in image collections but also in print (dailies, magazines, monographies); Transform, harmonize and enrich the descriptive metadata (in particular with automatic classification tools); Load it all into a web portal dedicated to iconographic research. The approach is pragmatically dual, since it involves leveraging existing digital resources and (virtually) on-the-shelf technologies. |
Champ de recherche:
image retrieval in dls a large scale multicollection experimentation ifla news media section dresden 20170815 pwhile historically digital heritage libraries were first powered in image mode they quickly took advantage of ocr technology to index printed collections and consequently improve the perimeter and performance of the information retrieval service offered to users but the access to iconographic resources has not progressed in the same way and the latter remain in the shadows manual incomplete and heterogeneous indexation data silos by iconographic genre today however it would be possible to make better use of these resources especially by exploiting the enormous volumes of ocr produced during the last two decades and thus valorize these engravings drawings photographs maps etc for their own value but also as an attractive entry point into the collections supporting the discovery and serenpidity from document to document and collection to collection this article presents an etl extracttransformload approach to this need that aims to identify and extract iconography wherever it may be found in image collections but also in print dailies magazines monographies transform harmonize and enrich the descriptive metadata in particular with automatic classification tools load it all into a web portal dedicated to iconographic research the approach is pragmatically dual since it involves leveraging existing digital resources and virtually ontheshelf technologiesp jeanphilippe moreux guillaume chiron automatic image classification cbir data mining deep learning digital libraries heritage documents image retrieval metadata ocr