TY - CONF T1 - Image Retrieval in DLs - A Large Scale Multicollection Experimentation T2 - IFLA News Media Section, Dresden Y1 - 2017 A1 - Jean-Philippe Moreux A1 - Guillaume Chiron KW - automatic image classification KW - CBIR KW - data mining KW - deep learning KW - digital libraries KW - heritage documents KW - image retrieval KW - metadata KW - OCR AB -

While historically digital heritage libraries were first powered in image mode, they quickly took advantage of OCR technology to index printed collections and consequently improve the perimeter and performance of the information retrieval service offered to users. But the access to iconographic resources has not progressed in the same way, and the latter remain in the shadows: manual incomplete and heterogeneous indexation, data silos by iconographic genre. Today, however, it would be possible to make better use of these resources, especially by exploiting the enormous volumes of OCR produced during the last two decades, and thus valorize these engravings, drawings, photographs, maps, etc. for their own value but also as an attractive entry point into the collections, supporting the discovery and serenpidity from document to document and collection to collection. This article presents an ETL (extract-transform-load) approach to this need, that aims to: Identify and extract iconography wherever it may be found, in image collections but also in print (dailies, magazines, monographies); Transform, harmonize and enrich the descriptive metadata (in particular with automatic classification tools); Load it all into a web portal dedicated to iconographic research. The approach is pragmatically dual, since it involves leveraging existing digital resources and (virtually) on-the-shelf technologies.

JA - IFLA News Media Section, Dresden ER -