TitreInnovative Approaches of Historical Newspapers: Data Mining, Data Visualization, Semantic Enrichment
Type de publicationArticle de colloque/conférence
Année de publication2016
AuteursJean-Philippe Moreux, Caroline Kageneck
Nom du colloqueIFLA News Media Section
Date de la réunion2016/08/11
OrganisateurIFLA
Lieu du colloqueLexington, USA
Mots clésdata mining; data visualisation; digital humanities; digital mediation; metadata; named entities recognition; OCR; OLR; semantic enrichment
Résumé

In this age of Big Data this paper describes how digital librairies can apply at large scale innovative approaches to better valorize and bring better experiences of old newspapers.On the first hand, the state-of-the-art OLR (optical layout recognition) technique in one of the largest heritage press digitization projects in Europe (Europeana Newspapers, www.europeana-newspapers.eu, 2012-2015) was used in a data mining experiment. Data analysis was applied to quantitative metadata derived from a 850K pages subset of six XIXth-XXth c. French newspaper titles from the BnF collection. The METS/ALTO XML data was analyzed with data mining and data visualization techniques that show promising ways for the production of knowledge about historical newspapers that are of great interest for library professionals (digitization programs management, curation and mediation of newspaper collections) and for end-users, particularly the digital humanities community.On the other hand, the Retronews web portal showcases how advanced semantic annotation techniques can improve the retrieval efficiency on a digital newspapers collection; thus the rediscovery and reappropriation of these documents by various types of users: teachers, students, researchers, general public.

Titre traduitApproches innovantes pour la presse patrimoniale : fouille de données, visualisation de données, enrichissement sémantique
Champ de recherche: 
innovative approaches of historical newspapers data mining data visualization semantic enrichment ifla news media section 20160811 pin this age of big data this paper describes how digital librairies can apply at large scale innovative approaches to better valorize and bring better experiences of old newspapersbron the first hand the stateoftheart olr optical layout recognition technique in one of the largest heritage press digitization projects in europe europeana newspapers wwweuropeananewspaperseu 20122015 was used in a data mining experiment data analysis was applied to quantitative metadata derived from a 850k pages subset of six xixthxxth c french newspaper titles from the bnf collection the metsalto xml data was analyzed with data mining and data visualization techniques that show promising ways for the production of knowledge about historical newspapers that are of great interest for library professionals digitization programs management curation and mediation of newspaper collections and for endusers particularly the digital humanities communitybron the other hand the retronews web portal showcases how advanced semantic annotation techniques can improve the retrieval efficiency on a digital newspapers collection thus the rediscovery and reappropriation of these documents by various types of users teachers students researchers general publicp lexington usa jeanphilippe moreux caroline kageneck data mining data visualisation digital humanities digital mediation metadata named entities recognition ocr olr semantic enrichment
Retour en haut de page