Titre | Data Mining Historical Newspaper Metadata - Old News Teaches History |
Type de publication | Article de colloque/conférence |
Année de publication | 2016 |
Auteurs | Jean-Philippe Moreux |
Nom du colloque | IFLA News Media Section Conference |
Date de la réunion | 2016/04/21 |
Organisateur | Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky |
Lieu du colloque | Hamburg |
Mots clés | ALTO; data mining; data visualisation; digital libraries; metada; METS; OCR; OLR; press |
Résumé | In this age of Big Data this paper describes how the state-of-the-art OLR (optical layout recognition) technique in one of the largest heritage press digitization projects in Europe (www.europeana-newspapers.eu, 2012-2015) was used in a data mining experiment. Data analysis was applied to descriptive metadata (number of pages, articles, words, illustrations, ads…) derived from a subset of the Europeana Newspapers collection. The METS/ALTO XML data from a 850K page subset of six XIXth-XXth century French newspaper titles from the collection was analyzed with data mining and data visualization techniques that show promising ways for the production of knowledge about historical newspapers that are of great interest for digital libraries (digitization programs management, curation, and mediation of newspaper collections) as well as for the digital humanities. Equipped with basic tools widely used in libraries (XSL, spreadsheet, charts generator), we show that simple newspaper metadata can give insights into the history of the press and into history itself. |
URL | http://altomator.github.io/EN-data_mining/ |
Titre traduit | Fouiller les métadonnées de la presse ancienne |
Champ de recherche:
data mining historical newspaper metadata old news teaches history ifla news media section conference 20160421 pin this age of big data this paper describes how the stateoftheart olr optical layout recognition technique in one of the largest heritage press digitization projects in europe wwweuropeananewspaperseu 20122015 was used in a data mining experiment data analysis was applied to descriptive metadata number of pages articles words illustrations ads derived from a subset of the europeana newspapers collection the metsalto xml data from a 850k page subset of six xixthxxth century french newspaper titles from the collection was analyzed with data mining and data visualization techniques that show promising ways for the production of knowledge about historical newspapers that are of great interest for digital libraries digitization programs management curation and mediation of newspaper collections as well as for the digital humanities equipped with basic tools widely used in libraries xsl spreadsheet charts generator we show that simple newspaper metadata can give insights into the history of the press and into history itselfp httpaltomatorgithubioendatamining hamburg jeanphilippe moreux alto data mining data visualisation digital libraries metada mets ocr olr press