Titre | Prediction of Selection Decision of Document Using Bibliographic Data at the National Library of France (BnF) |
Type de publication | Chapitre de livre |
Année de publication | 2012 |
Auteurs | Ahmed Ben Salah, Geneviève Cron, Nicolas Ragot, Thierry Paquet |
Titre de l'ouvrage | Archiving 2012, Copenhague, 12-15 juin 2012 |
Pagination | 135–140 |
Editeur | Society for Imaging Sciences and Technology |
Ville | Copenhague |
ISBN | 978-0-89208-300-8 |
Résumé | The selection process of the documents is a very important step in mass digitization projects. This is especially true at the BnF, where the digitization should include or not OCRization depending on the OCR results expected. Consequently, the selection task is very complex and time consuming due to the number of documents to be processed and the diversity of the selection criteria to consider. Trying to improve and simplify this task by automation, we studied the relationship between bibliographic data and the selection decisions of documents. We used two statistical analysis: a factor analysis of correspondence and a multiple correspondence analysis. Our analysis has shown that, for example, the documents in format "4 or GR FOL" and edited "between 1961 and 1990" in Morocco are more likely to be "Selected". However, the documents in format "16 or 8" and edited "between 1871 and 1800" in English or Spanish have a greater chance to be "Not Selected". |
URL | http://www.imaging.org/IST/store/epub.cfm?abstrid=45326 |
Champ de recherche:
prediction of selection decision of document using bibliographic data at the national library of france bnf archiving 2012 copenhague 1215 juin 2012 pthe selection process of the documents is a very important step in mass digitization projects this is especially true at the bnf where the digitization should include or not ocrization depending on the ocr results expected consequently the selection task is very complex and time consuming due to the number of documents to be processed and the diversity of the selection criteria to consider trying to improve and simplify this task by automation we studied the relationship between bibliographic data and the selection decisions of documents we used two statistical analysis a factor analysis of correspondence and a multiple correspondence analysis our analysis has shown that for example the documents in format 4 or gr fol and edited between 1961 and 1990 in morocco are more likely to be selected however the documents in format 16 or 8 and edited between 1871 and 1800 in english or spanish have a greater chance to be not selectedp httpwwwimagingorgiststoreepubcfmabstrid45326 9780892083008 copenhague ahmed ben salah genevieve cron nicolas ragot thierry paquet