International Journal of Digital Curation | Vol.2, Issue.1 | 2017-06-01 | Pages
“The Naming of Cats”: Automated Genre Classification
This paper builds on the work presented at the ECDL 2006 in automated genre classification as a step toward automating metadata extraction from digital documents for ingest into digital repositories such as those run by archives, libraries and eprint services (Kim & Ross, 2006b). We have previously proposed dividing features of a document into five types (features for visual layout, language model features, stylometric features, features for semantic structure, and contextual features as an object linked to previously classified objects and other external sources) and have examined visual and language model features. The current paper compares results from testing classifiers based on image and stylometric features in a binary classification to show that certain genres have strong image features which enable effective separation of documents belonging to the genre from a large pool of other documents.
Original Text (This is the original text for your reference.)
“The Naming of Cats”: Automated Genre Classification
This paper builds on the work presented at the ECDL 2006 in automated genre classification as a step toward automating metadata extraction from digital documents for ingest into digital repositories such as those run by archives, libraries and eprint services (Kim & Ross, 2006b). We have previously proposed dividing features of a document into five types (features for visual layout, language model features, stylometric features, features for semantic structure, and contextual features as an object linked to previously classified objects and other external sources) and have examined visual and language model features. The current paper compares results from testing classifiers based on image and stylometric features in a binary classification to show that certain genres have strong image features which enable effective separation of documents belonging to the genre from a large pool of other documents.
+More
semantic structure eprint visual layout language model features classifiers contextual features libraries automated genre classification automating metadata extraction digital repositories
APA
MLA
Chicago
Seamus Ross,Yunhyong Kim,.“The Naming of Cats”: Automated Genre Classification. 2 (1),.
Select your report category*
Reason*
New sign-in location:
Last sign-in location:
Last sign-in date: