Conference Paper (published)

Using Titles vs. Full-text as source for automated semantic document annotation

Details

Citation

Galke L, Mai F, Schelten A, Brunsch D & Scherp A (2017) Using Titles vs. Full-text as source for automated semantic document annotation. In: Proceedings of the Knowledge Capture Conference K-Cap 2017. Knowledge Capture Conference 2017, Austin, TX, USA, 04.12.2017-06.12.2017. New York: ACM, p. Article 20. https://doi.org/10.1145/3148011.3148039

Abstract
We conduct the first systematic comparison of automated semantic annotation based on either the full-text or only on the title metadata of documents. Apart from the prominent text classification baselines kNN and SVM, we also compare recent techniques of Learning to Rank and neural networks and revisit the traditional methods logistic regression, Rocchio, and Naive Bayes. Across three of our four datasets, the performance of the classifications using only titles reaches over 90% of the quality compared to the performance when using the full-text.

Keywords
Multi-label classification; document analysis; semantic annotation;

Journal
Proceedings of the Knowledge Capture Conference, K-CAP 2017

Status	Published
Funders	European Commission
Publication date	31/12/2017
URL	http://hdl.handle.net/1893/28018
Publisher	ACM
Place of publication	New York
ISBN	9781450355537
Conference	Knowledge Capture Conference 2017
Conference location	Austin, TX, USA
Dates	31/12/2017