The limits of Big Data for analyzing reading



Rowberry S (2019) The limits of Big Data for analyzing reading. Participations, 16 (1), pp. 237-257.

Companies including Jellybooks and Amazon have introduced analytics to collect, analyze and monetize the user’s reading experience. Ebook apps and hardware collect implicit data about reading including progress and speed as well as encouraging readers to share more data through social networks. These practices generate large data sets with millions, if not billions of data points. For example, a copy of the King James Bible on the Kindle features over two million shared highlights. The allure of big data suggests that these metrics can be used at scale to gain a better understanding of how readers interact with books. While data collection practices continue to evolve, it is unclear how the metrics relate to the act of reading. For example, Kindle software tracks which words a reader looks up, but cannot distinguish between accidental look-ups, or otherwise link the act to the reader’s comprehension. In this article, I analyze patent filings and ebook software source code to assess the disconnect between data collection practices and the act of reading. The metrics capture data associated with software use rather than reading and therefore offer a poor approximation of the reading experience and must be corroborated by further data.

Reader Analytics; Amazon; Kindle; Ebooks; Big Data; Critical Code Studies; Patents

Participations: Volume 16, Issue 1

Publication date31/05/2019
Publication date online31/05/2019
Date accepted by journal07/03/2019
Publisher URL

People (1)


Dr Simon Rowberry

Dr Simon Rowberry

Lecturer, Communications, Media and Culture