Book Chapter

Representativeness and Corpus Sampling

Details

Citation

Wiegand V (2026) Representativeness and Corpus Sampling. In: International Encyclopedia of Language and Linguistics. 3 ed. Elsevier, p. 283–289. https://doi.org/10.1016/b978-0-323-95504-1.01324-7

Abstract
Corpus compilation involves a range of decisions to ensure that the resulting corpus is as suitable as possible to address the intended research question(s). Accordingly, the corpus should contain language use that is typical of the language variety, registers, or discourses that it is meant to represent. The article introduces the concepts of ‘representativeness’ and ‘corpus sampling’ and outlines how different types of corpora raise distinct considerations. It also argues that representativeness and corpus sampling are particularly relevant principles that should inform the development of future large language models.

StatusPublished
Publication date31/12/2026
Publication date online30/06/2026
PublisherElsevier
ISBN9780443157851

People (1)

Dr Viola Wiegand

Dr Viola Wiegand

Lecturer in Education (TESOL), Education

Research centres/groups