Corpus Linguistics

Module code: ED7023

In this module you will investigate issues relating to lexis (vocabulary) using the methods of corpus linguistics. A 'corpus' (plural form 'corpora') is a collection of naturally occurring texts that is stored on computer and carefully sampled to represent some variety of language, such as standard written American English in the 2000s, spoken British English in the 1980s, or short essays by L2 learners.

Increasingly corpora are revolutionising lexical studies, dictionaries and other vocabulary-related materials by offering a unique window on the frequency and use of language patterns by speakers from different backgrounds. Because the corpus is in electronic form, vocabulary items can be easily searched, counted and investigated using a wide array of tools. We will explore how these resources can be exploited so that language teachers, learners, and researchers can improve their understanding of the usage and meaning of words and phrases.

You will learn how to retrieve different types of lexical items from a study corpus and how to investigate the use of these items using a range of methods. Together we will explore the characteristics of the main published corpora and software available for applied English language/linguistics studies, and their strengths and weaknesses. We will also investigate lexical characteristics across different varieties of language such as genres or regional varieties, across different categories of speaker (such as age, sex or social class) and across different time periods.

Topics covered

Corpus benefits and limitations, corpus design and representativeness
Investigating (apparent) synonymy, collocations and phraseology
Annotating (labelling) lexical features in a corpus
Variation and change in usage
Corpus keywords
Compiling and analysing do-it-yourself corpora
Uses and implications of corpus-based lexical studies for English language teaching and applied linguistics