Corpus Statistics with R

We are delighted to offer a tutorial on Corpus Statistics with R on the pre-conference day (Tuesday, 8 October 2019) at KONVENS 2019 in Erlangen. The full-day tutorial will begin with a general introduction to R and its use for corpus linguistic research in the morning, followed by advanced statistical methods in the afternoon. Participants who are already familiar with R and corpus analysis are welcome to attend the afternoon session only.

The tutorial takes place in room KH1.012.


9:30 – 11:00 Working with R & RStudio
coffee break
11:30 – 13:00 Analyzing corpus frequency data
lunch break
14:30 – 16:00 Linear and mixed effects models
coffee break
16:30 – 18:00 Time series analysis


Please register for KONVENS and select the statistics tutorial option. The tutorial is already included in the main conference fee. Alternatively, you can register only for the workshop/tutorial day at a reduced rate.


The tutorial will be taught by Andreas Blombach, Philipp Heinrich, and Stefan Evert (FAU Erlangen-Nürnberg).


Session 1

Data and material for session 1: session-1.zip

Session 2

Data and material for session 2: session-2.zip

Extended versions of the slides and further materials can be found in the SIGIL Course (Unit 2) and the LREC tutorial.

Session 3

Data and material for session 3: session-3.zip

Session 4

Data and material for session 4: session-4.zip

Before you arrive

All participants are asked to bring their own laptop with reasonably up-to-date versions of R and RStudio. If you're new to R and RStudio, the resources provided by Andy Field should get you started.

Please also install the following packages before arriving on site:

  • tidyverse
  • data.table
  • R.utils
  • tibbletime
  • anomalize
  • Hmisc
  • caret
  • tseries
  • car
  • GGally
  • ggcorrplot
  • psych
  • lme4
  • MuMIn
  • dfoptim
  • zipfR (v0.6-66, also available here)
  • corpora

You can install all packages with the following command from within R:
install.packages(c("tidyverse", "data.table", "R.utils", "tibbletime", "anomalize", "Hmisc", "caret", "tseries", "car", "GGally", "ggcorrplot", "psych", "lme4", "MuMIn", "dfoptim", "zipfR", "corpora"))

If you encounter any problems, please write us an e-mail.