R as a calculator

Try entering some basic calculations!

Objects

To work with data, we use objects. Objects are stored in the current environment. Single values, data structures and functions are all objects.

To access an object, it needs a name. This name should start with a letter (after that, you’re free to use whatever combination of letters, numbers and some special characters (like dots or underscores) you like).

Execute the line above by placing the cursor somewhere in it and pressing CTRL+Enter (or by clicking “Run” in RStudio).

Notice that it will now appear in the environment.

You can now access the object’s value by entering its name:

## [1] 42

Create a few more objects. You can also assign the value of an existing object to a new object (thereby copying it), do calculations with objects and assign the result of these to objects. (What happens when you multiply an object by itself and assign the result to the same object?)

You can also see the objects in the current environment by executing this function:

## [1] "my_first_object"

To get rid of an object, use rm():

To get rid of all objects in the current environment:

Functions

Functions like these can be recognised by the round brackets.

A function has a name (the characters before the brackets) and may have different arguments (between the brackets), separated by commas. Arguments may be either optional or obligatory.

For example, the following function produces five random numbers between 1 and 10:

## [1]  9  5 10  7  4

To take a sample with replacement (allowing the same number to be drawn multiple times), we can specifiy the optional argument replace:

##  [1]  7  1  1  9  3  4 10  1 10  6  2

To look up what a function does and how it works, you can access the built-in documentation by typing ? followed by the function’s name: ?sample

If you enter a function’s arguments in the exact same order as seen in its documentation, you don’t need to specify the names of its arguments. If you do specify them, however, you are free to enter them in any order you want:

## [1] 64 46 43 38 56

Installing and loading packages

R already provides quite a lot of functions, but sooner or later, you’ll need some more …

A package is a collection of functions and/or data sets, usually for a certain range of applications (e.g. plotting, linear mixed-effects models, corpus analysis, …).

When packages are installed, they are stored locally (e.g. on a hard drive). The set of installed packages can be thought of as a library: if you need a certain package in your current session, you can check it out (thus activating it).

To install a package (or several, by providing a vector of package names): install.packages(“name_of_package”) install.packages(c(“package1”, “package2”, “package3”))

By default, dependencies are also installed (= packages which are required for your new package to work properly).

To activate an installed package: library(“name_of_package”) library(name_of_package)

(For whatever reason, quotation marks are optional in this case.)

You can also use RStudio to install, update, activate and deactivate packages.

A much more extensive tutorial (useful even for advanced users): https://www.datacamp.com/community/tutorials/rpackages-guide

We’ll need some packages later, so let’s activate them:

## -- Attaching packages ------------------------------------------------------------------------ tidyverse 1.2.1.9000 --
## v ggplot2 3.2.1     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts -------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## The following object is masked from 'package:purrr':
## 
##     transpose

Logical Operators

To check if values are equal, if one is greater than another etc., we need logical operators.

Are a and b equal?

## [1] FALSE

Is a greater than b?

## [1] TRUE

Also:

## [1] TRUE
## [1] FALSE
## [1] FALSE
## [1] TRUE

On its own, the exclamation mark (“not”) negates an expression

## [1] TRUE

We can use & (AND) and | (OR) to combine conditions:

## [1] TRUE
## [1] TRUE

If we want to know if only one of both sides is TRUE, we need XOR (excluding OR):

## [1] TRUE
## [1] FALSE

Side note: There’s also && und || which behave somewhat differently – you’ll need these for if statements.

Data types

There are lots of types of data in R. Luckily, we won’t need all of them.

These are the most important basic types:

  • numbers (integers and doubles)
  • logical values (TRUE and FALSE)
  • characters (= strings)

(Use typeof() to determine the basic type of objects.)

Data structures:

  • vectors
  • factors (ordered and unordered)
  • matrices
  • data sets (data.frame, tibble and data.table)
  • lists

Vectors

Probably the most import data structure in R, a vector contains elements of the same basic type (for different types, you’ll need lists).

## [1]  3.0  4.6 64.0 42.0

Vectors of characters/strings or logical values are also possible:

## [1] "colourless" "green"      "ideas"
## [1]  TRUE FALSE  TRUE  TRUE FALSE

To get the number of elements in a vector, use length():

## [1] 4

We can use square brackets to access elements of a vector:

## [1] 3
## [1] 4.6
## [1] "green"

We can also use vectors of numbers to access several elements:

## [1]  3.0  4.6 64.0
## [1] FALSE  TRUE  TRUE
## [1] "colourless" "ideas"

Vectors can be part of new vectors:

## [1]   3.0   4.6  64.0  42.0  48.0 120.0   5.0  32.0

To sort a vector, use sort():

## [1]   3.0   4.6   5.0  32.0  42.0  48.0  64.0 120.0
## [1] 120.0  64.0  48.0  42.0  32.0   5.0   4.6   3.0

Mathematical operators and many functions are vectorised which means that when applied to a vector, you get a vector in return:

## [1]   5.0   6.6  66.0  44.0  50.0 122.0   7.0  34.0
## [1]   9.0  13.8 192.0 126.0 144.0 360.0  15.0  96.0
## [1]  1.732051  2.144761  8.000000  6.480741  6.928203 10.954451  2.236068
## [8]  5.656854
## [1]   3   5  64  42  48 120   5  32
## [1]   3.0   4.6  64.0  42.0  48.0 120.0   5.0  32.0

R provides some useful methods to search inside of vectors:

Use %in% to check if a vector contains a certain value:

## [1] TRUE

which() returns the position(!) of elements that meet your conditions (by using logical operators, see above):

## [1] 4
## [1] 3 4 5 6

Since this is a vector itself, you can use it to access the elements of the original vector by their position:

## [1]  64  42  48 120

But this might be a little easier:

## [1]  64  42  48 120

You can also combine conditions:

## [1] 42 48

Data sets

We can import data in a number of ways. R prefers CSV files, but there are packages to read in other file formats (Excel, SPSS, JSON, etc.).

Reading in data

Those with some R experience probably already know read.table(), read.csv(), read.csv2() etc. Alternatively, you can read in a data set as a tibble which is a little faster. For really large files, the data.table package offers the function fread().

Which function is best suited to read in a specific file depends on the file format and the file’s formatting (field separator, decimal point, etc.). (Exception: fread(). fread() doesn’t care and usually figures this out by itself.)

For CSV files in European format (semicolon as field separator, comma as decimal point), use read_csv2():

## Using ',' as decimal and '.' as grouping mark. Use read_delim() for more control.
## Parsed with column specification:
## cols(
##   Lemma = col_character(),
##   s.Genitiv = col_double(),
##   es.Genitiv = col_double()
## )
## # A tibble: 17,512 x 3
##    Lemma       s.Genitiv es.Genitiv
##    <chr>           <dbl>      <dbl>
##  1 Leben            3761          0
##  2 Blog             2570          0
##  3 Internet         1847          0
##  4 Artikel          1757          0
##  5 Erachten         1666          0
##  6 Monat            1562          6
##  7 Spiel            1479        192
##  8 Wissen           1463          0
##  9 Unternehmen      1260          0
## 10 Film             1241        265
## # ... with 17,502 more rows

To specify data types for certain columns:

## Using ',' as decimal and '.' as grouping mark. Use read_delim() for more control.
## # A tibble: 17,512 x 3
##    Lemma       s.Genitiv es.Genitiv
##    <chr>           <int>      <int>
##  1 Leben            3761          0
##  2 Blog             2570          0
##  3 Internet         1847          0
##  4 Artikel          1757          0
##  5 Erachten         1666          0
##  6 Monat            1562          6
##  7 Spiel            1479        192
##  8 Wissen           1463          0
##  9 Unternehmen      1260          0
## 10 Film             1241        265
## # ... with 17,502 more rows

The first argument is a file path. Since the folder “data” is located in my current working directory, I don’t need to specify the full/absolute path.

Alternatively, you can use file.choose() to select a file: read_csv2(file.choose())

Try it out!

There’s also read_csv() for classic CSV files (comma as field separator, . as decimal point), read_tsv() for files with tab stops as field separators, and read_delim(), the parent function where you can specify everything yourself.

RStudio offers some options to read in files a little more comfortably: File -> Import Dataset

  • From Text (base)…: base R functions: read.table() etc.: data.frame
  • From Text (readr)…: tidyverse/readr style: tibble
  • From Excel…: Excel files (tibble)
  • From SPSS…: SPSS files (tibble)
  • From SAS…: SAS files (tibble)
  • From Stata…: Stata files (tibble)

Having selected fitting options to import your data and having clicked “Import”, you can see the R command on the console. You can then copy it to your script to speed up the process in the future.

Example: Opening an Excel file:

## # A tibble: 17,512 x 3
##    Lemma       s.Genitiv es.Genitiv
##    <chr>           <dbl>      <dbl>
##  1 Leben            3761          0
##  2 Blog             2570          0
##  3 Internet         1847          0
##  4 Artikel          1757          0
##  5 Erachten         1666          0
##  6 Monat            1562          6
##  7 Spiel            1479        192
##  8 Wissen           1463          0
##  9 Unternehmen      1260          0
## 10 Film             1241        265
## # ... with 17,502 more rows

If you’ve got your own data with you, now’s the time to try to open it! #### Accessing parts of a data set To access a column (usually a statistical variable), enter the data set’s name, followed by a Dollar sign and the name of the column. We get a vector of values (let’s not display all of them by using head()):

## [1] "Leben"    "Blog"     "Internet" "Artikel"  "Erachten" "Monat"
## [1] 3761 2570 1847 1757 1666 1562

Just as with vectors, you can use square brackets to subset a data set. You just have to provide two values: row and column.

## # A tibble: 1 x 1
##   es.Genitiv
##        <dbl>
## 1          0
## # A tibble: 1 x 2
##   Lemma    es.Genitiv
##   <chr>         <dbl>
## 1 Internet          0
## # A tibble: 1 x 3
##   Lemma    s.Genitiv es.Genitiv
##   <chr>        <dbl>      <dbl>
## 1 Internet      1847          0
## # A tibble: 17,512 x 1
##    es.Genitiv
##         <dbl>
##  1          0
##  2          0
##  3          0
##  4          0
##  5          0
##  6          6
##  7        192
##  8          0
##  9          0
## 10        265
## # ... with 17,502 more rows

To select certain columns, select() is also useful:

## # A tibble: 17,512 x 3
##    es.Genitiv s.Genitiv Lemma      
##         <dbl>     <dbl> <chr>      
##  1          0      3761 Leben      
##  2          0      2570 Blog       
##  3          0      1847 Internet   
##  4          0      1757 Artikel    
##  5          0      1666 Erachten   
##  6          6      1562 Monat      
##  7        192      1479 Spiel      
##  8          0      1463 Wissen     
##  9          0      1260 Unternehmen
## 10        265      1241 Film       
## # ... with 17,502 more rows
## # A tibble: 17,512 x 2
##    Lemma       s.Genitiv
##    <chr>           <dbl>
##  1 Leben            3761
##  2 Blog             2570
##  3 Internet         1847
##  4 Artikel          1757
##  5 Erachten         1666
##  6 Monat            1562
##  7 Spiel            1479
##  8 Wissen           1463
##  9 Unternehmen      1260
## 10 Film             1241
## # ... with 17,502 more rows

You can also rename variables:

## # A tibble: 17,512 x 3
##    Lemma       s_genitive es_genitive
##    <chr>            <dbl>       <dbl>
##  1 Leben             3761           0
##  2 Blog              2570           0
##  3 Internet          1847           0
##  4 Artikel           1757           0
##  5 Erachten          1666           0
##  6 Monat             1562           6
##  7 Spiel             1479         192
##  8 Wissen            1463           0
##  9 Unternehmen       1260           0
## 10 Film              1241         265
## # ... with 17,502 more rows

If you just want to rename a column while keeping all other columns, rename() might be more practical:

## # A tibble: 17,512 x 3
##    Lemma       s_genitive es.Genitiv
##    <chr>            <dbl>      <dbl>
##  1 Leben             3761          0
##  2 Blog              2570          0
##  3 Internet          1847          0
##  4 Artikel           1757          0
##  5 Erachten          1666          0
##  6 Monat             1562          6
##  7 Spiel             1479        192
##  8 Wissen            1463          0
##  9 Unternehmen       1260          0
## 10 Film              1241        265
## # ... with 17,502 more rows

select() is also useful to change the order of columns:

## # A tibble: 17,512 x 3
##    s.Genitiv Lemma       es.Genitiv
##        <dbl> <chr>            <dbl>
##  1      3761 Leben                0
##  2      2570 Blog                 0
##  3      1847 Internet             0
##  4      1757 Artikel              0
##  5      1666 Erachten             0
##  6      1562 Monat                6
##  7      1479 Spiel              192
##  8      1463 Wissen               0
##  9      1260 Unternehmen          0
## 10      1241 Film               265
## # ... with 17,502 more rows

Filtering data sets

You’ll often want to get parts of a data set not according to their position, but according to certain conditions which must be fulfilled. That’s what filter() is for.

gen_blogs has 17512 rows – let’s just use the lemmas which appear at least five times in any form (arbitrary choice):

## # A tibble: 4,360 x 3
##    Lemma       s.Genitiv es.Genitiv
##    <chr>           <dbl>      <dbl>
##  1 Leben            3761          0
##  2 Blog             2570          0
##  3 Internet         1847          0
##  4 Artikel          1757          0
##  5 Erachten         1666          0
##  6 Monat            1562          6
##  7 Spiel            1479        192
##  8 Wissen           1463          0
##  9 Unternehmen      1260          0
## 10 Film             1241        265
## # ... with 4,350 more rows

If several conditions have to be fulfilled, they can be separated by commas:

## # A tibble: 16 x 3
##    Lemma      s.Genitiv es.Genitiv
##    <chr>          <dbl>      <dbl>
##  1 Spiel           1479        192
##  2 Film            1241        265
##  3 Projekt         1215        725
##  4 Beitrag          757        281
##  5 Licht            406        128
##  6 Begriff          376        171
##  7 Vortrag          307        135
##  8 Gerät            242        366
##  9 Netzwerk         181        104
## 10 Bundestag        164        367
## 11 Produkt          136        189
## 12 Werk             133        348
## 13 Vertrag          126        189
## 14 Verlag           125        124
## 15 Widerstand       122        106
## 16 Protest          107        110

Logical AND works the same way:

## # A tibble: 16 x 3
##    Lemma      s.Genitiv es.Genitiv
##    <chr>          <dbl>      <dbl>
##  1 Spiel           1479        192
##  2 Film            1241        265
##  3 Projekt         1215        725
##  4 Beitrag          757        281
##  5 Licht            406        128
##  6 Begriff          376        171
##  7 Vortrag          307        135
##  8 Gerät            242        366
##  9 Netzwerk         181        104
## 10 Bundestag        164        367
## 11 Produkt          136        189
## 12 Werk             133        348
## 13 Vertrag          126        189
## 14 Verlag           125        124
## 15 Widerstand       122        106
## 16 Protest          107        110

Try to …

  • select all rows where s.Genitiv is exactly 100
  • select all rows where the lemma is not “Äußer”
  • select all rows where es.Genitiv is between 100 und 200

Besides “Äußer”, there are some other lemmas in the data set which shouldn’t be in there.
Let’s throw them out by using %in%:

More cleanup, using string functions and regular expressions:

Words ending in -nis have been improperly lemmatised (-niss):

##  [1] "Bündniss"                  "Ereigniss"                
##  [3] "Verhältniss"               "Ergebniss"                
##  [5] "Verständniss"              "Aktionsbündniss"          
##  [7] "Gedächtniss"               "Selbstverständniss"       
##  [9] "Verzeichniss"              "Wahlergebniss"            
## [11] "Gefängniss"                "Bedürfniss"               
## [13] "Arbeitsverhältniss"        "Bekenntniss"              
## [15] "Geheimniss"                "Wahlgeheimniss"           
## [17] "Beschäftigungsverhältniss" "Geständniss"              
## [19] "Missverständniss"          "Bankgeheimniss"           
## [21] "Kapitalverhältniss"        "Erlebniss"                
## [23] "Unverständniss"            "Briefgeheimniss"          
## [25] "Presseerzeugniss"          "Fernmeldegeheimniss"      
## [27] "Vertragsverhältniss"       "Einverständniss"          
## [29] "Gleichniss"                "Inhaltsverzeichniss"      
## [31] "Mietverhältniss"           "Arbeitsgedächtniss"       
## [33] "Begräbniss"                "Jahrhundertereigniss"     
## [35] "Textverständniss"          "Untersuchungsergebniss"   
## [37] "Verhängniss"               "Ärgerniss"

There are also very few lemmas with non-alphanumerical characters at the end:

Adding columns

If you want to add a column to an existing data.frame, tibble or data.table, the vector needs to have the same length as the other columns.

There are quite a few ways to do this. Imho, the easiest one is this:

## # A tibble: 4,354 x 4
##    Lemma       s.Genitiv es.Genitiv Length
##    <chr>           <dbl>      <dbl>  <int>
##  1 Leben            3761          0      5
##  2 Blog             2570          0      4
##  3 Internet         1847          0      8
##  4 Artikel          1757          0      7
##  5 Erachten         1666          0      8
##  6 Monat            1562          6      5
##  7 Spiel            1479        192      5
##  8 Wissen           1463          0      6
##  9 Unternehmen      1260          0     11
## 10 Film             1241        265      4
## # ... with 4,344 more rows

mutate() can be used to add several columns at once, to change existing columns, and to do calculations with columns:

## # A tibble: 4,354 x 6
##    Lemma       s.Genitiv es.Genitiv Length Total Frac_es
##    <chr>           <dbl>      <dbl>  <int> <dbl>   <dbl>
##  1 Leben            3761          0      5  3761    0   
##  2 Blog             2570          0      4  2570    0   
##  3 Internet         1847          0      8  1847    0   
##  4 Artikel          1757          0      7  1757    0   
##  5 Erachten         1666          0      8  1666    0   
##  6 Monat            1562          6      5  1568    0   
##  7 Spiel            1479        192      5  1671    0.11
##  8 Wissen           1463          0      6  1463    0   
##  9 Unternehmen      1260          0     11  1260    0   
## 10 Film             1241        265      4  1506    0.18
## # ... with 4,344 more rows

optional step: new column with the number of syllables

## Loading required package: sylly
## Hyphenation (language: de)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |                                                                 |   1%
  |                                                                       
  |=                                                                |   1%
  |                                                                       
  |=                                                                |   2%
  |                                                                       
  |==                                                               |   2%
  |                                                                       
  |==                                                               |   3%
  |                                                                       
  |==                                                               |   4%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |====                                                             |   5%
  |                                                                       
  |====                                                             |   6%
  |                                                                       
  |====                                                             |   7%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |======                                                           |   8%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=======                                                          |  11%
  |                                                                       
  |=======                                                          |  12%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |========                                                         |  13%
  |                                                                       
  |=========                                                        |  13%
  |                                                                       
  |=========                                                        |  14%
  |                                                                       
  |=========                                                        |  15%
  |                                                                       
  |==========                                                       |  15%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |===========                                                      |  16%
  |                                                                       
  |===========                                                      |  17%
  |                                                                       
  |===========                                                      |  18%
  |                                                                       
  |============                                                     |  18%
  |                                                                       
  |============                                                     |  19%
  |                                                                       
  |=============                                                    |  19%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |=============                                                    |  21%
  |                                                                       
  |==============                                                   |  21%
  |                                                                       
  |==============                                                   |  22%
  |                                                                       
  |===============                                                  |  22%
  |                                                                       
  |===============                                                  |  23%
  |                                                                       
  |===============                                                  |  24%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |=================                                                |  25%
  |                                                                       
  |=================                                                |  26%
  |                                                                       
  |=================                                                |  27%
  |                                                                       
  |==================                                               |  27%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |===================                                              |  28%
  |                                                                       
  |===================                                              |  29%
  |                                                                       
  |===================                                              |  30%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |====================                                             |  31%
  |                                                                       
  |====================                                             |  32%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=====================                                            |  33%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |======================                                           |  34%
  |                                                                       
  |======================                                           |  35%
  |                                                                       
  |=======================                                          |  35%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |========================                                         |  36%
  |                                                                       
  |========================                                         |  37%
  |                                                                       
  |========================                                         |  38%
  |                                                                       
  |=========================                                        |  38%
  |                                                                       
  |=========================                                        |  39%
  |                                                                       
  |==========================                                       |  39%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |==========================                                       |  41%
  |                                                                       
  |===========================                                      |  41%
  |                                                                       
  |===========================                                      |  42%
  |                                                                       
  |============================                                     |  42%
  |                                                                       
  |============================                                     |  43%
  |                                                                       
  |============================                                     |  44%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |=============================                                    |  45%
  |                                                                       
  |==============================                                   |  45%
  |                                                                       
  |==============================                                   |  46%
  |                                                                       
  |==============================                                   |  47%
  |                                                                       
  |===============================                                  |  47%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |================================                                 |  48%
  |                                                                       
  |================================                                 |  49%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================                                |  50%
  |                                                                       
  |=================================                                |  51%
  |                                                                       
  |=================================                                |  52%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |==================================                               |  53%
  |                                                                       
  |===================================                              |  53%
  |                                                                       
  |===================================                              |  54%
  |                                                                       
  |===================================                              |  55%
  |                                                                       
  |====================================                             |  55%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |=====================================                            |  57%
  |                                                                       
  |=====================================                            |  58%
  |                                                                       
  |======================================                           |  58%
  |                                                                       
  |======================================                           |  59%
  |                                                                       
  |=======================================                          |  59%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |=======================================                          |  61%
  |                                                                       
  |========================================                         |  61%
  |                                                                       
  |========================================                         |  62%
  |                                                                       
  |=========================================                        |  62%
  |                                                                       
  |=========================================                        |  63%
  |                                                                       
  |=========================================                        |  64%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |==========================================                       |  65%
  |                                                                       
  |===========================================                      |  65%
  |                                                                       
  |===========================================                      |  66%
  |                                                                       
  |===========================================                      |  67%
  |                                                                       
  |============================================                     |  67%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |=============================================                    |  68%
  |                                                                       
  |=============================================                    |  69%
  |                                                                       
  |=============================================                    |  70%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |==============================================                   |  71%
  |                                                                       
  |==============================================                   |  72%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |===============================================                  |  73%
  |                                                                       
  |================================================                 |  73%
  |                                                                       
  |================================================                 |  74%
  |                                                                       
  |================================================                 |  75%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |==================================================               |  76%
  |                                                                       
  |==================================================               |  77%
  |                                                                       
  |==================================================               |  78%
  |                                                                       
  |===================================================              |  78%
  |                                                                       
  |===================================================              |  79%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |====================================================             |  81%
  |                                                                       
  |=====================================================            |  81%
  |                                                                       
  |=====================================================            |  82%
  |                                                                       
  |======================================================           |  82%
  |                                                                       
  |======================================================           |  83%
  |                                                                       
  |======================================================           |  84%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=======================================================          |  85%
  |                                                                       
  |========================================================         |  85%
  |                                                                       
  |========================================================         |  86%
  |                                                                       
  |========================================================         |  87%
  |                                                                       
  |=========================================================        |  87%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |==========================================================       |  88%
  |                                                                       
  |==========================================================       |  89%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |===========================================================      |  90%
  |                                                                       
  |===========================================================      |  91%
  |                                                                       
  |===========================================================      |  92%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |============================================================     |  93%
  |                                                                       
  |=============================================================    |  93%
  |                                                                       
  |=============================================================    |  94%
  |                                                                       
  |=============================================================    |  95%
  |                                                                       
  |==============================================================   |  95%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |===============================================================  |  96%
  |                                                                       
  |===============================================================  |  97%
  |                                                                       
  |===============================================================  |  98%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |================================================================ |  99%
  |                                                                       
  |=================================================================|  99%
  |                                                                       
  |=================================================================| 100%

Sorting

Use arrange() to change the order of rows:

## # A tibble: 4,354 x 7
##    Lemma   s.Genitiv es.Genitiv Length Total Frac_es Syllables
##    <chr>       <dbl>      <dbl>  <int> <dbl>   <dbl>     <dbl>
##  1 Jahr            6       4378      4  4384    1            1
##  2 Tag            43       3401      3  3444    0.99         1
##  3 Land            7       2659      4  2666    1            1
##  4 Buch            0       1669      4  1669    1            1
##  5 Staat          48       1585      5  1633    0.97         1
##  6 Wort           31       1062      4  1093    0.97         1
##  7 Text           43        970      4  1013    0.96         1
##  8 Kind            3        896      4   899    1            1
##  9 Volk           29        879      4   908    0.97         1
## 10 Projekt      1215        725      7  1940    0.37         2
## # ... with 4,344 more rows

desc() to sort in descending order

You can also sort by several columns:

## # A tibble: 4,354 x 7
##    Lemma s.Genitiv es.Genitiv Length Total Frac_es Syllables
##    <chr>     <dbl>      <dbl>  <int> <dbl>   <dbl>     <dbl>
##  1 DJ           18          0      2    18    0            1
##  2 Ei            0          7      2     7    1            1
##  3 Öl           45          5      2    50    0.1          1
##  4 Abo          22          0      3    22    0            1
##  5 Abt           0         18      3    18    1            1
##  6 Akt           0          9      3     9    1            1
##  7 All          32          0      3    32    0            1
##  8 Amt          15        186      3   201    0.93         1
##  9 Arm           4          8      3    12    0.67         1
## 10 Bad           0         20      3    20    1            1
## # ... with 4,344 more rows
## # A tibble: 4,354 x 7
##    Lemma                s.Genitiv es.Genitiv Length Total Frac_es Syllables
##    <chr>                    <dbl>      <dbl>  <int> <dbl>   <dbl>     <dbl>
##  1 Jugendmedienschutz-~        11         11     32    22    0.5          9
##  2 Bundesverfassungsge~         9          0     31     9    0            9
##  3 Mammographie-Screen~         5          0     31     5    0            7
##  4 Jugendmedienschutzs~         2         13     31    15    0.87         9
##  5 Urheberrechtswahrne~         0          8     31     8    1            9
##  6 Bundesverteidigungs~         7          0     30     7    0           11
##  7 Beschäftigtendatens~         0          5     30     5    1            9
##  8 Bundesgesundheitsmi~        22          0     28    22    0           10
##  9 Bundeswirtschaftsmi~        18          0     28    18    0            9
## 10 Verbraucherschutzmi~        11          0     28    11    0            9
## # ... with 4,344 more rows

Summarising data

  • group_by() creates a grouped tibble
  • summarise is then used for arbitrary operations (sums, means, standard deviations, …) which are performed by group
## # A tibble: 30 x 4
##    Length Lemma_count s_genitives es_genitives
##     <int>       <int>       <dbl>        <dbl>
##  1      2           3          63           12
##  2      3          51        1223         5356
##  3      4         243        8883        21801
##  4      5         281       19506         6635
##  5      6         448       18188         3588
##  6      7         401       18062         4715
##  7      8         426       15099         2195
##  8      9         428       10238         2927
##  9     10         366        6078         1972
## 10     11         340        6441         2477
## # ... with 20 more rows
## # A tibble: 11 x 4
##    Syllables Lemma_count s_genitives es_genitives
##        <dbl>       <int>       <dbl>        <dbl>
##  1         1         426       13989        34642
##  2         2        1412       56535        11310
##  3         3        1143       28994         6973
##  4         4         719       11610         2481
##  5         5         382        4210         1102
##  6         6         137        1102          642
##  7         7          89        1309          347
##  8         8          21         172           12
##  9         9          20         226           58
## 10        10           3          22           62
## 11        11           2          19            0

Does the lemma end in s, ß, z or x?

## # A tibble: 4,354 x 8
##    Lemma      s.Genitiv es.Genitiv Length Total Frac_es Syllables Ends_in_s
##    <chr>          <dbl>      <dbl>  <int> <dbl>   <dbl>     <dbl> <fct>    
##  1 Leben           3761          0      5  3761    0            2 no       
##  2 Blog            2570          0      4  2570    0            1 no       
##  3 Internet        1847          0      8  1847    0            3 no       
##  4 Artikel         1757          0      7  1757    0            3 no       
##  5 Erachten        1666          0      8  1666    0            3 no       
##  6 Monat           1562          6      5  1568    0            2 no       
##  7 Spiel           1479        192      5  1671    0.11         1 no       
##  8 Wissen          1463          0      6  1463    0            2 no       
##  9 Unternehm~      1260          0     11  1260    0            4 no       
## 10 Film            1241        265      4  1506    0.18         1 no       
## # ... with 4,344 more rows
## # A tibble: 2 x 3
##   Ends_in_s      s    es
##   <fct>      <dbl> <dbl>
## 1 no        118188 46660
## 2 yes            0 10969