<- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-08-06/olympics.csv') olympics
The tidyverse is a collection of R packages designed for data science, as a suite aimed at easening the data analysis in all its steps.
All packages share an underlying design philosophy, grammar, and data structures.
ggplot2
a system for creating graphics, based on the Grammar of Graphics
readr
a fast and friendly way to read rectangular data (csv, txt…)
tibble
a tibble is a re-imagining version of the data frame, keeping what time has proven to be effective and throwing out what has not
stringr
provides a cohesive set of functions designed to make working with strings as easy as possible
forcats
provides a suite of useful tools that solve common problems with factors
dplyr
provides a grammar of data manipulation, providing a consistent set of verbs that solve the most common data manipulation challenges
tidyr
provides a set of functions that help you get to tidy data
purrr
enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors
Installing:
4 main verbs of dplyr:
filter
: keep the rows that match a conditionselect
: keep columns by name, index or conditionmutate
: transform existent variables or create new onessummarise
: do some summary statistics and reduce dataLet’s work with some data. We are going to use data from the Olympic Games.
An explanation of the dataset can be found here
Let’s work with some data. We are going to use data from the Olympic Games.
An explanation of the dataset can be found here
# A tibble: 271,116 × 15
id name sex age height weight team noc games year season city
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 1 A Dijia… M 24 180 80 China CHN 1992… 1992 Summer Barc…
2 2 A Lamusi M 23 170 60 China CHN 2012… 2012 Summer Lond…
3 3 Gunnar … M 24 NA NA Denm… DEN 1920… 1920 Summer Antw…
4 4 Edgar L… M 34 NA NA Denm… DEN 1900… 1900 Summer Paris
5 5 Christi… F 21 185 82 Neth… NED 1988… 1988 Winter Calg…
6 5 Christi… F 21 185 82 Neth… NED 1988… 1988 Winter Calg…
7 5 Christi… F 25 185 82 Neth… NED 1992… 1992 Winter Albe…
8 5 Christi… F 25 185 82 Neth… NED 1992… 1992 Winter Albe…
9 5 Christi… F 27 185 82 Neth… NED 1994… 1994 Winter Lill…
10 5 Christi… F 27 185 82 Neth… NED 1994… 1994 Winter Lill…
# ℹ 271,106 more rows
# ℹ 3 more variables: sport <chr>, event <chr>, medal <chr>
4 main verbs of dplyr:
filter
: keep the rows that match a condition
select
: keep columns by name
mutate
: transform existent variables or create new ones
summarise
: do some summary statistics and reduce data
filter
# A tibble: 13,372 × 15
id name sex age height weight team noc games year season city
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 4 Edgar L… M 34 NA NA Denm… DEN 1900… 1900 Summer Paris
2 17 Paavo J… M 28 175 64 Finl… FIN 1948… 1948 Summer Lond…
3 17 Paavo J… M 28 175 64 Finl… FIN 1948… 1948 Summer Lond…
4 17 Paavo J… M 28 175 64 Finl… FIN 1948… 1948 Summer Lond…
5 20 Kjetil … M 20 176 85 Norw… NOR 1992… 1992 Winter Albe…
6 20 Kjetil … M 30 176 85 Norw… NOR 2002… 2002 Winter Salt…
7 20 Kjetil … M 30 176 85 Norw… NOR 2002… 2002 Winter Salt…
8 20 Kjetil … M 34 176 85 Norw… NOR 2006… 2006 Winter Tori…
9 21 Ragnhil… F 27 163 NA Norw… NOR 2008… 2008 Summer Beij…
10 40 Roald E… M 31 NA NA Norw… NOR 1960… 1960 Winter Squa…
# ℹ 13,362 more rows
# ℹ 3 more variables: sport <chr>, event <chr>, medal <chr>
filter
# A tibble: 88,863 × 15
id name sex age height weight team noc games year season city
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 2 A Lamusi M 23 170 60 China CHN 2012… 2012 Summer Lond…
2 9 Antti S… M 26 186 96 Finl… FIN 2002… 2002 Winter Salt…
3 12 Jyri Ta… M 31 172 70 Finl… FIN 2000… 2000 Summer Sydn…
4 13 Minna M… F 34 159 55.5 Finl… FIN 2000… 2000 Summer Sydn…
5 16 Juhamat… M 28 184 85 Finl… FIN 2014… 2014 Winter Sochi
6 18 Timo An… M 31 189 130 Finl… FIN 2000… 2000 Summer Sydn…
7 20 Kjetil … M 26 176 85 Norw… NOR 1998… 1998 Winter Naga…
8 20 Kjetil … M 26 176 85 Norw… NOR 1998… 1998 Winter Naga…
9 20 Kjetil … M 26 176 85 Norw… NOR 1998… 1998 Winter Naga…
10 20 Kjetil … M 26 176 85 Norw… NOR 1998… 1998 Winter Naga…
# ℹ 88,853 more rows
# ℹ 3 more variables: sport <chr>, event <chr>, medal <chr>
filter
# A tibble: 6,696 × 15
id name sex age height weight team noc games year season city
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 51 Nstor A… M 23 167 64 Spain ESP 2016… 2016 Summer Rio …
2 51 Nstor A… M 23 167 64 Spain ESP 2016… 2016 Summer Rio …
3 51 Nstor A… M 23 167 64 Spain ESP 2016… 2016 Summer Rio …
4 51 Nstor A… M 23 167 64 Spain ESP 2016… 2016 Summer Rio …
5 51 Nstor A… M 23 167 64 Spain ESP 2016… 2016 Summer Rio …
6 51 Nstor A… M 23 167 64 Spain ESP 2016… 2016 Summer Rio …
7 55 Antonio… M 26 170 65 Spain ESP 2016… 2016 Summer Rio …
8 63 Jos Lui… M 30 194 87 Spain ESP 2008… 2008 Summer Beij…
9 69 Tamara … F 19 185 72 Spain ESP 2008… 2008 Summer Beij…
10 80 Jos Mar… M 22 186 83 Spain ESP 1996… 1996 Summer Atla…
# ℹ 6,686 more rows
# ℹ 3 more variables: sport <chr>, event <chr>, medal <chr>
filter
# A tibble: 27 × 15
id name sex age height weight team noc games year season city
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 2210 Marina … F 26 164 55 Spain ESP 2012… 2012 Summer Lond…
2 9464 Ruth Be… F 37 191 72 Spain ESP 2016… 2016 Summer Rio …
3 9750 Mireia … F 25 170 59 Spain ESP 2016… 2016 Summer Rio …
4 17348 David C… M 21 183 86 Spain ESP 2004… 2004 Summer Athi…
5 21174 Maialen… F 33 161 55 Spain ESP 2016… 2016 Summer Rio …
6 23703 Sal Cra… M 23 192 98 Spain ESP 2008… 2008 Summer Beij…
7 23703 Sal Cra… M 31 192 98 Spain ESP 2016… 2016 Summer Rio …
8 26983 Gervasi… M 19 166 69 Spain ESP 2000… 2000 Summer Sydn…
9 26983 Gervasi… M 23 166 69 Spain ESP 2004… 2004 Summer Athi…
10 31296 Fernand… M 35 180 76 Spain ESP 2008… 2008 Summer Beij…
# ℹ 17 more rows
# ℹ 3 more variables: sport <chr>, event <chr>, medal <chr>
select
# A tibble: 271,116 × 3
name sport games
<chr> <chr> <chr>
1 A Dijiang Basketball 1992 Summer
2 A Lamusi Judo 2012 Summer
3 Gunnar Nielsen Aaby Football 1920 Summer
4 Edgar Lindenau Aabye Tug-Of-War 1900 Summer
5 Christine Jacoba Aaftink Speed Skating 1988 Winter
6 Christine Jacoba Aaftink Speed Skating 1988 Winter
7 Christine Jacoba Aaftink Speed Skating 1992 Winter
8 Christine Jacoba Aaftink Speed Skating 1992 Winter
9 Christine Jacoba Aaftink Speed Skating 1994 Winter
10 Christine Jacoba Aaftink Speed Skating 1994 Winter
# ℹ 271,106 more rows
select
# A tibble: 271,116 × 14
id sex age height weight team noc games year season city sport
<dbl> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr>
1 1 M 24 180 80 China CHN 1992… 1992 Summer Barc… Bask…
2 2 M 23 170 60 China CHN 2012… 2012 Summer Lond… Judo
3 3 M 24 NA NA Denmark DEN 1920… 1920 Summer Antw… Foot…
4 4 M 34 NA NA Denmark… DEN 1900… 1900 Summer Paris Tug-…
5 5 F 21 185 82 Netherl… NED 1988… 1988 Winter Calg… Spee…
6 5 F 21 185 82 Netherl… NED 1988… 1988 Winter Calg… Spee…
7 5 F 25 185 82 Netherl… NED 1992… 1992 Winter Albe… Spee…
8 5 F 25 185 82 Netherl… NED 1992… 1992 Winter Albe… Spee…
9 5 F 27 185 82 Netherl… NED 1994… 1994 Winter Lill… Spee…
10 5 F 27 185 82 Netherl… NED 1994… 1994 Winter Lill… Spee…
# ℹ 271,106 more rows
# ℹ 2 more variables: event <chr>, medal <chr>
select
# A tibble: 271,116 × 9
id name sex age height weight team noc games
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr>
1 1 A Dijiang M 24 180 80 China CHN 1992…
2 2 A Lamusi M 23 170 60 China CHN 2012…
3 3 Gunnar Nielsen Aaby M 24 NA NA Denmark DEN 1920…
4 4 Edgar Lindenau Aabye M 34 NA NA Denmark… DEN 1900…
5 5 Christine Jacoba Aaftink F 21 185 82 Netherl… NED 1988…
6 5 Christine Jacoba Aaftink F 21 185 82 Netherl… NED 1988…
7 5 Christine Jacoba Aaftink F 25 185 82 Netherl… NED 1992…
8 5 Christine Jacoba Aaftink F 25 185 82 Netherl… NED 1992…
9 5 Christine Jacoba Aaftink F 27 185 82 Netherl… NED 1994…
10 5 Christine Jacoba Aaftink F 27 185 82 Netherl… NED 1994…
# ℹ 271,106 more rows
select
Special functions:
starts_with(x)
: names that start with xends_with(x)
: names that end with xcontains(x)
: selects all variables whose name contains xmatches(x)
: selects all variables whose name contains the regular expression xnum_range("x", 1:5, width = 2)
: selects all variables (numerically) from x01 to x05all_of(c("x", "y", "z"))
: selects variables provided in a character vectorany_of(c("x", "y", "z"))
: selects variables provided in a character vectoreverything()
: selects all variablesselect
Petal.Length Petal.Width
1 1.4 0.2
2 1.4 0.2
3 1.3 0.2
4 1.5 0.2
5 1.4 0.2
6 1.7 0.4
7 1.4 0.3
8 1.5 0.2
9 1.4 0.2
10 1.5 0.1
11 1.5 0.2
12 1.6 0.2
13 1.4 0.1
14 1.1 0.1
15 1.2 0.2
16 1.5 0.4
17 1.3 0.4
18 1.4 0.3
19 1.7 0.3
20 1.5 0.3
21 1.7 0.2
22 1.5 0.4
23 1.0 0.2
24 1.7 0.5
25 1.9 0.2
26 1.6 0.2
27 1.6 0.4
28 1.5 0.2
29 1.4 0.2
30 1.6 0.2
31 1.6 0.2
32 1.5 0.4
33 1.5 0.1
34 1.4 0.2
35 1.5 0.2
36 1.2 0.2
37 1.3 0.2
38 1.4 0.1
39 1.3 0.2
40 1.5 0.2
41 1.3 0.3
42 1.3 0.3
43 1.3 0.2
44 1.6 0.6
45 1.9 0.4
46 1.4 0.3
47 1.6 0.2
48 1.4 0.2
49 1.5 0.2
50 1.4 0.2
51 4.7 1.4
52 4.5 1.5
53 4.9 1.5
54 4.0 1.3
55 4.6 1.5
56 4.5 1.3
57 4.7 1.6
58 3.3 1.0
59 4.6 1.3
60 3.9 1.4
61 3.5 1.0
62 4.2 1.5
63 4.0 1.0
64 4.7 1.4
65 3.6 1.3
66 4.4 1.4
67 4.5 1.5
68 4.1 1.0
69 4.5 1.5
70 3.9 1.1
71 4.8 1.8
72 4.0 1.3
73 4.9 1.5
74 4.7 1.2
75 4.3 1.3
76 4.4 1.4
77 4.8 1.4
78 5.0 1.7
79 4.5 1.5
80 3.5 1.0
81 3.8 1.1
82 3.7 1.0
83 3.9 1.2
84 5.1 1.6
85 4.5 1.5
86 4.5 1.6
87 4.7 1.5
88 4.4 1.3
89 4.1 1.3
90 4.0 1.3
91 4.4 1.2
92 4.6 1.4
93 4.0 1.2
94 3.3 1.0
95 4.2 1.3
96 4.2 1.2
97 4.2 1.3
98 4.3 1.3
99 3.0 1.1
100 4.1 1.3
101 6.0 2.5
102 5.1 1.9
103 5.9 2.1
104 5.6 1.8
105 5.8 2.2
106 6.6 2.1
107 4.5 1.7
108 6.3 1.8
109 5.8 1.8
110 6.1 2.5
111 5.1 2.0
112 5.3 1.9
113 5.5 2.1
114 5.0 2.0
115 5.1 2.4
116 5.3 2.3
117 5.5 1.8
118 6.7 2.2
119 6.9 2.3
120 5.0 1.5
121 5.7 2.3
122 4.9 2.0
123 6.7 2.0
124 4.9 1.8
125 5.7 2.1
126 6.0 1.8
127 4.8 1.8
128 4.9 1.8
129 5.6 2.1
130 5.8 1.6
131 6.1 1.9
132 6.4 2.0
133 5.6 2.2
134 5.1 1.5
135 5.6 1.4
136 6.1 2.3
137 5.6 2.4
138 5.5 1.8
139 4.8 1.8
140 5.4 2.1
141 5.6 2.4
142 5.1 2.3
143 5.1 1.9
144 5.9 2.3
145 5.7 2.5
146 5.2 2.3
147 5.0 1.9
148 5.2 2.0
149 5.4 2.3
150 5.1 1.8
Sepal.Width Petal.Width
1 3.5 0.2
2 3.0 0.2
3 3.2 0.2
4 3.1 0.2
5 3.6 0.2
6 3.9 0.4
7 3.4 0.3
8 3.4 0.2
9 2.9 0.2
10 3.1 0.1
11 3.7 0.2
12 3.4 0.2
13 3.0 0.1
14 3.0 0.1
15 4.0 0.2
16 4.4 0.4
17 3.9 0.4
18 3.5 0.3
19 3.8 0.3
20 3.8 0.3
21 3.4 0.2
22 3.7 0.4
23 3.6 0.2
24 3.3 0.5
25 3.4 0.2
26 3.0 0.2
27 3.4 0.4
28 3.5 0.2
29 3.4 0.2
30 3.2 0.2
31 3.1 0.2
32 3.4 0.4
33 4.1 0.1
34 4.2 0.2
35 3.1 0.2
36 3.2 0.2
37 3.5 0.2
38 3.6 0.1
39 3.0 0.2
40 3.4 0.2
41 3.5 0.3
42 2.3 0.3
43 3.2 0.2
44 3.5 0.6
45 3.8 0.4
46 3.0 0.3
47 3.8 0.2
48 3.2 0.2
49 3.7 0.2
50 3.3 0.2
51 3.2 1.4
52 3.2 1.5
53 3.1 1.5
54 2.3 1.3
55 2.8 1.5
56 2.8 1.3
57 3.3 1.6
58 2.4 1.0
59 2.9 1.3
60 2.7 1.4
61 2.0 1.0
62 3.0 1.5
63 2.2 1.0
64 2.9 1.4
65 2.9 1.3
66 3.1 1.4
67 3.0 1.5
68 2.7 1.0
69 2.2 1.5
70 2.5 1.1
71 3.2 1.8
72 2.8 1.3
73 2.5 1.5
74 2.8 1.2
75 2.9 1.3
76 3.0 1.4
77 2.8 1.4
78 3.0 1.7
79 2.9 1.5
80 2.6 1.0
81 2.4 1.1
82 2.4 1.0
83 2.7 1.2
84 2.7 1.6
85 3.0 1.5
86 3.4 1.6
87 3.1 1.5
88 2.3 1.3
89 3.0 1.3
90 2.5 1.3
91 2.6 1.2
92 3.0 1.4
93 2.6 1.2
94 2.3 1.0
95 2.7 1.3
96 3.0 1.2
97 2.9 1.3
98 2.9 1.3
99 2.5 1.1
100 2.8 1.3
101 3.3 2.5
102 2.7 1.9
103 3.0 2.1
104 2.9 1.8
105 3.0 2.2
106 3.0 2.1
107 2.5 1.7
108 2.9 1.8
109 2.5 1.8
110 3.6 2.5
111 3.2 2.0
112 2.7 1.9
113 3.0 2.1
114 2.5 2.0
115 2.8 2.4
116 3.2 2.3
117 3.0 1.8
118 3.8 2.2
119 2.6 2.3
120 2.2 1.5
121 3.2 2.3
122 2.8 2.0
123 2.8 2.0
124 2.7 1.8
125 3.3 2.1
126 3.2 1.8
127 2.8 1.8
128 3.0 1.8
129 2.8 2.1
130 3.0 1.6
131 2.8 1.9
132 3.8 2.0
133 2.8 2.2
134 2.8 1.5
135 2.6 1.4
136 3.0 2.3
137 3.4 2.4
138 3.1 1.8
139 3.0 1.8
140 3.1 2.1
141 3.1 2.4
142 3.1 2.3
143 2.7 1.9
144 3.2 2.3
145 3.3 2.5
146 3.0 2.3
147 2.5 1.9
148 3.0 2.0
149 3.4 2.3
150 3.0 1.8
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
5 5.0 3.6 1.4 0.2
6 5.4 3.9 1.7 0.4
7 4.6 3.4 1.4 0.3
8 5.0 3.4 1.5 0.2
9 4.4 2.9 1.4 0.2
10 4.9 3.1 1.5 0.1
11 5.4 3.7 1.5 0.2
12 4.8 3.4 1.6 0.2
13 4.8 3.0 1.4 0.1
14 4.3 3.0 1.1 0.1
15 5.8 4.0 1.2 0.2
16 5.7 4.4 1.5 0.4
17 5.4 3.9 1.3 0.4
18 5.1 3.5 1.4 0.3
19 5.7 3.8 1.7 0.3
20 5.1 3.8 1.5 0.3
21 5.4 3.4 1.7 0.2
22 5.1 3.7 1.5 0.4
23 4.6 3.6 1.0 0.2
24 5.1 3.3 1.7 0.5
25 4.8 3.4 1.9 0.2
26 5.0 3.0 1.6 0.2
27 5.0 3.4 1.6 0.4
28 5.2 3.5 1.5 0.2
29 5.2 3.4 1.4 0.2
30 4.7 3.2 1.6 0.2
31 4.8 3.1 1.6 0.2
32 5.4 3.4 1.5 0.4
33 5.2 4.1 1.5 0.1
34 5.5 4.2 1.4 0.2
35 4.9 3.1 1.5 0.2
36 5.0 3.2 1.2 0.2
37 5.5 3.5 1.3 0.2
38 4.9 3.6 1.4 0.1
39 4.4 3.0 1.3 0.2
40 5.1 3.4 1.5 0.2
41 5.0 3.5 1.3 0.3
42 4.5 2.3 1.3 0.3
43 4.4 3.2 1.3 0.2
44 5.0 3.5 1.6 0.6
45 5.1 3.8 1.9 0.4
46 4.8 3.0 1.4 0.3
47 5.1 3.8 1.6 0.2
48 4.6 3.2 1.4 0.2
49 5.3 3.7 1.5 0.2
50 5.0 3.3 1.4 0.2
51 7.0 3.2 4.7 1.4
52 6.4 3.2 4.5 1.5
53 6.9 3.1 4.9 1.5
54 5.5 2.3 4.0 1.3
55 6.5 2.8 4.6 1.5
56 5.7 2.8 4.5 1.3
57 6.3 3.3 4.7 1.6
58 4.9 2.4 3.3 1.0
59 6.6 2.9 4.6 1.3
60 5.2 2.7 3.9 1.4
61 5.0 2.0 3.5 1.0
62 5.9 3.0 4.2 1.5
63 6.0 2.2 4.0 1.0
64 6.1 2.9 4.7 1.4
65 5.6 2.9 3.6 1.3
66 6.7 3.1 4.4 1.4
67 5.6 3.0 4.5 1.5
68 5.8 2.7 4.1 1.0
69 6.2 2.2 4.5 1.5
70 5.6 2.5 3.9 1.1
71 5.9 3.2 4.8 1.8
72 6.1 2.8 4.0 1.3
73 6.3 2.5 4.9 1.5
74 6.1 2.8 4.7 1.2
75 6.4 2.9 4.3 1.3
76 6.6 3.0 4.4 1.4
77 6.8 2.8 4.8 1.4
78 6.7 3.0 5.0 1.7
79 6.0 2.9 4.5 1.5
80 5.7 2.6 3.5 1.0
81 5.5 2.4 3.8 1.1
82 5.5 2.4 3.7 1.0
83 5.8 2.7 3.9 1.2
84 6.0 2.7 5.1 1.6
85 5.4 3.0 4.5 1.5
86 6.0 3.4 4.5 1.6
87 6.7 3.1 4.7 1.5
88 6.3 2.3 4.4 1.3
89 5.6 3.0 4.1 1.3
90 5.5 2.5 4.0 1.3
91 5.5 2.6 4.4 1.2
92 6.1 3.0 4.6 1.4
93 5.8 2.6 4.0 1.2
94 5.0 2.3 3.3 1.0
95 5.6 2.7 4.2 1.3
96 5.7 3.0 4.2 1.2
97 5.7 2.9 4.2 1.3
98 6.2 2.9 4.3 1.3
99 5.1 2.5 3.0 1.1
100 5.7 2.8 4.1 1.3
101 6.3 3.3 6.0 2.5
102 5.8 2.7 5.1 1.9
103 7.1 3.0 5.9 2.1
104 6.3 2.9 5.6 1.8
105 6.5 3.0 5.8 2.2
106 7.6 3.0 6.6 2.1
107 4.9 2.5 4.5 1.7
108 7.3 2.9 6.3 1.8
109 6.7 2.5 5.8 1.8
110 7.2 3.6 6.1 2.5
111 6.5 3.2 5.1 2.0
112 6.4 2.7 5.3 1.9
113 6.8 3.0 5.5 2.1
114 5.7 2.5 5.0 2.0
115 5.8 2.8 5.1 2.4
116 6.4 3.2 5.3 2.3
117 6.5 3.0 5.5 1.8
118 7.7 3.8 6.7 2.2
119 7.7 2.6 6.9 2.3
120 6.0 2.2 5.0 1.5
121 6.9 3.2 5.7 2.3
122 5.6 2.8 4.9 2.0
123 7.7 2.8 6.7 2.0
124 6.3 2.7 4.9 1.8
125 6.7 3.3 5.7 2.1
126 7.2 3.2 6.0 1.8
127 6.2 2.8 4.8 1.8
128 6.1 3.0 4.9 1.8
129 6.4 2.8 5.6 2.1
130 7.2 3.0 5.8 1.6
131 7.4 2.8 6.1 1.9
132 7.9 3.8 6.4 2.0
133 6.4 2.8 5.6 2.2
134 6.3 2.8 5.1 1.5
135 6.1 2.6 5.6 1.4
136 7.7 3.0 6.1 2.3
137 6.3 3.4 5.6 2.4
138 6.4 3.1 5.5 1.8
139 6.0 3.0 4.8 1.8
140 6.9 3.1 5.4 2.1
141 6.7 3.1 5.6 2.4
142 6.9 3.1 5.1 2.3
143 5.8 2.7 5.1 1.9
144 6.8 3.2 5.9 2.3
145 6.7 3.3 5.7 2.5
146 6.7 3.0 5.2 2.3
147 6.3 2.5 5.0 1.9
148 6.5 3.0 5.2 2.0
149 6.2 3.4 5.4 2.3
150 5.9 3.0 5.1 1.8
# A tibble: 271,116 × 4
name medal noc year
<chr> <chr> <chr> <dbl>
1 A Dijiang <NA> CHN 1992
2 A Lamusi <NA> CHN 2012
3 Gunnar Nielsen Aaby <NA> DEN 1920
4 Edgar Lindenau Aabye Gold DEN 1900
5 Christine Jacoba Aaftink <NA> NED 1988
6 Christine Jacoba Aaftink <NA> NED 1988
7 Christine Jacoba Aaftink <NA> NED 1992
8 Christine Jacoba Aaftink <NA> NED 1992
9 Christine Jacoba Aaftink <NA> NED 1994
10 Christine Jacoba Aaftink <NA> NED 1994
# ℹ 271,106 more rows
# A tibble: 271,116 × 2
name medal
<chr> <chr>
1 A Dijiang <NA>
2 A Lamusi <NA>
3 Gunnar Nielsen Aaby <NA>
4 Edgar Lindenau Aabye Gold
5 Christine Jacoba Aaftink <NA>
6 Christine Jacoba Aaftink <NA>
7 Christine Jacoba Aaftink <NA>
8 Christine Jacoba Aaftink <NA>
9 Christine Jacoba Aaftink <NA>
10 Christine Jacoba Aaftink <NA>
# ℹ 271,106 more rows
mutate
# A tibble: 271,116 × 15
id name sex age height weight team noc games year season city
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 1 A Dijia… M 24 1.8 80 China CHN 1992… 1992 Summer Barc…
2 2 A Lamusi M 23 1.7 60 China CHN 2012… 2012 Summer Lond…
3 3 Gunnar … M 24 NA NA Denm… DEN 1920… 1920 Summer Antw…
4 4 Edgar L… M 34 NA NA Denm… DEN 1900… 1900 Summer Paris
5 5 Christi… F 21 1.85 82 Neth… NED 1988… 1988 Winter Calg…
6 5 Christi… F 21 1.85 82 Neth… NED 1988… 1988 Winter Calg…
7 5 Christi… F 25 1.85 82 Neth… NED 1992… 1992 Winter Albe…
8 5 Christi… F 25 1.85 82 Neth… NED 1992… 1992 Winter Albe…
9 5 Christi… F 27 1.85 82 Neth… NED 1994… 1994 Winter Lill…
10 5 Christi… F 27 1.85 82 Neth… NED 1994… 1994 Winter Lill…
# ℹ 271,106 more rows
# ℹ 3 more variables: sport <chr>, event <chr>, medal <chr>
mutate
# A tibble: 271,116 × 16
id name sex age height weight team noc games year season city
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 1 A Dijia… M 24 180 80 China CHN 1992… 1992 Summer Barc…
2 2 A Lamusi M 23 170 60 China CHN 2012… 2012 Summer Lond…
3 3 Gunnar … M 24 NA NA Denm… DEN 1920… 1920 Summer Antw…
4 4 Edgar L… M 34 NA NA Denm… DEN 1900… 1900 Summer Paris
5 5 Christi… F 21 185 82 Neth… NED 1988… 1988 Winter Calg…
6 5 Christi… F 21 185 82 Neth… NED 1988… 1988 Winter Calg…
7 5 Christi… F 25 185 82 Neth… NED 1992… 1992 Winter Albe…
8 5 Christi… F 25 185 82 Neth… NED 1992… 1992 Winter Albe…
9 5 Christi… F 27 185 82 Neth… NED 1994… 1994 Winter Lill…
10 5 Christi… F 27 185 82 Neth… NED 1994… 1994 Winter Lill…
# ℹ 271,106 more rows
# ℹ 4 more variables: sport <chr>, event <chr>, medal <chr>, bmi <dbl>
mutate
# A tibble: 271,116 × 17
id name sex age height weight team noc games year season city
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 1 A Dijia… M 24 1.8 80 China CHN 1992… 1992 Summer Barc…
2 2 A Lamusi M 23 1.7 60 China CHN 2012… 2012 Summer Lond…
3 3 Gunnar … M 24 NA NA Denm… DEN 1920… 1920 Summer Antw…
4 4 Edgar L… M 34 NA NA Denm… DEN 1900… 1900 Summer Paris
5 5 Christi… F 21 1.85 82 Neth… NED 1988… 1988 Winter Calg…
6 5 Christi… F 21 1.85 82 Neth… NED 1988… 1988 Winter Calg…
7 5 Christi… F 25 1.85 82 Neth… NED 1992… 1992 Winter Albe…
8 5 Christi… F 25 1.85 82 Neth… NED 1992… 1992 Winter Albe…
9 5 Christi… F 27 1.85 82 Neth… NED 1994… 1994 Winter Lill…
10 5 Christi… F 27 1.85 82 Neth… NED 1994… 1994 Winter Lill…
# ℹ 271,106 more rows
# ℹ 5 more variables: sport <chr>, event <chr>, medal <chr>, bmi <dbl>,
# imb <dbl>
summarise
# A tibble: 1 × 2
n average_height
<int> <dbl>
1 271116 175.
summarise
Summary functions
min(x)
, max(x)
, quantile(x, p)
mean(x)
, median(x)
,
sd(x)
, var(x)
, IQR(x)
n()
, n_distinct(x)
sum(x > 10)
, mean(x > 10)
.by
argument# A tibble: 120 × 5
sex season sport n average_height
<chr> <chr> <chr> <int> <dbl>
1 M Summer Basketball 3280 195.
2 M Summer Judo 2708 177.
3 M Summer Football 5733 177.
4 M Summer Tug-Of-War 170 182.
5 F Winter Speed Skating 2081 167.
6 M Winter Cross Country Skiing 5748 178.
7 F Summer Athletics 11666 169.
8 M Winter Ice Hockey 4702 181.
9 M Summer Swimming 13345 184.
10 M Summer Badminton 717 180.
# ℹ 110 more rows
group_by
# A tibble: 271,116 × 15
# Groups: sex, season, sport [120]
id name sex age height weight team noc games year season city
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 1 A Dijia… M 24 180 80 China CHN 1992… 1992 Summer Barc…
2 2 A Lamusi M 23 170 60 China CHN 2012… 2012 Summer Lond…
3 3 Gunnar … M 24 NA NA Denm… DEN 1920… 1920 Summer Antw…
4 4 Edgar L… M 34 NA NA Denm… DEN 1900… 1900 Summer Paris
5 5 Christi… F 21 185 82 Neth… NED 1988… 1988 Winter Calg…
6 5 Christi… F 21 185 82 Neth… NED 1988… 1988 Winter Calg…
7 5 Christi… F 25 185 82 Neth… NED 1992… 1992 Winter Albe…
8 5 Christi… F 25 185 82 Neth… NED 1992… 1992 Winter Albe…
9 5 Christi… F 27 185 82 Neth… NED 1994… 1994 Winter Lill…
10 5 Christi… F 27 185 82 Neth… NED 1994… 1994 Winter Lill…
# ℹ 271,106 more rows
# ℹ 3 more variables: sport <chr>, event <chr>, medal <chr>
group_by
# A tibble: 120 × 5
# Groups: sex, season [4]
sex season sport n average_height
<chr> <chr> <chr> <int> <dbl>
1 F Summer Alpinism 1 NaN
2 F Summer Archery 1015 167.
3 F Summer Art Competitions 377 160
4 F Summer Athletics 11666 169.
5 F Summer Badminton 740 168.
6 F Summer Basketball 1256 182.
7 F Summer Beach Volleyball 276 179.
8 F Summer Boxing 72 169.
9 F Summer Canoeing 1380 170.
10 F Summer Croquet 6 NaN
# ℹ 110 more rows
|>
)Often, we want to use several verbs (filter, arrange, group_by, summarise…)
Multiple operations are difficult to read, or require to create multiple intermediate objects:
|>
)Often, we want to use several verbs (filter, arrange, group_by, summarise…)
Multiple operations are difficult to read, or require to create multiple intermediate objects:
pull
case_when
arrange
bind_cols
, bind_rows
, left_join
, inner_join
and other joins (not explained here)
pull
) [1] "Basketball" "Judo"
[3] "Football" "Tug-Of-War"
[5] "Speed Skating" "Cross Country Skiing"
[7] "Athletics" "Ice Hockey"
[9] "Swimming" "Badminton"
[11] "Sailing" "Biathlon"
[13] "Gymnastics" "Art Competitions"
[15] "Alpine Skiing" "Handball"
[17] "Weightlifting" "Wrestling"
[19] "Luge" "Water Polo"
[21] "Hockey" "Rowing"
[23] "Bobsleigh" "Fencing"
[25] "Equestrianism" "Shooting"
[27] "Boxing" "Taekwondo"
[29] "Cycling" "Diving"
[31] "Canoeing" "Tennis"
[33] "Modern Pentathlon" "Figure Skating"
[35] "Golf" "Softball"
[37] "Archery" "Volleyball"
[39] "Synchronized Swimming" "Table Tennis"
[41] "Nordic Combined" "Baseball"
[43] "Rhythmic Gymnastics" "Freestyle Skiing"
[45] "Rugby Sevens" "Trampolining"
[47] "Beach Volleyball" "Triathlon"
[49] "Ski Jumping" "Curling"
[51] "Snowboarding" "Rugby"
[53] "Short Track Speed Skating" "Skeleton"
[55] "Lacrosse" "Polo"
[57] "Cricket" "Racquets"
[59] "Motorboating" "Military Ski Patrol"
[61] "Croquet" "Jeu De Paume"
[63] "Roque" "Alpinism"
[65] "Basque Pelota" "Aeronautics"
case_when
)olympics |>
mutate(
age_class = case_when(
age < 16 ~ "junior",
age > 50 ~ "senior",
.default = "elite"
)
)
# A tibble: 271,116 × 16
id name sex age height weight team noc games year season city
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 1 A Dijia… M 24 180 80 China CHN 1992… 1992 Summer Barc…
2 2 A Lamusi M 23 170 60 China CHN 2012… 2012 Summer Lond…
3 3 Gunnar … M 24 NA NA Denm… DEN 1920… 1920 Summer Antw…
4 4 Edgar L… M 34 NA NA Denm… DEN 1900… 1900 Summer Paris
5 5 Christi… F 21 185 82 Neth… NED 1988… 1988 Winter Calg…
6 5 Christi… F 21 185 82 Neth… NED 1988… 1988 Winter Calg…
7 5 Christi… F 25 185 82 Neth… NED 1992… 1992 Winter Albe…
8 5 Christi… F 25 185 82 Neth… NED 1992… 1992 Winter Albe…
9 5 Christi… F 27 185 82 Neth… NED 1994… 1994 Winter Lill…
10 5 Christi… F 27 185 82 Neth… NED 1994… 1994 Winter Lill…
# ℹ 271,106 more rows
# ℹ 4 more variables: sport <chr>, event <chr>, medal <chr>, age_class <chr>
arrange
)# A tibble: 271,116 × 15
id name sex age height weight team noc games year season city
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
1 22700 "James… M 27 175 72 Unit… USA 1896… 1896 Summer Athi…
2 25072 "Evang… M NA NA NA Gree… GRE 1896… 1896 Summer Athi…
3 40851 "Charl… M 23 NA NA Grea… GBR 1896… 1896 Summer Athi…
4 41160 "Dimit… M 21 NA NA Gree… GRE 1896… 1896 Summer Athi…
5 58546 "Gyula… M 24 NA NA Hung… HUN 1896… 1896 Summer Athi…
6 66362 "Franc… M 21 170 69 Unit… USA 1896… 1896 Summer Athi…
7 68911 "Albin… M 21 NA NA Fran… FRA 1896… 1896 Summer Athi…
8 91250 "Georg… M NA NA NA Gree… GRE 1896… 1896 Summer Athi…
9 93606 "Ioann… M NA NA NA Gree… GRE 1896… 1896 Summer Athi…
10 117571 "Alajo… M 24 NA NA Hung… HUN 1896… 1896 Summer Athi…
# ℹ 271,106 more rows
# ℹ 3 more variables: sport <chr>, event <chr>, medal <chr>
25-29 Nov 2024