1 + 3 + 5
Week 1 Worksheet Exercises
Mind your language: A brief introduction to R, RStudio, and other tools of the trade
Aims
By the end of the session, you will:
- understand how to use the most important panels in the RStudio interface
- create an RStudio Project to store your work throughout the course
- begin using R scripts (.R) and Quarto notebooks (.qmd) to record and document your coding progress
- understand data types and basic operations in the
R
language - understand the principles behind functions
- know how to install, load and use functions from user-written packages
- gain familiarity with some useful functions from packages included in the tidyverse ecosystem
Exercise 1: Install R and RStudio, and perform basic settings
To install R and RStudio on your personal computers, follow the steps outlined here based on your operating system.
Although you will only interact directly with RStudio in this module, R needs to be installed first so that RStudio can detect it and connect to it.
Once installed, open RStudio and explore its panes.
Make the following changes to the RStudio settings in the Global options:
- set RStudio to never save your workspace as
.RData
upon exiting; - set RStudio to insert the native “pipe operator” when typing the
Ctrl
+Shift
+M
keyboard shortcut.
Advanced user exercise: leap year functions
If you have more advanced knowledge of R
, here’s and exercise for you.
Suppose you want to write a function that lists all the leap years between two specified years. How would you go about writing it? What are the information that you need first? What are the steps that you would take to build up the function? There are several ways of achieving such a function, and you can find three options at the bottom of this worksheet. Work individually or in a small group.
- You first need to check the definition of a leap year and how it is calculated (Google?)
- Ask R to tell you what the “%%” operator does. You can ask R for help using the
help()
function or?...
. - When you are done, you can check your results against the example solutions given at the bottom of this worksheet.
Exercise 2: Use R as a simple calculator
The most elementary yet still handy task you can use R
for is to perform basic arithmetic operations. This is useful for getting a first experience doing things in the R
language.
Let’s have a look at a few operations using the Console directly. Let’s say we want to know the result of adding up three numbers: 1, 3 and 5. In the Console pane, type the command below and then click Enter:
This will print out the result (9) in the Console:
[1] 9
The [1]
in the result is just the line number; in this case, our result only consists of a single line.
We can also save the result of this operation as an object, so we can use it for further operations. We create objects by using the so-called assignment operator consisting of the characters <-
.
A command involving <-
can be read as “assign the value of the result from the operation on the right hand side (some expression) to the object on the left hand side (short name of object, single word, with no spaces)”.
For example, let’s save our result in an object called “nine”:
nine <- 1 + 3 + 5
Notice that there is no output printed in the Console this time. But there are also no error messages, so the operation must have run without problems. Instead, if we look at the Environment pane, we notice that it is no longer empty, but contains an object called “nine” that stores the value “9” in it. We can now use this object for other operations, such as:
nine - 3
nine + 15
nine / 3
nine * 9
We see the results of these operations printed out in the Console.
We can also check results of so-called relational operations. There are several relational operators that allow us to compare objects in R. The most useful of these are the following:
-
>
greater than,>=
greater than or equal to -
<
less than,<=
less than or equal to -
==
equal to -
!=
not equal to
When we use these to compare two objects in R, we end up with a logical object.
For example, let’s check whether 9 is greater than 5, and whether it is lower than 8:
9 > 5
9 < 8
R
treats our inputs as statements that we are asking it to evaluate, and we get the answers “TRUE” and “FALSE”, respectively, as we would expect. Let’s now check whether our object “nine” is equal to the number 9. We may assume that we can achieve this by typing “nine = 9”, but let’s see what that results in:
nine = 9
Did we get the result we expected? Nothing was printed in the output, so seemingly nothing happened… That’s because the “=” sign is also used as an assignment operator in R
, just like “<-”. So we basically assigned the value “9” to the object “nine” again. To use the equal sign as a logical operator we must type it twice (==). Let’s see:
nine == 9
Now we get the answer “TRUE”, as expected.
This distinction between “=” and “==” is important to keep in mind. What would have happened if we had tried to test whether our object “nine” equals value “5” or not, and instead of the logical operator (==) we used the assignment operator (=)? Let’s see:
nine = 5
In the Console we again see no results printed, but if we check our Environment, we see that the value of the object “nine” was changed to 5. So it can be a dangerous business. We’ll be using the “<-” as assignment operator instead of “=” to avoid any confusion in this respect. The distinction between == and = will also emerge in other contexts later.
So, try out the following commands in turn now and check if the results are what you’d expect:
nine == 9
nine == 5
five <- 9
nine == five
five = nine
nine == five
nine + five <= 10 # lower than or equal to ...
The text following the hashtag (#) in the last line is a comment. If you’d like to comment on any code you write just add a hash (#
) or series of hashes in front of it so that R
knows it should not evaluate it as a command. This will be useful when writing your commands in an R script that you can save for later, rather than interacting with R
live in the Console.
Exercise 3: Create an RStudio Project containing a .R
and a .qmd
file
- Create a new folder set up as an R project; call the folder “HSS8005_labs”; when done, you should have an empty folder with a file called “HSS8005_labs.Rproj” in it
- Create a new R script (
.R
); once created, save it as “Lab_1.R” within the “HSS8005_labs” folder - Create a new Quarto document (
.qmd
); once created, save it as “Lab_1.qmd” within the “HSS8005_labs” folder
You will work in each of these new documents in this lab to gain experience with them.
Exercise 4: Vector operations
Let’s learn a few vector operations. Type/copy the code below to the R script file your created earlier (“Lab_1.R”), and save it at the end for your records.
First, let’s use the c()
function to concatenate vector elements:
x <- c(2.2, 6.2, 1.2, 5.5, 20.1)
To run this line of code in an R script, place the cursor on the line you want to execute and either click on the small “Run” tab in the upper-right corner of the script’s task bar, or click Ctrl+Enter (on Windows PCs).
The vector called x that we just created appears in the Environment. We can examine some of its features:
class(x)
typeof(x)
length(x)
attributes(x)
These tell us something about the characteristics of the object, but not much about its content (apart from the fact that it has a length of 5). Functions such as min
, max
, range
, mean
, median
, sum
or summary
give us some summary statistics about the object:
The seq()
function lets us create a sequence from a starting point to an ending point. If you specify the by
argument, you can skip values. For instance, if we wanted a vector of every 5th number between 0 and 100, we could write:
numbers <- seq(from = 0, to = 100, by = 5)
To print out the result in the console, we can simply type the name of the object:
numbers
A shorthand version to get a sequence between two numbers counting by 1s is to use the :
sign. For example, print out all the numbers between 200 and 250:
200:250
To access a single element of a vector by position in the vector, use the square brackets []
:
x[2]
If you want to access more than one element of a vector, put a vector of the positions you want to access in the brackets:
x[c(2, 5)]
If you try to access an element past the length of the vector, it will return a missing value NA
:
x[10]
If you accidentally subset a vector by NA
(the missing value), you get the vector back with all its entries replaced by NA
:
x[NA]
Let’s say you want to modify one value in your vector. You can combine the square bracket subset []
with the assignment operator <-
to replace a particular value:
x
x[3] <- 50.3
x
You can replace multiple values at the same time by using a vector for subsetting:
x
x[1:2] <- c(-1.3, 42)
x
If the replacement vector (the right-hand side) is shorter than what you are assigning to (the left-hand side), the values will “recycle” or repeat as necessary:
x[1:2] <- 3.2
x
x[1:4] <- c(1.2, 2.4)
x
You can also create a vector of characters (words, letters, punctuation, etc):
jedi <- c("Yoda", "Obi-Wan", "Luke", "Leia", "Rey")
Note for vectors, you cannot mix characters and numbers in the same vector. If you add a single character element, the whole vector gets converted.
### output is numeric
x
### output is now character
c(x, "hey")
Logical vectors are just vectors that only contain the special R values TRUE
or FALSE
.
logical <- c(TRUE, FALSE, TRUE, TRUE, FALSE)
logical
You could but never should shorten TRUE
to T
and FALSE
to F
. It’s easy for this shortening to go wrong so better just to spell out the full word. Also not that this is case-sensitive, and this will produce an error:
true
True
false
Exercise 5: Data-frame operations on built-in datasets
There are several toy data frames built into R, and we can have a look at one to see how it looks like.
- Use the
data()
command to get a list of available built-in datasets; - Choose one of the available datasets and import it into the Rstudio Environment
- Open the dataset in the Viewer to quickly inspect it visually
- Check the following using the appropriate R functions:
- How many cases (rows) are in the dataset?
- How many variables (columns) are in the dataset?
- What is the type of the first variables in the dataset?
- Print the first few and last few entries in the dataset.
Exercise 6: Install and load R packages
Install and load the following R packages:
Spend a bit of time reading about these packages on their website documentation.
Exercise 6: Data frame operations in a Quarto document
In this task, let’s start using the other document we created, the .qmd file. This file format allows you to combine both longer written text (such as detailed descriptions of your data analysis process or the main body of a report or journal article) with code chunks. To get you started using this file format, read Chapter 3.2. in TSD. Below we will focus only on the code chunks.
Compared to what you have done in the R script, in the main Quarto document a # refers to a heading level rather than a comment. If you want to include a code chunk, you can click on the +C tab in the upper-right corner of the .qmd document’s toolbar, or use the keyboard shortcut Ctrl+Alt+i. In the code chunk you would write in the same way as you did in the R script (they are basically mini-scripts). Within a code-chunk, therefore, the # still refers to a comment.
To execute a command withing a code chunk, you can either run each line/selection separately using Ctrl+Enter as in the R script, or you can run the entire content of the chunk with the green right-pointing triangle-arrow in the upper-right corner of the chunk.
Let’s continue doing some operations on the mtcars dataset we looked at earlier, this time using some useful tidyverse functions.
Let’s subset the data frame by selecting certain rows or columns. In tidyverse, you can do this with the filter()
function for selecting rows and the select()
function for selecting columns. Here we pipe the selections into head()
to show the first few rows. You could also use the dplyr::slice_head
function
mtcars |>
select(mpg, wt) |>
head()
To select the cars with eight cylinders:
mtcars |>
filter(cyl == 8)
We can use the slice()
function. For example, to get the 5th through 10th rows:
mtcars |>
slice(5:10)
If we pass a vector of integers to the select
function, we will get the variables corresponding to those column positions. So to get the first through third columns:
mtcars |>
select(1:3) |>
head()
If you call summary()
a data frame, it produces applies the vector version of the summary command to each column:
summary(mtcars)
These few tasks should be enough to get you started with R and RStudio.
If this was your first encounter with R, you can complete the R for Social Scientists online training too sometime during the week.
From next week we will begin working actively with real data and address specific data management challenges that arise from there.
Those of you who have worked on the advanced user exercise can check some optional solutions below.
leap_year_v1 <- function(year1,year2) {
year <- year1:year2
year[(year%%4==0 & year%%100!=0) | year%%400==0]
}
leap_year_v2 <- function(year1,year2){
vector<-c()
for(year in year1:year2){
if((year %% 4 == 0) & (year %% 100 != 0) | (year %% 400 == 0)){
vector<-c(vector,year)
}}
return(vector)}
leap_year_v3 <- function(year1,year2){
#make a vector of all years
year<-year1:year2
#find the leap years (TRUE/FALSE)
leaps<-ifelse((year %% 4 == 0) & (year %% 100 != 0) | (year %% 400 == 0), TRUE, FALSE)
year[leaps] #return the leap years
}