Introduction to for HSS8005+

Description

This coding playground introduces some basic R operations. Its aim is to introduce at a basic level those elements of the R programming language that will be most useful for the applied purposes of this course. It is therefore not a comprehensive introduction to R, but a pragmatic one.

This page contains interactive WebR code fields. If the WEBR STATUS below is 🟢Ready!, then the fields are ready to evaluate the code within them. You can edit the code in the fields as necessary, even completely replacing the default content. The ‘Start over’ refresh button at the top right end of the fields reinstates the original content. The code in the fields can also be copied, either by manually selecting it or using the copy button at the top right end of the fields.

Function list

Package : : Function	Purpose
`help()`	The primary interface to R’s built-in help system. It’s main mandatory attribute is `topic`, usually a name or character string specifying the topic for which help is sought. See, for example, below, where we use the function to get information on built-in arithmetic and logical operators.
`utils::help(Arithmetic)`	R’s built-in arithmetic operators
`utils::help(Comparison)`	R’s built-in relational operators

If you have more advanced knowledge of and you find the exercises below too easy, here’s an programming challenge for you.

Suppose you want to write a function that lists all the leap years between two specified years. How would you go about writing it? What are the information that you need first? What are the steps that you would take to build up the function? There are several ways of achieving such a function, and you can find three options at the bottom of this worksheet. Work individually or in a small group.

Tip:

You first need to check the definition of a leap year and how it is calculated; you can Google this.
You then need to find out how the arithmetic operations involved in the calculation are implemented in ; try help(Arithmetic) or ?Arithmetic.
When you are done, you can check your results against the example solutions given at the bottom of this page

Elements

Expressions

Expressions or statements are the fundamental units of work in R. The R computing environment evaluates expressions/statements written in the R programming language. Expressions/statements are made up of functions, operators (special characters) and data objects.

The code block below contains an example of each. Clicking Run Code will run all lines of code in the block; to run the expressions one-by-one, click on the desired code line and execute Ctrl + ENTER / Command + RETURN.

Operators and other special characters

We have encountered the + operator above. We have also found out from the R Documentation that + is one of several *arithmetic operators.R` organises its various operators into the following groups:

Arithmetic operators (see help(Arithmetic))
Assignment operators (see help(assignOps))
Comparison/Relational operators (see help(Comparison))
Logical operators (&, &&, |, ||, !, isTRUE, isFALSE)
Miscellaneous operators (:, %in%, %*%)

Find out more about each by searching the R Documentation using the help() function:

There are also a number of other special characters used in R. For example, it’s worth being aware of the role of different types of quotes (check help(Quotes)). Single (') and double quotes (") delimit character constants. They can be used interchangeably but double quotes are preferred (and character constants are printed using double quotes), so single quotes are normally only used to delimit character constants containing double quotes (e.g. 'here are "double quotes" in single quotes'. Backslash (```) is used to start an escape sequence inside character constants.

To check how this works, let’s get back into Nineteen Eighty-Four territory:

Functions

Most of the work in R is done using functions. It’s possible to create your own functions. This makes R extremely powerful and extendible. We’re not going to cover making your own functions in this course, but it’s important to be aware of this capability. There are plenty of good resources online for learning how to do this, including this one.

R as a simple calculator

The most elementary yet still handy task you can use for is to perform basic arithmetic operations. This is useful for getting a first experience doing things in the language. This is aided by R’s arithmetic operators, which we can find out more about if we type and execute the command help(Arithmetic). Generally, the help() function allows us to access R’s internal helper pages.

Let’s say we want to know the result of adding up three numbers: 1, 3 and 5:

This will print out the result (9).

The [1] in the result is just the line number; in this case, our result only consists of a single line.

We can also save the result of this operation as an object, so we can use it in further operations. We create objects by using the so-called assignment operator consisting of the characters <- (resembling a left-arrow).

A command involving <- can be read as “assign the value of the result from the operation on the right hand side (some expression) to the object on the left hand side (short name of object, single word, with no spaces)”.

For example, let’s save our result in an object called “nine”:

Notice that there is no output printed this time. But there are also no error messages, so the operation must have run without problems. Instead, if you are working in the Console within RStudio, the newly created objects will be listed in the Environment pane; you should have an object called “nine” there that stores the single value “9” in it. We can now use this object for other operations, such as:

We can also check results of so-called relational operations. There are several relational operators that allow us to compare objects in R. The most useful of these are the following:

> greater than, >= greater than or equal to
< less than, <= less than or equal to
== equal to
!= not equal to

When we use these to compare two objects in R, we end up with a logical object.

For example, let’s check whether 9 is greater than 5, and whether it is lower than 8:

treats our inputs as statements that we are asking it to evaluate, and we get the answers “TRUE” and “FALSE”, respectively, as we would expect. Let’s now check whether our object “nine” is equal to the number 9. We may assume that we can achieve this by typing “nine = 9”, but let’s see what that results in:

Did we get the result we expected? Nothing was printed in the output, so seemingly nothing happened… That’s because the “=” sign is also used as an assignment operator in , just like “<-”. So we basically assigned the value “9” to the object “nine” again. To use the equal sign as a logical operator we must type it twice (==). Let’s see:

Now we get the answer “TRUE”, as expected.

This distinction between “=” and “==” is important to keep in mind. What would have happened if we had tried to test whether our object “nine” equals value “5” or not, and instead of the logical operator (==) we used the assignment operator (=)? Let’s see:

In the Console we again see no results printed, but if we check our Environment, we see that the value of the object “nine” was changed to 5. So it can be a dangerous business. We’ll be using the “<-” as assignment operator instead of “=” to avoid any confusion in this respect. The distinction between == and = will also emerge in other contexts later.

So, try out the following commands in turn now and check if the results are what you’d expect:

The text following the hashtag (#) in the last line is a comment. If you’d like to comment on any code you write just add a hash (#) or series of hashes in front of it so that knows it should not evaluate it as a command. This will be useful when writing your commands in an R script that you can save for later, rather than interacting with live in the Console.

Vectors, data types and structures

The basic elements of data in R are called vectors. R has 6 basic data types that you should be aware of:

character: a text string, e.g. “name”
numeric: a real or decimal number
integer: non-decimal number; often represented by a number followed by the letter “L”, e.g. 5L
logical: TRUE or FALSE
complex: complex numbers with real and imaginary parts

R provides several functions to examine features of vectors and other objects, for example:

class() - what kind of object is it (high-level)?
typeof() - what is the object’s data type (low-level)?
length() - how long is it? What about two dimensional objects?
attributes() - does it have any metadata?

Vector operations

Let’s learn a few vector operations.

First, let’s use the c() function to concatenate or ccombine vector elements:

To run this line of code in an R script, place the cursor on the line you want to execute and either click on the small “Run” tab in the upper-right corner of the script’s task bar, or click Ctrl+Enter (on Windows PCs).

The vector called x that we just created appears in the Environment. We can examine some of its features:

These tell us something about the characteristics of the object, but not much about its content (apart from the fact that it has a length of 5). Functions such as min, max, range, mean, median, sum or summary give us some summary statistics about the object:

The seq() function lets us create a sequence from a starting point to an ending point. If you specify the by argument, you can skip values. For instance, if we wanted a vector of every 5th number between 0 and 100, we could write:

To print out the result in the console, we can simply type the name of the object:

A shorthand version to get a sequence between two numbers counting by 1s is to use the : sign. For example, print out all the numbers between 200 and 250:

To access a single element of a vector by position in the vector, use the square brackets []:

If you want to access more than one element of a vector, put a vector of the positions you want to access in the brackets:

If you try to access an element past the length of the vector, it will return a missing value NA:

If you accidentally subset a vector by NA (the missing value), you get the vector back with all its entries replaced by NA:

Let’s say you want to modify one value in your vector. You can combine the square bracket subset [] with the assignment operator <- to replace a particular value:

You can replace multiple values at the same time by using a vector for subsetting:

If the replacement vector (the right-hand side) is shorter than what you are assigning to (the left-hand side), the values will “recycle” or repeat as necessary:

You can also create a vector of characters (words, letters, punctuation, etc):

Note for vectors, you cannot mix characters and numbers in the same vector. If you add a single character element, the whole vector gets converted.

Logical vectors are just vectors that only contain the special values TRUE or FALSE.

TRUE and FALSE can be shortened to T and F, respectively. However, it is stylistically recommended to spell them out. Also, note that is case-sensitive, so typing something like true, True or fALSE would give an error.

Solutions to the advanced exercise: leap year functions

First, we need to make sure that we have a definition of a leap year. We can go for this one, from the Royal Museums of Greenwich:

To be a leap year, the year number must be divisible by four – except for end-of-century years, which must be divisible by 400. This means that the year 2000 was a leap year, although 1900 was not.

So, we need to write a function that calculates whether a given number is divisible by another or not. We still need to simplify this further conceptually. If a number is divisible by another, that means that there is no ‘remainder’, or in other words, that the ‘remainder’ is 0. Is there a mathematical operation that could come in handy here? We may remember that the modulo operation gives the remainder when dividing. For example, “5 modulo 3 = 2” because \({5\over3}=1\) with a remainder of \(2\).

Knowing this will be extremely helpful, but we also need to know how the modulo operation is implemented in . We could do a search in for ?Arithmetic to find a list of arithmetic operators, and we’d find that the operator %% stands for the modulo operation.

We now have all the knowledge elements we need to get down to programming a leap-year function in :