Introduction to R for MassWateR

RStudio

RStudio is the go-to Interactive Development Environment (IDE) for R. Rstudio includes many features to improve the user’s experience.

Let’s get familiar with RStudio.

Open R and RStudio

Find the RStudio shortcut on your computer and fire it up. You should see something like this:

There are four panes in RStudio:

  • Source: Your primary window for writing code to send to the console, this is where you write and save R “scripts”
  • Console: This is where code is executed in R
  • Environment, History, etc.: A tabbed window showing your working environment, code execution history, and other useful things
  • Files, plots, etc.: A tabbed window showing a file explorer, a plot window, list of installed packages, help files, and viewer

Scripting

In most cases, you will not enter and execute code directly in the console. Code can be written in a script and then sent directly to the console.

Open a new script from the File menu…

Executing code in RStudio

After you write code in an R script, it can be sent to the Console to run the code. There are two ways to do this. First, you can hit the Run button at the top right of the scripting window. Second, you can use ctrl+enter (cmd+enter on a Mac). Either option will run the line(s) of script that are selected.

R language fundamentals

R is built around functions. The basic syntax of a function follows the form: function_name(arg1, arg2, ...).

With the base install, you will gain access to many functions (2320, to be exact). Some examples:

# print
print("hello world!")
[1] "hello world!"
# sequence
seq(1, 10)
 [1]  1  2  3  4  5  6  7  8  9 10
# random numbers
rnorm(100, mean = 10, sd = 2)
  [1]  4.410223  8.769791 10.488501  8.277897  8.707751  8.967271  8.413104
  [8]  7.626492  8.486753  7.146864 13.144194  8.484648 12.362187  9.786854
 [15] 14.175568 11.738758 10.164867 10.172953  7.608978  5.862138  9.620646
 [22] 10.223982 11.200023 12.245523 10.642966 10.461397 14.383220 11.934940
 [29] 12.040518 11.261272 10.742810 13.430168  8.689879 10.799983  6.641578
 [36] 13.827513  9.450230 10.586984  8.753153  7.950180 11.507410  8.025754
 [43] 11.202164 11.881891  7.972065  8.266329  8.734186  7.907956 11.781619
 [50]  9.146494  8.008290  9.410634  9.247903 11.634368 10.755694 12.239223
 [57]  7.552802  8.229018 10.389892 11.574668 11.743257  8.408076 11.849789
 [64] 14.557338  9.322093  7.718355  8.771761 10.181788 10.975010  9.295106
 [71] 12.171848 10.659645  9.398790 11.748145  8.166172  9.578124  8.781566
 [78] 11.571578 11.341385  8.949168  4.820264 10.241716  8.963576 10.778825
 [85] 12.935514 10.707099 12.091322  9.330601  9.123449  9.580845  7.304165
 [92]  8.534874  9.378703  8.838073  8.074070 11.133118 13.415193 12.945664
 [99]  8.294765 10.017813
# average 
mean(rnorm(100))
[1] 0.1511307
# sum
sum(rnorm(100))
[1] -5.591465

Very often you will see functions used like this:

my_random_sum <- sum(rnorm(100))

The first part of the line is the name of an object that you make up. The second bit, <-, is the assignment operator. This tells R to take the result of sum(rnorm(100)) and store it in an object named, my_random_sum. It is stored in the environment and can be used by just executing it’s name in the console.

my_random_sum
[1] 12.78607

What is the environment?

There are two outcomes when you run code. First, the code will simply print output directly in the console. Second, there is no output because you have stored it as a variable using <-. Output that is stored is saved in the environment. The environment is the collection of named objects that are stored in memory for your current R session.

Packages

The base installation of R is quite powerful. Packages allow you to include new methods for use in R.

CRAN

Many packages are available on CRAN, The Comprehensive R Archive Network. This is where you download R and also where most will gain access to packages. As of 2023-12-13, there are 20127 packages on CRAN!

Installing packages

When a package gets installed, that means the source code is downloaded and put into your library. A default library location is set for you.

We use the install.packages() function to download and install a package. Here, we install the readxl package, used below, which is used to upload data from and Excel file.

install.packages("readxl")

You should see some text in the R console showing progress of the installation and a prompt after installation is done.

After installation, you can load a package using the library() function. This makes all functions in a package available for you to use.

library(readxl)

We also want to install MassWateR from CRAN.

# Install the package
install.packages("MassWateR")

An important aspect of packages is that you only need to download them once, but every time you start RStudio you need to load them with the library() function.

Data structures in R

Now we can talk about R data structures. Simply put, a data structure is a way for programming languages to handle information storage.

Vectors (one-dimensional data)

The basic data format in R is a vector - a one-dimensional grouping of elements that have the same type. These are all vectors and they are created with the c (concatenate) function:

dbl_var <- c(1, 2.5, 4.5)
int_var <- c(1L, 6L, 10L)
log_var <- c(TRUE, FALSE, T, F)
chr_var <- c("a", "b", "c")

The four types of vectors are double (or numeric), integer, logical, and character. The following functions can return useful information about the vectors:

class(dbl_var)
[1] "numeric"
length(log_var)
[1] 4

Data frames (two-dimensional data)

A collection of vectors represented as one data object are often described as two-dimensional data, like a spreadsheet, or in R speak, a data frame. Here’s a simple example:

ltrs <- c("a", "b", "c")
nums <- c(1, 2, 3)
logs <- c(T, F, T)
mydf <- data.frame(ltrs, nums, logs)
mydf
  ltrs nums  logs
1    a    1  TRUE
2    b    2 FALSE
3    c    3  TRUE

The only constraints required to make a data frame are:

  1. Each column (vector) contains the same type of data

  2. The number of observations in each column is equal.

Getting your data into R

It is the rare case when you manually enter your data in R. Most data analysis workflows typically begin with importing a dataset from an external source. We’ll be using read_excel() function from the readxl package.

We can import the ExampleSites.xlsx dataset as follows. Note the use of a relative file path. You can see what R is using as your “working directory” using the getwd() function.

sitdat <- read_excel("data/ExampleSites.xlsx")

Let’s explore the dataset a bit.

# get the dimensions
dim(sitdat)
[1] 11  5
# get the column names
names(sitdat)
[1] "Monitoring Location ID"        "Monitoring Location Name"     
[3] "Monitoring Location Latitude"  "Monitoring Location Longitude"
[5] "Location Group"               
# see the first six rows
head(sitdat)
# A tibble: 6 × 5
  `Monitoring Location ID` `Monitoring Location Name` Monitoring Location Lati…¹
  <chr>                    <chr>                                           <dbl>
1 ABT-026                  Rte 2, Concord                                   42.5
2 ABT-062                  Rte 62, Acton                                    42.4
3 ABT-077                  Rte 27/USGS, Maynard                             42.4
4 ABT-144                  Rte 62, Stow                                     42.4
5 ABT-237                  Robin Hill Rd, Marlboro                          42.3
6 ABT-301                  Rte 9, Westboro                                  42.3
# ℹ abbreviated name: ¹​`Monitoring Location Latitude`
# ℹ 2 more variables: `Monitoring Location Longitude` <dbl>,
#   `Location Group` <chr>
# get the overall structure
str(sitdat)
tibble [11 × 5] (S3: tbl_df/tbl/data.frame)
 $ Monitoring Location ID       : chr [1:11] "ABT-026" "ABT-062" "ABT-077" "ABT-144" ...
 $ Monitoring Location Name     : chr [1:11] "Rte 2, Concord" "Rte 62, Acton" "Rte 27/USGS, Maynard" "Rte 62, Stow" ...
 $ Monitoring Location Latitude : num [1:11] 42.5 42.4 42.4 42.4 42.3 ...
 $ Monitoring Location Longitude: num [1:11] -71.4 -71.4 -71.4 -71.5 -71.6 ...
 $ Location Group               : chr [1:11] "Assabet" "Assabet" "Assabet" "Assabet" ...

You can also view a dataset in a spreadsheet style using the View() function:

View(sitdat)

Summary

In this intro we learned about R and Rstudio, some of the basic syntax and data structures in R, and how to import files.