# Simple Graphs in R

Abstract

A well-made graph provides us a valuable insight to better understand and analyse data. This post gives us a very simple introduction to graphs in R with ggplot2 through exemples and with the codes.

First we install and call the required package: tidyverse

``install.packages("tidyverse")``
``library(tidyverse)``

Now we start with an exemple. The mpg dataset is in R :

``````mpg
``````
```## # A tibble: 234 x 11
##    manufacturer model    displ  year   cyl trans   drv     cty   hwy fl
##
##  1 audi         a4        1.80  1999     4 auto(l… f        18    29 p
##  2 audi         a4        1.80  1999     4 manual… f        21    29 p
##  3 audi         a4        2.00  2008     4 manual… f        20    31 p
##  4 audi         a4        2.00  2008     4 auto(a… f        21    30 p
##  5 audi         a4        2.80  1999     6 auto(l… f        16    26 p
##  6 audi         a4        2.80  1999     6 manual… f        18    26 p
##  7 audi         a4        3.10  2008     6 auto(a… f        18    27 p
##  8 audi         a4 quat…  1.80  1999     4 manual… 4        18    26 p
##  9 audi         a4 quat…  1.80  1999     4 auto(l… 4        16    25 p
## 10 audi         a4 quat…  2.00  2008     4 manual… 4        20    28 p
## # ... with 224 more rows, and 1 more variable: class
```

## A first visualisation of the data with ggplot

With the mapping = aes() argument, we select the x and y axis of our graph. We also notice that we “build up” our graph, ggplot() creates a coordinate system that we can add layers to, starting from the selection of the data set, followed by the selection of axis.

ggplot2 comes with many geom function arguments, each adds a different type of layer to the plot, each being a mapping argument.

``````ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) # displ= engine size , hwy = fuel efficiency
``````

We notice a negative relationship between the engine size and fuel efficiency. Does this give us enough evidence to back up our hypothesis about fuel efficiency and engine size? Probably not. Let’s continue our discovery of the ggplot.

Suppose we want to differentiate between certain types of observations. In this example, for instance, we ghave a variable, “class”, showing whether the car is a compact, midsize, and SUV car. As this can affect largely the fuel efficiency, we would like to graph the cars accordingly.

To show the cars of different class on the previous graph, we can simply add “color = class” into the aes argument. Note that R assigns automatically a unique color for each class and adds a legend to the graph.

``````ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
``````

Note also that we mapped class into color caracteristics but we could have mapped it into size (by adding “size = class” into the aes argument), or to the alpha aesthetic (which controls for the transparency of the points, by adding “alpha = class”), or the shape of the points (by adding “shape = class” into the aes): as the following codes and graphs show.

We also change the color by the “color = “darkblue”“ argument (outside of the aes argument as here the color does not convey a supplemenatry information about the data, it makes it only prettier). You can try to change the color by putting any colors between the ”“.

``````ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = class), color = "darkblue")  # I am also changing the color
``````
```## Warning: Using size for a discrete variable is not advised.
```

``````ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, alpha = class), color = "darkred")
``````

``````ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, shape = class), color = "darkgreen")``````

We can also see the inconveniences of each methodes:

• size: not advised, as not visible and it is difficult to differentiate between the different sizes. Big point can also cover up small ones..
• aplha: it looks okay on the graph but generally, it can be difficult to visualise all observations
• shape: ggplot2 will only use six shapes at a time. By default, additional groups will go unplotted when you use this aesthetic so we cannot use this for more than 6 classes..

I think the best is color but of course, this will depend on the exact data we use. (Color also make your data more friendly 🙂 ) Note also that we can change the scale by adding a palette.

### Rule of thumbs for ggplot options:

• size of point in mm
• color as a string
• shapes as shown in the following picture:

## Facets

Another way to differentiate between the classes is to visualise each of them on a seperate graph. I do not personally like this method as we lose the relative comparison of each class to the other (difficult to imagine all observation together), but with the “facet_wrap” argument, it is easily made.

``````ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "darkblue") +
facet_wrap(~ class, nrow = 2)
``````

## Geoms – Geometric objects

A geom is the geometrical object that a plot uses to represent data. For example, bar charts use bar geoms, line charts use line geoms, boxplots use boxplot geoms, and so on. To change the geom in our plot, we simply change the geom function that we add to ggplot().

For instance, we can graph the same data set as:

``````ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy), color = "darkblue")  # I am also changing the color
``````
```## `geom_smooth()` using method = 'loess'
```

Note that not every aesthetic works with every geom. It does not make sense for instance to show the above graph as a geom_bar, but would make if we wanted to illustrate the frequancy of a variable.

ggplot2 provides over 30 geoms, and extension packages provide even more (see https://www.ggplot2-exts.org for a sampling or http://rstudio.com/cheatsheets )

Finally we see that it is a good idea to use a geom_smooth for our data but it would be even better to show the points of observations also. To do this, we can just add multiple geoms to our graphs :

``````ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "darkblue") +
geom_smooth(mapping = aes(x = displ, y = hwy))
``````
```## `geom_smooth()` using method = 'loess'
```

We also notice that we use the same aes argument twice, and therefore we are repeating ourselves. Would be probably better to only use it once and then just add the geoms.

If we put the aes in a geom function, ggplot2 will treat them as local mappings for the layer and therefore overwrites the global mappings for that layer only. This makes it possible to display different aes in different layers as shown on the following graph:

`````` ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) + # overwrites the global aes setting
geom_smooth() # uses the global aes settings
``````
```## `geom_smooth()` using method = 'loess'
```

## Basic statistical graphs

Now let’s turn to basic and simple statistical transformations of the data. The following graph shows the frequency of each car types in our dataset. We see therefore that the geom_bar calculate the frequency (count) of each class and does not simply graphs the data. We can also add a fill option.

``````ggplot(data = mpg) +
geom_bar(mapping = aes(x = class, fill = class))
``````

Other graphs, that add new values to the graphs:

• Bar charts, histograms, and frequency polygons shows frequency (counts)
• Smoother fits a model to our data, then plots predictions from the model.
• Boxplot computes summary of the distribution and displays a box accordingly.

This was just a simple introduction to graphs. Now we can try to visualise all kind of data.