Hi All,
In the first part and the second part of the ggplot series, I have mentioned about geom and scale layers. Now, let’s dig into various types of geoms.
1. geom_boxplot()
This geom is used for creating a boxplot for showing the distribution. x
and y
variables are defined in the aes
argument in ggplot() function. x = 1
means creating a single boxplot. x = column name
means creating multiple boxplots placed side by side.
Example 1: single boxplot
mtcars %>% ggplot(aes(x=1, y=disp)) + geom_boxplot() .

Example 2: multiple boxplots
mtcars %>% ggplot(aes(x=factor(cyl), y=disp)) + geom_boxplot()

2. geom_abline()
geom_abline(slope=xx, intercept=xx) –> adds a line with specified slope and intercept to the plot.
3. geom_bar()
For creating bar chart, where x = factor variable.
Usage: geom_bar(position='stack', stat = "count", width = NULL, col = NULL, fill = NULL)
The arguments:
- position = how the bars will be arranged
- “stack” –> default –> different categories will be stacked.
- “identity” –> overlapping the bar instead of stacking it.
- “fill” –> shows proportion
- “dodge” –> side by side.
–> Dodging preserves the vertical position of a geom while adjusting the horizontal position.
–> We can adjust the horizontal position using argument position = position_dodge(width = xx) .
–> example:
123456posn.d <- position_dodge(width = 0.9)# Plot 3: Redraw dynamite plotm +stat_summary(fun.y = mean, geom = "bar", position = posn.d) +stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", width = 0.1, position = posn.d)
Output:Example of bar plot with dodge width = 0.9 The smaller the width of the dodge is, the more overlapped the bars for each category. Using dodge width = 0.3 will return the following result. I use alpha = 0.6 just to show the overlapping bars.
R bar plot with dodge width = 0.3
- “stack” –> default –> different categories will be stacked.
- stat = the value for y axis. –> The statistical transformation to use on the data for this layer.
- ‘count’ –> default –> the bar height represents the count of cases –> as can be seen in the example above.
- ‘identity’ –> leaves the data as is (leaves the y values unchanged).
- width = bar width.
- col = the color of bar’s outline
- fill = the color of the inside of the bar.
4. geom_col()
Usage: geom_col(width = NULL, fill = NULL, position = 'stack') .
It’s a variant of geom_bar()
–> is a shortcut for geom_bar(stat = "identity")
, which means leaves data as is.
Example:
1 2 3 |
ggplot(disease_counts, aes(x = region, y = total_cases, fill = disease)) + # Add a column geometry with the proper position value. geom_col(position = 'fill') |
5. geom_histogram()
Usage: geom_histogram(binwidth = NULL, position='stack', na.rm = FALSE, aes(...)) .
Arguments:
- binwidth = the width of each bin. if it is not specified, then by default the histogram will have 30 bins.
- by default, the y-axis of the histogram represents the count of values in each bin. There are 4 computed variables available for y axis: count (default), density, ncount, ndensity. We can modify the y-axis value by changing the computed variables mapped onto the y axis. Notes: use double dots (..) before and after the computed variables.
- geom_histogram(aes(y = ..count..)) –> default. –> number of points in bin
-
geom_histogram(aes(y = ..density..)) or
geom_histogram(aes(y = stat(density))) –> density of points in bin, scaled to integrate to 1.
- The vertical scale of a ‘density histogram’ shows units that make the total area of all the bars add to 1. This makes it possible to show the density curve of the population using the same vertical scale.
- for a histogram that is colored based on certain column (stacked histogram), the density calculates the proportion across the category, and not across bin. As solution, use geom_histogram(aes(y = ..count../sum(..count..))) instead to get a better visualization of the population density.
- geom_histogram(aes(y = ..ncount..)) –> normalized count, scaled to maximum of 1
- geom_histogram(aes(y = ..ndensity..)) –> normalized density, scaled to maximum of 1
- We can also define our own customized y aesthetic such as: geom_histogram(aes(y = ..count../sum(..count..))) .
- We can also add some aesthetics such as fill, etc.center = the position of the center of each bin. By default, the center will be the category of the bin
center = 0.5
–> each bin will have its center positioned at value + 0.5.
Example with default center.
Example with center = 0.5
-
- Example 1: histogramming the count of x
12ggplot(mtcars, aes(x = mpg)) +geom_histogram(binwidth = 1, fill = "#377EB8", aes(y = ..ndensity..))
- Example 2: histogramming the count of x colored by certain column –> creating stacked histogram.
12ggplot(mtcars, aes(x = mpg, fill = factor(cyl))) +geom_histogram(binwidth = 1, aes(y = ..ndensity..))
- Example 1: histogramming the count of x
6. geom_point()
Things that we can add inside geom_point():
- alpha –> for transparency.
Example: ggplot(diamonds, aes(x = carat, y = price, color = clarity)) + geom_point(alpha = 0.4) - aes() –> x, y, color (outline color), fill (inside shading color), size, alpha, shape (shape of the point), linetype, label.
Various codes of point shapes that we can choose can be seen below.
Note that all arguments insideaes()
can also be put directly in the geom without aes(). When using the above symbol insideaes()
, we cannot directly use the symbol number as the value for the shape. For example, writing something like
ggplot(mtcars) + geom_point(aes(x=wt, y=mpg, col=my_color, fill=cyl, size=10, shape=1))
will result in an error:
Error: A continuous variable can not be mapped to shape.
To solve this error, we have to add another layer scale_shape_identity() , so that the complete syntax will be:
123ggplot(mtcars, aes(x=wt, y=mpg, fill=cyl)) +geom_point(aes(col="#4ABEFF", size=10, shape=23)) +scale_shape_identity()
If we use the above symbols outside aes(), we do not need to use scale_shape_identity() .
Example:
12ggplot(mtcars, aes(x = wt, y = mpg, fill = cyl)) +geom_point(size = 10, shape = 23, color = my_color)
Shape can also be a column name. For example: ggplot(diamonds, aes(x = carat, y = price)) + geom_point(aes(color=clarity))
ggplot() + geom_point() has a shortcut version –> quick plot a.k.a qplot().
The notation is similar to that of basic plot:
qplot(x, y, data, geom = 'auto') . If geom = 'auto'
, it will result a scatterplot. Different treatment for x
and y
return different result.
Example:
1 2 3 |
# qplot() with x only # just x supplied = histogram qplot(x = factor(cyl), data = mtcars) |
1 2 3 |
# gplot with y only # just y supplied = scatterplot, with x = seq_along(y) qplot(y = qsec, data = mtcars) |
1 2 3 |
# qplot() with x and y # both x and y supplied = scatterplot qplot(x = factor(cyl), y = factor(vs), data = mtcars) |
1 2 3 |
# qplot() with geom set to jitter manually # geom = 'jitter' qplot(x = factor(cyl), y = factor(vs), data = mtcars, geom = 'jitter') |
7. geom_beeswarm(data, cex, alpha)
needs an additional library library(ggbeeswarm) .
The arguments:
- cex = the point size.
- alpha = for transparency. the smaller alpha, the more transparent the plot is.
Example:
1 2 3 4 5 6 7 8 9 10 |
# Load library for making beeswarm plots library(ggbeeswarm) md_speeding %>% filter(vehicle_color == 'RED') %>% ggplot(aes(x = gender, y = speed)) + # change point size to 0.5 and alpha to 0.8 geom_beeswarm(alpha = 0.8, cex = 0.5) + # add a transparent boxplot on top of points geom_boxplot(alpha = 0) |
output:
8. geom_density()
For creating Kernel Density Estimation (KDE), which visualizes probability density function (PDF) of a random variable.
Usage: geom_density(alpha, fill, bw) .
Arguments:
- w = bandwidth
- fill = color that will fill the KDE graph.
Example:
1 2 3 |
gap2007 %>% ggplot(aes(x = lifeExp, fill = continent)) + geom_density(alpha = 0.3) |
Output:
9. geom_emoji()
Create an emoji scatter plot instead of a dot scatter plot. Don’t forget to import and install the required library.
1 2 |
devtools::install_github("dill/emoGG") library(emoGG) |
Example:
1 2 3 |
ggplot(data = ToothGrowth) + geom_emoji(aes(dose, len), emoji = "1f439") + labs(x = "Dose (mg/day)", y = "Tooth length") |
10. geom_lollipop()
Create a bar chart-like graph. But instead of a bar, we use lollipop.
Example:
1 2 |
ggplot() + geom_lollipop(aes(language, count), size = 2, col = "salmon") |
12. geom_line()
For creating a line chart.
Usage: geom_line(mapping = NULL, data = NULL, stat = "identity", position = "identity", aes(group = colx), ...)
Argument:
- group = colx –> if we want to create one line for each category in the column colx.
There are still plenty of other geoms. I will keep updating this post.
Cheers.