Matplotlib
Matplotlib is one of the Python’s libraries that can be used to create a visualization. Other libraries are available such as:
Seaborn and Bokeh will be discussed in different posts. Enjoy!
Steps in creating the plot using matplotlib
I. Import the Library
import matplotlib.pyplot as plt.
II. Create the Plot
- Learn the differences between plt.xxx , df.plot(kind=xxx) , plt.axes() , and plt.subplot(...) .
- if it’s a series –> accessing from the library (
plt. )
-
plt.plot(x, y, color = 'colorx', label = 'labelx') –> create line plot.
plt.plot(x, y, color, marker, linestyle) –> create line plot with dots to represent the data points.
Notes:- x, y = data
- color = the color used for the line –> ‘red’ (‘r’), ‘green’ (‘g’), ‘blue’ (‘b’), etc.
character color 'b'
blue 'g'
green 'r'
red 'c'
cyan 'm'
magenta 'y'
yellow 'k'
black 'w'
white - marker = the symbol to be used to represent the data points –>
. , o D d * .
character description '.'
point marker ','
pixel marker 'o'
circle marker 'v'
triangle_down marker '^'
triangle_up marker '<'
triangle_left marker '>'
triangle_right marker '1'
tri_down marker '2'
tri_up marker '3'
tri_left marker '4'
tri_right marker 's'
square marker 'p'
pentagon marker '*'
star marker 'h'
hexagon1 marker 'H'
hexagon2 marker '+'
plus marker 'x'
x marker 'D'
diamond marker 'd'
thin_diamond marker '|'
vline marker '_'
hline marker - linestyle = the style of the line.
character description '-'
solid line style '--'
dashed line style '-.'
dash-dot line style ':'
dotted line style - label = the string that will be used for the legend.
- calling plt.plot() several times will result in several number of line plot generated on the same axes.
Example:
1234567891011121314151617181920212223# Import matplotlib.pyplotfrom matplotlib import pyplot as plt# Plot the aapl time series in blueplt.plot(aapl, color='blue', label='AAPL')# Plot the ibm time series in greenplt.plot(ibm, color='green', label='IBM')# Plot the csco time series in redplt.plot(csco, color='red', label='CSCO')# Plot the msft time series in magentaplt.plot(msft, color='magenta', label='MSFT')# Add a legend in the top left corner of the plotplt.legend(loc='upper left')# Specify the orientation of the xticksplt.xticks(rotation=60)# Display the plotplt.show()
Output:
-
plt.contour(x, y, z, n, cmap='xxx') and
plt.contourf(x, y, z, n, cmap='xxx') .
- create a contour map and filled contour map of z, respectively, using a meshgrid x and y as the axes.
- n = number of contour being used. optional. by default, n = 10
- cmap = color map –> mapping the data values into corressponding colors.
Note: It’s always useful to add plt.colorbar() everytime we work with colors so that the plot will have abar-legend
like on the plot.- several names for the color map:
- unique –>
jet
,coolwarm
,magma
,viridis
. - season –>
summer
,autumn
,winter
,spring
. - overall colors –>
greens
,reds
,blues
,purples
.
- unique –>
- several names for the color map:
- Example 1:
123456789101112131415161718192021# Generate a default contour map of the array Zplt.subplot(2,2,1)plt.contour(X, Y, Z)# Generate a contour map with 20 contoursplt.subplot(2,2,2)plt.contour(X, Y, Z, 20)# Generate a default filled contour map of the array Zplt.subplot(2,2,3)plt.contourf(X, Y, Z)# Generate a default filled contour map with 20 contoursplt.subplot(2,2,4)plt.contourf(X, Y, Z, 20)# Improve the spacing between subplotsplt.tight_layout()# Display the figureplt.show()
Output:
- Example 2: contour with cmap and plt.colorbar().
123456789101112131415161718192021222324252627# Create a filled contour plot with a color map of 'viridis'plt.subplot(2,2,1)plt.contourf(X,Y,Z,20, cmap='viridis')plt.colorbar()plt.title('Viridis')# Create a filled contour plot with a color map of 'gray'plt.subplot(2,2,2)plt.contourf(X,Y,Z,20, cmap='gray')plt.colorbar()plt.title('Gray')# Create a filled contour plot with a color map of 'autumn'plt.subplot(2,2,3)plt.contourf(X, Y, Z, 20, cmap = 'autumn')plt.colorbar()plt.title('Autumn')# Create a filled contour plot with a color map of 'winter'plt.subplot(2,2,4)plt.contourf(X, Y, Z, 20, cmap = 'winter')plt.colorbar()plt.title('Winter')# Improve the spacing between subplots and display themplt.tight_layout()plt.show()
Output:
-
plt.hexbin(x, y, gridsize = (nx, ny), extent = []) .
- make 2D histogram plot composed of hexagonal bins.
- Notes:
- gridsize = the number of hexagons
- extent = specify the area covered with bins.
Format: extent=[x_start, x_end, y_start, y_end]
- Example:
plt.hexbin(hp, mpg, gridsize=(15, 12), extent=[40, 235, 8, 48])
-
plt.hist(data, bins = 10, range=None, color='colorname', normed=None, histtype='bar') –> create histogram for showing distribution of the data. Can be used to create population pyramid too by making the bins horizontal.
- Notes:
- range = The lower and upper range of the bins –> (xmin, xmax)
- normed = optional, boolean –> If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1.
- histtype = The type of histogram to draw. Options: {‘bar’, ‘barstacked’, ‘step’, ‘stepfilled’}
- ‘step’ generates a lineplot.
- ‘stepfilled’ generates an areaplot.
- Besides plotting a histogram, plt.hist() also returns a
tuple (n, bins, patches) as a return value. Each element of this tuple is an array.
- n = array or list of arrays –> The values/data of the histogram bins
- bins = array –> the edge of the bins.
- patches = the patches to draw the plot. A patch is a 2D plane to draw the plot.
- Example:
12values = [0,0.6,1.4,1.6,2.2,2.5,2.6,3.2,3.5,3.9,4.2,6]plt.hist(values, bins = 3)
- Notes:
-
plt.hist2d(x, y, bins = (nx, ny), range = [[xmin, xmax], [ymin, ymax]])
- make 2D histogram plot composed of rectangular bins.
- x, y –> input value –> 2 vectors of the same length.
- x = values along the horizontal axis
- y = values along the vertical axiss
- bins = (nx, ny) –> optional. default: 10
- nx = the number of bins to use in the horizontal direction
- ny = the number of bins to use in the vertical direction.
- Example:
1234567891011# Generate a 2-D histogramplt.hist2d(hp, mpg, bins=(20, 20), range=[[40, 235], [8, 48]])# Add a color bar to the histogramplt.colorbar()# Add labels, title, and display the plotplt.xlabel('Horse power [hp]')plt.ylabel('Miles per gallon [mpg]')plt.title('hist2d() plot')plt.show()
Output:
-
plt.pcolor(data) –> Create a pseudocolor plot with a non-regular rectangular grid.
Pseudocolor looks like this:
-
plt.scatter(x, y, s =None, c = None, alpha = 1) –> create scatter plot
Notes:- s = the size of points –> It could be a vector, or number, or math products.
Example 1: plt.scatter(gdp_cap, life_exp, s = pop)
The size of the points in scatter plot are based on the pop.
Example 2: plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2) - c = the color of points –> a vector.
- alpha = the opacity of the geom.
- s = the size of points –> It could be a vector, or number, or math products.
-
plt.plot(x, y, color = 'colorx', label = 'labelx') –> create line plot.
- if it’s a dataframe –> accessing directly from the dataframe (
df.plot(kind = xxx) ) or
df.plot.xxx() .
- Format
- plotting all columns in the dataframe –> this is the master format : df.plot(x=None, y=None, kind='line', color=None, subplots=False, range=(a, b)).
- plotting certain column(s) in the dataframe –>
df[column_list].plot(x=None, y=None, color=None, subplots=False) .
Notes:- x = the x label
- y = the y label
- kind = the graph type –> string.
Each kind has several accustomed arguments. The ones that you see above are the arguments for a line chart.- ‘line’ : line plot (default)
- ‘bar’ : vertical bar plot
- ‘barh’ : horizontal bar plot
- ‘hist’ : histogram
df.plot(kind = 'hist', bins=10, normed=None) or df.plot.hist(bins=10)
Notes:- to plot a PDF (probability density function) histogram, used normed = True.
- It does not mean that the sum of histogram = 1, but rather the integral over the bars is unity
- to plot a CDF (cumulative density function) histogram, used normed=True and cumulative=True.
- to plot a PDF (probability density function) histogram, used normed = True.
- ‘box’ : boxplot
df.plot(kind='box') or df.plot.box() or df[col].plot.box() or df[col1, col2].plot.box() .
Notes:- add argument subplots=True for creating subplots.
- ‘kde’ : Kernel Density Estimation plot
- ‘density’ : same as ‘kde’
- ‘area’ : area plot
- ‘pie’ : pie plot
- ‘scatter’ : scatter plot
df.plot(x='xdata', y='ydata', kind='scatter', s=None, c=None, subplots=False, **kwds) .
Notes:- x = the x data –> must be quoted
- y = the y data –> must be quoted
- s = the size of the dots
- c = color of the dots. it could be a string like ‘red’, ‘green’, etc. or
A column name whose values will be used to color the dots according to a colormap.
- ‘hexbin’ : hexbin plot
- color = the color of the geom in the plot. It’s a string such as ‘red’, etc.
- subplots = if it’s True, make separate subplots for each column.
- range = the range for the x axis.
- If we are only plotting a dataframe’s column, we still have to use df[column].plot() .
- Don’t use plt.plot(df.column) , even though it looks like a series that we want to plot.
- df.plot(kind='line') == df.plot.line() .
- Example:
12345678# Import pyplotimport matplotlib.pyplot as plt# Extract influence['Change']: changechange = influence['Change']# Make bar plot of change: axchange.plot(kind = 'bar')
- Format
- If there are multiple plots on distinct axes.
- using
plt.axes() .
- 3 options for args:
- None –> A new full window axes is added using subplot(111, **kwargs)
- tuple of floats rect = [left, bottom, width, height] –> A new axes is added with dimensions rect in normalized (0, 1) units.
- ax
- Example:
12345678910111213141516# Create plot axes for the first line plot# define a rectplt.axes([0.05, 0.05, 0.425, 0.9])# Plot in blue the % of degrees awarded to women in the Physical Sciencesplt.plot(year, physical_sciences, color='blue')# Create plot axes for the second line plot# define another rectplt.axes([0.525, 0.05, 0.425, 0.9])# Plot in red the % of degrees awarded to women in Computer Scienceplt.plot(year, computer_science, color='red')# Display the plotplt.show() - Output:
- 3 options for args:
- using subplot.
- Format: plt.subplot(nrows, ncols, nth_subplot) .
- ordering: row-wise, from top left, indexed from 1.
- using
plt.axes() .
III. Customize/Decorate the Plot
Before learning about customizing the plot, it’s very important for us to understand the anatomy of figure in matplotlib.

- Anatomy of Figure in matplotlib:
- Figure:
The whole figure. The figure keeps track of all the child Axes and the canvas.
Canvas is the object that actually does the drawing to get you your plot, but as the user it is more-or-less invisible to you.
A figure can have any number of Axes, but to be useful should have at least one. - Axes:
A smattering of ‘special’ artists (titles, figure legends, etc).
This is what you think of as ‘a plot’, it is the region of the image with the data space.
A given figure can contain many Axes, but a given Axes object can only be in one Figure.
The Axes contains two (or three in the case of 3D) Axis objects which take care of the data limits. - Axis:
These are the number-line-like objects –> the real axis that we think of.
They take care of setting the graph limits and generating the ticks (the marks on the axis) and ticklabels (strings labeling the ticks).
- Figure:
- Decorating with plt.
-
plt.annotate(s, xy, xytext, arrowprops) –> provide some context on the plot.
- Notes:
- s = the annotation text
- xy = the point to annotate
- xytext = the coordinate of the annotation text
- arrowprops –> optional = drawing an arrow for the annotation.
- Example:
12# Add a black arrow annotationplt.annotate('Maximum', xy = (yr_max, cs_max), xytext = (yr_max+5, cs_max+5), arrowprops = dict(facecolor = 'black'))
Output:
- Notes:
-
plt.axis([xmin, xmax, ymin, ymax]) –> set the x and y axes.
- plt.axis(‘off’) –> turn off (remove) the axes.
-
plt.colorbar() –> adding colorbar legend to a plot which involves colors.
Example:
- plt.grid(True) –> give grids on the plot.
-
plt.legend(loc = 'the position') –> give a legend to the graph.
- loc = the position of the legend on the graph. It’s a combination of vertical (upper, center, lower) and horizontal (left, center, right) position.
Options:
Example:
12345678# Specify the label 'Computer Science'plt.plot(year, computer_science, color='red', label='Computer Science')# Specify the label 'Physical Sciences'plt.plot(year, physical_sciences, color='blue', label='Physical Science')# Add a legend at the lower centerplt.legend(loc='best')
- the legend values are referred to the value for argument label. For the example above, the legend values are ‘Computer Science’ and ‘Physical Science’ since the argument ‘label’ is filled with that values.
- loc = the position of the legend on the graph. It’s a combination of vertical (upper, center, lower) and horizontal (left, center, right) position.
-
plt.style() –> use pre-defined styles provided by matplotlib.
- to check the available styles:
plt.style.available .
Example:
1234567891011121314151617181920212223In [3]: plt.style.availableOut[3]:['seaborn-dark','ggplot','seaborn-deep','seaborn-muted','seaborn-whitegrid','seaborn-talk','classic','seaborn-poster','seaborn-pastel','seaborn-notebook','seaborn-ticks','seaborn-colorblind','seaborn-paper','seaborn-bright','seaborn-white','grayscale','bmh','seaborn-darkgrid','fivethirtyeight','seaborn-dark-palette','dark_background'] - to use the style:
plt.style.use('style_name') .
For example:
Plot with style ‘ggplot’:
plt.style.use('ggplot')
Plot with style ‘fivethirtyeeight’:
plt.style.use('fivethirtyeight')
- to check the available styles:
plt.style.available .
-
plt.text(x_coord, y_coord, 'the text') .
Example: plt.text(1550, 71, 'India') -
plt.tight_layout() –> Improve the spacing between the subplots .
- without parameter
- plt.title("the plot title") .
-
plt.twinx() –> Create a twin Axes sharing the xaxis.
- Create a new Axes instance with an invisible x-axis and an independent y-axis positioned opposite to the original one (i.e. at right)
- Example:
12345678910plt.hist(pixels, bins=64, range=(0,256), normed=True, color='red', alpha=0.4)plt.grid('off')# Use plt.twinx() to overlay the CDF in the bottom subplotplt.twinx()# Display a cumulative histogram of the pixelsplt.hist(pixels, bins=64, range=(0,256),normed=True, cumulative=True,color='blue', alpha=0.4)
Output:
- plt.xscale('log') –> Put the x-axis on a logarithmic scale
- plt.xlabel('the xlabel') .
-
plt.xlim([xmin, xmax]) –> set the x-axis range
- can also work with tuple –> plt.xlim([xmin, xmax]) .
- plt.yscale('log') –> put the y-axis on a logarithmic scale
- plt.ylabel('the ylabel') .
-
plt.ylim([ymin, ymax]) –> set the y-axis range.
- can also work with tuple –> plt.ylim((ymin, ymax)) .
- plt.yticks([v1], [v2], rotation=n) –> the argument is a vector of number. v2 is optional. v1 is the tick values while v2 is the tick label. If v2 is not specified, then v1 will be used for tick labels.
-
plt.xticsks([v1, [v2], rotation=n) –> the argumen is a vector of number.
- Notes:
- rotation = the degree of which you want to rotate the label
- Example:
123456# Definition of tick_val and tick_labtick_val = [1000,10000,100000]tick_lab = ['1k','10k','100k']# Adapt the ticks on the x-axisplt.xticks(tick_val, tick_lab)
- Notes:
-
plt.annotate(s, xy, xytext, arrowprops) –> provide some context on the plot.
- Decorating with axes.
- Decoration that we can do with axes:
- axes.set_ylabel(“% Change of Host Country Medal Count”)
- ax.set_title(“Is there a Host Country Advantage?”)
- ax.set_xticklabels(editions[‘City’])
- how to get axes out of a plot:
assign a plot into a parameter and the parameter will be the axes.
Example:
123456ax = change.plot(kind = 'bar')# Customize the plot to improve readabilityax.set_ylabel("% Change of Host Country Medal Count")ax.set_title("Is there a Host Country Advantage?")ax.set_xticklabels(editions['City'])
- Decoration that we can do with axes:
IV. Show the Plot
plt.show() .
Working with image in Matplotlib.
- Save the plot into an image
plt.savefig('figname') - loading/reading an image:
- Syntax: plt.imread('image_file') .
- The resulting image loaded is a NumPy array with different dimensions:
- (M, N) for grayscale images.
- (M, N, 3) for RGB images.
- M×N is the dimensions of the image.
- The third dimensions are referred to as color channels (RGB).
- We can extract the color channels as follows:
12# Extract 2-D arrays of the RGB channels: red, blue, greenred, green, blue = image[:,:,0], image[:,:,1], image[:,:,2]
- (M, N, 4) for RGBA images.
- RGBA = red, green, blue, alpha.
- alpha defines the opacity
- df
- showing the image
- use both plt.imshow(img) and plt.show() –> imshow = image show –> Display an image on a 2D regular raste.
- format for plt.imshow(img) –>
plt.imshow(img, extent = (xmin, xmax, ymin, ymax), aspect = xxx) .
- aspect = Controls the aspect ratio of the axes. –> ‘equal’, ‘auto’, float.
- ‘equal’: Ensures an aspect ratio of 1. Pixels will be square
- ‘auto’: The axes is kept fixed
- float, such as 0.5, 2, etc
- aspect = Controls the aspect ratio of the axes. –> ‘equal’, ‘auto’, float.
- Modifying the image’s color.
Let’s say we have this image:
Modifying the image’s color can be conducted by following these steps:- reduce the shape of img, from 3D numpy array to 2D numpy array by summing it with axis = 2 –> img.sum(axis=2) .
- remember that axis = 0 is row and axis = 1 is column, so axis = 2 is the ‘color channels’ dimension.
123456# Load the image into an array: imgimg = plt.imread('480px-Astronaut-EVA.jpg')# 'collapse' the 3D numpy array into 2D numpy array by summing the red, green and blue channels# so the idea is removing the original color of the image and then give it a new color map.intensity = img.sum(axis=2) - reducing the color dimension will cause the image to lose its RGB dimension so that it’ll look like this:
- Now, we can apply a color map on this ‘naked’ image. Let’s say we want to use the color map ‘gray’:
12plt.imshow(intensity, cmap = 'gray')plt.show()
The final image will look like this:
- Rescaling image entities (pixels)
- just do a math operation (multiplication) on the image numpy array.
- Example:
1234567891011# Load the image into an array: imageimage = plt.imread('640px-Unequalized_Hawkes_Bay_NZ.jpg')# Extract minimum and maximum values from the image: pmin, pmaxpmin, pmax = image.min(), image.max()print("The smallest & largest pixel intensities are %d & %d." % (pmin, pmax))# Rescale the pixels: rescaled_imagerescaled_image = 256*(image - pmin) / (pmax - pmin)print("The rescaled smallest & largest pixel intensities are %.1f & %.1f." %(rescaled_image.min(), rescaled_image.max()))
Erasing/Clearing the Previous Created Plots
This is especially useful when you modify a plot and don’t want the previous plot still to get in the way in the new plot.
- plt.cla() –> clear the axes
- plt.clf() –> clear the figures
About Subplots
Subplots is creating multiple plot in one figure.
Let’s say that we have a dataframe consists of several columns and we want to plot all the columns as line graphs.
If we don’t make it as subplots, then all lines will be plotted into the same graph axes and unit.
But if we make it as subplots, then each line will be plotted into separated graph axes and unit, but still in the same figure.
Various ways of creating a subplot:
- define it as argument
subplots=True in the plot function.
Example:
df.plot(subplots=True) - Use plt.subplots()
Example:
fig, axes = plt.subplots(nrows=2, ncols=1) –> this line a figures and 2 axes which arranged vertically (2 rows and 1 column). Axes are the containers for the subplots. The output is in list format.
Output:
12345678In [2]: figOut[2]: <Figure size 640x480 with 2 Axes>In [3]: axesOut[3]:array([<matplotlib.axes._subplots.AxesSubplot object at 0x7fd35d52ef28>,<matplotlib.axes._subplots.AxesSubplot object at 0x7fd35d03d278>],dtype=object)
To fill the axes with subplots:- access the axes by its indexes.