matplotlilb

Python Visualization with Matplotlib

Matplotlib

Matplotlib is one of the Python’s libraries that can be used to create a visualization. Other libraries are available such as:

  1. seaborn
  2. bokeh
  3. Altair –> Altair is a declarative statistical visualization library for Python.
    1. built on top of the powerful Vega-Lite visualization grammar.p
    2. source 1
    3. source 2

Seaborn and Bokeh will be discussed in different posts. Enjoy!

Steps in creating the plot using matplotlib

I. Import the Library

import matplotlib.pyplot as plt.

II. Create the Plot

  1. Learn the differences between plt.xxx , df.plot(kind=xxx) ,  plt.axes() , and plt.subplot(...) .
  2. if it’s a series  –> accessing from the library ( plt. )
    1. plt.plot(x, y, color = 'colorx', label = 'labelx')   –> create line plot.
      plt.plot(x, y, color, marker, linestyle) –> create line plot with dots to represent the data points.
      Notes:

      1. x, y = data
      2. color = the color used for the line –> ‘red’ (‘r’), ‘green’ (‘g’), ‘blue’ (‘b’), etc.
        character color
        'b' blue
        'g' green
        'r' red
        'c' cyan
        'm' magenta
        'y' yellow
        'k' black
        'w' white
      3. marker = the symbol to be used to represent the data points  –> .  ,  o  D d  *  .
        character description
        '.' point marker
        ',' pixel marker
        'o' circle marker
        'v' triangle_down marker
        '^' triangle_up marker
        '<' triangle_left marker
        '>' triangle_right marker
        '1' tri_down marker
        '2' tri_up marker
        '3' tri_left marker
        '4' tri_right marker
        's' square marker
        'p' pentagon marker
        '*' star marker
        'h' hexagon1 marker
        'H' hexagon2 marker
        '+' plus marker
        'x' x marker
        'D' diamond marker
        'd' thin_diamond marker
        '|' vline marker
        '_' hline marker
      4. linestyle = the style of the line.
        character description
        '-' solid line style
        '--' dashed line style
        '-.' dash-dot line style
        ':' dotted line style
      5. label = the string that will be used for the legend.
      6. calling plt.plot() several times will result in several number of line plot generated on the same axes.
        Example:

        Output:
    2. plt.contour(x, y, z, n, cmap='xxx')    and plt.contourf(x, y, z, n, cmap='xxx') .
      1. create a contour map and filled contour map of z, respectively, using a meshgrid x and y as the axes.
      2. n = number of contour being used. optional. by default, n = 10
      3. cmap = color map –> mapping the data values into corressponding colors.
        Note: It’s always useful to add plt.colorbar() everytime we work with colors so that the plot will have a bar-legend like on the plot.

        1. several names for the color map:
          1. unique  –> jet, coolwarm, magma, viridis.
          2. season –> summer, autumn, winter, spring.
          3. overall colors –> greens, reds, blues, purples.
      4. Example 1:

        Output:
      5. Example 2: contour with cmap and plt.colorbar().

        Output:
    3. plt.hexbin(x, y, gridsize = (nx, ny), extent = []) .
      1. make 2D histogram plot composed of hexagonal bins.
      2. Notes:
        1. gridsize = the number of hexagons
        2. extent = specify the area covered with bins.
          Format: extent=[x_start, x_end, y_start, y_end]
      3. Example:
        plt.hexbin(hp, mpg, gridsize=(15, 12), extent=[40, 235, 8, 48])
    4. plt.hist(data, bins = 10, range=None, color='colorname', normed=None, histtype='bar')  –> create histogram for showing distribution of the data. Can be used to create population pyramid too by making the bins horizontal.
      1. Notes:
        1. range = The lower and upper range of the bins  –> (xmin, xmax)
        2. normed = optional, boolean  –> If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1.
        3. histtype = The type of histogram to draw.  Options: {‘bar’, ‘barstacked’, ‘step’, ‘stepfilled’}
          1. ‘step’ generates a lineplot.
          2. ‘stepfilled’ generates an areaplot.
      2. Besides plotting a histogram, plt.hist() also returns a tuple (n, bins, patches)  as a return value. Each element of this tuple is an array.
        1. n = array or list of arrays  –> The values/data of the histogram bins
        2. bins = array  –> the edge of the bins.
        3. patches = the patches to draw the plot. A patch is a 2D plane to draw the plot.
      3. Example:
    5. plt.hist2d(x, y, bins = (nx, ny), range = [[xmin, xmax], [ymin, ymax]])
      1. make 2D histogram plot  composed of rectangular bins.
      2. x, y –> input value  –> 2 vectors of the same length.
        1. x = values along the horizontal axis
        2. y = values along the vertical axiss
      3. bins = (nx, ny)  –> optional. default: 10
        1. nx  = the number of bins to use in the horizontal direction
        2. ny  = the number of bins to use in the vertical direction.
      4. Example:

        Output:
    6. plt.pcolor(data)  –> Create a pseudocolor plot with a non-regular rectangular grid.
      Pseudocolor looks like this:
    7. plt.scatter(x, y, s =None, c = None, alpha = 1)  –> create scatter plot
      Notes:

      1. s = the size of points  –>  It could be a vector, or number, or math products.
        Example 1:  plt.scatter(gdp_cap, life_exp, s = pop)
        The size of the points in scatter plot are based on the pop.
        Example 2: plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2)
      2. c = the color of points –> a vector.
      3. alpha = the opacity of the geom.
  3. if it’s a dataframe –> accessing directly from the dataframe ( df.plot(kind = xxx) )   or  df.plot.xxx() .

    1. Format
      1. plotting all columns in the dataframe  –> this is the master format   :     df.plot(x=None, y=None, kind='line', color=None, subplots=False, range=(a, b)).
      2. plotting certain column(s) in the dataframe   –> df[column_list].plot(x=None, y=None, color=None, subplots=False) .
        Notes:

        1. x = the x label
        2. y = the y label
        3. kind = the graph type –> string.
          Each kind has several accustomed arguments. The ones that you see above are the arguments for a line chart.

          1. ‘line’ : line plot (default)
          2. ‘bar’ : vertical bar plot
          3. ‘barh’ : horizontal bar plot
          4. ‘hist’ : histogram
            df.plot(kind = 'hist', bins=10, normed=None)  or  df.plot.hist(bins=10)
            Notes:

            1. to plot a PDF (probability density function) histogram, used  normed = True.
              1. It does not mean that the sum of histogram = 1, but rather the integral over the bars is unity
            2. to plot a CDF (cumulative density function) histogram, used normed=True and cumulative=True.
          5. ‘box’ : boxplot
            df.plot(kind='box')  or  df.plot.box()   or  df[col].plot.box()   or  df[col1, col2].plot.box()  .
            Notes:

            1. add argument subplots=True  for creating subplots.
          6. ‘kde’ : Kernel Density Estimation plot
          7. ‘density’ : same as ‘kde’
          8. ‘area’ : area plot
          9. ‘pie’ : pie plot
          10. ‘scatter’ : scatter plot
            df.plot(x='xdata', y='ydata', kind='scatter', s=None, c=None, subplots=False, **kwds)  .
            Notes:

            1. x = the x data  –> must be quoted
            2. y = the y data  –> must be quoted
            3. s = the size of the dots
            4. c = color of the dots. it could be a string like ‘red’, ‘green’, etc. or
              A column name whose values will be used to color the dots according to a colormap.
          11. ‘hexbin’ : hexbin plot
        4. color = the color of the geom in the plot. It’s a string such as ‘red’, etc.
        5. subplots = if it’s True, make separate subplots for each column.
        6. range = the range for the x axis.
        7. If we are only plotting a dataframe’s column, we still have to use df[column].plot() .
        8. Don’t use plt.plot(df.column) , even though it looks like a series that we want to plot.
        9. df.plot(kind='line')  ==  df.plot.line() .
    2. Example:
  4. If there are multiple plots on distinct axes.
    1. using plt.axes() .
      1. 3 options for args:
        1. None  –> A new full window axes is added using subplot(111, **kwargs)
        2. tuple of floats rect = [left, bottom, width, height]  –> A new axes is added with dimensions rect in normalized (0, 1) units.
        3. ax
      2. Example:
      3. Output:
    2. using subplot.
      1. Format: plt.subplot(nrows, ncols, nth_subplot)  .
      2. ordering: row-wise, from top left, indexed from 1.

III. Customize/Decorate the Plot

Before learning about customizing the plot, it’s very important for us to understand the anatomy of figure in matplotlib.

Anatomy of Figure in matplotlib. Figure is taken from matplotlib.org.
  1. Anatomy of Figure in matplotlib:
    1. Figure:
      The whole figure. The figure keeps track of all the child Axes and the canvas.
      Canvas is the object that actually does the drawing to get you your plot, but as the user it is more-or-less invisible to you.
      A figure can have any number of Axes, but to be useful should have at least one.
    2. Axes:
      A smattering of ‘special’ artists (titles, figure legends, etc).
      This is what you think of as ‘a plot’, it is the region of the image with the data space.
      A given figure can contain many Axes, but a given Axes object can only be in one Figure.
      The Axes contains two (or three in the case of 3D) Axis objects which take care of the data limits.
    3. Axis:
      These are the number-line-like objects  –> the real axis that we think of.
      They take care of setting the graph limits and generating the ticks (the marks on the axis) and ticklabels (strings labeling the ticks).
  2. Decorating with plt.
    1. plt.annotate(s, xy, xytext, arrowprops)  –> provide some context on the plot.
      1. Notes:
        1. s = the annotation text
        2. xy = the point to annotate
        3. xytext = the coordinate of the annotation text
        4. arrowprops –> optional = drawing an arrow for the annotation.
      2. Example:

        Output:
    2. plt.axis([xmin, xmax, ymin, ymax])  –> set the x and y axes.
      1. plt.axis(‘off’)  –> turn off (remove) the axes.
    3. plt.colorbar()  –> adding colorbar legend to a plot which involves colors.
      Example:
    4. plt.grid(True)  –> give grids on the plot.
    5. plt.legend(loc = 'the position')  –> give a legend to the graph.
      1. loc = the position of the legend on the graph. It’s a combination of vertical (upper, center, lower) and horizontal (left, center, right) position.
        Options:
        Example:

      2. the legend values are referred to the value for argument label. For the example above, the legend values are ‘Computer Science’ and ‘Physical Science’ since the argument ‘label’ is filled with that values.
    6. plt.style()  –> use pre-defined styles provided by matplotlib.
      1. to check the available styles:  plt.style.available .
        Example:
      2. to use the style: plt.style.use('style_name') .
        For example:
        Plot with style ‘ggplot’:
        plt.style.use('ggplot')
        Plot with style ‘fivethirtyeeight’:
        plt.style.use('fivethirtyeight')
    7. plt.text(x_coord, y_coord, 'the text') .
      Example:   plt.text(1550, 71, 'India')
    8. plt.tight_layout()   –> Improve the spacing between the subplots .
      1. without parameter
    9. plt.title("the plot title") .
    10. plt.twinx()  –> Create a twin Axes sharing the xaxis.
      1. Create a new Axes instance with an invisible x-axis and an independent y-axis positioned opposite to the original one (i.e. at right)
      2. Example:

        Output:
    11. plt.xscale('log')  –> Put the x-axis on a logarithmic scale
    12. plt.xlabel('the xlabel')  .
    13. plt.xlim([xmin, xmax])  –> set the x-axis range
      1. can also work with tuple –> plt.xlim([xmin, xmax]) .
    14. plt.yscale('log')  –> put the y-axis on a logarithmic scale
    15. plt.ylabel('the ylabel') .
    16. plt.ylim([ymin, ymax])  –> set the y-axis range.
      1. can also work with tuple –> plt.ylim((ymin, ymax)) .
    17. plt.yticks([v1], [v2], rotation=n)   –> the argument is a vector of number. v2 is optional. v1 is the tick values while v2 is the tick label. If v2 is not specified, then v1 will be used for tick labels.
    18. plt.xticsks([v1, [v2], rotation=n)  –> the argumen is a vector of number.
      1. Notes:
        1. rotation = the degree of which you want to rotate the label
      2. Example:
  3. Decorating with axes.
    1. Decoration that we can do with axes:
      1. axes.set_ylabel(“% Change of Host Country Medal Count”)
      2. ax.set_title(“Is there a Host Country Advantage?”)
      3. ax.set_xticklabels(editions[‘City’])
    2. how to get axes out of a plot:
      assign a plot into a parameter and the parameter will be the axes.
      Example:

IV. Show the Plot

plt.show() .

Working with image in Matplotlib.

  1. Save the plot into an image
    plt.savefig('figname')
  2.  loading/reading an image:
    1. Syntax:  plt.imread('image_file') .
    2. The resulting image loaded is a NumPy array with different dimensions:
      1. (M, N) for grayscale images.
      2. (M, N, 3) for RGB images.
        1. M×N is the dimensions of the image.
        2. The third dimensions are referred to as color channels (RGB).
        3. We can extract the color channels as follows:
      3. (M, N, 4) for RGBA images.
        1. RGBA = red, green, blue, alpha.
        2. alpha defines the opacity
        3. df
  3. showing the image
    1. use both plt.imshow(img)  and plt.show() –> imshow = image show –> Display an image on a 2D regular raste.
    2. format for plt.imshow(img)  –> plt.imshow(img, extent = (xmin, xmax, ymin, ymax), aspect = xxx) .
      1. aspect = Controls the aspect ratio of the axes.  –> ‘equal’, ‘auto’, float.
        1. ‘equal’: Ensures an aspect ratio of 1. Pixels will be square
        2. ‘auto’: The axes is kept fixed
        3. float, such as 0.5, 2, etc
  4. Modifying the image’s color.
    Let’s say we have this image:

    Modifying the image’s color can be conducted by following these steps:

    1. reduce the shape of img, from 3D numpy array to 2D numpy array by summing it with axis = 2  –> img.sum(axis=2) .
    2. remember that axis = 0 is row and axis = 1 is column, so axis = 2 is the ‘color channels’ dimension.
    3. reducing the color dimension will cause the image to lose its RGB dimension so that it’ll look like this:
    4. Now, we can apply a color map on this ‘naked’ image. Let’s say we want to use the color map ‘gray’:

      The final image will look like this:
  5. Rescaling image entities (pixels)
    1. just do a math operation (multiplication) on the image numpy array.
    2. Example:

Erasing/Clearing the Previous Created Plots

This is especially useful when you modify a plot and don’t want the previous plot still to get in the way in the new plot.

  1. plt.cla()   –> clear the axes
  2. plt.clf()    –> clear the figures

About Subplots

Subplots is creating multiple plot in one figure.
Let’s say that we have a dataframe consists of several columns and we want to plot all the columns as line graphs.
If we don’t make it as subplots, then all lines will be plotted into the same graph axes and unit.
But if we make it as subplots, then each line will be plotted into separated graph axes and unit, but still in the same figure.

Various ways of creating a subplot:

  1. define it as argument subplots=True  in the plot function.
    Example:
    df.plot(subplots=True)
  2. Use plt.subplots()
    Example:
    fig, axes = plt.subplots(nrows=2, ncols=1)   –> this line a figures and 2 axes which arranged vertically (2 rows and 1 column).  Axes are the containers for the subplots. The output is in list format.
    Output:

    To fill the axes with subplots:

    1. access the axes by its indexes.

Leave a Reply

Your email address will not be published. Required fields are marked *

Show Buttons
Hide Buttons