Plotly Python
By: Yijiang Zhao and Cynthia Chen
Graphs in Python
Welcome to Graphs in Python! We will be using Plotly
. You can check out our Deepnote, which has the code and example graphs on it, which can be good starter code for your graphs.
Plotly
is a library which allows you to easily make interactive graphs (and these interactive graphs can be added to your articles on our website)!
We'll be using plotly.graph_objects
because it is highly customizable, but alternatives include plotly.express
which returns figures of the same type as plotly.graph_objects
and streamlines much of the process of making graphs, but is less customizable.
Plotly Graph Object Figures
In plotly.graph_objects
, you will often begin with making a figure.
(or however you would like to call your figure). This just a blank canvas to which you can add your traces. Traces are essentially the name plotly
gives to a collection of data that you are planning to plot or visualize.
After you've made your figure fig
, you can add your traces. (You can also define your traces, layout, etc. when creating your figure, but I think it is nice style to separate the initializing, adding traces, and formatting).
If, for example, I had 2-dimensional data that suited a scatterplot, I could add a trace in the form of a scatter plot.
Figure Layout
You can edit features of a graph's layout, e.g. the x-axis, title, etc. This is great for making your graph look better and more readable. The function to do so is go.Figure.update_layout()
. You can define layout when you initialize the figure, but I think it is cleaner to update later.
Layout has many parameters, which you can read about at the Plotly Layout documentation. However, for most things, you should just know that there are parameters:
Parameter | Type |
---|---|
title | String |
xaxis | Dictionary or Layout.XAxis type |
yaxis | Dictionary or Layout.YAxis type |
legend | Dictionary or Layout.Legend type |
template | Dictionary or Layout.Template type |
As you can see, it's quite complicated. However, for template, you should always set it to the HODP template, which takes cares of a lot of font, size, and other formatting.
An important parameter you should know is how to set the title, labels, and legend. Below, we have updated a layout which has title "Total Votes", X-axis label "Year", Y-axis label "Total Votes", and legend title "Political Party".
You can see that all of the Layout.XAxis
, Layout.YAxis
, and Layout.Legend
types have a parameter called title
which in turn has a parameter called text
. By changing the text
of the title
of each of the layout's parameters, you can set the x-axis label, y-axis label, and legend title, respectively.
note
Many features of a figure's layout, like the axis and legend, take other plotly
classes as arguments. Many of these classes share different features - e.g. title
is a class in all the axis and legend classes. This is also the case for the different traces or types of graphs.
Because these things are standardized, it's actually not terrible to read through the documentation for many of these classes and understand what features you can customize.
Template
This is the HODP template. You can just copy-paste this. If you're interested in the specific details, check out the Plotly documentation on Layouts.
Figure Traces
Scatterplots / Line Plots
Scatterplots are a great way to graph 2-dimensional data (e.g. you have a predictor and a response, independent and dependent, etc). It essentially plots a point for each row of data (x,y)
on a graph, so that you may see how all data points look like relative to others.
They're frequently used to accompany different models as a way to visually show why a given model makes sense.
For example, a linear regression model is often overlayed on top of a scatterplot of the data so that viewers can visually see how well (or poorly) the model fits the data. Other uses include using a color scale for different points on the graph to visually separate different categories of data.
Additionally, there are various other features you can change:
Parameter | Use | Input | Example |
---|---|---|---|
marker_color | Specifies what colors corresponds to this trace | Pass in the color of the markers / lines | go.Scatter(..., marker_color = '#C63F3F') |
name | Determines the name of that trace on the legend | Pass in a string | go.Scatter(..., name = 'one category of data') |
mode | Determines how the scatter plot points appear | Pass in a string, you can use either lines , markers | go.Scatter(..., mode = 'lines+markers') |
You can find more parameters at the Plotly Scatter documentation.
note
A lot of the parameters to different type of graphs, e.g. scatter or histogram, etc., are intuitive and have standard names. 2-dimensional data almost always has
Bubble Charts
Bubble charts are scatter plots in which a third dimension of the data is shown through the size of markers. Each marker/bubble can also have its own color which could represent a label/category or another value.
Histograms
Histograms are generally used when representing the distribution of numerical data among categories. Each category (also called a bin) has a count, which is represented by the height of the bar. where the data are binned and the count for each bin is represented. More generally, in plotly a histogram is an aggregated bar chart, with several possible aggregation functions (e.g. sum, average, count...). Also, the data to be binned can be numerical data but also categorical or date data.
More Histograms
Pie Charts
Pie charts are typically used for 1-dimensional categorical data. In plotly go, this means that you use go.Pie()
. It takes in many inputs, but the most important ones are values
, e.g. the numerical counts of each category, and labels
, the names of all categories.
For example, if I had a pie chart of the break down of HODP bootcampers by class year, then the labels would be an array of class years, e.g. [2021, 2022, ...]
, and the values would be the number of bootcampers in each class year, e.g. [10, 20, ...]
.
Additional helpful feature and effects include:
Parameter | Use | Input | Example |
---|---|---|---|
marker_colors | Specifies what colors correspond to what categories | Pass in a list of colors | go.Pie(..., marker_colors = ['#C63F3F', '#F4B436', '#83BFCC']) |
textinfo | Determines what information appears on the graph itself | Pass in a string, you can use either label , value , percent | go.Pie(..., textinfo = 'label+value') |
hoverinfo | Determines what information appears on hover | Pass in a string, you can use either label , value , percent | go.Pie(..., hoverinfo = 'label+percent') |
Bar Charts
Bar charts are great for comparing data across groups, e.g. looking at some variable Y
over time X
and comparing at each value of X
what Y
is for the different categories. In plotly go, you call go.Bar()
and include the x
and y
.
Each go.Bar()
adds one category, e.g. one trace, so to compare multiple groups of data on the same bar chart, you call go.Bar()
for each one.
Additional helpful feature and effects include:
Parameter | Use | Input | Example |
---|---|---|---|
name | Name of the category in the legend | Pass in a string | go.Bar(..., name = 'Category A') |
marker_color | A single color or list of colors for each bar | Pass in a color | go.Bar(..., marker_color = '#C63F3F') |
hovertext | Determines the hover text for each bar for each data | Pass in list of strings | go.Bar(..., hovertext = ['10 points', '12 points']) |
To change the barmode - e.g. whether it's stacked, grouped (side-by-side), etc. - update the figure layout using ``` fig.update_layout(barmode = 'group') ``` and the barmode is either `group`, `stack`, or `relative`.
Box Plots
Box plots are good for displaying key summary information, like the quantiles and outliers, and especially for comparing these statistics across different data. It allows viewers to easily see if there is a difference in mean or spread across groups.
In plotly go, you call go.Bar()
and pass in either an x
or a y
depending on whether you want the box plots to be horizontal or vertical.
Additional helpful feature and effects include:
Parameter | Use | Input | Example |
---|---|---|---|
name | Name of the group in the legend | Pass in a string | go.Box(..., name = 'Category A') |
marker_color | A color for filling the box | Pass in a color | go.Box(..., marker_color = '#C63F3F') |
line_color | A color for the lines | Pass in a color | go.Box(..., line_color = '#C63F3F') |
boxpoints | Controls what data points are included (in addition to box) | Either 'all', False, 'suspectedoutliers', 'outliers' | go.Box(..., boxpoints = 'outliers') |
Heat Maps
Heatmaps are used to show relationships between two variables, one plotted on each axis. By observing how cell colors change across each axis, you can observe if there are any patterns in value for one or both variables. Heatmaps are great for visualizing 2D data where each pixel has a value associated with it (for example, COVID-19 hotspots!)