This is the coding component of the Data Visualization with Plotly for Python workshop.\ Please find the workshop slides here
Workshop Best Practices
If you have not installed plotly yet, run the following:
pip install plotly
Requirement already satisfied: plotly in ./opt/anaconda3/lib/python3.8/site-packages (5.3.1) Requirement already satisfied: tenacity>=6.2.0 in ./opt/anaconda3/lib/python3.8/site-packages (from plotly) (8.0.1) Requirement already satisfied: six in ./opt/anaconda3/lib/python3.8/site-packages (from plotly) (1.15.0) Note: you may need to restart the kernel to use updated packages.
import as px
import pandas as pd
from plotly.figure_factory import create_table
The data we are using today is the built-in demo data of module.\ The demo data is from Gapminder, detailing the life expectancy, population, GDP change every five years from 1952 to 2007.\ As you may know, in a data science life cycle, up to 80% of the time may be devoted to cleaning and processing data. The demo data we are using today has been cleaned and is ready for use.\ The main goal of today is to explore the different functionalities of the Plotly library, and this cleaned data will serve our purpose well.
# define our data
gapminder =
Let's preview our data.
country | continent | year | lifeExp | pop | gdpPercap | iso_alpha | iso_num | |
0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 | AFG | 4 |
1 | Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.853030 | AFG | 4 |
2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.100710 | AFG | 4 |
3 | Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.197138 | AFG | 4 |
4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.981106 | AFG | 4 |
table = create_table(gapminder.head(5)) #display the first 10 rows
data_china = gapminder.query("country == 'China'")
# Let's visualize population (variable name: "pop") using bar chart!
# We will use, x=[x variable name], y=[y variable name name])
bar =, x = "year", y = "pop")
Next, let's add some more functionalities to our graph
bar =, x = "year", y = "pop", color = "lifeExp", color_continuous_scale = "Earth")
bar =, x = "year", y = "pop", color = "lifeExp", color_continuous_scale = "Magenta")
# let's say we want to take a snapshot of the world in 2007
# and observe if the GDP influences the life expectancy
gapminder2007 = gapminder.query("year == 2007") #2007 is a integer value, so it does not be enclosed in another pair of quotes
# make a basic scatter plot with px.scatter(data, x, y)
# life expectancy is "lifeExp", GDP per capita is "gdpPercap"
scatter = px.scatter(gapminder2007, x = "gdpPercap", y = "lifeExp")
# let's color the data points by continents using the "color" attribute
scatter = px.scatter(gapminder2007, x = "gdpPercap", y = "lifeExp", color = "continent", hover_data = ['country', 'year'])
# let's say we want to advance our scatter plot into a bubble chart
# and we want the size of the data points to change according to population size
# we would want to pass in "size" attribute
scatter = px.scatter(gapminder2007, x = "gdpPercap", y = "lifeExp",
color = "continent", hover_data = ['country', 'year'],
size = "pop")
# what do you notice about the bubbles?
# size_max is default 20.
# Should we change something?
scatter = px.scatter(gapminder2007, x = "gdpPercap", y = "lifeExp",
color = "continent", hover_data = ['country', 'year'],
size = "pop", size_max = 60)
Full documentation on Scatter Plot in, which has size_max info.
# we need to know what countries are they
# so to help us get insights for outliers
# let's add in another attribute called "hover_name", which will be an overarching title for the hover label.
scatter = px.scatter(gapminder2007, x = "gdpPercap", y = "lifeExp",
color = "continent", hover_data = ['year'],
size = "pop", size_max = 60,
hover_name = 'country')
Facet charts can be used to split a multi-trace data visualization into a matrix of data panels.\ In, if you want to facet a larger plot into multiple smaller plots, you simply pass in the variable you want to facet by into an attribute called "facet_col".
# copy the previous code, add in "facet_col" attribute
# isolate by continent
# setting legends
# log x-axis: transformation of graphs
scatter = px.scatter(gapminder2007, x = "gdpPercap", y = "lifeExp",
color = "continent", hover_data = ['year'],
size = "pop", size_max = 60,
hover_name = 'country',
facet_col = "continent",
log_x = True)
Write it here:
Let's create animation so that the we can observe the yearly change of GDP VS Life Expectancy data. Code the following:
# start by copying the previous code
# change data attribtue to entire dataset, "gapminder"
scatter = px.scatter(gapminder, x = "gdpPercap", y = "lifeExp",
color = "continent", hover_data = ['year'],
size = "pop", size_max = 60,
hover_name = 'country',
facet_col = "continent",
log_x = True,
animation_frame = "year",
animation_group = 'country',
range_y = [25, 90],
labels = {"pop": "Population",
"lifeExp": "Life Expectancy"}
Choropleths are thematic maps in which areas are shaded/ patterned in proportion to the measurement of the displayed statistical variable.\ We are going to shade our maps by lifeExp using our entire dataset.
What is "locations" ='iso_alpha'?
# add in 'location'
# 'color', "hover_name", "animation_frame", "projection"
# "color_continuous_scale"
choro = px.choropleth(gapminder, locations = "iso_alpha", color = "lifeExp",
hover_name = "country", animation_frame = "year",
projection = "natural earth", color_continuous_scale = "Viridis")
# add in other hover info with "hover_data"
# "color_continuous_scale": because we want the scale to be continuous rather than discrete
# sequential = continuous
# try changing a color scale (e.g. Viridis)
# try changing the "projection" to orthographic
fig = px.line(gapminder, x="year", y="lifeExp", color="continent", line_group="country", hover_name="country",
line_shape="spline", render_mode="svg")
fig = px.area(gapminder, x="year", y="lifeExp", color="continent", line_group="country", hover_name="country")
Got Questions?\ Ask the Data Services Team at NYU Shanghai Library (
Ending Credits\ Tutorial framework: Plotly for Python documentation\ Reference: Kaggle\ Modified and Organized by: Pamela Pan (