This is the coding component of the Data Visualization with Plotly for Python workshop.\ Please find the workshop slides here
Workshop Best Practices
If you have not installed plotly yet, run the following:
pip install plotly
Requirement already satisfied: plotly in ./opt/anaconda3/lib/python3.8/site-packages (5.3.1) Requirement already satisfied: tenacity>=6.2.0 in ./opt/anaconda3/lib/python3.8/site-packages (from plotly) (8.0.1) Requirement already satisfied: six in ./opt/anaconda3/lib/python3.8/site-packages (from plotly) (1.15.0) Note: you may need to restart the kernel to use updated packages.
import plotly.express as px
type(px)
module
import pandas as pd
type(pd)
module
from plotly.figure_factory import create_table
type(create_table)
function
The data we are using today is the built-in demo data of Plotly.express module.\ The demo data is from Gapminder, detailing the life expectancy, population, GDP change every five years from 1952 to 2007.\ As you may know, in a data science life cycle, up to 80% of the time may be devoted to cleaning and processing data. The demo data we are using today has been cleaned and is ready for use.\ The main goal of today is to explore the different functionalities of the Plotly library, and this cleaned data will serve our purpose well.
# define our data
gapminder = px.data.gapminder()
Let's preview our data.
pd.DataFrame.head(gapminder)
country | continent | year | lifeExp | pop | gdpPercap | iso_alpha | iso_num | |
---|---|---|---|---|---|---|---|---|
0 | Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.445314 | AFG | 4 |
1 | Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.853030 | AFG | 4 |
2 | Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.100710 | AFG | 4 |
3 | Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.197138 | AFG | 4 |
4 | Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.981106 | AFG | 4 |
table = create_table(gapminder.head(5)) #display the first 10 rows
table
data_china = gapminder.query("country == 'China'")
print(data_china)
# Let's visualize population (variable name: "pop") using bar chart!
# We will use px.bar(dataset, x=[x variable name], y=[y variable name name])
country continent year lifeExp pop gdpPercap iso_alpha \ 288 China Asia 1952 44.00000 556263527 400.448611 CHN 289 China Asia 1957 50.54896 637408000 575.987001 CHN 290 China Asia 1962 44.50136 665770000 487.674018 CHN 291 China Asia 1967 58.38112 754550000 612.705693 CHN 292 China Asia 1972 63.11888 862030000 676.900092 CHN 293 China Asia 1977 63.96736 943455000 741.237470 CHN 294 China Asia 1982 65.52500 1000281000 962.421381 CHN 295 China Asia 1987 67.27400 1084035000 1378.904018 CHN 296 China Asia 1992 68.69000 1164970000 1655.784158 CHN 297 China Asia 1997 70.42600 1230075000 2289.234136 CHN 298 China Asia 2002 72.02800 1280400000 3119.280896 CHN 299 China Asia 2007 72.96100 1318683096 4959.114854 CHN iso_num 288 156 289 156 290 156 291 156 292 156 293 156 294 156 295 156 296 156 297 156 298 156 299 156
bar = px.bar(data_china, x = "year", y = "pop")
bar.show()
Next, let's add some more functionalities to our graph
bar = px.bar(data_china, x = "year", y = "pop", color = "lifeExp", color_continuous_scale = "Earth")
bar.show()
bar = px.bar(data_china, x = "year", y = "pop", color = "lifeExp", color_continuous_scale = "Magenta")
bar.show()
# let's say we want to take a snapshot of the world in 2007
# and observe if the GDP influences the life expectancy
gapminder2007 = gapminder.query("year == 2007") #2007 is a integer value, so it does not be enclosed in another pair of quotes
# make a basic scatter plot with px.scatter(data, x, y)
# life expectancy is "lifeExp", GDP per capita is "gdpPercap"
scatter = px.scatter(gapminder2007, x = "gdpPercap", y = "lifeExp")
scatter.show()
# let's color the data points by continents using the "color" attribute
scatter = px.scatter(gapminder2007, x = "gdpPercap", y = "lifeExp", color = "continent", hover_data = ['country', 'year'])
scatter.show()
# let's say we want to advance our scatter plot into a bubble chart
# and we want the size of the data points to change according to population size
# we would want to pass in "size" attribute
scatter = px.scatter(gapminder2007, x = "gdpPercap", y = "lifeExp",
color = "continent", hover_data = ['country', 'year'],
size = "pop")
scatter.show()
# what do you notice about the bubbles?
# size_max is default 20.
# Should we change something?
scatter = px.scatter(gapminder2007, x = "gdpPercap", y = "lifeExp",
color = "continent", hover_data = ['country', 'year'],
size = "pop", size_max = 60)
scatter.show()
Full documentation on Scatter Plot in plotly.express, which has size_max info.
# we need to know what countries are they
# so to help us get insights for outliers
# let's add in another attribute called "hover_name", which will be an overarching title for the hover label.
scatter = px.scatter(gapminder2007, x = "gdpPercap", y = "lifeExp",
color = "continent", hover_data = ['year'],
size = "pop", size_max = 60,
hover_name = 'country')
scatter.show()
Facet charts can be used to split a multi-trace data visualization into a matrix of data panels.\ In Plotly.express, if you want to facet a larger plot into multiple smaller plots, you simply pass in the variable you want to facet by into an attribute called "facet_col".
# copy the previous code, add in "facet_col" attribute
# isolate by continent
# setting legends
# log x-axis: transformation of graphs
scatter = px.scatter(gapminder2007, x = "gdpPercap", y = "lifeExp",
color = "continent", hover_data = ['year'],
size = "pop", size_max = 60,
hover_name = 'country',
facet_col = "continent",
log_x = True)
scatter.show()
Write it here:
Let's create animation so that the we can observe the yearly change of GDP VS Life Expectancy data. Code the following:
# start by copying the previous code
# change data attribtue to entire dataset, "gapminder"
scatter = px.scatter(gapminder, x = "gdpPercap", y = "lifeExp",
color = "continent", hover_data = ['year'],
size = "pop", size_max = 60,
hover_name = 'country',
facet_col = "continent",
log_x = True,
animation_frame = "year",
animation_group = 'country',
range_y = [25, 90],
labels = {"pop": "Population",
"lifeExp": "Life Expectancy"}
)
scatter.show()
Choropleths are thematic maps in which areas are shaded/ patterned in proportion to the measurement of the displayed statistical variable.\ We are going to shade our maps by lifeExp using our entire dataset.
What is "locations" ='iso_alpha'?
# add in 'location'
# 'color', "hover_name", "animation_frame", "projection"
# "color_continuous_scale"
choro = px.choropleth(gapminder, locations = "iso_alpha", color = "lifeExp",
hover_name = "country", animation_frame = "year",
projection = "natural earth", color_continuous_scale = "Viridis")
choro.show()
# add in other hover info with "hover_data"
# "color_continuous_scale": because we want the scale to be continuous rather than discrete
# sequential = continuous
# try changing a color scale (e.g. Viridis)
# try changing the "projection" to orthographic
fig = px.line(gapminder, x="year", y="lifeExp", color="continent", line_group="country", hover_name="country",
line_shape="spline", render_mode="svg")
fig.show()
fig = px.area(gapminder, x="year", y="lifeExp", color="continent", line_group="country", hover_name="country")
fig.show()
Got Questions?\ Ask the Data Services Team at NYU Shanghai Library (shanghai.library@nyu.edu)
Ending Credits\ Tutorial framework: Plotly for Python documentation\ Reference: Kaggle\ Modified and Organized by: Pamela Pan (pp1813@nyu.edu)