Sie sind auf Seite 1von 8

Visualizing COVID-19 Data Beautifully in Python (in 5 Minutes or Less!!

) | by Nik Piepenbreier | Towards Data Science 7/11/20, 09:51

Sign in to your account (mi__@o__.com) for your personalized experience.

Send login link Not you? Sign in or create an account

You have 1 free story left this month.


Sign up and get an extra one for free.

Visualizing COVID-19 Data Beautifully Top highlight

in Python (in 5 Minutes or Less!!)


Making Matplotlib a Little Less Painful!

Nik Piepenbreier Follow


Apr 5 · 4 min read

https://towardsdatascience.com/visualizing-covid-19-data-beautifully-in-python-in-5-minutes-or-less-affc361b2c6a Page 1 of 8
Visualizing COVID-19 Data Beautifully in Python (in 5 Minutes or Less!!) | by Nik Piepenbreier | Towards Data Science 7/11/20, 09:51

Let’s create some beautiful data visualizations in Python! Source: Nik Piepenbreier.

Matplotlib may be the de facto data visualization library for Python, but it’s
not always the prettiest. In this post, we’ll explore how to turn a drab,
default Matplotlib graph into a beautiful data visualization. We’ll explore
COVID-19 data to see how the virus has spread throughout diGerent
countries.

Let’s Load in Our Data


We’ll be using data from this wonderful Github repository that auto-
updates the data daily. We’ll load our data into a Pandas’ dataframe based
on the URL so that it’ll update automatically for us every day.

1 # Section 1 - Loading our Libraries


2 import pandas as pd
3 import matplotlib.pyplot as plt
4 from matplotlib.dates import DateFormatter
5 import matplotlib.ticker as ticker
6 %matplotlib inline #if you're working in a Jupyter notebook
7
8 # Section 2 - Loading and Selecting Data
9 df = pd.read_csv('https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv'
10 countries = ['Canada', 'Germany', 'United Kingdom', 'US', 'France', 'China']
11 df = df[df['Country'].isin(countries)]
12
13 # Section 3 - Creating a Summary Column
14 df['Cases'] = df[['Confirmed', 'Recovered', 'Deaths']].sum(axis=1)

covid-datagy1.py hosted with ❤ by GitHub view raw

Loading in our data and creating summary variables. Source: Nik Piepenbreier

In Section 1 of the Gist above, we’re loading our libraries. We’ll be


making use of Pandas and Matplotlib for this tutorial.

In Section 2, we read in the data into a dataframe df , and then select

only the countries in our list countries . Selecting the data makes the

resulting visualization a little more readable.

In Section 3, we create a summary column that aggregates the total


number of cases across our conNrmed cases, recovered cases, and any
individuals who have died as a result of COVID-19.

https://towardsdatascience.com/visualizing-covid-19-data-beautifully-in-python-in-5-minutes-or-less-affc361b2c6a Page 2 of 8
Visualizing COVID-19 Data Beautifully in Python (in 5 Minutes or Less!!) | by Nik Piepenbreier | Towards Data Science 7/11/20, 09:51

Preparing our Dataframes for Data Visualization


Now that we have our data stored within a dataframe, let’s prepare two
further dataframes that will hold our data in crosstabs, which will allow us
to more easily visualize the data.

1 # Section 4 - Restructuring our Data


2 df = df.pivot(index='Date', columns='Country', values='Cases')
3 countries = list(df.columns)
4 covid = df.reset_index('Date')
5 covid.set_index(['Date'], inplace=True)
6 covid.columns = countries
7
8 # Section 5 - Calculating Rates per 100,000
9 populations = {'Canada':37664517, 'Germany': 83721496 , 'United Kingdom': 67802690 , 'US'
10 percapita = covid.copy()
11 for country in list(percapita.columns):
12 percapita[country] = percapita[country]/populations[country]*100000

covid-datagy2.py hosted with ❤ by GitHub view raw

Preparing our data for visualization. Source: Nik Piepenbreier

Let’s explore what we did here in a bit of detail:

In Section 4, we pivot our dataframe df , creating columns out of countries,

with the number of cases as the data Nelds. This new dataframe is called
covid . We then set the index of the dataframe to be the date and assign the

country names to column headers.

In Section 5, we copy our dataframe covid and call it percapita . We use a

dictionary that is storing all our countries’ populations and divide each
value by the population and multiply it by 100,000 to generate a number of
cases per 100,000 people.

If you ever want to learn how to unpivot data, check out this tutorial on the
melt function available in Pandas.

Let’s take a look at how our data has transformed from beginning up until
now:

https://towardsdatascience.com/visualizing-covid-19-data-beautifully-in-python-in-5-minutes-or-less-affc361b2c6a Page 3 of 8
Visualizing COVID-19 Data Beautifully in Python (in 5 Minutes or Less!!) | by Nik Piepenbreier | Towards Data Science 7/11/20, 09:51

How we’ve reshaped our data to tell a story. Source: Nik Piepenbreier

Creating our First Visualization — Cases over


Time
Let’s begin by creating our Nrst visualization that will demonstrate the
number of total cases over time in various countries:

1 # Section 6 - Generating Colours and Style


2 colors = {'Canada':'#045275', 'China':'#089099', 'France':'#7CCBA2', 'Germany':'#FCDE9C'
3 plt.style.use('fivethirtyeight')
4
5 # Section 7 - Creating the Visualization
6 plot = covid.plot(figsize=(12,8), color=list(colors.values()), linewidth=5, legend=False
7 plot.yaxis.set_major_formatter(ticker.StrMethodFormatter('{x:,.0f}'))
8 plot.grid(color='#d4d4d4')
9 plot.set_xlabel('Date')
10 plot.set_ylabel('# of Cases')
11
12 # Section 8 - Assigning Colour
13 for country in list(colors.keys()):
14 plot.text(x = covid.index[-1], y = covid[country].max(), color = colors[country], s
15
16 # Section 9 - Adding Labels
17 plot.text(x = covid.index[1], y = int(covid.max().max())+45000, s = "COVID-19 Cases by Country"

https://towardsdatascience.com/visualizing-covid-19-data-beautifully-in-python-in-5-minutes-or-less-affc361b2c6a Page 4 of 8
Visualizing COVID-19 Data Beautifully in Python (in 5 Minutes or Less!!) | by Nik Piepenbreier | Towards Data Science 7/11/20, 09:51

17 plot.text(x = covid.index[1], y = int(covid.max().max())+45000, s = "COVID-19 Cases by Country"


18 plot.text(x = covid.index[1], y = int(covid.max().max())+15000, s = "For the USA, China, Germany, France, United Kingd
19 plot.text(x = percapita.index[1], y = -100000,s = 'datagy.io Source: https://github.com/datasets/

covid-datagy3.py hosted with ❤ by GitHub view raw

Creating our First Visualization. Source: Nik Piepenbreier

Let’s explore what we did her in a bit more detail:

In Section 6, we created a dictionary that contains hex values for diGerent


countries. Storing this in a dictionary will allow us to easily call it later in a
for-loop. We also assign the FiveThirtyEight style to add some general
formatting, which we’ll heavily build upon.

In Section 7, we create our Nrst visualization using Pandas’ plot function.


We use the colors parameter to assign the colors to diGerent columns. We
also use the set_major_formatter method to format values with separators
for thousands.

Then, in Section 8, we create a for-loop that generates label text for the
various countries. This for-loop gets each country’s name from the keys in
the dictionary in the form of a list and iterates over this list. It places text
containing the country’s name to the right of the last x-value
( covid.index[-1] → the last date in the dataframe), at the current day’s y-
value (which will always be equal to the max value of that column).

Finally, in Section 9, we add a title, subtitle, and source information about


the chart. We use variables again to position the data so as the graph
updates these positions are updated dynamically!

This is our end result for the Nrst chart:

https://towardsdatascience.com/visualizing-covid-19-data-beautifully-in-python-in-5-minutes-or-less-affc361b2c6a Page 5 of 8
Visualizing COVID-19 Data Beautifully in Python (in 5 Minutes or Less!!) | by Nik Piepenbreier | Towards Data Science 7/11/20, 09:51

Our First Visualization — Cases over Time by Country. Source: Nik Piepenbreier

Creating our Second Visualization — Cases per


100,000 People
To create our second visualization, we’ll make use of the code below:

1 percapitaplot = percapita.plot(figsize=(12,8), color=list(colors.values()), linewidth=5,


2 percapitaplot.grid(color='#d4d4d4')
3 percapitaplot.set_xlabel('Date')
4 percapitaplot.set_ylabel('# of Cases per 100,000 People')
5 for country in list(colors.keys()):
6 percapitaplot.text(x = percapita.index[-1], y = percapita[country].max(), color = colors
7 percapitaplot.text(x = percapita.index[1], y = percapita.max().max()+25, s = "Per Capita COVID-19 Cases by Country"
8 percapitaplot.text(x = percapita.index[1], y = percapita.max().max()+10, s = "For the USA, China, Germany, France, Unit
9 percapitaplot.text(x = percapita.index[1], y = -55,s = 'datagy.io Source: https://github.com/datas

covid-datagy4.py hosted with ❤ by GitHub view raw

Creating our second visualization — cases per 100,000 people. Source: Nik Piepenbreier

This section follows mostly what we did for our Nrst graph. This goes to
show how simple it is to update visualizations for diGerent datasets once
you’ve set up a chart using Python!

This is our resulting visualization:

https://towardsdatascience.com/visualizing-covid-19-data-beautifully-in-python-in-5-minutes-or-less-affc361b2c6a Page 6 of 8
Visualizing COVID-19 Data Beautifully in Python (in 5 Minutes or Less!!) | by Nik Piepenbreier | Towards Data Science 7/11/20, 09:51

Our Second Visualization — Cases per 100,000 people per country. Source: Nik Piepenbreier

Conclusion: Beautiful COVID Visualizations with


Matplotlib
In this post, we learned how to
generate beautiful data
visualizations using COVID-19
datasets on Github. We can use
the power of Python to have our
graphs update automatically
based on today’s data.

Thanks so much for taking the


time to read this!
Thanks for reading! Source: Nik Piepenbreier

Sign up for The Daily Pick


By Towards Data Science

Hands-on real-world examples, research, tutorials, and cutting-edge techniques


delivered Monday to Thursday. Make learning your daily ritual. Learn more

Create a free Medium account to get The Daily Pick in


Get this newsletter your inbox.

Python Data Science Programming Software Development Data Visualization

https://towardsdatascience.com/visualizing-covid-19-data-beautifully-in-python-in-5-minutes-or-less-affc361b2c6a Page 7 of 8
Visualizing COVID-19 Data Beautifully in Python (in 5 Minutes or Less!!) | by Nik Piepenbreier | Towards Data Science 7/11/20, 09:51

Discover Medium Make Medium yours Become a member


Welcome to a place where words matter. On Follow all the topics you care about, and we’ll Get unlimited access to the best stories on
Medium, smart voices and original ideas take deliver the best stories for you to your Medium — and support writers while you’re
center stage - with no ads in sight. Watch homepage and inbox. Explore at it. Just $5/month. Upgrade

About Help Legal

https://towardsdatascience.com/visualizing-covid-19-data-beautifully-in-python-in-5-minutes-or-less-affc361b2c6a Page 8 of 8

Das könnte Ihnen auch gefallen