Comical way of storytelling with xkcd Matplotlib

Kashish Rastogi
7 min readMay 5, 2024

--

Making comical charts for infographics in matplotlib for Netflix Tv Show and Movies dataset.

Data visualization is a great way for telling the story as humans understand charts easy rather than tables and values. You can absorb the data in a great way by visualization by various charts but what if we can do it in a fun way!

Have you ever wondered about a comical way to represent the data? In this article, we will see a Matplotlib library for creating our fun visualizations. Since we are attracted to visualize by nature, we can add these skills to enhance the data by using charts and telling stories.

We can make a bar chart, line chart, scatter chart, and many more charts in a comical way with XKCD matplotlib. XKCD is a style that is meant to mimic the style of drawing of popular comic XKCD. XKCD is a comic series created by Randall Munroe and he is famous for its simple design and witty humor.

The data which I am going to take is taken from Kaggle of Netflix Movies and TV shows, if you want an EDA for this data do visit here.

Python interface with xkcd

Python library uses json interface to Randall’s site to retrieve the comic data. python 2 and python 3 both are supported for xkcd.

Table of Content

  • Plotting chart with xkcd
  • Description of Dataset
  • Netflix Timeline chart
  • Pie Chart
  • Bar Chart
  • Line Chart
  • plotting charts with subplots
  • Infographics with xkcd
  • Where to use xkcd types of charts

Plotting chart with xkcd

To make the comical type of chart we just need to add our plotting code into the following block. Adding the code of matplotlib for a bar chart, line chart, or any other chart in the below code will make a comical chart

with plt.xkcd():

Importing the library

import pandas as pd
import matplotlib.pyplot as plt

Dataset

First, let’s see how the data looks like. Before starting the analysis part need to do data cleaning do follow the steps provided here.

df = pd.read_csv(r'D:\netflix_titles.csv')
df.head(2)

The data is of Netflix Movies and TV Shows that have various features like; type, title, director, country, cast, year, duration. In this article, we will see how to make a comical chart and compare the charts which are made in plotly.

Netflix Journey

Before diving into how to make charts, let’s look around and make a journey chart of Netflix. To make this chart we need to combine scatter and line charts. This visual shows the journey of Netflix from DVDs to Netflix and Chill!

from datetime import datetime# these go on the numbers below
tl_dates = [
"1997\nFounded",
"1999\nStart Monthly\nsubscription\nservice",
"2004\nLaunches online\n DVD rental\nservice",
"2007\nStreaming service",
"2016\nGoes Global",
"2021\nNetflix & Chill"
]
tl_x = [1, 2.2, 4, 5.3, 7.5, 8.5]
# the numbers go on these
tl_sub_x = [1.5,3,5,6.5]
tl_sub_times = [
"1998","2000","2006","2012"
]
tl_text = [
"Netflix.com launched", "Starts\nPersonal\nRecommendations","Billionth DVD Delivery","UK Launch"]
with plt.xkcd():
# Set figure & Axes
fig, ax = plt.subplots(figsize=(15, 6), constrained_layout=True)
ax.set_ylim(-2, 1.75)
ax.set_xlim(0, 10)
# Timeline : line
ax.axhline(0, xmin=0.1, xmax=0.85, c='#000', zorder=1)
# Timeline : Date Points
ax.scatter(tl_x, np.zeros(len(tl_x)), s=120, c='#000', zorder=2)
ax.scatter(tl_x, np.zeros(len(tl_x)), s=30, c='#000', zorder=3)
# Timeline : Time Points
ax.scatter(tl_sub_x, np.zeros(len(tl_sub_x)), s=50, c='#777',zorder=4)
# Date Text
for x, date in zip(tl_x, tl_dates):
ax.text(x, -0.55, date, ha='center',
fontfamily='serif', fontweight='bold',
color='#111',fontsize=12)
# Stemplot : vertical line
levels = np.zeros(len(tl_sub_x))
levels[::2] = 0.3
levels[1::2] = -0.3
markerline, stemline, baseline = ax.stem(tl_sub_x, levels, use_line_collection=True)
plt.setp(baseline, zorder=0)
plt.setp(markerline, marker=',', color='#000')
plt.setp(stemline, color='#000')
# Text
for idx, x, time, txt in zip(range(1, len(tl_sub_x)+1), tl_sub_x, tl_sub_times, tl_text):
ax.text(x, 1.3*(idx%2)-0.5, time, ha='center',
fontfamily='serif', fontweight='bold',
color='#111', fontsize=11)
ax.text(x, 1.3*(idx%2)-0.6, txt, va='top', ha='center',
fontfamily='serif',color='#111')
# Spine
for spine in ["left", "top", "right", "bottom"]:
ax.spines[spine].set_visible(False)
# Ticks
ax.set_xticks([])
ax.set_yticks([])
# Title
ax.set_title("From DVD rental to Netflix & chill", fontweight="bold", fontfamily='serif', fontsize=16, color='#111')
plt.show()
Timeline in Matplotlib

Reference for the chart is taken from here

To enhance this chart you can add different colors to the timeline. Where each color will show a different part of the Netflix Journey.

Pie chart in a comical way

Let’s see the ratio of Movies and Tv shows. This chart can be present like this in a plotly.

Pie chart in Plotly

But to spice things up and make it a little bit interesting we will create a chart like this

df_type = df['type'].value_counts().reset_index().rename(columns = {'index':'Type','type':'Count'})
with plt.xkcd():
explode = (0, 0.1)
fig1, ax1 = plt.subplots(figsize=(5, 5), dpi=100)
ax1.pie(df_type["Count"], explode=explode, labels=df_type["Type"], autopct='%1.1f%%',
shadow=True)
ax1.set_title('Most watched on Netflix')
plt.show()
Pie Chart in Xkcd Matplotlib

As we see there is a higher number of audiences preferring Movies over TV shows on Netflix.

2. Bar chart in a comical way

Watching the distribution of Ratings on Netflix. To find out which type of content is most preferable by the audience. The below chart is made in plotly.

Bar Chart in Plotly

Let’s make the bar chart with xkcd.

import numpy as np
df_rating = pd.DataFrame(df['rating'].value_counts()).reset_index().rename(columns={'index':'rating','rating':'count'}) #.sort_values(by='count', ascending=True)
with plt.xkcd():
fig, ax = plt.subplots(figsize=(10, 4), dpi=100)
y_pos = np.arange(len(df_rating.rating))
ax.barh(df_rating.rating, df_rating['count'], align='center')
ax.set_yticks(y_pos)
ax.set_yticklabels(df_rating.rating)
ax.invert_yaxis()
ax.set_title('Distribution of Ratings')
plt.show()
Bar Chart in Xkcd Matplotlib

Interpreting the visual

Most of the content watched is preferred by the mature audiences so we can tell most of the users Netflix has are of mature age. The highest number of shows has TV-MA and TV-14 rating tags. The least show having a rating tag is NC-17; where children under 17 can not watch these shows.

3. Line chart in a comical way

Let’s see how much TV Shows have created an impact over the years. So Netflix will know if they should produce more TV Shows or not. They can also make some strategies so the audience can watch more TV Shows.

d1 = df[df["type"] == "TV Show"]
col = "year_added"
vc1 = d1[col].value_counts().reset_index().rename(columns = {col : "count", "index" : col})
vc1['percent'] = vc1['count'].apply(lambda x : 100*x/sum(vc1['count']))
vc1 = vc1.sort_values(col)
with plt.xkcd():
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(vc1[col], vc1["count"])
plt.title('TV Shows imapct over the years')
plt.show()
Line Chart in Xkcd Matplotlib

Interpreting the visuale

In the year 2017 TV Shows were in demand after the year 2020. We can also check for Movies too just by selecting the Movie data and compare the shows with respect to years.

4. Working with subplots

Let’s see how we can work with subplots in xkcd in matplotlib. We will see The most popular director with the highest content from countries; India, United States, Canada, United Kingdom. For easy differentiation, I have given different colors for the countries.

from collections import Counter
from matplotlib.pyplot import figure
import math
colours = ["#4c78a8", "#e45766", "#72b7b2", "#b279a2"]
countries_list = [ "India", "United States", "Canada", "United Kingdom"]
col = "director"
with plt.xkcd():
figure(num=None, figsize=(20, 8))
x=1
for country in countries_list:
country_df = df[df["country"]==country]
categories = ", ".join(country_df['director'].fillna("")).split(", ")
counter_list = Counter(categories).most_common(5)
counter_list = [_ for _ in counter_list if _[0] != ""]
labels = [_[0] for _ in counter_list][::-1]
values = [_[1] for _ in counter_list][::-1]
if max(values)<10:
values_int = range(0, math.ceil(max(values))+1)
else:
values_int = range(0, math.ceil(max(values))+1, 2)
plt.subplot(2, 2, x)
plt.barh(labels,values, color = colours[x-1])
plt.xticks(values_int)
plt.title(country)
x+=1
plt.suptitle('Popular Directors with the Highest content')
plt.tight_layout()
plt.show()
Subplots in Xkcd Matplotlib

Interpreting the visuale

As we see how to make a pie chart, a bar chart, line chart, and how to work with subplots with xkcd in matplotlib. Now we can use these charts in a fun way to represent the data.

Infographics in Matplotlib

Let’s combine all the charts which we made with xkcd. We will create small comical infographics for the Netflix dataset. I made some changes in the bar and line chart by removing axes of spines from the right, top, and bottom to make it look more presentable.

with plt.xkcd():
fig = plt.figure(figsize=(15, 8))
plt.subplots_adjust(wspace= 0.35, hspace= 0.40)
ax1 = fig.add_subplot(2,2,1)
ax1.barh(df_rating.rating, df_rating['count'])
plt.annotate('Tv-MA is highest', xy = (1400, 5), va = 'center', ha = 'center', weight='bold', fontsize = 15)
ax1.set_title("Distribution of Ratings")
ax1.axes.get_xaxis().set_visible(False)
ax1.spines['right'].set_visible(False)
ax1.spines['top'].set_visible(False)
ax1.spines['bottom'].set_visible(False)
# Create second axes, the top-left plot with orange plot
ax2 = fig.add_subplot(2,2,2)
ax2.pie(df_type["Count"], explode=explode,
labels=df_type["Type"], autopct='%1.1f%%', shadow=True)
ax2.set_title('Ration of Movies vs TV shows')
# Create third axes, a combination of third and fourth cell
ax3 = fig.add_subplot(2,2,(3,4))
ax3.plot(vc1[col], vc1["count"])
ax3.set_title('TV shows over the Years')
ax3.spines['right'].set_visible(False)
ax3.spines['top'].set_visible(False)
plt.tight_layout()
plt.show()
Infographics in Xkcd Matplotlib

Interpreting the infographics

We can see that from here Movies are most preferred to watch and most of the ratings are given to TV-MA shows. There are fluctuations in the charts of TV Shows impact over the years. We can add more images or charts to describe the use case.

Where to use xkcd type of charts

We normally use the charts in our analysis or while making presentations but using comical charts can be used in a fun way of telling a story.

End Note

As we saw how to make boring charts interesting with the help of xkcd. We can add these types of charts in meetings or presentations. It’s fun to create these charts. I hope you like the journey from boring graphics charts to comical visuals.

--

--

Kashish Rastogi

Data Analyst | Data Visualization | Storyteller | Tableau | Plotly