Python bubble plot with labels

Learn to plot bubble plots with examples using Python’s Matplotlib library

Bubble plots are an improved version of the scatter plot. In a scatter plot, there are two dimensions x, and y. In a bubble plot, there are three dimensions x, y, and z. Where the third dimension z denotes weight. That way, bubble plots give more information visually than a two dimensional scatter plot.

Data Preparation

For this tutorial, I will use the dataset that contains Canadian immigration information. It has the data from 1980 to 2013 and it includes the number of immigrants from 195 countries. import the necessary packages and the dataset:

import numpy as np  
import pandas as pd
df = pd.read_excel('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Data_Files/Canada.xlsx',
sheet_name='Canada by Citizenship',
skiprows=range(20),
skipfooter=2)

The dataset is too big. So, I can not show a screenshot here. Let’s see the name of the columns.

df.columns#Output:
Index([ 'Type', 'Coverage', 'OdName', 'AREA', 'AreaName', 'REG',
'RegName', 'DEV', 'DevName', 1980, 1981, 1982,
1983, 1984, 1985, 1986, 1987, 1988,
1989, 1990, 1991, 1992, 1993, 1994,
1995, 1996, 1997, 1998, 1999, 2000,
2001, 2002, 2003, 2004, 2005, 2006,
2007, 2008, 2009, 2010, 2011, 2012,
2013],
dtype='object')

We are not going to use a lot of the columns. I just dropped those columns and set the name of the countries (‘OdName’) as the index.

df = df.drop(columns = ['Type', 'Coverage', 'AREA', 'AreaName',      'REG', 'RegName', 'DEV', 'DevName',]).set_index('OdName')
df.head()

I chose the data of Ireland and Brazil for this exercise. There is no special reason. I chose them randomly.

Ireland = df.loc['Ireland']
Brazil = df.loc['Brazil']

Normalize the Data

There are a few different ways to normalize the data. We normalize the data to bring the data in a similar range. Ireland and Brazil immigration data have different ranges. I needed to bring them to the range from 0 to 1. I simply divided the Ireland data by the maximum value of the Ireland data series. I did the same to the data Series of Brazil.

i_normal = Ireland / Ireland.max()
b_normal = Brazil / Brazil.max()

We will plot the Ireland and Bazil data against the years. It will be useful to have the years on a list.

years = list(range(1980, 2014))

Make the Bubble Plot

Just to see the difference, let’s plot the scatter plot first.

import matplotlib.pyplot as plt
plt.figure(figsize=(14, 8))
plt.scatter(years, Ireland, color='blue')
plt.scatter(years, Brazil, color='orange')
plt.xlabel("Years", size=14)
plt.ylabel("Number of immigrants", size=14)
plt.show()

Now, plot the bubble plot. We have to input the size that we defined before.

plt.figure(figsize=(12, 8))
plt.scatter(years, Brazil,
color='darkblue',
alpha=0.5,
s = b_normal * 2000)
plt.scatter(years, Ireland,
color='purple',
alpha=0.5,
s = i_normal * 2000,
)
plt.xlabel("Years", size=14)
plt.ylabel("Number of immigrants", size=14)

We can get an idea about the number of immigrants by the size of the bubbles. The smaller the bubbles, the smaller the number of immigrants.

We can make this plot multicolored as well. To make it a bit meaningful, we need the data series’ sorted. You will see the reason very soon.

c_br = sorted(Brazil)
c_fr = sorted(France)

Now we will pass these values to change the colors.

plt.figure(figsize=(12, 8))
plt.scatter(years, Brazil,
c=c_br,
alpha=0.5,
s = b_normal * 2000)
plt.scatter(years, Ireland,
c=c_fr,
alpha=0.5,
s = i_normal * 2000,
)
plt.xlabel("Years", size=14)
plt.ylabel("Number of immigrants", size=14)

Now we added another dimension, color. Color changes by the number of immigrants. But it is not doing that good when we are plotting two variables. Because in this process we did not explicitly define the color for the individual variables. But it does a good job when we plot one variable in the y-axis. Let’s plot the number of immigrants from Brazil per year to see the trend over the years.

plt.figure(figsize=(12, 8))
plt.scatter(years, Brazil,
c=c_br,
alpha=0.5,
s = b_normal * 2000)
plt.xlabel("Years", size=14)
plt.ylabel("Number of immigrants of Brazil", size=14)

I am sure, you can see the change in colors with the number of immigrants very clearly here.

That was all for the bubble plot in Matplotlib. I hope it was helpful.

Here is another cool visualization tutorial:

Recommended Reading:

Basic Plots in Matplotlib: Visualization in Python

Understand the Data With Univariate And Multivariate Charts and Plots in Python

How to Present the Relationships Amongst Multiple Variables in Python

Indexing and Slicing of 1D, 2D and 3D Arrays in Numpy

Data Analysis With Pivot Table in Pandas

Exploratory Data Analysis For Data Modeling

How do you plot a bubble graph in Python?

Drawing a Bubble Chart Bubble chart can be created using the DataFrame. plot. scatter() methods.

How do you graph a bubble chart with 3 variables?

STEP 1: Right-click on a bubble and click on Format Data Series. STEP 2: In the Format Series Panel, Select the Fill icon. STEP 3: Check Vary colors by point. STEP 4: Your desired Bubble Chart with 3 variables is ready!

How do you make a bubble chart with multiple series?

How to create bubble chart with multiple series in Excel?.
Create bubble chart with multiple series..
Click Insert > Other Charts, select the bubble type you need in the Bubble section from the list..
In Excel 2013, click Insert > Insert Scatter (X, Y) or Bubble chart, and select bubble chart..

How many variables can be displayed in a bubble plot?

A bubble plot can be used to display data concerning three quantitative variables at a time and a categorical grouping variable.