Cara menggunakan heatmap using python

In this Tableau tutorial, we are going to learn what is a heat map and what are the steps to create a Tableau Heat Map. You might be thinking, how and from where to start and what are the requirements to make this chart. I know, I know so many questions are there in your mind. Don’t worry we have answers to all your questions. All you have to do is first, open your Tableau software on your device and then follow the below steps with us. (Note: We are using Tableau Desktop to make the density heat map).

But wait, before moving on to the steps, first understand the concept of heat map.

What is Heat Map?

A density heat map is used to analyze the areas in a plot where data points are dense or scattered. Heat maps are specifically used where there is a huge data set with overlapping data values. This helps analysts to see the areas with greater density and discover data trends. In the following section, we will learn how to create a density heat map in Tableau.

But before that check the Stacked Bar Chart in Tableau to see the comparison of your data.

How to Create Tableau Heat Map?

Let us learn how to create a heatmap in Tableau. Here, we will create a density heat map using our sample dataset pertaining to sales in an electronics store.

Before we start creating our heatmap, make sure you take care of the points and requirements given below.

  • Columns: At least one continuous measure
  • Rows: At least one measure or dimension
  • Mark type: Density
  • Marks card: At least one dimension

Step 1: Add Measure Profit

To begin with, we add a measure Profit to the Columns section.

Cara menggunakan heatmap using python

We select the aggregation type as AVG, that is, an average of the field values. Also, we make sure that our measure is Continuous type.

Cara menggunakan heatmap using python

Step 2: Add Measure to Rows Section

Next, we add one more measure field; Sales to the Rows section and again select Average of the field values. You can set the aggregation type as you like and as per your analytical requirements.

As you can see in the image below, an empty plot with two axes appears on the canvas.

Cara menggunakan heatmap using python

Step 3: Add Dimension Field

Next, we put a dimension field, State into Detail card present in Marks section. This will add a group circle representing different states on the plot showing average sales and average profit for each state.

Cara menggunakan heatmap using python

Step 4: Select Density Mark

Now, we’ll convert this plot into a density heat map by selecting the Shape as Density. This will change the shape of data points from circles to density spots. That is, the color scheme of data points will follow a density gradient.

The regions with most data points or dense regions will be in red/orange whereas, the areas with lesser or scattered data points appear in greenish-blue shades. You can select color schemes of your choice for heat maps.

Cara menggunakan heatmap using python

Step 5: Set Intensity and Opacity

To select a color scheme of your choice, right-click on Color card and explore the options. From here, we can also set the intensity, opacity and other border effects for the heat map.

Cara menggunakan heatmap using python

We can select the color scheme from a long list of available options.

Cara menggunakan heatmap using python

Step 6: Set Size of Tableau Heat Map

We can also increase or decrease the size of density spots.

Cara menggunakan heatmap using python

Step 7: Create Final Tableau Heat Map

In this way, we create a heat map or density heat map in Tableau. To see more details of the data points in the map, hover your cursor on the density points. We can see all the relevant details on the text label.

Cara menggunakan heatmap using python

Summary

This concludes our tutorial on Tableau Heat Map. Now, you should explore how to create Pie Chart in Tableau to check the part-to-whole relationship of your data. I hope this article was helpful to you and you have learned how to create a density heat map in your Tableau software. In case if you have any queries in any of the above steps, mention them in the comment section below, our experts will guide you.

Using a heatmap to visualise a confusion matrix, time-series movements, temperature changes, correlation matrix and SHAP interaction values

(Source: flaticon)

Heatmaps can bring your data to life. Versatile and eye-catching. There are many situations where they can highlight important relationships in your data. Specifically, we will discuss how you can use them to visualise:

  • A confusion matrix for model accuracy
  • Time-series data to show movement between groups
  • Time-series data to show temperature changes
  • A correlation matrix
  • Mean SHAP interaction values

Along the way, you will learn different ways to customise the heatmaps. We will discuss the code to create them and you can find the full project on Github.

To start, you can watch this video for an overview:

What are Heatmaps?

Let's start by discussing what heatmaps are and why they are so useful. You can see an example in Figure 1. We have variable 1 on the y-axis. In this case, variable 1 can take on different 4 values. That is “V1–1” is the first value for variable 1. Similarly, we have variable 2 on the y-axis. There is also a 3rd variable. That is the value within each of the cells. The colour of each cell is determined by the value of this variable.

Figure 1: example heatmap (Source: author)

So using a heatmap we are able to visualise the relationships between 3 variables on a 2D plane. These relationships can be complicated. This is why colour is used. It can highlight important aspects of the relationship and make them easier to understand.

We should keep in mind that heatmaps are still limited. Variable 1 and variable 2 need to be discrete or categorical. Or, if they are continuous we need to be able to put them into groups. On the other hand, variable 3 needs to be a continuous variable. Hopefully, this will be clear when we discuss our 5 heatmaps below.

1) Confusion matrix

Our first heatmap, in Figure 2, is a visualisation of a confusion matrix. This comes from a model used to predict the language of a piece of text. The y-axis gives the actual language of the text. The x-axis gives the language predicted by the model. The numbers on the diagonal, give the counts of correct predictions. The off-diagonals, give the number of incorrect predictions. For example, English (eng) is incorrectly predicted as German (deu) 11 times.

Figure 2: correlation matrix (Source: author)

Visualising a confusion matrix like this is useful when your target variable has many classes. It can highlight where the model has gone wrong. For example, we see that the model most often confuses either Portuguese (por) for Spanish (spa) (124 times) or Spanish for Portuguese (84 times). This makes sense as, among all the languages, these two are the most lexically similar.

Heatmap code

To create this heatmap, we start by importing the packages below. The heatmap function comes from the seaborn package (line 6). We will be using the same packages for all 5 heatmaps. Make sure you have them installed.

We have a 2D array used to populate the heatmap below. These give the number of correct and incorrect predictions. You can see that the first subarray (line 2) corresponds to the values on the first line of the heatmap in Figure 2. All of the heatmaps are populated using 2D arrays similar to this one. If you need to use this code for another heatmap you can replace this confusion matrix with your 2D array.

For now, we have hardcoded the 2D array. The article below takes you through the process of how we actually get these numbers. To summarise, we have built a neural network using NLP techniques. We then use this model to predict the language of the text in a test dataset. The numbers you see above come from these predictions.

Deep Neural Network Language Identification

Classify the language of a piece of text using a DNN and character tri-grams

towardsdatascience.com

Using this 2D array we create a pandas DataFrame (conf_matrix_df). We use the different languages as both the column and row names.

Lastly, we visualise this DataFrame using the seaborn heatmap function (lines 5–9). Along with conf_matrix_df, we have passed a few parameters. cmap gives the colour scheme. Setting this to ‘coolwarm’ gives us the red and blue cells. Setting annot to true gives us the numbers in each cell. Without it, we would only have colours. fmt defines the format of the colours. We will see some variations of these parameters when creating the other heatmaps.

The last parameter is vmax. This defines the maximum value for the colour scale. If you do not pass a value for this parameter it will default to the largest value in the heatmap. In this case, it is the number of correct french (fra) predictions (i.e. 4999). We have set the value to 200 because this makes it easier to distinguish the incorrect predictions. You can see what we mean in Figure 3. This heatmap was created using the default value for vmax.

Figure 3: correlation matrix with no vmax (Source: author)2) Movement between groups

Our second heatmap shows how we can visualise the change in a categorical variable through time. Specifically, we show the air quality index (AQI) in cities in America. The y-axis gives the AQI levels in 2010 and the x-axis gives the levels in 2016. The cell values give the number of cities that moved from one level to another. For example, we can see that 20 cities improved from an unhealthy (sensitive groups) level to a moderate level.

Figure 4: AQI levels through time (Source: author)

The AQI is a value between 0 and 500. The higher the value the higher the level of air pollution. The AQI is calculated using 4 different pollutants — Nitrogen Dioxide (NO2), Sulphur Dioxide (SO2), Carbon Monoxide (CO) and Ozone (O3). Specifically, to get the final AQI we take the maximum AQI across these 4 pollutants. In Figure 5, you can see AQI ranges for different levels of concern. We have used these levels in our heatmap.

Figure 5: AQI levels (Source: AirNow)

To create the heatmap, we start by loading our dataset (line 2). You can find this dataset on Kaggle. Readings are made on a daily basis. We are only interested in the year of the reading. So we create a column with the year of the reading (lines 5–6).

This is an example of when the variable on the x and y-axis was original a continuous variable. As mentioned, we need to group this variable. The aqiGroup function below is used to do that. It will return a level based on the AQI value. It uses the same ranges as in Figure 5.

To get to our final 2D matrix we need to do some data processing. We start by calculating the AQI value using the values from the 4 pollutants (line 2). Then for each city, we calculate the maximum AQI in each year (line 5). So values you see in the heatmap are actually based on maximum AQI values in 2010 and 2016. Lastly, we use the aqiGroup function to group the AQI values (line 8).

We get all the AQI values in 2016 (lines 2–3) and 2010 (lines 6–7). We then join these tables (line 10). In some cases, a city may have a reading in one year and not the other. In this case, we replace the missing values with ‘No Reading’ (line 11). This final dataset, AQI, will contain a level in 2016 and 2010 for each city.

Okay, now that we have this dataset we can use it to create the 2D array, hm_array. This is used to populate the heatmap. This will have the same structure as the hardcoded array we saw for the first heatmap. The array is created in lines 6 to 12. Where for each level combination, we count the number of records in the AQI dataset (lines 10–11). As before, we create a dataFrame using this 2D array. We use the AQI levels as both the column and row names.

Finally, we create the heatmap as before. This time we have different parameter values. We have used a different colour scheme, cmap. We have set cbar to false. This hides the colour bar. We have also used the linewidths and linecolor parameters to give the heatmap black gridlines.

3) Temperature changes through time

Similar to the last heatmap, we use this one to visualise time-series data. Except now, we show how a continuous variable changes through time. In Figure 4, you can see average global temperatures through time. There is a reading for every month from 1900 to 2016. You can clearly see the impact of climate change in the later months. Perhaps we have taken the term heatmap a bit too literally.

Figure 6: average temperature through time (Source: author)

We start by loading our dataset (line 1). You can find this on [datahub]. The dataset contains two different sources of temperature readings. We select only the GISTEMP readings (line 4). We then create a column for the year and month for each of the readings (lines 7–9).

Just as before, we create a 2D array used to populate the heatmap. In the previous heatmaps, all the 2D arrays have been symmetrical. This does not always have to be the case. For this heatmap, there is a subarray for each month (i.e 1 to 12). Each of these subarrays will contain a temperature value for each year from 1900 to 2016. So we now have a 12x117 array. We create a DataFrame using the years as column names and the months as row names.

We visualise this DataFrame just as before. The biggest difference is that we have set the xticklabels parameter to 10. This means that only every 10th label on the x-axis is displayed. You can see this in Figure 6 where only the labels for 1900, 1910, 1920, etc.. are displayed.

4) Correlation matrix

Our fourth heatmap may be one you’ve seen before. A common use is to visualise correlations in a dataset. For example, we have the correlation matrix of a house price dataset in Figure 7. We can use this to identify any multicollinearity that may cause issues in our model. For example, X3 and X4 are negatively correlated. The last row also gives the correlations with the target variable, Y. We can use this to understand if any of the features have significant relationships with Y.

Figure 7: correlation matrix (Source: author)

To create this heatmap, we start by loading our dataset (line 2). You can find it in UCI’s machine learning repository. Using this dataset, we then create a correlation matrix (line 5). The result will be a pandas DataFrame. The column and row names will be the same names of the features in the dataset.

You may have noticed that, in Figure 7, the cells above the diagonals are blank. To do this we first need to create a mask. This is a 2D array similar to ones we used to populate previous heatmaps. The values of the array should be ‘True’ for the cells you want to show. Otherwise, for blank cells they should be ‘False’. We use the code below to create the mask.

Finally, we can display our heatmap. The only difference for this one is we need to pass the mask as a parameter (line 8).

5) SHAP interaction values

Our last heatmap can be used to highlight features that are important for model predictions. It is created by taking the average SHAP interaction values. It shows the average main effects on the diagonal. For example, we can see that the main effect is large for experience, degree, performance and sales. Similarly, the average interaction effects are on the off-diagonal. We can see that the experience.degree and performance.sales interaction effects are significant.

Figure 8: mean SHAP interaction values (Source: author)

We won’t go over the code used to create this heatmap. If you are interested you can find it in the article below. We go into depth on SHAP interaction values. We also create and interpret other plots using these values. These are used to interpret your machine learning models.

Analysing Interactions with SHAP

Using the SHAP Python package to identify and visualise interactions in your data

towardsdatascience.com

I hope you found this article helpful! If you want to see more you can support me by becoming one of my referred members. You’ll get access to all the articles on medium and I’ll get part of your fee.

As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…

conorosullyds.medium.com

You can find me on | Twitter | YouTube | Newsletter — sign up for FREE access to a Python SHAP course

Image Sources

All images are my own or obtain from www.flaticon.com. In the case of the latter, I have a “Full license” as defined under their Premium Plan.

Apa itu heatmap python?

Heatmap adalah visualisasi atau pemetaan dengan menampilkan data dengan representasi warna yang berbeda-beda. Biasanya, semakin tinggi angka suatu kelompok data, warnanya akan semakin gelap, umumnya disimbolkan dengan warna merah.

Bagaimana cara kerja heatmap dalam desain komunikasi visual?

Heatmap disebut juga visualisasi atau pemetaan dengan menampilkan data dengan cara representasi warna yang berbeda-beda. Biasanya pada heatmap semakin tinggi angka suatu kelompok data maka warnanya akan semakin gelap dan umumnya disimbolkan dengan warna berwarna merah.

Mengapa menggunakan heatmap?

Heatmap merupakan teknik untuk mengetahui perilaku pengunjung suatu situs dengan menyediakan data berupa statistik. Dengan tools ini kita bisa melihat area mana dari situs kita yang menarik dan sering dikunjungi sehingga kita bisa tahu di mana kekurangan tampilan pada situs kita.