Cara menggunakan data table visualization python

Often when visualizing data using a bar chart, you’ll have to make a decision about the orientation of your bars. While there are no concrete rules, there are quite a few factors that can go into making this decision. For example, when grouping your data by an ordinal variable, you may want to display those groupings along the x-axis. On the other hand, when grouping your data by a nominal variable, or a variable that has long labels, you may want to display those groupings horizontally to aid in readability.

This recipe will show you how to go about creating a horizontal bar chart using Python. Specifically, you’ll be using pandas plot() method, which is simply a wrapper for the matplotlib pyplot API.

In our example, you'll be using the publicly available San Francisco bike share trip dataset to identify the top 15 bike stations with the highest average trip durations. You will then visualize these average trip durations using a horizontal bar chart. The steps in this recipe are divided into the following sections:

You can find implementations of all of the steps outlined below in this example Mode report. Let’s get started.

Data Wrangling

You’ll use SQL to wrangle the data you’ll need for our analysis. For this example, you’ll be using the sf_bike_share_trips dataset available in Mode's Public Data Warehouse. Using the within the , make sure your data source is set to the Mode Public Warehouse data source and run the following query to wrangle your data:

select *
from modeanalytics.sf_bike_share_trips

Once the SQL query has completed running, rename your SQL query to SF Bike Share Trip Rankings so that you can easily identify it within the Python notebook:

Cara menggunakan data table visualization python

Data Analysis

Now that you have your data wrangled, you’re ready to move over to the Python notebook to prepare your data for visualization. Inside of the Python notebook, start by importing the Python modules that you'll be using throughout the remainder of this recipe:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import StrMethodFormatter

Mode pipes the results of your SQL queries into a pandas assigned to the variable datasets. You can use the following line of Python to access the results of your SQL query as a dataframe and assign them to a new variable:

df = datasets['SF Bike Share Trip Data']

As previously mentioned, your goal is to visualize the 15 start stations with the highest average trip duration. You can analyze the dataframe to find these stations using the following method chain on our existing dataframe object:

x = df.groupby('start_station_name')['duration'].mean().sort_values().tail(15)

We now have a new dataframe assigned to the variable

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import StrMethodFormatter
0 that contains the top 15 start stations with the highest average trip durations. Now that we have our dataset aggregated, we are ready to visualize the data.

Data Visualization

To create a horizontal bar chart, we will use pandas plot() method. We can specify that we would like a horizontal bar chart by passing

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import StrMethodFormatter
2 to the
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import StrMethodFormatter
3 argument:

x.plot(kind=‘barh’)

Pandas returns the following horizontal bar chart using the default settings:

Cara menggunakan data table visualization python

You can use a bit of matplotlib styling functionality to further customize and clean up the appearance of your visualization:

Ever thought you could build a real-time dashboard in Python without writing a single line of HTML, CSS, or Javascript?

Yes, you can! In this post, you’ll learn:

  1. How to import the required libraries and read input data
  2. How to do a basic dashboard setup
  3. How to design a user interface
  4. How to refresh the dashboard for real-time or live data feed
  5. How to auto-update components

Can’t wait and want to jump right in? Here's the code repo and the video tutorial.

What’s a real-time live dashboard?

A real-time live dashboard is a web app used to display Key Performance Indicators (KPIs).

If you want to build a dashboard to monitor the stock market, IoT Sensor Data, AI Model Training, or anything else with streaming data, then this tutorial is for you.

Cara menggunakan data table visualization python

1. How to import the required libraries and read input data

Here are the libraries that you’ll need for this dashboard:

  • Streamlit (st). As you might’ve guessed, you’ll be using Streamlit for building the web app/dashboard.
  • Time, NumPy (np). Because you don’t have a data source, you’ll need to simulate a live data feed. Use NumPy to generate data and make it live (looped) with the Time library (unless you already have a live data feed).
  • Pandas (pd). You’ll use pandas to read the input data source. In this case, you’ll use a Comma Separated Values (CSV) file.

Go ahead and import all the required libraries:

import time  # to simulate a real time data, time loop

import numpy as np  # np mean, np random
import pandas as pd  # read csv, df manipulation
import plotly.express as px  # interactive charts
import streamlit as st  # 🎈 data web app development

You can read your input data in a CSV by using

dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
3. But remember, this data source could be streaming from an API, a JSON or an XML object, or even a CSV that gets updated at regular intervals.

Next, add the

dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
3 call within a new function
dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
5 so that it gets properly cached.

What's caching? It's simple. Adding the decorator

dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
6 will make the function
dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
5 run once. Then every time you rerun your app, the data will stay memoized! This way you can avoid downloading the dataset again and again. Read more about caching in Streamlit docs.

dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()

Cara menggunakan data table visualization python

2. How to do a basic dashboard setup

Now let’s set up a basic dashboard. Use

dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
8 with parameters serving the following purpose:

  • The web app title
    dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"
    
    # read csv from a URL
    @st.experimental_memo
    def get_data() -> pd.DataFrame:
        return pd.read_csv(dataset_url)
    
    df = get_data()
    
    9 in the HTML tag <title> and in the browser tab
  • The favicon that uses the argument
    st.set_page_config(
        page_title="Real-Time Data Science Dashboard",
        page_icon="✅",
        layout="wide",
    )
    
    0 (also in the browser tab)
  • The
    st.set_page_config(
        page_title="Real-Time Data Science Dashboard",
        page_icon="✅",
        layout="wide",
    )
    
    1 that renders the web app/dashboard with a wide-screen layout
st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)

3. How to design a user interface

A typical dashboard contains the following basic UI design components:

  • A page title
  • A top-level filter
  • KPIs/summary cards
  • Interactive charts
  • A data table

Let’s drill into them in detail.

Page title

The title is rendered as the <h1> tag. To display the title, use

st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)
2. It’ll take the string “Real-Time / Live Data Science Dashboard” and display it in the Page Title.

# dashboard title
st.title("Real-Time / Live Data Science Dashboard")

Top-level filter

First, create the filter by using

st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)
3. It’ll display a dropdown with a list of options. To generate it, take the unique elements of the
st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)
4 column from the dataframe df. The selected item is saved in an object named
st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)
5:

# top-level filters
job_filter = st.selectbox("Select the Job", pd.unique(df["job"]))

Now that your filter UI is ready, use

st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)
5 to filter your dataframe df.

# dataframe filter
df = df[df["job"] == job_filter]

KPIs/summary cards

Before you can design your KPIs, divide your layout into a 3 column layout by using

st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)
7. The three columns are kpi1, kpi2, and kpi3.
st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)
8 helps you create a KPI card. Use it to fill one KPI in each of those columns.

st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)
8’s label helps you display the KPI title. The value **is the argument that helps you show the actual metric (value) and add-ons like delta to compare the KPI value with the KPI goal.

# create three columns
kpi1, kpi2, kpi3 = st.columns(3)

# fill in those three columns with respective metrics or KPIs
kpi1.metric(
    label="Age ⏳",
    value=round(avg_age),
    delta=round(avg_age) - 10,
)

kpi2.metric(
    label="Married Count 💍",
    value=int(count_married),
    delta=-10 + count_married,
)

kpi3.metric(
    label="A/C Balance $",
    value=f"$ {round(balance,2)} ",
    delta=-round(balance / count_married) * 100,
)

Interactive charts

Split your layout into 2 columns and fill them with charts. Unlike the metric above, use the

# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
0 clause to fill the interactive charts in the respective columns:

  • Density_heatmap in fig_col1
  • Histogram in fig_col2
# create two columns for charts
fig_col1, fig_col2 = st.columns(2)

with fig_col1:
    st.markdown("### First Chart")
    fig = px.density_heatmap(
        data_frame=df, y="age_new", x="marital"
    )
    st.write(fig)
   
with fig_col2:
    st.markdown("### Second Chart")
    fig2 = px.histogram(data_frame=df, x="age_new")
    st.write(fig2)

Data table

Use

# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
1 to display the data frame. Remember, your data frame gets filtered based on the filter option selected at the top:

st.markdown("### Detailed Data View")
st.dataframe(df)

4. How to refresh the dashboard for real-time or live data feed

Since you don’t have a real-time or live data feed yet, you’re going to simulate your existing data frame (unless you already have a live data feed or real-time data flowing in).

To simulate it, use a

# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
2 loop from 0 to 200 seconds (as an option, on every iteration you’ll have a second
# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
3/pause):

for seconds in range(200):

    df["age_new"] = df["age"] * np.random.choice(range(1, 5))
    df["balance_new"] = df["balance"] * np.random.choice(range(1, 5))
    time.sleep(1)

Inside the loop, use NumPy's

# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
4 to generate a random number between 1 to 5. Use it as a multiplier to randomize the values of age and balance columns that you’ve used for your metrics and charts.

5. How to auto-update components

Now you know how to do a Streamlit web app!

To display the live data feed with auto-updating KPIs/Metrics/Charts, put all these components inside a single-element container using

# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
5. Call it
# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
6:

dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
0

Put your components inside the

# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
6 by using a
# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
0 clause. This way you’ll replace them in every iteration of the data update. The code below contains the
# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
9 along with the UI components you created above:

dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
1

And...here is the full code!

dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
2

To run this dashboard on your local computer:

  1. Save the code as a single monolithic
    # top-level filters
    job_filter = st.selectbox("Select the Job", pd.unique(df["job"]))
    0.
  2. Open your Terminal or Command Prompt in the same path where the
    # top-level filters
    job_filter = st.selectbox("Select the Job", pd.unique(df["job"]))
    0 is stored.
  3. Execute
    # top-level filters
    job_filter = st.selectbox("Select the Job", pd.unique(df["job"]))
    2 for the dashboard to start running on your localhost and the link would be displayed in your Terminal and also opened as a new Tab in your default browser.

Wrapping up

Congratulations! You have learned how to build your own real-time live dashboard with Streamlit. I hope you had fun along the way.

If you have any questions, please leave them below in the comments or reach out to me at [email protected] or on Linkedin.

Langkah langkah dalam memvisualisasikan data?

Langkah-langkah membuat visualisasi data.
Tentukan pertanyaan terkait data. ... .
Pahami data dan tentukan bentuk visualnya. ... .
3. Identifikasi pesan yang ingin disampaikan. ... .
Pilih bentuk visual yang akan digunakan. ... .
Kreasikan dengan berbagai warna dan bentuk..

Kenapa sih perlu data visualization?

Fungsi Data Visualization Visualisasi data memberikan informasi yang sangat berguna untuk kepentingan bisnis. Pengambil keputusan di perusahaan akan dapat dengan mudah melihat dan memahami mengenai hasil kerja perusahaan, berdasarkan variabel-variabel yang dimiliki.

Apa yang dimaksud dengan Data Visualization?

Visualisasi data (data visualization) merupakan rangkaian proses yang akan dilakukan setiap data analis untuk menampilkan data atau informasi dalam bentuk yang agar mudah dipahami oleh orang awam, seperti grafik, angka dan lain sebagainya. Menggunakan data visual memberikan banyak sekali manfaat.

Apa nama pustaka Python yang digunakan untuk membuat visualisasi data?

Matplotlib adalah library Python yang digunakan untuk membuat visualisasi data agar lebih menarik dan mudah dipahami.