Cara menggunakan mysql optimize query

Odds are, if you're doing any commercial programming, you've had to interact with a SQL database. They're a staple in programming due to how easy it is to store and retrieve data on them and the many free and open-source options available. Still, people have trouble figuring out how to optimize their queries correctly, so let's fix that.

While this text is focused on MySQL specifically, most of the discussion is also true for other databases. So, even if you don't use MySQL, you should learn something new that is going to be helpful on the SQL database you're using right now.

The database we'll use here is the employees db from MySQL itself. This repo has a docker-compose setup on how to start and run the database so you can run the commands along.

EXPLAIN to me

The first thing you have to learn to optimize queries in relational databases is to use the

SELECT * FROM employees WHERE first_name = "Joe";
4 command.
SELECT * FROM employees WHERE first_name = "Joe";
4 runs the query you've created through the database engine to check what it thinks it's going to take to run the command. It won't run the command, but it checks how many rows it believes it will need to access to satisfy the query.

So given the following table:

CREATE TABLE employees (
    emp_no      INT             NOT NULL,
    birth_date  DATE            NOT NULL,
    first_name  VARCHAR(14)     NOT NULL,
    last_name   VARCHAR(16)     NOT NULL,
    gender      ENUM ('M','F')  NOT NULL,    
    hire_date   DATE            NOT NULL,
    PRIMARY KEY (emp_no)
);

Enter fullscreen mode Exit fullscreen mode

And a query that finds all the employees named

SELECT * FROM employees WHERE first_name = "Joe";
6:

SELECT * FROM employees WHERE first_name = "Joe";

Enter fullscreen mode Exit fullscreen mode

You running a query with

SELECT * FROM employees WHERE first_name = "Joe";
4 before it all, so an
SELECT * FROM employees WHERE first_name = "Joe";
4 on the previous query would be:

EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 

Enter fullscreen mode Exit fullscreen mode

The

SELECT * FROM employees WHERE first_name = "Joe";
9 here is specific to MySQL to say you want the data printed in rows. Here's what it prints:

mysql> EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299645
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

Enter fullscreen mode Exit fullscreen mode

So when we ask the database what it thinks about this query, its response is a bit harsh. It says it has to look for 299645 rows (this table has 300024 entries, so it's almost every single row) to find out there are no employees named

SELECT * FROM employees WHERE first_name = "Joe";
6. There are no keys to be used (keys here would be database indexes), and we're using a
EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
1 clause to filter the results.

So, we know we have to find users given the first name, and the

SELECT * FROM employees WHERE first_name = "Joe";
4 response shows there are no indexes on this specific column, let's add one:

CREATE INDEX employees_first_name_idx ON employees (first_name);

Enter fullscreen mode Exit fullscreen mode

Now running

SELECT * FROM employees WHERE first_name = "Joe";
4 again:

mysql> EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: ref
possible_keys: employees_first_name_idx
          key: employees_first_name_idx
      key_len: 58
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

Enter fullscreen mode Exit fullscreen mode

A single row checked!

How did this happen?

When you ask for the database to create an index on a column, it creates an optimized structure that allows you to find all rows associated with a specific value quickly. Think about it as if it maps a value ("Joe") to all rows that have that value on the column you created the index.

Indexes are the magical solution. We should create indexes for every column in the table, and we should be good, right?

"You should have an index for all columns you use for querying a table"

This is a pretty common misconception of how indexes work in relational databases. If you have an index for every column, every query is automatically optimized as the database can use all these indexes to find the rows, but this is not true.

MySQL can, in some cases, use more than one index when querying data. If we create separate indexes on

EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
4 and
EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
5 and try to find a specific user with values for both, we get this:

mysql> EXPLAIN SELECT * FROM employees WHERE last_name = 'Halloran' AND first_name = 'Aleksander'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: index_merge
possible_keys: employees_first_name_idx,employees_last_name_idx
          key: employees_last_name_idx,employees_first_name_idx
      key_len: 66,58
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using intersect(employees_last_name_idx,employees_first_name_idx); Using where

Enter fullscreen mode Exit fullscreen mode

So, we're still doing good, only one row checked, but the extra field has an interesting reference,

EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
6. MySQL sees two indexes here and decides to use them both for the query. It's still better than not having an index, but we're going through two separate data structures to find the value instead of a single one.

If we now get an index on both

EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
5 and
EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
4, this is the output:

mysql> EXPLAIN SELECT * FROM employees WHERE last_name = 'Halloran' AND first_name = 'Aleksander'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: ref
possible_keys: employees_first_name_idx,employees_last_name_idx,employees_last_first_name_idx
          key: employees_last_first_name_idx
      key_len: 124
          ref: const,const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

Enter fullscreen mode Exit fullscreen mode

The server digs into a single index to find the row that matches the expected value instead of two. Having multiple rows in an index also helps with covering indexes, a feature we'll discuss later.

Picking index columns

Picking indexes is about how you query the table and what fields are part of the queries. When looking for employees, we want to find them by first and last name, so we have an index on both of them. When creating an index for multiple columns, the order of the columns matters.

Given we have an employee called

EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
9, our index would have it referenced under the entry
mysql> EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299645
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)
0 (the index is
mysql> EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299645
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)
1), so it only works for queries where you are looking for
EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
5 and
EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
4 or
EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
5 alone. A query that only looks for
EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
4 can't use this index as the index matches left-to-right. For that, you'd need an index that starts with
EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
4.

It's also not valid for a query where you need to find someone given a

EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
5 or a
EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
4, as you'd need both columns to be the leftmost columns in the indexes used, this would be a case where having separate indexes on the columns would be helpful. MySQL would run a union on both indexes to perform the query.

Now, if you're querying all or most columns in the index, what order should they have?

The columns with the biggest variety in values should come first. You can quickly calculate an average of how common values are in a column with a query like the following:

mysql> SELECT COUNT(DISTINCT emp_no)/COUNT(*) FROM employees;
+---------------------------------+
| COUNT(DISTINCT emp_no)/COUNT(*) |
+---------------------------------+
|             1.0000              |
+---------------------------------+

Enter fullscreen mode Exit fullscreen mode

The primary key is the perfect example. Every row has a unique value, so we get 1. We now want the columns we'll use in the index to be as close to 1 as we can, let's look at

EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
4:

mysql> SELECT COUNT(DISTINCT first_name)/COUNT(*) FROM employees;
+-------------------------------------+
| COUNT(DISTINCT first_name)/COUNT(*) |
+-------------------------------------+
|               0.0042 |
+-------------------------------------+

Enter fullscreen mode Exit fullscreen mode

Then look at

EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
5:

SELECT * FROM employees WHERE first_name = "Joe";
0

Enter fullscreen mode Exit fullscreen mode

So

EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
5 gives us better filtering than
EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
4, making it the best column on the index's left side. Still, this is an average, and as you might have found out the hard way in other places, averages are good at hiding outliers. One here would impact query performance directly so we also want to check if there are visible outliers on these columns:

SELECT * FROM employees WHERE first_name = "Joe";
1

Enter fullscreen mode Exit fullscreen mode

So no outliers here. Values seem to be pretty close to each other. Let's look at

EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G 
4:

SELECT * FROM employees WHERE first_name = "Joe";
2

Enter fullscreen mode Exit fullscreen mode

Not bad either. Values aren't that far from each other. The order we decided on for the index is a pretty good one.

Now, something else you need to account for when creating indexes is how you will sort the results?

Just like the database uses the index to find rows quickly, it can also sort the results if you're sorting on the same columns in the index order. For our

mysql> EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299645
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)
1 index it would mean to
CREATE INDEX employees_first_name_idx ON employees (first_name);
5 or
CREATE INDEX employees_first_name_idx ON employees (first_name);
6. If multiple columns are sorted, they all have to be in the same direction, so either all
CREATE INDEX employees_first_name_idx ON employees (first_name);
7 or all
CREATE INDEX employees_first_name_idx ON employees (first_name);
8. If they're not all in the same order the database can't use the index itself to sort the results and would have to resort to temporary tables to load the results and sort them.

Covering indexes

And another important reason to have indexes that hold multiple columns is the covering indexes feature that MySQL provides. When your query only loads the primary key and columns in the index, the database doesn't even have to look at the tables to read the results. It reads everything from the index alone. Here's what it looks like:

SELECT * FROM employees WHERE first_name = "Joe";
3

Enter fullscreen mode Exit fullscreen mode

The hint here is

CREATE INDEX employees_first_name_idx ON employees (first_name);
9, which means all the data is read from the index alone. As the index already contains all the information we need (all indexes include the primary key for the table), the database loads everything from it and returns, not even reaching out to the table. This is the best-case scenario for queries, especially if your index fits into memory.

Creating multiple indexes

Creating indexes isn't free. While they make it faster for us to find data, they also slow down any changes to the table as writing to columns with indexes will cause these indexes to be updated. So you have to strike a balance between making as many queries as possible fast but also allowing for fast

mysql> EXPLAIN SELECT * FROM employees WHERE first_name = "Joe"\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: ref
possible_keys: employees_first_name_idx
          key: employees_first_name_idx
      key_len: 58
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)
0 commands.

Summary

So, when optimizing, remember to:

  • Use
    SELECT * FROM employees WHERE first_name = "Joe";
    
    4 to find out how the database thinks your query will perform - check the command docs here;
  • Create indexes that cover the filtering and ordering you want to perform on your queries;
  • Evaluate the best order for the columns when creating indexes;
  • Try to load as much information from covering indexes as possible;

One of the best references to optimizing MySQL databases is the High Performance MySQL that is in its 4th edition and covers from database internals to how to design your database schema to make the most of MySQL. If you're running apps on MySQL, you should read it. If you're not using MySQL, it's very likely there is a book just like this one for it as well, and you should invest some time in reading it.

Bagaimana cara melakukan optimasi query pada MySQL?

Cara optimalisasi database mysql.
Menggunakan SELECT nama_kolom sebagai ganti SELECT * ... .
Buatlah kolom yang penting saja dan minimalisir membuat banyak kolom. ... .
Hindari penggunaan DISTINCT pada syntax SQL. ... .
Hindari mencari data menggunakan Cartesian atau CROSS JOIN. ... .
Gunakan wildcard atau % pada akhiran kata saja..

Langkah langkah optimasi query?

Ringkasan dari Tips Optimasi Query.
Gunakan kode seragam di seluruh aplikasi standar..
Hindari ketidakcocokan jenis data untuk indeks kolom..
Hindari fungsi pada kolom indeks..
Pindahkan kondisi dari klausa HAVING ke klausa WHERE..
Gunakan joins bukan nested selects, jika memungkinkan..

Apa itu optimasi query?

Optimasi Query adalah suatu proses untuk menganalisa query untuk menentukan sumber- sumber apa saja yang digunakan oleh query tersebut dan apakah penggunaan dari sumber tersebut dapat dikurangi tanpa merubah output.

MySQL query untuk apa?

MySQL Query adalah perintah atau instruksi yang dapat digunakan untuk mengelola database atau tabel dalam database MySQL. Query lebih dikenal dengan sebutan SQL (Structured Query Language) yang artinya adalah sebuah bahasa yang digunakan untuk mengakses data dalam basis data relasional.