Odds are, if you're doing any commercial programming, you've had to interact with a SQL database. They're a staple in programming due to how easy it is to store and retrieve data on them and the many free and open-source options available. Still, people have trouble figuring out how to optimize their queries correctly, so let's fix that. Show While this text is focused on MySQL specifically, most of the discussion is also true for other databases. So, even if you don't use MySQL, you should learn something new that is going to be helpful on the SQL database you're using right now. The database we'll use here is the employees db from MySQL itself. This repo has a docker-compose setup on how to start and run the database so you can run the commands along. EXPLAIN to meThe first thing you have to learn to optimize queries in relational databases is to use the 4 command. 4 runs the query you've created through the database engine to check what it thinks it's going to take to run the command. It won't run the command, but it checks how many rows it believes it will need to access to satisfy the query.So given the following table:
Enter fullscreen mode Exit fullscreen mode And a query that finds all the employees named 6:
Enter fullscreen mode Exit fullscreen mode You running a query with 4 before it all, so an 4 on the previous query would be:
Enter fullscreen mode Exit fullscreen mode The 9 here is specific to MySQL to say you want the data printed in rows. Here's what it prints:
Enter fullscreen mode Exit fullscreen mode So when we ask the database what it thinks about this query, its response is a bit harsh. It says it has to look for 299645 rows (this table has 300024 entries, so it's almost every single row) to find out there are no employees named 6. There are no keys to be used (keys here would be database indexes), and we're using a 1 clause to filter the results.So, we know we have to find users given the first name, and the 4 response shows there are no indexes on this specific column, let's add one:
Enter fullscreen mode Exit fullscreen mode Now running 4 again:
Enter fullscreen mode Exit fullscreen mode A single row checked! How did this happen? When you ask for the database to create an index on a column, it creates an optimized structure that allows you to find all rows associated with a specific value quickly. Think about it as if it maps a value ("Joe") to all rows that have that value on the column you created the index. Indexes are the magical solution. We should create indexes for every column in the table, and we should be good, right? "You should have an index for all columns you use for querying a table"This is a pretty common misconception of how indexes work in relational databases. If you have an index for every column, every query is automatically optimized as the database can use all these indexes to find the rows, but this is not true. MySQL can, in some cases, use more than one index when querying data. If we create separate indexes on 4 and 5 and try to find a specific user with values for both, we get this:
Enter fullscreen mode Exit fullscreen mode So, we're still doing good, only one row checked, but the extra field has an interesting reference, 6. MySQL sees two indexes here and decides to use them both for the query. It's still better than not having an index, but we're going through two separate data structures to find the value instead of a single one.If we now get an index on both 5 and 4, this is the output:
Enter fullscreen mode Exit fullscreen mode The server digs into a single index to find the row that matches the expected value instead of two. Having multiple rows in an index also helps with covering indexes, a feature we'll discuss later. Picking index columnsPicking indexes is about how you query the table and what fields are part of the queries. When looking for employees, we want to find them by first and last name, so we have an index on both of them. When creating an index for multiple columns, the order of the columns matters. Given we have an employee called 9, our index would have it referenced under the entry 0 (the index is 1), so it only works for queries where you are looking for 5 and 4 or 5 alone. A query that only looks for 4 can't use this index as the index matches left-to-right. For that, you'd need an index that starts with 4.It's also not valid for a query where you need to find someone given a 5 or a 4, as you'd need both columns to be the leftmost columns in the indexes used, this would be a case where having separate indexes on the columns would be helpful. MySQL would run a union on both indexes to perform the query.Now, if you're querying all or most columns in the index, what order should they have? The columns with the biggest variety in values should come first. You can quickly calculate an average of how common values are in a column with a query like the following:
Enter fullscreen mode Exit fullscreen mode The primary key is the perfect example. Every row has a unique value, so we get 1. We now want the columns we'll use in the index to be as close to 1 as we can, let's look at 4:
Enter fullscreen mode Exit fullscreen mode Then look at 5: 0Enter fullscreen mode Exit fullscreen mode So 5 gives us better filtering than 4, making it the best column on the index's left side. Still, this is an average, and as you might have found out the hard way in other places, averages are good at hiding outliers. One here would impact query performance directly so we also want to check if there are visible outliers on these columns: 1Enter fullscreen mode Exit fullscreen mode So no outliers here. Values seem to be pretty close to each other. Let's look at 4: 2Enter fullscreen mode Exit fullscreen mode Not bad either. Values aren't that far from each other. The order we decided on for the index is a pretty good one. Now, something else you need to account for when creating indexes is how you will sort the results? Just like the database uses the index to find rows quickly, it can also sort the results if you're sorting on the same columns in the index order. For our 1 index it would mean to 5 or 6. If multiple columns are sorted, they all have to be in the same direction, so either all 7 or all 8. If they're not all in the same order the database can't use the index itself to sort the results and would have to resort to temporary tables to load the results and sort them.Covering indexesAnd another important reason to have indexes that hold multiple columns is the covering indexes feature that MySQL provides. When your query only loads the primary key and columns in the index, the database doesn't even have to look at the tables to read the results. It reads everything from the index alone. Here's what it looks like: 3Enter fullscreen mode Exit fullscreen mode The hint here is 9, which means all the data is read from the index alone. As the index already contains all the information we need (all indexes include the primary key for the table), the database loads everything from it and returns, not even reaching out to the table. This is the best-case scenario for queries, especially if your index fits into memory.Creating multiple indexesCreating indexes isn't free. While they make it faster for us to find data, they also slow down any changes to the table as writing to columns with indexes will cause these indexes to be updated. So you have to strike a balance between making as many queries as possible fast but also allowing for fast 0 commands.SummarySo, when optimizing, remember to:
One of the best references to optimizing MySQL databases is the High Performance MySQL that is in its 4th edition and covers from database internals to how to design your database schema to make the most of MySQL. If you're running apps on MySQL, you should read it. If you're not using MySQL, it's very likely there is a book just like this one for it as well, and you should invest some time in reading it. Bagaimana cara melakukan optimasi query pada MySQL?Cara optimalisasi database mysql. Menggunakan SELECT nama_kolom sebagai ganti SELECT * ... . Buatlah kolom yang penting saja dan minimalisir membuat banyak kolom. ... . Hindari penggunaan DISTINCT pada syntax SQL. ... . Hindari mencari data menggunakan Cartesian atau CROSS JOIN. ... . Gunakan wildcard atau % pada akhiran kata saja.. Langkah langkah optimasi query?Ringkasan dari Tips Optimasi Query. Gunakan kode seragam di seluruh aplikasi standar.. Hindari ketidakcocokan jenis data untuk indeks kolom.. Hindari fungsi pada kolom indeks.. Pindahkan kondisi dari klausa HAVING ke klausa WHERE.. Gunakan joins bukan nested selects, jika memungkinkan.. Apa itu optimasi query?Optimasi Query adalah suatu proses untuk menganalisa query untuk menentukan sumber- sumber apa saja yang digunakan oleh query tersebut dan apakah penggunaan dari sumber tersebut dapat dikurangi tanpa merubah output.
MySQL query untuk apa?MySQL Query adalah perintah atau instruksi yang dapat digunakan untuk mengelola database atau tabel dalam database MySQL. Query lebih dikenal dengan sebutan SQL (Structured Query Language) yang artinya adalah sebuah bahasa yang digunakan untuk mengakses data dalam basis data relasional.
|