I’ve been using this semi-relational, or SemiSQL, if you’d like, approach to SQL since around 2009. At first, I’d store the data as text or longtext types using PHP’s serialize(), but then I quickly moved to JSON. In late 2015, with MySQL 5.7.8, our prayers were answered and the native support for JSON datatype was introduced. 🙏 Let’s have a look at the best practices using JSON in MySQL 8.
SemiSQL vs NoSQL
For most cases, I’d recommend Mongo DB as NoSQL storage for JSON data. It’s a great and scalable database system featuring plenty of filtering, sorting, indexing, projecting and even aggregating functions, which can supply for basic relations.
However, if your project uses heavy relational logic, MySQL with SemiSQL approach may suit you better. Or maybe your project already uses MySQL and switching databases is simply too much work.
With MySQL’s JSON datatype and plenty of JSON functions, you gain advantage using both non-relational (NoSQL) and relational (SQL) database features where needed. All without the need to maintain the data integrity manually 🙏. At least for 1:1 relations.
Creating JSON storage
Creating a key-value table for JSON storage is simple
CREATE TABLE listings (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
data JSON NOT NULL
);INSERT INTO listings (data) VALUES
('{
"title": "First Listing",
"public": false,
"text": "Lorem ipsum",
"address": "Klimentská 1215/25, 110 00, Prague 1"
}'),
('{
"title": "Second Listing",
"public": true,
"text": "Dolor sit amet",
"address": "Rybná 716/24, 110 00, Prague 1"
}');
Read the full data
Read partial data, preserving JSON types
SELECT id, data->'$.title' as 'title', data->'$.public' as 'public'FROM listings;1 "First Listing" false (bool)
2 "Second Listing" true (bool)
Read partial data, unquoting JSON types
SELECTid,
data->>'$.title' 'title',
data->>'$.public' 'public'
FROM listings;1 First Listing false (string)
2 Second Listing true (string)
These -> and ->> are shortcuts for JSON_EXTRACT(column, path) and JSON_UNQUOTE(JSON_EXTRACT(column, path)) functions respectively.
I do not recommend to unquote other types than strings, especially booleans, as this would break logical operations.
SELECTid,
data->>'$.title' 'title',
data->'$.public' 'public'
FROM listings;1 First Listing false (bool)
2 Second Listing true (bool)
Updating data
You can update the whole data object simply via
UPDATE listings SET data = '{ ... }' WHERE id = 1;Or update single keys via
UPDATE listingsSET data = JSON_SET(data, '$.address', CAST('{
"street": "Klimentská 1215/25",
"city": "Prague 1",
"zip": "CZ-11000",
"gps": [50.0920, 14.4304]
}' AS JSON))
WHERE id = 1;
You can also use JSON data for filters, like this:
UPDATE listingsSET data = JSON_SET(data, '$.address', CAST('{
"street": "Rybná 716/24",
"city": "Prague 1",
"zip": "CZ-11000",
"gps": [50.0903, 14.4260]
}' AS JSON))
WHERE data->>'$.title' = 'Second Listing';
For multi–row update, I prefer to
# for updating the whole data packagesINSERT INTO listings (id, data) VALUES(1, '{ ... }'),
(2, '{ ... }')
ON DUPLICATE KEY UPDATE data = VALUES(data);# for updating only specified keys
# unfortunately, this breaks on ACID checks, as described belowINSERT INTO listings (id, data) VALUES
(1, '{ "public": false }'),
(2, '{ "public": true, "reserved": false }')
ON DUPLICATE KEY UPDATE data = JSON_MERGE_PATCH(data, VALUES(data));
Indexes and virtual (generated) columns
The true beauty of MySQL’s JSON functions lies with generated columns.
ALTER TABLE listingsADD title VARCHAR(255)AS (data->>'$.title') STORED
AFTER id,ADD public BIT
AS (data->'$.public' = true) STORED
AFTER title,ADD country VARCHAR(2)
AS (SUBSTRING(data->>'$.address.zip', 1, 2)) STORED
NOT NULL
AFTER public,ADD city VARCHAR(15)
AS (data->>'$.address.city') STORED
NOT NULL
AFTER country,ADD gps POINT
AS (POINT(data->'$.address.gps[0]', data->'$.address.gps[1]')) STORED
NOT NULL
AFTER city;
NOT NULL columns will also ACID–check JSON’s integrity 👍
Now you can select data like this
SELECT id, title, data->>'$.text' 'text', country, cityFROM listings
WHERE public AND title LIKE 'Second%';2 Second Listing Dolor sit amet CZ Prague 1
Indexes
Always a good idea to add indexes, even redundant ones. Especially if your table hits considerable more reads than writes, what, in most cases, happen.
ALTER TABLE listingsADD INDEX (public DESC,country),
ADD INDEX (public DESC),
ADD INDEX (country),
ADD INDEX (city),
ADD INDEX (title),
ADD SPATIAL (gps);
All indexes require GENERATED colums to be STORED, as in not VIRTUAL. Also, SPATIAL index only supports NON NULL columns.
Projection columns
List–view projection
Let’s say your JSON object in data column is heavier than that. Let’s say:
UPDATE listingsSET data = JSON_SET(data,
'$.contact', CAST('{
"name": "Lorem ipsum",
"phone": "+000000000000",
"email": ""
}' AS JSON),
'$.photos', CAST('[
{"url": "...", "width": 3240, "height": 1620},
{"url": "...", "width": 2880, "height": 1620},
{"url": "...", "width": 1620, "height": 3240}
]' AS JSON)
# plus plenty of other data
)
WHERE id = 2;
You’d want to load the data column for viewing the entry’s detail, but for list views, it’s better not to load the whole big packages.
Let’s make a projection for what you need only, using a VIRTUAL column.
ALTER TABLE listingsADD listing JSON AS(JSON_OBJECT(
'title', data->'$.title',
'photo', data->'$.photos[0]',
'address', CONCAT_WS(', ',
data->>'$.address.street',
data->>'$.address.city',
SUBSTRING(data->>'$.address.zip', 1, 2)
),
'gps', data->'$.address.gps'
))
VIRTUAL AFTER gps;
Virtual columns are not stored, obviously, and are only generated when selected.
SELECT listingFROM listings
WHERE public;2 {
"gps": [50.0903, 14.426],
"photo": {"url": "...", "width": 3240, "height": 1620},
"title": "Second Listing",
"address": "Klimentská 1215/25, Prague 1, CZ"
}
Fulltext projections
Let’s extract a fulltext string from data and store them for fulltext index.
ALTER TABLE listingsADD ft VARCHAR(255) AS(IF(public, CONCAT_WS(' ',
data->'$.title',
data->'$.text',
JSON_EXTRACT(data->'$.address', '$.*'),
JSON_EXTRACT(data->'$.contact', '$.*')
), NULL))
STORED AFTER listing,ADD FULLTEXT (ft);
Now you can perform fulltext–optimized search like this:
SELECT id, listingFROM listings
WHERE MATCH(ft) AGAINST('Kliment*' IN BOOLEAN MODE);2 {
"gps": [50.0903, 14.426],
"photo": {"url": "...", "width": 3240, "height": 1620},
"title": "Second Listing",
"address": "Klimentská 1215/25, Prague 1, CZ"
}
Datetime and timestamp
These are not generated columns in true sense of the phrase, but I take them for good praxis anyways.
ALTER TABLE listingsADD dt DATETIMEDEFAULT CURRENT_TIMESTAMP
AFTER data,ADD INDEX (dt DESC), # optionally ADD ts TIMESTAMP
DEFAULT CURRENT_TIMESTAMP
ON UPDATE CURRENT_TIMESTAMP
AFTER dt;
Column dt keeps the row’s insert datetime, while ts shows the row’s last update timestamp.
Overall table structure
If TLDR, the full structure of a smart JSON-based table for semi-relational purposes would look like this
CREATE TABLE listings (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(255) AS
(data->>'$.title') STORED,
public BIT AS
(data->'$.public' = true) STORED,
country VARCHAR(2) AS
(SUBSTRING(data->>'$.address.zip', 1, 2)) STORED NOT NULL,
city VARCHAR(15) AS
(data->>'$.address.city') STORED NOT NULL,
gps POINT AS
(POINT(data->'$.address.gps[0]', data->'$.address.gps[1]'))
STORED NOT NULL,
listing JSON AS
(JSON_OBJECT(
'title', data->'$.title',
'photo', data->'$.photos[0]',
'address', CONCAT_WS(', ',
data->>'$.address.street',
data->>'$.address.city',
SUBSTRING(data->>'$.address.zip', 1, 2)
),
'gps', data->'$.address.gps'
)) VIRTUAL,
ft VARCHAR(255) AS
(IF(public, CONCAT_WS(' ',
data->'$.title',
data->'$.text',
JSON_EXTRACT(data->'$.address', '$.*'),
JSON_EXTRACT(data->'$.contact', '$.*')
), NULL)) STORED,
data JSON NOT NULL,
dt DATETIME
DEFAULT CURRENT_TIMESTAMP NOT NULL,
ts TIMESTAMP
DEFAULT CURRENT_TIMESTAMP
ON UPDATE CURRENT_TIMESTAMP NOT NULL,
KEY (public DESC, country),
KEY (public DESC),
KEY (country),
KEY (city),
KEY (title),
SPATIAL KEY (gps),
KEY (dt DESC),
FULLTEXT KEY (ft)
);
Quirks to watch for
Retyping
Casting JSON types into MySQL types can be tricky from time to time. Especially using ->> (JSON_UNQUOTE). In example, booleans go like this
{"public": true} SELECT data->'$.public' true{"public": "true"} SELECT data->'$.public' "true"
{"public": "true"} SELECT data->>'$.public' true
{"public": true} SELECT (data->'$.public' = true) 1
{"public": true} SELECT (data->>'$.public' = true) 0
{"public": "true"} SELECT (data->>'$.public' = true) 0
Combined use of bits (1, 0) and booleans (true, false) for logical operations might lead to unexpected results.
Order inside JSON objects
Will not be preserved.
{ "title": ... , "public": ... , "text": ... }
will become
{ "text": ... , "title": ... , "public": ... }
If I got you worried now, order in arrays
will be preserved, of course
[5, 2, 1] will be [5, 2, 1] no matter what 😇
Not a big deal, but better to keep this in mind.