The Database Migrations team is in charge of creating and maintaining all the internal tools for Shopify teams to carry out schema changes safely and with minimal downtime. One of our latest investigations involved using the popular Large Hadron Migrator (LHM) gem to perform schema changes in MySQL databases, after safety-checking that the migrations can be performed without losing any data. In particular, when adding a NOT NULL
columns to an existing table.
In this post, I'll share what we learned and our recommendations for doing so to your own database tables.
Defining Schema Change Safety
First, it’s important to understand how schema changes safety is defined.
For this purpose, we should be aware of the procedure LHM uses to execute migrations, as it uses the shadow-table mechanism to ensure there is minimal downtime while the migrations are being carried out. In a nutshell, LHM creates a new table (known as shadow table) with the schema change applied, and sets up triggers on the original table to populate any data related operations (INSERT
, UPDATE
and DELETE
) to the shadow table. Then, it starts copying records, in batches, from the original table to the shadow one. When all records have been copied to the shadow table, LHM automatically renames the two tables and then drop the triggers from the original one.
This procedure ensures there’s minimal downtime (see MySQL rename table limitations), while migrations are taking place, but it introduces a new set of potential problems, as we need to make sure that the batched insertions don’t drop records in the process, which may happen depending on the schema change applied.
Therefore, schema changes are identified as safe if:
- After the migration has started:
INSERT
,UPDATE
andDELETE
operations targeting the original table can populate data to the shadow one (via MySQL triggers), without crashing. This is known as backward compatibility. - Once the migration has finished: the number of records in the shadow table must be equal to the number of records in the original table (via MySQL triggers and LHM batched insertions).
Research Case: Adding a NOT NULL Column
As stated in the title, this investigation focuses on one specific set of schema changes, those adding NOT NULL
constrained columns to a table. Considering how LHM works internally, one can only guess what values are going to be populated to the newly added column, for all the original table records, which lack a value for that column in the first place.
Therefore, the investigation of these schema changes safety will consider the following factors:
- The inclusion of a
DEFAULT
value for the new column definition, in the same migration. - The inclusion of a
UNIQUE INDEX
for the new column, in the same migration. - The mode the MySQL instance is configured to (strict or non-strict).
Setting Up the Experiment
In order to carry out the investigation, multiple steps are defined to simulate how LHM will carry out the migrations while iterating on the considered factors. For demonstration purposes, let’s assume the migration adds a NOT NULL
column called “last_name” to a table named “users” .
2. Table creation: Simulate how LHM would apply the migration.
3. Triggers definition: Simulate how LHM would set up MySQL triggers.
Running the Experiment
Once the experiment setup is defined, we can execute data-related SQL operations to check how MySQL triggers will populate them from the original to the shadow table. These statements are dependent on the SQL operation we are testing (INSERT
, UPDATE
and DELETE
).
1. Populate initial data: Create initial records to execute the investigation queries on.
2. Activate the SQL triggers: Run specific queries to activate the LHM simulated triggers.
3. Compare the results: Execute a SELECT
query both in the original and in the shadow table, in order to compare their records (both in length and content).
Concatenating all the Experiment Setup section steps alongside these ones, we can determine the safety of adding NOT NULL
column schema changes, depending on the factors stated in section 2 ( DEFAULT
clause, UNIQUE INDEX
existence, and MySQL mode). A complete back-to-back experiment, for INSERT
operations, would look like this:
As the number of records between the original and shadow tables is different, we conclude that performing INSERT operations, when there is a NOT NULL
, DEFAULT
defined schema-change, that also introduces a UNIQUE INDEX
on that column, can produce data loss when the MySQL instance is configured with an strict mode.
Results
Similar to how we built an experiment case in the previous section, we can iterate on the experiment factors (operation type, DEFAULT
value inclusion, UNIQUE INDEX
presence, and MySQL mode) to build a matrix of schema change safety for all the resulting combinations.
As a reminder, schema change safety is determined by answering two questions:
-
Will the migration be backwards compatible? In other words, whether
INSERT
,UPDATE
andDELETE
operations targeting the original table can populate data to the shadow one (via MySQL triggers), without crashing. - Will the migration introduce data loss? In other words, whether the number of records in the shadow table equals the one in the original table, once the migration has finished.
Column spec |
Schema change |
MySQL mode |
Operation |
Backward compatible? |
Data loss? |
NOT NULL, with DEFAULT value |
Not includes UNIQUE INDEX |
STRICT_ALL_TABLES |
INSERT |
Yes |
No |
UPDATE |
Yes |
No |
|||
DELETE |
Yes |
No |
|||
NO_ENGINE_SUBSTITUTION |
INSERT |
Yes |
No |
||
UPDATE |
Yes |
No |
|||
DELETE |
Yes |
No |
|||
Includes UNIQUE INDEX |
STRICT_ALL_TABLES |
INSERT |
Yes |
Yes |
|
UPDATE |
Yes |
Yes |
|||
DELETE |
Yes |
No |
|||
NO_ENGINE_SUBSTITUTION |
INSERT |
Yes |
Yes |
||
UPDATE |
Yes |
Yes |
|||
DELETE |
Yes |
No |
|||
NOT NULL without DEFAULT value |
Not includes UNIQUE INDEX |
STRICT_ALL_TABLES |
INSERT |
No |
- |
UPDATE |
No |
- |
|||
DELETE |
No |
- |
|||
NO_ENGINE_SUBSTITUTION |
INSERT |
Yes |
No* |
||
UPDATE |
Yes |
No* |
|||
DELETE |
Yes |
No* |
|||
Includes UNIQUE INDEX |
STRICT_ALL_TABLES |
INSERT |
No |
- |
|
UPDATE |
No |
- |
|||
DELETE |
No |
- |
|||
NO_ENGINE_SUBSTITUTION |
INSERT |
Yes |
Yes |
||
UPDATE |
Yes |
Yes |
|||
DELETE |
Yes |
No* |
* The number of records in the shadow table matches the one in the original table, but an implicit DEFAULT
value is chosen for the new column. For this experiment, the value was the empty string (“”), but it will vary depending on the data type (check MySQL implicit defaults).
Conclusions
Considering the matrix of cases from previous section:
-
Avoid adding a NOT NULL column without a DEFAULT value.
Schema changes introducingNOT NULL
columns must define aDEFAULT
value to avoid unexpected results when the migrations are taking place.
In the worst case (when the MySQL instance is configured with a strict mode), the table affected by the migration will break compatibility for existing applications, as previously used data-related operations (INSERT
/UPDATE
/DELETE
) could not be populated to the shadow table, after the migrations starts.
In the best case (when the MySQL instance is configured with a non-strict mode), the records populated from the original to the shadow table, either by the MySQL triggers or LHM batched insertions, will receive an implicitDEFAULT
value for the new column, which is probably undesirable.
-
Be extremely cautious when adding a UNIQUE INDEX.
The introduction ofUNIQUE
indexes in schema-changes engines that use the shadow table mechanism to carry out their migrations proved to be dangerous as it can lead to data loss when there are duplicate values, for the index covered columns, prior to the migrations.
It’s recommended that developers check for the existence of duplicates before they add aUNIQUE
index on a set of columns.
Acknowledgements
This investigation was completed thanks to the feedback provided by all the DB Migrations team members: Bastian Bartmann, Sergey Fedorov, Anya Zenkina, Xiaoli Liang; and the dedicated guidance of Shuhao Wu.
If building systems from the ground up to solve real-world problems interests you, our Engineering blog has stories about other challenges we have encountered. Visit our Engineering career page to find out about our open positions. Join our remote team and work (almost) anywhere. Learn about how we’re hiring to design the future together—a future that is digital by design.