One optimization that I’ve employed several times involves batching write queries. MySQL has some very efficient ways to load multiple rows of data in a single query. Multi-row inserts are one common way to do it, but if you’re adventurous you can also try using LOAD DATA INFILE.

Multi-row inserts are just what you would assume: they allow you to insert multiple rows with a single INSERT statement.  The syntax is simple; after the VALUES keyword, you enclose your first row’s data in parentheses, then add a comma, and proceed to the next row’s data, and repeat as many times as needed.

The benefit of using multi-row inserts is significantly less overhead.  There is only one SQL query to parse; there is only one transaction; and indexes only need to be redone once.

Batching writes makes sense for some data, but not for others.  In my experience, I’ve mostly used this technique when dealing with non-critical data, such as recording the last time a user logged in.  It isn’t critical that the data is recorded immediately, and in the unlikely event that the data is lost, it isn’t that big of a deal.

When a user logs in, I record that information somewhere fast, possibly using an asynchronous call.  Redis is a good candidate for storing this information, as it has a native LIST type.  Every so often (once a minute, for example), a script will aggregate all of the pending login information into a single query, and clear the list.  Whatever method you use for storing the data temporarily, you’ll need to make sure that you do not lose data by removing it before it is processed.

There’s one potential flaw with this way of doing things: what if we want to update existing rows instead of inserting new ones?  There are a couple ways you can handle this.

First, if you ensure your temporary data is always 100% reliable, you can use INSERT … ON DUPLICATE KEY UPDATE to do multiple updates at a time.  You’ll just need to insert rows that you know already exist, and then use the UPDATE clause to do the actual work.  If there’s a chance you might have non-existent rows in your data, however, this method will result in bogus rows in your database.  Obviously, this is not ideal.

Second, you can do a bulk insert (using multi-row inserts or LOAD DATA INFILE) into a temporary table in your database.  Then, you can perform a multi-table UPDATE to update the table you are intending to write the data to.  This requires an extra query over the first method, but ensures that extraneous data is not inserted into your table.

Share →

2 Responses to Optimizing MySQL: Batching your write queries

  1. Joseph Scott says:

    More than once I’ve been saved by multi-row batch inserts, definitely a huge win when used appropriately.

  2. If you want adventure, try crafting BINLOG statements for batching writes :)

Leave a Reply

%d bloggers like this: