Speed improvement story 1
I’ve accumulated a number of tech stories from various projects over the years. I thought I’d write some down and share. This first one… I’ve told the highlight portion during interviews years ago, and it probably comes across as either total BS, or me seemingly not knowing what I’m talking about, or what not, but it did happen, and… in some ways isn’t *that* exciting, but there’s a lesson or two you can pull from it.
Started working at a company that was importing data from their clients. One client had a large and growing dataset. Data was imported so we could do some other processing on it – generating daily and weekly reports, IIRC. The import started to take a *long* time. When it started, the import took around an hour. Then two. By the time I started working there, it as 23 hours. You can sort of see where this is going. At some point, it’ll take longer than a day, and you wouldn’t be able to run ‘daily’ numbers if you don’t have a still set of source data.
Couple of folks had been ‘working on it’ for some time – on and off for a couple months, once they understood it would be a problem.
Once I took a look, it took me a couple days. I got the import process down from around 24 hours to around… 30 minutes. The original guys took what I’d done and tweaked it to 20 minutes, but… 30 was still sufficient.
When I’ve told this story during interviews, it was normally for “what’s a tech achievement you’re proud of?” And I tell it, and half the time, I don’t think they believed me.
So… little more background. This was early 2000s, and the database was MySQL (3? 4?) running on a moderately large linux box and spinning disks.
The key thing I did was to chunk up the original data file in to smaller segments (100k rows each, for example). The I created a temp table ‘in memory’ – that was key. I’d import 100k records in to memory, then copy them to a disk table. Then repeat. It was comparatively blazing fast. I also disabled some indexes during the import/copy so that it wasn’t trying to update the disk index so often.
Lessons? If something’s not working as expected, see if you can reframe the problem some. Running mysqlimport directly on a large file may sometimes be the only way you can handle things, but you may be able to reframe the problem and take other steps to get the same results.