{"id":52,"date":"2022-10-29T16:37:12","date_gmt":"2022-10-29T20:37:12","guid":{"rendered":"https:\/\/kimsal.com\/blog\/?p=52"},"modified":"2022-10-29T16:37:12","modified_gmt":"2022-10-29T20:37:12","slug":"speed-improvement-story-1","status":"publish","type":"post","link":"https:\/\/kimsal.com\/blog\/2022\/10\/29\/speed-improvement-story-1\/","title":{"rendered":"Speed improvement story 1"},"content":{"rendered":"\n<p>I&#8217;ve accumulated a number of tech stories from various projects over the years.  I thought I&#8217;d write some down and share.  This first one&#8230; I&#8217;ve told the highlight portion during interviews years ago, and it probably comes across as either total BS, or me seemingly not knowing what I&#8217;m talking about, or what not, but it did happen, and&#8230; in some ways isn&#8217;t *that* exciting, but there&#8217;s a lesson or two you can pull from it.<\/p>\n\n\n\n<p>Started working at a company that was importing data from their clients.  One client had a large and growing dataset.  Data was imported so we could do some other processing on it &#8211; generating daily and weekly reports, IIRC.  The import started to take a *long* time.  When it started, the import took around an hour.  Then two.  By the time I started working there, it as 23 hours.  You can sort of see where this is going.  At some point, it&#8217;ll take longer than a day, and you wouldn&#8217;t be able to run &#8216;daily&#8217; numbers if you don&#8217;t have a still set of source data.<\/p>\n\n\n\n<p>Couple of folks had been &#8216;working on it&#8217; for some time &#8211; on and off for a couple months, once they understood it would be a problem.<\/p>\n\n\n\n<p>Once I took a look, it took me a couple days.  I got the import process down from around 24 hours to around&#8230; 30 minutes.  The original guys took what I&#8217;d done and tweaked it to 20 minutes, but&#8230; 30 was still sufficient.<\/p>\n\n\n\n<p>When I&#8217;ve told this story during interviews, it was normally for &#8220;what&#8217;s a tech achievement you&#8217;re proud of?&#8221;  And I tell it, and half the time, I don&#8217;t think they believed me.<\/p>\n\n\n\n<p>So&#8230; little more background.  This was early 2000s, and the database was MySQL (3? 4?) running on a moderately large linux box and spinning disks.<\/p>\n\n\n\n<p>The key thing I did was to chunk up the original data file in to smaller segments (100k rows each, for example).  The I created a temp table &#8216;in memory&#8217; &#8211; that was key.  I&#8217;d import 100k records in to memory, then copy them to a disk table.  Then repeat.  It was comparatively blazing fast.  I also disabled some indexes during the import\/copy so that it wasn&#8217;t trying to update the disk index so often.<\/p>\n\n\n\n<p>Lessons?  If something&#8217;s not working as expected, see if you can reframe the problem some.  Running mysqlimport directly on a large file may sometimes be the only way you can handle things, but you may be able to reframe the problem and take other steps to get the same results.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve accumulated a number of tech stories from various projects over the years. I thought I&#8217;d write some down and share. This first one&#8230; I&#8217;ve told the highlight portion during interviews years ago, and it probably comes across as either total BS, or me seemingly not knowing what I&#8217;m talking about, or what not, but&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","footnotes":""},"categories":[2],"tags":[],"class_list":["post-52","post","type-post","status-publish","format-standard","hentry","category-lessons"],"_links":{"self":[{"href":"https:\/\/kimsal.com\/blog\/wp-json\/wp\/v2\/posts\/52","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kimsal.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kimsal.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kimsal.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kimsal.com\/blog\/wp-json\/wp\/v2\/comments?post=52"}],"version-history":[{"count":0,"href":"https:\/\/kimsal.com\/blog\/wp-json\/wp\/v2\/posts\/52\/revisions"}],"wp:attachment":[{"href":"https:\/\/kimsal.com\/blog\/wp-json\/wp\/v2\/media?parent=52"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kimsal.com\/blog\/wp-json\/wp\/v2\/categories?post=52"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kimsal.com\/blog\/wp-json\/wp\/v2\/tags?post=52"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}