{"id":47,"date":"2008-05-24T14:42:34","date_gmt":"2008-05-24T19:42:34","guid":{"rendered":"http:\/\/vgable.com\/blog\/2008\/05\/24\/memcopy-memmove-and-speed-over-safety\/"},"modified":"2009-02-24T07:13:35","modified_gmt":"2009-02-24T12:13:35","slug":"memcopy-memmove-and-speed-over-safety","status":"publish","type":"post","link":"https:\/\/vgable.com\/blog\/2008\/05\/24\/memcopy-memmove-and-speed-over-safety\/","title":{"rendered":"memcopy, memmove, and Speed over Safety"},"content":{"rendered":"<blockquote><p>&#8220;We should forget about small efficiencies, say about 97% of the time: <strong>premature optimization is the root of all evil<\/strong>.&#8221; <\/p><\/blockquote>\n<p>&#8211;Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268.<\/p>\n<p>If you&#8217;ve programmed in C, you&#8217;ve seen <code><a href=\"http:\/\/www.cplusplus.com\/reference\/clibrary\/cstring\/memcpy.html\">memcpy<\/a><\/code> (pronounced &#8220;mem-copy&#8221;).  It&#8217;s the preferred way to copy a block of memory somewhere.  There is a safer and equally easy to use alternative, <code><a href=\"http:\/\/www.cplusplus.com\/reference\/clibrary\/cstring\/memmove.html\">memmove<\/a>.<\/code>  But it has never gained traction, in spite of it&#8217;s advantages.  As we will see, this is unfortunate, and may be an example of a programming culture that values (superficial) speed over safety.<\/p>\n<p><strong>The Difference<\/strong><\/p>\n<p>The difference between <code>memcpy<\/code> and <code>memmove<\/code> is simple: when the source and destination blocks of memory overlap (for example, if they are the same), <code>memmove<\/code> works, but <code>memcpy<\/code>&#8216;s behavior is <em>undefined<\/em>.  It might work.  It might crash.  It might corrupt data.  It might behave differently during debugging.<\/p>\n<p><code>memmove<\/code> is safer.<\/p>\n<p><code>memcpy<\/code> <em>can be<\/em> faster, and usually is.  There are less restrictions on it&#8217;s implementation, so more can be done to optimize it.  But not necessarily a lot more &#8212; in fact, it could even be slower then <code>memmove<\/code>, and sometimes this is the case.  On some systems, <code>memcpy<\/code> may just be <code>memmove<\/code>.<\/p>\n<p><strong>Faster<\/strong><\/p>\n<p>So how much faster can <code>memcpy<\/code> be then <code>memmove<\/code>? They both take <em>O(N)<\/em> time to copy <em>N<\/em> bytes of data.  So in some computer-science circles, <code>memcpy<\/code> wouldn&#8217;t be considered faster then <code>memmove<\/code>.<\/p>\n<p>Algorithmically, <code>memmove<\/code> needs one extra <code>if<\/code>-statement before it starts to copy; to determine if it needs to copy front-to-back, or back-to-front. (See <a href=\"http:\/\/vincentgable.com\/code\/memcpy_memove_lab.c\">this reference implementation<\/a> for an example.) The only other advantage <code>memcpy<\/code> <em>may<\/em> have are esoteric processor-specific instructions that assume <a href=\"http:\/\/www.devx.com\/tips\/Tip\/13825\">restricted pointers<\/a>.  So unless there is a &#8220;<code>memcpy<\/code> instruction&#8221;, we can expect the difference in speed to be pretty small.<\/p>\n<p>But the real proof is in the pudding and &#8230; <strong><code>memmove<\/code> is faster then <code>memcpy<\/code><\/strong>!  At least on my laptop, with <a href=\"http:\/\/vincentgable.com\/code\/memcpy_memove_lab.c\">this test<\/a>; basically copying 4MB of memory 100 times.  See for yourself:<br \/>\n<code><br \/>\n$ gcc -O0  memcpy_memove_lab.c && .\/a.out <br \/>\n&nbsp;&nbsp;&nbsp;Surprise!<br \/>\n&nbsp;&nbsp;&nbsp;memmove is 1.404409 times faster then memcpy<\/p>\n<p>gcc -O3  memcpy_memove_lab.c && .\/a.out<br \/>\n&nbsp;&nbsp;&nbsp;Surprise!<br \/>\n&nbsp;&nbsp;&nbsp;memmove is 1.054571 times faster then memcpy<br \/>\n<\/code><\/p>\n<p>&#8220;This must be an unfair test!&#8221;, you&#8217;re probably thinking. Yes, it is.  Or at least it&#8217;s a <strong>dangerously narrow test<\/strong>.  But it is also an honest and simple test.  By that I mean, it is the <em>first<\/em> code I hammered out to just get some numbers for this article.  Proper benchmarking is <strong>very<\/strong> difficult.  I&#8217;m not going to attempt it.<\/p>\n<p>The real lesson from this naive benchmark is that <strong>you must measure <em>your code<\/em> before concluding that an optimization is really faster<\/strong>.  I would never have guessed that <code>memmove<\/code> would be up to 40% faster, at copying 4MB.  But it was &#8212; in this particular instance.<\/p>\n<p>On a related note, a significantly faster <code>memcpy<\/code> (say 2x) won&#8217;t have an appreciable impact on an application&#8217;s performance, unless the application spends a surprisingly large portion of it&#8217;s time copying memory (<a href=\"http:\/\/en.wikipedia.org\/wiki\/Amdahl's_law\">Amdahl&#8217;s law<\/a>).  For example, let&#8217;s say that 1 out of every 20 seconds is spent copying memory with <code>memmove<\/code>.  (That&#8217;s a lot of time just moving bits around!  Programs should do something with bits, not just move them.)  So we replace <code>memmove<\/code> with <code>memcpy<\/code>, and <em>this<\/em> <code>memcpy<\/code> is a full 2x faster then <code>memmove<\/code> (which is optimistic).  Surprisingly, we only get a 2.5% speedup!  Remember that 1 in 20 is only 5% of the program&#8217;s total time.  Cutting this time in half eliminates 2.5% of the program&#8217;s total execution time.  If we can find an optimization that speeds up the other 95% of the program by just 2.7%, we get better performance overall.<\/p>\n<p>So <code>memcpy<\/code> is unlikely to make a large difference to program performance, <em>in general<\/em>.  Switching <code>memcpy<\/code> implementations is a &#8220;local optimization&#8221;, that has much less value then changing the algorithm that&#8217;s requiring all that duplication.  It may even <a href=\"http:\/\/vgable.com\/blog\/2008\/03\/07\/low-level-optimization-decays-with-time\/\">suddenly become slower<\/a> when hardware is upgraded.<\/p>\n<p><strong>Safer<\/strong><\/p>\n<p>How much safer is <code>memmove<\/code>?  This is a hard dimension to quantify, and I don&#8217;t have a satisfying answer.  My instinct tells me that it isn&#8217;t dramatically safer.  I don&#8217;t have any data to support this, but I believe it&#8217;s very rare to be copying memory into itself; compared to other memory-management errors, like a double-<code>free()<\/code>.<\/p>\n<p>But the bottom line is that, <strong>there are less ways your program can fail if you use <code>memmove<\/code> over <code>memcpy<\/code>.<\/strong>  Period.  Since <code>memcpy<\/code>&#8216;s behavior is <em>undefined<\/em> when the source and destination overlap, it can be a vicious bitch to debug.<\/p>\n<p><strong>Speed over Safety<\/strong><\/p>\n<p><code>memcpy<\/code> is preferred by a significant majority of C programmers.  I don&#8217;t know exactly how many.  But <a href=\"http:\/\/www.googlefight.com\/index.php?lang=en_GB&#038;waord1=memcpy&#038;word2=memmove\">a google fight<\/a> shows that <code>memcpy<\/code> is almost 6x more talked about then <code>memmove<\/code> (as of 2008-04-11).  Anecdotally, <code>memmove<\/code> is mostly unheard of in my experience.  It seems like the call of &#8220;faster&#8221; really is a siren&#8217;s-song for developers; luring them into dangerous code.<\/p>\n<p>I think this is very unfortunate.  Especially, because the performance advantage of <code>memcpy<\/code> just isn&#8217;t that big in general! (Sometimes it&#8217;s even harmful).  Given the <a href=\"http:\/\/www.codinghorror.com\/blog\/archives\/000099.html\">unreliability of software<\/a>, anything that elements bugs is a Very Good Thing.<\/p>\n<p>I wish I knew the full story of <code>memcpy<\/code> winning the popularity contest with <code>memmove<\/code>.  By accident or design, it has left us with a <strong>programming culture that values <em>superficial<\/em> speed over safety<\/strong>.<\/p>\n<p><strong>For Further Reading:<\/strong><\/p>\n<p><a href=\"http:\/\/www.embedded.com\/columns\/technicalinsights\/19205567?_requestid=12115\">Optimizing <code>memcpy<\/code><\/a> &#8212; includes some graphs showing the tradeoffs between optimizing for large chunks of memory (say copying pictures), and small data structures.<\/p>\n<p><a href=\"http:\/\/www.codinghorror.com\/blog\/archives\/000061.html\">Why aren&#8217;t my optimizations optimizing?<\/a> &#8212; &#8220;Optimizing code is a tricky business.&#8221;<\/p>\n<p><a href=\"http:\/\/vincentgable.com\/code\/memcpy_memove_lab.c\">memcpy_memove_lab.c<\/a> &#8212; The naive benchmark from this article, plus a reference implementation of memcpy and memmove.<\/p>\n<p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.&#8221; &#8211;Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268. If you&#8217;ve programmed in C, you&#8217;ve seen memcpy (pronounced &#8220;mem-copy&#8221;). It&#8217;s the preferred way to copy [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12,4],"tags":[35,36,37],"class_list":["post-47","post","type-post","status-publish","format-standard","hentry","category-design","category-programming","tag-hubris","tag-memcpy","tag-memmove"],"_links":{"self":[{"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/posts\/47","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/comments?post=47"}],"version-history":[{"count":0,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/posts\/47\/revisions"}],"wp:attachment":[{"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/media?parent=47"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/categories?post=47"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/tags?post=47"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}