“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.”
–Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268.
If you’ve programmed in C, you’ve seen
memcpy (pronounced “mem-copy”). It’s the preferred way to copy a block of memory somewhere. There is a safer and equally easy to use alternative,
memmove. But it has never gained traction, in spite of it’s advantages. As we will see, this is unfortunate, and may be an example of a programming culture that values (superficial) speed over safety.
The difference between
memmove is simple: when the source and destination blocks of memory overlap (for example, if they are the same),
memmove works, but
memcpy‘s behavior is undefined. It might work. It might crash. It might corrupt data. It might behave differently during debugging.
memmove is safer.
memcpy can be faster, and usually is. There are less restrictions on it’s implementation, so more can be done to optimize it. But not necessarily a lot more — in fact, it could even be slower then
memmove, and sometimes this is the case. On some systems,
memcpy may just be
So how much faster can
memcpy be then
memmove? They both take O(N) time to copy N bytes of data. So in some computer-science circles,
memcpy wouldn’t be considered faster then
memmove needs one extra
if-statement before it starts to copy; to determine if it needs to copy front-to-back, or back-to-front. (See this reference implementation for an example.) The only other advantage
memcpy may have are esoteric processor-specific instructions that assume restricted pointers. So unless there is a “
memcpy instruction”, we can expect the difference in speed to be pretty small.
But the real proof is in the pudding and …
memmove is faster then
memcpy! At least on my laptop, with this test; basically copying 4MB of memory 100 times. See for yourself:
$ gcc -O0 memcpy_memove_lab.c && ./a.out
memmove is 1.404409 times faster then memcpy
gcc -O3 memcpy_memove_lab.c && ./a.out
memmove is 1.054571 times faster then memcpy
“This must be an unfair test!”, you’re probably thinking. Yes, it is. Or at least it’s a dangerously narrow test. But it is also an honest and simple test. By that I mean, it is the first code I hammered out to just get some numbers for this article. Proper benchmarking is very difficult. I’m not going to attempt it.
The real lesson from this naive benchmark is that you must measure your code before concluding that an optimization is really faster. I would never have guessed that
memmove would be up to 40% faster, at copying 4MB. But it was — in this particular instance.
On a related note, a significantly faster
memcpy (say 2x) won’t have an appreciable impact on an application’s performance, unless the application spends a surprisingly large portion of it’s time copying memory (Amdahl’s law). For example, let’s say that 1 out of every 20 seconds is spent copying memory with
memmove. (That’s a lot of time just moving bits around! Programs should do something with bits, not just move them.) So we replace
memcpy, and this
memcpy is a full 2x faster then
memmove (which is optimistic). Surprisingly, we only get a 2.5% speedup! Remember that 1 in 20 is only 5% of the program’s total time. Cutting this time in half eliminates 2.5% of the program’s total execution time. If we can find an optimization that speeds up the other 95% of the program by just 2.7%, we get better performance overall.
memcpy is unlikely to make a large difference to program performance, in general. Switching
memcpy implementations is a “local optimization”, that has much less value then changing the algorithm that’s requiring all that duplication. It may even suddenly become slower when hardware is upgraded.
How much safer is
memmove? This is a hard dimension to quantify, and I don’t have a satisfying answer. My instinct tells me that it isn’t dramatically safer. I don’t have any data to support this, but I believe it’s very rare to be copying memory into itself; compared to other memory-management errors, like a double-
But the bottom line is that, there are less ways your program can fail if you use
memcpy. Period. Since
memcpy‘s behavior is undefined when the source and destination overlap, it can be a vicious bitch to debug.
Speed over Safety
memcpy is preferred by a significant majority of C programmers. I don’t know exactly how many. But a google fight shows that
memcpy is almost 6x more talked about then
memmove (as of 2008-04-11). Anecdotally,
memmove is mostly unheard of in my experience. It seems like the call of “faster” really is a siren’s-song for developers; luring them into dangerous code.
I think this is very unfortunate. Especially, because the performance advantage of
memcpy just isn’t that big in general! (Sometimes it’s even harmful). Given the unreliability of software, anything that elements bugs is a Very Good Thing.
I wish I knew the full story of
memcpy winning the popularity contest with
memmove. By accident or design, it has left us with a programming culture that values superficial speed over safety.
For Further Reading:
memcpy — includes some graphs showing the tradeoffs between optimizing for large chunks of memory (say copying pictures), and small data structures.
Why aren’t my optimizations optimizing? — “Optimizing code is a tricky business.”
memcpy_memove_lab.c — The naive benchmark from this article, plus a reference implementation of memcpy and memmove.