I have latched onto an idea, but don’t have the resources to follow up on it: could a static-analysis tool identify repeated patterns of code, across many code bases, that should be extracted out as subroutines and higher-level functions? How universal would these “emergent libraries” be?
My inspiration here is Section 4.1 Identifying Common Functions, in the excellent paper Some Thoughts on Security After Ten Years of qmail 1.0 (PDF), by Daniel J. Bernstein,
Most programmers would never bother to create such a small function. But several words of code are saved whenever one occurrence of the
dup2()
/close()
pattern is replaced with one call tofd_move();
replacing a dozen occurrences saves considerably more code than were spent writing the function itself. (The function is also a natural target for tests.) The same benefit scales to larger systems and to a huge variety of functions;fd_move()
is just one example. In many cases an automated scan for common operation sequences can suggest helpful new functions, but even without automation I frequently find myself thinking “Haven’t I seen this before?” and extracting a new function out of existing code.
What’s particularly fascinating to me are the new operations we might find.
Before I was exposed to the Haskell prelude I hadn’t known about the fundamentally useful foldl
and foldr
operations. I had written dozens of programs that used accumulation, but it’s generalization hadn’t occurred to me — and probably never would have. Static analysis can help uncover generalizations that we might have missed, or didn’t think were important, but turn out in practice to be widely used operations.