One of the first things we learn as developers is that if you're going to write any amount of code you're going to have to have a strategy of code reuse. Everyone is familiar with the curse of verbose, repetitive code and its effects on maintainability. We've all run into problems where we didn't know if all of the places X was done were correctly updated and that can lead to subtle and hard-to-squash bugs. Unfortunately, while we all have learned that repetition in code is a bad thing, I think we've failed to find an algorithm to removing duplicate code.
I'm beginning to believe that "best practices" are a tool for people who can't be bothered to do a basic cost/benefit analysis for themselves. I'm not saying I'm smarter than the people who coin popular adages, I would just rather understand the 'why' than the 'how' on issues where I'm supposed to follow a convention.
One of the things that is almost never discussed is the idea that code-reuse may not be our ultimate goal. Everybody touts how their style of programming reduces redundant code by X%, but is that even a good thing? When we're so fired up to exterminate code duplication from our code base, we need to realize that like everything else, duplicate code may have some benefits to go along with the long list of costs.
Costs of Duplicate Code
Everybody knows the cost of duplicate code, right? More code means more space for bugs to hide in. Duplicate code may become out-of-sync with its clones. Everybody has to reinvent the wheel. From a maintainability perspective, it's very clear that we ought to at least seriously consider removing as much code duplication from our product as possible.
To be perfectly clear, I'm not advocating copypasta, I completely agree that these are valid concerns when code gets duplicated and I think that they should be considered and weighed against other concerns when a decision is made whether or not to extract a method/function/macro.
Benefits of Duplicate Code
What could the benefits of duplicate code be? To start with, I'd like to call on a word that Rich Hickey taught me (well, not me personally; more like everyone who watches his talks on InfoQ): complecting. If you're too lazy to look at that definition, the gist is this: complecting is weaving two things together. Now every time you see an opportunity to re-factor you have to ask yourself: should these things be tied together?
One problem that I often see is people jamming multiple paths of execution into a single procedure in order to reuse the existing code for a particular resource such as the database or what have you. Over time, new parameters are introduced to allow old calls to the procedure to continue as expected while adding new features to invocations going forward.
One reason this is a problem is that the complexity in the procedure starts to skyrocket. If you look at the available code paths as opposed to the few code paths actually taken, you start to see plenty of dead ends and undefined behavior lurking in the unused code paths. How can you tell if those paths are ever taken? Well, if your lucky, your function takes mostly primitives and you can look at all of the call-sites. If your less likely, your code will contain some well-worn objects as parameters and it will be a pain to scan through every path where that object is created to see if it is passed into your functions. This will range between hard and impossible depending on the size of your code base and how well 'reused' your code is.
Why is complexity a problem? Well, every time you add another code path to a procedure, you need to carefully study the effects that the new path might have on old paths. Hopefully you can satisfy yourself quickly that there are no adverse interactions, but you may end up getting lost in all of the branches and sub-routines. If you're trying to verify you didn't break anything, you had better hope you have access to a pretty representative test suite or some thorough testers. As C.A.R. Hoare put it: "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies." Unfortunately the added complexity tends to push us toward the latter.
One thing that I've come to realize is that it is a pain to trigger regression testing. All of the worst things come out when you make changes to code that is already working and would not otherwise be a part of your project. Unfortunately, religiously re-factored code tends to tie everything together in one big knot, causing a project-wide regression test when someone makes a change. The best way to avoid it is to only re-factor code out of parts of the project that have to be tested in tandem no matter what changes were made to them.
The code that is the hardest to measure by far is library code. In some circumstances, someone has implemented a particular algorithm that is needed in other places. My take on this is that a function should represent on particular algorithm for performing a task and while it's not off-limits to ever touch it, one should be very mindful when working on it to ensure that the changes being made are in the best interest of the whole project. Unless you know that you should be updating every occurrence of the specified function, I would prefer greatly to create a new function and migrate all of the callers one-by-one as needed. Of course, this can't be a general rule because as I stated above, I'm decidedly against general rules.
Overall, I hope that you will have more to think about when you are making the decision to re-factor code and that you'll take care not to create a burden for future maintainers (or even your future self).