When duplication helps

For those using a RSS reader, subscribe here: rss.xml

Duplication, do you address it yourself or is the compiler addressing it? At every point you see duplication, I consider it a choice you have to make. “Well, duplication is bad, I should abstract it” is the common answer here. But duplication can also be an answer, and here is why.

From the point of the compiler, it shouldn’t affect the behavior of the code. It should be purely for organizing code. That is if you take optimizations and concurrency with a grain of salt. It should be a benefit to the reader, which are the other programmers that will read this code later. This includes your future self, so you better make sure the code is readable.

But, how will this work when the program needs to change? Because the programmer will attempt to change the code to fit new requirements. And if the abstraction doesn’t fit anymore, the idea behind the abstraction got outdated.

To get rid of the now wrong abstraction, you first identify how the abstraction works and their concrete counterparts. Implicit rules of the language become explicit, like inheritance turns into explicit delegation; by duplicate calling structure and further by duplicate code.

And then you have it unrolled, right in front of your eyes, to the most modifiable state you need, all the way down to assembly if you wish. And then you freely make the needed change, a specific change to the code that takes in the new requirement. The skill is to know what to write, so the code as a whole can be re-abstracted, and it won’t be the old code with the new requirement as a blob. Code is never in a perfect state, but to leave it as the latter is neglecting the code.

How would this re-abstracting work? An oversimplified example would be taking a for-each loop, with the requirement that you only take the even indices. Assuming imperative programming, by this plan you would unroll into a for loop with a counter. Then you make the change by changing the advancement part. That puts the new requirement idea in and the old one out.

But that syntax has too much variability for this idea, so the code is under-abstracted for this idea. I find it ugly to set the counter to zero and the loop condition to remain within bounds; too much focus on the how and prone to typing errors. So you re-abstract. It might sound silly to do this for a for loop, remember that this is an oversimplified example. As having the for-each loop is clearer, even if you know it won’t fit new requirements. I argue it is better to keep the code as simple it can be for your current requirements. You can bring benefit to the future reader by keeping it simple, and not use that code further once new requirements come in. Otherwise the tendency for future-proofing is too strong. It fits with developing incrementally; each version of your project got the refactoring it needed and is usable.

Real requirements can go against your whole architecture. The programmer will start to panic and realize the design is crumbling down when changing the code to the requirement. It is a divide and conquer against the code, which at every step you break open the needed parts to fit the requirement. Abstracting is solidifying the ideas, while keeping duplication is accepting you don’t want to maintain the current underlying idea to that extend. Keeping duplication to a degree is delaying the second step of this plan to the future, a future where this idea is perhaps not relevant anymore. Where importance has dropped, and it isn’t the center of your design anymore.

The center of your design serves as the opposite, the parts you don’t want to change. You accept the high risk for the high reward it gives, for the months that will follow and don’t have to work with too open code.

In summary:

  1. Code has a lifetime, bound to whenever the idea behind it changes.
  2. Duplication is bliss while discovering change.