The 80% rule.


EDIT: A later post cast doubt, rather ironically, on the Impacted Set as a good example!

Look, you want job security in the software industry? Here's how you do it.

Examine software fundamentals, tease out the problems associated with those fundamentals and learn to solve those problems. These fundamentals are shared by all computer languages, across all applications, are unchanged in forty years and are unlikely to change any time soon. As are their associated problems. Hence your workplace desirability.

The most fundamental of these fundamentals reveals itself in a trivial observation: software is composed of interacting parts. It has structure.

The timeless problem of structure is that when you change one part, you may also have to change others because these interacting parts ... interact. This means locking horns with your favourites: coupling and ripple effects. From a programmer's - and corporate accountant's - point of view, if a program's heavily coupled, its design is crap. No buts. No ifs. No arguments. If a program is not heavily coupled, then it may or may not be crap; only further investigation will tell. Low coupling does not imply good design, but presents a crisp, inescapable prerequisite.

So, how do you tell - at a glance - whether a program's heavily coupled? You measure a property of its methods. Then you establish a threshold coverage - ooh, say, 80%1 - such that only when 80% of the program's methods meet a certain property value can that program be considered loosely coupled. If it doesn't, then that puppy's heavily coupled and crappage ensues.

So what property do you measure? How do you measure coupling2?

Take method x() and count all the methods that depend on it either directly or indirectly: that's your property. That's the impacted set of x(), the most likely set of the methods which (in a worst case scenario) might have to change when x() changes. That simply is the coupling.

Combining both ideas, your job-securing, lightning-fast program evaluation will look like this: 80% of this program's methods are depended upon by up to 10 other methods: this program's design is crap. But pay me and I'll do much better!

Or: 80% of this program's methods are depended upon by up to 17 other methods: ugh. You really need me working here.

Or: 80% of this program's methods are depended upon by up to 23 other methods: my rate's doubling with every minute.

But ... which figure do you choose? 10, 17 or 23? Is there an absolute figure for sloppy design? Well, no. But this litmus test allows us to compare any two programs. So let's turn the problem around and re-examine some Java programs that this blog's already studied (and others besides) to find a nice low coupling figure to aspire to. Once we identify the good, all the rest will be the bad.

Figure 1 shows the impacted set for Spring 3.2.0.RC13.

Figure 1: Spring's impacted set

Figure 1: Spring's impacted set.

Yeah, so it's a CDF graph and graphs put you in the kitchen at parties. But this one's interesting, honest: find 80% on the Y-axis, follow it across till it hits the curve and read down to the X-axis. Thus: 80% of Spring's methods are depended upon by up to 8 other methods. Is this good? Is this high or low coupling? We shall see.

Figure 2: JUnit's impacted set

Figure 2: The impacted set of JUnit 4.11.

80% of JUnit's methods are depended upon by up to 9 other methods. We're going in the wrong direction. We want fewer depending methods, not more.

Figure 3: Lucene's impacted set

Figure 3: The impacted set of Lucene 5.2.1.

80% of Lucene's methods are depended upon by up to 9 other methods. No help here.

Figure 4: Log4j's impacted set

Figure 4: The impacted set of Log4j 5.2.1.

80% of Log4j's methods are depended upon by up to 6 other methods. Wow, that seems pretty good. Its 80th percentile impacted set is 33% lower than, for example, Lucene's; you might say it's a substantial 33% less coupled. Well done, Log4j. But can we go lower?

Figure 5: Ant's impacted set

Figure 5: The impacted set of Ant 6.2.1.

80% of Ant's methods are depended upon by up to 8 other methods. We're heading in the wrong direction again.

Figure 6: Fitnesse's impacted set

Figure 6: The impacted set of FitNesse 20151230.

80% of FitNesse's methods are depended upon by up to 10 other methods. Eeeewww ...

Figure 7: Antlr's impacted set

Figure 7: The impacted set of Antlr 2.7.7.

80% of Antlr's methods are depended upon by up to 12 other methods. OMFG!

BUT..!

Then we come ...

... to Netty ...

Figure 8: Netty's impacted set

Figure 8: The impacted set of Netty 4.0.36.

Just look at that jaw-dropping low coupling. 80% of Netty's methods are depended upon by up to just 5 other methods!

Here, then, is the gold standard. And Netty's no tiny, easy-to-design toy: it has 13,000 methods of loosely coupled gorgeousness. So if Netty's programmers can do it, why can't you?

And Netty's not even alone. Here's JGroups.

Figure 9: Jgroups's impacted set

Figure 9: JGroups' impacted set.

And this little minx.

Figure 10: Spoiklin Soice's impacted set

Figure 10: The impacted set of Spoiklin Soice.

The conclusion may be drawn that building systems according to the 80% rule - with an 80th percentile impacted set of just 5 methods or fewer - is a demonstrably achievable goal, and because it's achievable, larger 80th percentile values suddenly become commercially questionable.

So there's your job security: learn of how to design programs as good as Netty and your pension's safe.

Summary.

Yes, this post glossed over a slew of subtleties. The 80% threshold is arbitrary. Some programs might conceivably be less coupled even higher thresholds. Everything described here is a metric and all metrics can be faked4. And many more besides.

Despite this, the metric above would seem to capture the age-old phenomenon of coupling.

Do not think for a moment that coupling has been solved and modern programmers have simply wafted on to sexier challenges. Fundamental problems do not "age" away. As you can see from the above examples, coupling continues to generate massive and unnecessary costs in our industry.

Yet clearly some people can manage it better than others. The programmers of Netty look forward to comfortable retirements. Do you?

(Thanks to Mairbek for pointing out Netty as a design worthy of study.)