Evidence-based principles.


A previous post tried to find objective correlations1, 2 between the number of times a method was updated and its structural properties.

Given that the probability of a method's being updated during a project's lifetime depends entirely on whether that method partakes in the features arbitrarily desired by customers, it might have been guessed that such end-user whimsy would completely randomize any connection between method and property, such that no correlations whatsoever would exist.

Yet this was not the case.

True, most properties indeed showed no correlation. But some did. The correlations were weak, but not negligible, thus presenting at least some evidence that the following three principles really do help reduce software development cost:

  1. Manage method size.
  2. Manage method impact set - the number of all other methods that the given method depends on, directly or transitively.
  3. Manage the number of transitive dependencies - running through methods.

Programmers have known of these principles for years: the first being rather ovious, the latter two combining to say, roughly, don't make spaghetti code. It is nevertheless reassuring to find support for these principles - however slight - carved into the actual code, rather than stemming wholesale from mere subjective opinion.

Program structure, of course, extends beyond method-level. Java enjoys at least two other levels: class-level and package-level. Can we find evidence for the above three principles at these two higher levels?

The experiment was re-run3, this time counting the number of times a class or package was updated during a project's lifetime and correlating this with the properties4 of that class or package. Table 1 shows the results for all three levels.

Property S. D. f. C. c. T. d. I.S. A. p. c. A. C. d. Impd. S. C. M. D. D. o.
Method 0.36 0.31 0.28 0.25 0.29 0.08 0.13 0.16 0.01 0.15 0.13 0.18 0.01
Class 0.32 0.21 0.27 0.23 0.06 0.38 0.20 0.12 0.14 -0.10 -0.10 0.16 0.18
Package 0.41 0.32 0.39 0.31 0.05 0.42 0.27 0.22 0.27 0.0 0.0 0.31 0.19
Avg. 0.36 0.32 0.31 0.26 0.13 0.29 0.20 0.17 0.14 0.02 0.01 0.22 0.13

Table 1: Spearman correlations of structural properties with number of times a method, class5 and package6 was updated.

Looking at the average correlations over all three levels, we find - alas - many structural properties show no correlation (<0.2) with number of updates, and none shows a strong correlation. Again, however, three properties do show some correlation, albeit weak: size, number of transitive dependencies and absolute potential coupling.

Two of these properties we've encountered already and show correlations across all levels, though absolute potential coupling gains importance only at class- and package-level.

Absolute potential coupling is just the number of other elements that a given element can see. For example, a method can see all private methods in the same class, but cannot see private methods in other classes in the same package, and can only see public methods in public classes in other packages.

Thus, absolute potential coupling measures (approximately) how poorly a program is encapsulated, with this correlation showing that more methods that a given method is exposed to, the more often that method will be updated. This, once more, programmers know well.

It would be shame, however, to abandon the impact set principle at method-level simply because it loses support at class- and package-level. Thus we can summarise the findings on all levels by establishing the four evidence-based principles:

  1. Manage Size.
  2. Manage method Impact set.
  3. Manage absolute Potential coupling.
  4. Manage the number of Transitive dependencies.

Summary.

Programmers employ many structural principles in the production of high-quality code, some based on personal experience, some gleaned from books of worthy authority. These principles remain of inestimable value.

Some principles, however, exert so powerful an influence that cold statistics can harvest faint traces of their passing from source code alone. Despite the weakness of the correlations that identify these principles, they clearly boast better objective evidence than principles that lack any evidence whatsoever. These are the evidence-based principles. They in no way render other principles useless, but merely bask in the hazy sunshine of ever-so slightly increased credibility.