The tragedy of package coupling.


Stockholm picture.

Familiarity.

Definition: "A package is a grouping of related types providing access protection and name space management."

Thus speaks Oracle.

Let us celebrate this venerable structural stalwart with a rousing tour of examples showing just how classes are grouped and protected within their loving confines. Supplying these specimens will be no Java weakling found huddled in an internet basement, but Apache's flagship software management tool, Maven itself. Specifically, the maven-core-3.0.5.jar has been casually selected as the lucky donor.

The contents of these packages will be shown as spoiklin diagrams in which a circle represents a class, a straight line represents a dependency from a class drawn above to one drawn below and a curved line represents a dependency from a class drawn below to one drawn above. The colour of a class indicates the relative number package-dependency transitive dependencies of which it partakes: the redder, the more transitive dependencies.

Pretty as a package.

Image generated by Spoiklin Soice

Figure 1: Package org.apache.project.artifact

Figure 1 shows the artifact package. It is a handsome beast. The study and pursuit of good software structure rewards in many ways but none compares to the cash savings it yields. Governing the predictability of the potential cost of change, good structure minimises the deep, sinewy coupling that thrashes quarterly profit-and-loss accounts all over the world. The artifact package, however, shows not a trace of this coupling plague. Dependencies between its classes lay open for all to read, rendering the task of tracing potential ripple-effect vectors a triviality. The only class of any connectedness weight, MavenMetadataSource (its name alas clipped to the right) stands proudly over its four subordinates daring all comers to find a single difficult connection to follow. Indeed, the following shows the four transitive dependencies running through MavenMetadataSource:

  1. DefaultMetadataSourceMavenMetadataSourceArtifactWithDependencies
  2. DefaultMetadataSourceMavenMetadataSourceInvalidDependencyVersionException
  3. DefaultMetadataSourceMavenMetadataSourceProjectRelocation
  4. DefaultMetadataSourceMavenMetadataSourceMavenMetadataCache

Image generated by Spoiklin Soice

Figure 2: Package org.apache.maven

Figure 2 presents the maven package, surely the pivotal package examined and another beauty. That DefaultMaven class presides over a loyal and well-heeled court, showering them with short crisp dependencies. Free from gnarly tangled coupling, the picturesque sunburst pattern finds an almost perfect real-world incarnation, the programmers having designed with a skill and an effortlessness rare in commercial Java package structuring.

Image generated by Spoiklin Soice

Figure 3: Package org.apache.maven.lifecycle

Finally, figure 3 graces us with the lifecycle package. Perhaps not as cohesive as its forerunners, lifecycle nevertheless charms with its scarcity and soothes with its couplinglessness, its classes either haughtily independent of one another or holding hands in simple geometric relationship.

Hardly a programmer in the world would harbour the slightest fear of maintaining any of these three packages. It might even be fun. Maven, then, has treated us to a masterclass of package design.

The only problem is that it's all complete bollocks.

Expansions.

For a profession that shrieks with such visceral abhorrence at the mere suggestion of coupling, we programmers seem strikingly submissive. We talk a good game. We just do very little about it. To the outside world, we must look like a great mass of plumbers regularly staging glittering international conferences on the evils of leakage, describing the almost mythical leaks that used to happen in the bad old days and congratulating ourselves on the fantastically leak-free infrastructures installed daily into houses the world over, only for the average home-owner to enter her newly appointed bathroom and see copper pipes dangling everywhere with water spraying from every seal.

As class-diagrams showing the contents of packages, the above three diagrams are all very well. They have their place. The sometimes yield enormous value. But this illuminates merely one of the two aspects of the package, namely the, "Grouping of related types," aspect; the other, that of, "Providing access protection," must manifest itself via the coupling (or lack thereof) of the package's contents within the context of the entire system. (The Java compiler takes care of the third aspect, that of, "Name space management," by highlighting naming collisions, so we need not consider that integrity here.) Thus as instruments exposing the structural integrity of Maven's packages from a coupling point of view, the above diagrams fail hilariously.

A transformation called a structural expansion helps make this point. This involves taking any of the figures above and expanding each class to include all those other classes with which it shares transitive dependencies, that is, to show all those other classes with which it is coupled.

Image generated by Spoiklin Soice

Figure 4: Structural expansion of org.apache.project.artifact of figure 1

Figure 4 shows the structural expansion of figure 1, of the artifact package. Only now do we see the extent to which that beautiful set of classes in figure 1 truly depends on, and is depended on by, the rest of the system. Coupling has run riot. The MavenMetadataSource class, for example, so laudable when seated in isolation, suddenly appears brutish. It now flings dependencies far and wide, in turn finding itself confused by invading dependencies to the point where tracing potential ripple-effects becomes dizzying.

And this is the best of the three diagrams.

Image generated by Spoiklin Soice

Figure 5: Structural expansion of org.apache.maven of figure 2

Figure 5 presents the structural expansion of the maven package shown in figure 2. Transformed utterly, it has unmasked a savage truth: that this package is coupled to an enormous amount of classes in other packages.

Image generated by Spoiklin Soice

Figure 6: Structural expansion of org.apache.maven.lifecycle of figure 3

Figure 6 completes the wayward set. The lifecycle package appears to have been a cruel illusion all along for when seen from the coupling point of view it has simply vanished.

(Strictly speaking, these latter diagrams show first-order structural expansions; a second-order structural expansion would be achieved by taking a structural expansion of figure 6, for example, to produce a beast two expansions away from figure 3.)

Why coupling defeats us.

If packages define themselves along two dimensions - that of, "Grouping," and of, "Access" - then we may demand that they account for themselves accordingly. They must demonstrate their integrity with respect to both dimensions. A, "Grouping integrity" - if it must be called so - might measure a package's grouping-of-related-types-ness (which lies beyond the scope of this post). Its, "Access integrity," would establish the degree to which it protects package contents from outside interference: largely along this axis, then, rests the package's resistance or susceptibility to coupling. Maven, and perhaps the programming community at large, would seem to have well understood this first aspect: nothing suggests impropriety in its herding of classes into their chosen packages (not that a thorough semantic analysis was undertaken). The access integrity of some of its key packages, however, fares less well.

The initial class diagrams above only reinforce this lopsidedness, providing an artificially constrained environment from which external coupling has been banished, presenting, in essence, packages defanged, shorn of the means of defending themselves against barbarian interconnectedness. The structural expansion simply throws all this under a spotlight, underscoring the deficiencies of this blinkered view. A package, once subjected to a structural expansion, that finds itself instantly obliterated before a hurricane of dependencies has just as little right to pronounce itself well-structured as a package bloated with hundreds of dis-unified and squabbling classes; both should be put to the refactoring sword. Yet this seems uncommon practice.

Why is this so? Why is access integrity the perennial poor cousin of grouping integrity? It may be that certain entities of programming naturally consume greater attention than others: programmers think in terms of methods, of classes and of packages because these blaze explicit in their day-to-day work, they must be typed into being. Few packages arise unannounced, surprising the programmer at a project's end. transitive dependencies, however, establish themselves implicitly. No one ever types a dependency. Instead, the programmer's fingers type up first one method and then another; the third element, the dependency, sneaks in through a back door, furtive, invisible. Suddenly transitive dependencies cascade through a system without anyone having actually created them. Perhaps if programmers were forced to type dependencies - to create method a() and method b() and then actually type connection(a, b) - then coupling would blip onto the mental radar, brash and unignorable. Or, more extremely, if transitive dependencies were the unit of functional distribution, with programmers vying to connect to jealously guarded dependency trees whose responsibles feared dependency first and craved functionality only second; such programs might look very different from those that slither behind our applications today.

What should we do?

It is a truth universally resented that there are only two ways to lose weight: eat less or exercise more. Similarly, there are only two ways to manage package coupling: reduce the number of transitive dependencies on packages or reduce the number of transitive dependencies from packages. Note that coupling concerns not just the number of dependencies but of transitive dependencies. A single dependency may leave one package and terminate in another making both packages coupled but weakly so. If, however, that dependency fails to terminate quickly and instead explodes into countless others snaring package after package in its path then the latter case offers far more vectors for ripple-effects than the former and hence couples the packages with far greater force.

A simple solution to package coupling exists but, as with eating less and exercising more, few wish to listen. As it happens, it too comes in two parts. Firstly, if you wish to reduce the number of transitive dependencies on a package then allow just one dependency right from the beginning. Though this sounds impractical no Java package requires more than one in-coming dependency; as long as the class depended-on can create the other necessary classes within the package then it alone need be the only public class in the package, all else being package-private. Secondly, to reduce the number of transitive dependencies leaving a package, have each package accessible only via interfaces (facades) to the services which it provides and export these to an interface repository elsewhere. Yes, this interface repository will then be a package with a large number of in-coming dependencies but all these dependencies will terminate immediately, falling as they do on interfaces only, not on implementation classes with yet more implementation dependencies. This essentially divides packages into implementation repositories and interface repositories, a solution as elegant as it is rarely deployed. (Good design helps to further segregate these interface repositories.)

A passing word on Maven.

It would be unfair to leave the impression either that Maven is unique in its handling of the access integrity of its packages - there seems little reason to presume Maven's unusualness in this respect - or indeed that all its packages fared ill. Many of Maven's packages handle their access integrity flawlessly. Again, as a rule of thumb, if a package of classes retains its approximate configuration once subjected to a structural expansion, if it remains recognizable without having been overwhelmed by waves of foreign classes, then it can be said to enjoy access integrity.

Image generated by Spoiklin Soice

Figure 7: Package org.apache.maven.classrealm before and after expansion

Figure 7 shows package classrealm before (left) and after (right) a structural expansion. The degree to which both images resemble one another - particularly in their depth, or, roughly, the number of horizontal grey bars - testifies to the high-level of access integrity classrealm employed during its design.

Image generated by Spoiklin Soice

Figure 8: Package org.apache.maven.toolchain before and after expansion

Figure 8 shows package toolchain before (left) and after (right) a structural expansion. Screen real-estate limits explanatory power but this too evidences excellent coupling control.

The unfortunate point remains, however, that such shining access integrity should characterize not only the small outlying packages but core heavy-lifters. And such alas is not the case.

Summary.

We are losing the war on coupling.

Little will change until front-line programmers tear off their blindfolds and see coupling as the enemy it is, until they learn to identify the stinking lairs in which coupling gestates, until they fix bayonets and get in nice and close.