Encapsulation theory fundamentalsIntroduction.Divide and conquer. It's the standard approach to managing the complexity of a large task. Caesar did it. Charlemagne did it. Civil engineers do it. Programmers do it. Today we speak of modularisation, encapsulation, and information-hiding: terms often confused, but seldom denounced. David Parnas, in his seminal paper in 1972, "On the Criteria to Be Used in Decomposing Systems into Modules," coined the phrase, "Information hiding," and it's been practised ever since. Indeed, approaching a large programming task any other way is almost unthinkable. In object orientation, classes may be encapsulated into subsystems (or packages or name spaces - languages tend to chose their own terms for the, "Capsule" of encapsulation); programmers do this because they, "Know," it's useful. It, "Feels," useful. It, "Feels," right. Given a group of classes to encapsulate, programmers will try to find similarities between classes and group those classes together, perhaps to encapsulate the, "List of difficult design decisions or design decisions which are likely to change," perhaps to group the classes so that they can be released and re-used together. However the choose to do so, they will capture the system from some perspective and parcel it accordingly. For one hundred classes, some programmers might find that fifty subsystems suffice; others, twenty; still others, thirty-seven. And yet, though they may not agree on the exact number to select, they all know that encapsulation can be taken too far. Very few programmers would, given the classes above, produce one subsystem per class, with one class in each subsystem. This just, "Feels," wrong; it feels like over-encapsulation. There's somehow an increase in complexity, as though having, "More subsystems than necessary," (from whatever perspective offers such a comparison) is as bad as having no subsystems at all. Despite all these feelings, however, to my knowledge the benefits of encapsulation have never been, in the mathematical sense, proved. Never. There's simply no proof - no objective, timeless, unequivocal proof - that encapsulating a group of classes into subsystems actually does help manage the complexity of a system. This lack of proof had lead critics of encapsulation to declare it unscientific and un-mathematical; a design choice; an art; an opinion. Yes, there may be some good opinions out there; there may even be plenty of great software built upon these good opinions; there may even be, for those oh-so briefest of brief moments, some consensus among the functionalists and proceduralists, among the object-orientationists and the agilists, among the relationalists and test-driven-designers (etc.) about how and what to encapsulate: but still no concrete proof exists. There are just rules-of-thumb; guidelines; metrics; patterns; best practices; industry standards; industry-standard best practices; touchy-feely; wishy-washy; ungrounded, unfounded and unbounded. Perhaps the reason that encapsulation has resisted mathematical illumination for so long (1972 was thirty-five years ago) is that software is such a sloppy and peculiarly human business. How could anything as notional as encapsulation ever be proved? Well, this article will attempt to take an infinitesimal step closer to making encapsulation a science. This article will do so by trying to unearth the fundamental laws of encapsulation. It will do so by taking the following approach: it will examine a mathematically simple system (an equipoised system, which obeys such restrictions as having the same number of classes in each subsystem), and it will examine it from a single perspective - that of the potential structural complexity of a system: that is the number of dependencies in the system if all its nodes were connected. It cannot be stressed enough that this will necessarily yield a deeply theoretical result: no production systems are equipoised, and even if they were, there would be countless different, "Complexitites," that could be investigated. You may argue that another definition of complexity is better but it's hoped that you will at least find that the definition used here is not entirely without merit. Let's look at an example which will try to present the foundation for encapsultion theory in graphical form. Imagine we are given twelve classes to encapsulate any way we wish, as long as we respect one requirement: no matter how many subsystem we chose to have, the same number of classes must be in each subsystem. This doesn't, of course, happen in real software development: the subsystems we chose will be abstractions revelant to the problem we are trying to solve, and those abstractions will dictate how many subsystem we use and how many classes each contains. But suspend belief for a moment and take the requirement as given. Can as simple a decision as the number subsystems affect the complexity of the system? Well, there are many different forms of complexity, and we will examine just one, the potential structural complexity: the maximum number of dependencies permissible between classes. It is precisely this potential structural complexity that we will monitor as the number of subsystems changes. To begin with, we might chose to put all twelve classes in one subsystem, which effectively means that we don't want to use any information hiding whatsover, as all classes will be visible to all others. Figure 1 shows our system with all possible dependencies. (For simplicity, we will show a two-way dependency between classes as a single line, so in reality there are twice as many dependencies as shown.)
Figure 1: One subsystem with twelve classes: 132 dependencies With one subsystem there are a possible 132 dependencies between all classes (we'll later prove this); this means that this one subsystem has a potential structural complexity of 132. Now, let's split our system into two subsystems of six classes each. When we were dealing with just one subsystem, the problem of accessibility failed to arise. Accessibility concerns whether a class is visible to all others outside its subsystem or whether it's information-hidden and thus not visible outside its subsystem; the former class we can call, "Public," the latter, "Subsystem-private." When all classes are in just one subsystem, it doesn't matter whether they are public or subsystem-private, as they can all see one another (and thus form dependencies towards one another) irrespective of their visibility. When we are dealing with more than one subsystem, however, we must take into account accessibility: we must state which classes are public and which are subsystem-private. We will issue the futher, arbitrary directive that there must be one public class in each subsystem, all the rest will be subsystem-private (and thus no classes outside the subsystem will be able to form dependencies towards them). We'll colour the public classes green and the subsystem-private classes red. See figure 2.
Figure 2: Two subsystems with six classes in each: 72 dependencies. When our twelve classes are evenly distributed over two subsystems, the potential structural complexity drops dramatically: now there are only 72 dependencies that can possibly be formed. This makes sense, of course, as now information hiding comes into play, effectively forbidding dependencies that were previously permitted. This may seem, then, as though ever increasing encapsulation - essentially adding more and more subsystems - should further reduce the potential structural complexity. Could it be this simple? Let's take a look at evenly distributing our twelve classes over three subsystems, four classes in each. See figure 3.
Figure 3: Three subsystems with four classes in each: 60 dependencies. The potential structural complexity with three subsystems is 60. Yet again, we see that increasing the number of subsystems has decreased the number of possible dependencies in our system. It looks as though our assumption is correct: increasing encapsulation reduces potential structural complexity. There's a slight concern, however. Figure 3 for some reason looks a little busier than figure 2. Why is this? Could it be that the number of subsystem-internal dependencies are now spilling over into subsystem-external dependencies? We don't yet know, so we'll plough ahead. Let's look at the twelve classes distributed over four subsystems, three classes in each. See figure 4.
Figure 4: Four subsystems with three classes in each: 60 dependencies. This is strange: increasing the number of subsystems from three to four has had precisely no effect on the potential structural complexity, the number of possible dependencies with four subsystem is exactly the same as with three, namely 60. So much for our assumption that merely increasing encapsulation reduces potential structural complexity. So what happens when we split our twelve classes over six subsystems? See figure 5.
Figure 5: Six subsystems with two classes in each: 72 dependencies. The potential structural complexity is now 72: it's begun to rise again; and the rise is even more prominent when we chose twelve subsystems, with one class in each. See figure 6.
Figure 6: Twelve subsystems with one class in each: 132 dependencies. Now the potential structural complexity has soared once more to 132. What can this mean? Now that there are so many subsystems, have the number of subsystem-external dependencies begun to dominate the number of subsystem-interal dependencies? Does this mean that there's some number of subsystems that minimises the potential structural complexity? The answer to that question is the entire purpose of this article. And the answer is a categorical yes. This article will prove that, given certain circumstances, encapsulation reduces potential structural complexity, and that there is a point at which increasing encapsulation increases potential structural complexity. In graphical terms again, if we plot the potential structural complexity as a graph of the number of subsystems for the given example, we will see figure 7.
Figure 7: Potential structural complexity with increasing number of subsystems. This article will present not one but three laws. Laws in the real sense, not advice masquarading as laws like, "Program to an interface, not an implementation," or, "The law of Demeter," but inviolable laws like: r = n/v. (Indeed, the second law of encapsulation states that the number of subsystems that minimises the example system above is the square root of the number of classes: 3.5. This is why the potential structural complexity rises after four subsystems.) These are laws that are, for their particular circumstances, provably true for all time, everywhere. Laws that were true when the Vikings attacked Ireland in the ninth century (if only there had been software to apply them to). Laws that will be true in ten million years' time. Laws that are true now in the Andromeda galaxy, where the Chief Programmer of The Court of Xxxnnn'Thhharg The Twelfth is scratching one of his heads, pondering a dubious encapsulation strategy, as he watches his beloved Exhorbitant Lethality Weapon Delivery System impolitely blow itself to smithereens on the launch pad. This article will prove, given certain circumstances, that an unencapsulated system has a higher potential structural complexity than an encapsulated system. This article will prove, given certain circumstances, that one configuration of subsystems has a lower potential structural complexity than all others. This article will prove, given certain circumstances, that there exists a number of subsystems in an encapsulated system for which the potential structural complexity is no lower than that of the same classes unencapsulated. This article will prove, given certain circumstances, that an encapsulated system's potential structural complexity scales linearly with the number of constituent classes, whereas an unencapsulated system's potential structural complexity scales quadratically with the number of constituent classes. This article will prove, given certain circumstances, that there exists a rigorous, mathematical relationship between encapsulation, information-hiding and structural complexity. This article will prove, given certain circumstances, that encapsulation is a, "Good thing."
Ed Kirwan,
|
Copyright (C) Edmund Kirwan 2007. All rights reserved.