Interface over-segregation.

Programmers easily spot bloated interfaces, and usually carry with them an assortment of, "Knives and stabbing weapons," for just such encounters. A previous post presented an interface-efficiency equation and demonstrated an algorithm - fueled by this equation - to guide this butchery.

A trickier problem to spot, however, is when the members of a family of interfaces have been cut so small that a skillful re-combination might offer design benefits.

Put another way: if a collection of small interfaces spontaneously coalesced back into one large interface, and the programmer had to split that large interface, would the same small interfaces reappear? If they do, then those small interfaces have retained their claim to independence. If they do not, then this might suggest an over-segregating of interfaces and an unmerited allocation of behavior among those interfaces.

Let us take a look at a recently reviewed program, Apache Lucene, to see how successfully its interfaces have been segregated when considered as related collections. Here, we will assume that interfaces within the same package are, "Related."

Figure 1 shows the 6 interfaces in Lucene's org.apache.lucene.search.spans package, which contain a total of 25 methods (this analysis makes no distinction between interfaces and abstract classes).

Figure 1: The current spans package interfaces

Figure 1: Interfaces in Lucene's spans package.

We shall gather all these methods into a single interface and decompose that interface based entirely on objective interface efficiency calculations.

(Recall that if class A is a client of interface I, and I has 10 methods of which A calls 10, then I is 100% efficient with respect to A. If A only uses 3 of the methods, then I is only 30% efficient. If a second class B uses 6 of the methods, then I's efficiency is the average for both clients = (30% + 60%) / 2 = 45%.)

Figure 2 shows the resulting hypothetical reallocation of methods among the freshly segregated interfaces using the algorithm introduced in the previous post.

Figure 2: The spans package interfaces re-imagined

Figure 2: Lucene's spans package interfaces re-imagined.

The re-allocated interfaces of figure 2 have largely retained their integrity and only one has disappeared. The greatest impact is the combining of interfaces ConjunctionSpans and Spans into interface 2, indicating that clients use both interfaces together, but there seems little wrong in separating these interfaces, as they are in figure 1. These interfaces thus justify their current configuration.

If we look at another Lucene package, however, we see a different story. Package org.apache.lucene.analysis.tokenattributes contains 9 interfaces of 23 methods, see figure 3.

Figure 3: The tokenattributes package interfaces

Figure 3: Interfaces in Lucene's tokenattributes package.

If the interfaces of figure 3 are combined and then our algorithm used to split this large interface into an efficient collection, we arrive at figure 4.

Figure 4: The tokenattributes package interfaces re-imagined

Figure 4: Lucene's tokenattributes package interfaces re-imagined.

Figure 4 has reduced the collection from 9 to just 4 interfaces. Interface 1 contains largely the CharTermAttribute interface with minor additions, and interface 3 is a combination of two small interfaces. Interface 2, however, has amalgamated 4 entire interfaces into one, suggesting that - from an efficiency point of view alone - the interface collection merits further investigation.

Of course, programmers segregate interfaces for more reasons than just interface efficiency: it may be that the smaller interfaces reflect different implementations that may be combined in various forms, or that their semantic distinctness justifies the separation.

This is, furthermore, merely a static code analysis, and static analyses never answer design questions: they only pose questions. Nevertheless, the question posed here is clear: what motivates the splitting of the methods of interface 2 in the current code-base?

Summary

The Interface Segregation Principle advises not to break large interfaces into smaller ones, but to break large inefficient interfaces into smaller efficient ones. If all twenty client classes call all sixty methods of an interface (admittedly something of a rarity in modern software systems), then that interface is well designed and should not be decomposed.

Small interfaces are a pragmatic compromise, but maximally efficient large interfaces are the goal.

Over-segregating interfaces can result in interface-fragments that do more to cloud, rather than clarify, design intent.