Thank you, Mr Pearson: notes.

1. The eight programs analysed were:

Only core jar files were analysed where programs were very large.

The analysis requires that successive release use the same Java compiler version, as whether a method had changed is identified by whether the its bytecode changed. Hence Hadoop releases are split in two around a compiler update.

All data available on request.

2. As usual, the caveat must be made that the only way to ensure that a method has been updated between project releases is to manually check all methods, and this was NOT done in this case.

Instead, the project releases were sieved into a code analyser which checked automatically for changes by comparing the before and after bytecode of each method. This will not catch all method changes, and could conceivably flag unchanged methods as changed. Hence the results of this experiment are not definitive.

3. The properties investigated were:

  1. Size - size (in bytecode) of the given method.
  2. Dependencies from - number of dependencies leading from an method.
  3. Conditional count - number of conditionals in the bytecode, which roughly corresponds to conditionals in the source code, for example if-statements and loop boundary checks.
  4. Impact set - size of the impact set of a given method, that is, the complete set of all other methods that the given method depends on, either directly or transitively.
  5. Middle-man - shows whether an method is a middle-man, that is, it shows an method that could potentially be removed and instead have the parent do all the work that it did.
  6. Complectation - number of complected methods of multiple transitive dependencies. If a transitive dependency has multiple dependencies on another transitive dependency then it may be possible to access one method of the target transitive dependency through multiple paths. This sometimes (but not at all always) suggests an unnecessary duplication of method invocation, artificially raising the impact set of the system and thus exposing the system to uncessary potential ripple effects.
  7. Transitive dependencies - number of all the transitive dependencies involving an method.
  8. Potential coupling - the absolute potential coupling of this method. For example, the absolute potential coupling of function A is the number of other functions that A could depend on, i.e., that are accessible from A.
  9. Amplification - the amplification generated by this method. This is essentially a measure of how the number of transitive dependencies in which this method is involved is increased as a combinatorial effect of dependencies on and from this method.
  10. Circular dependencies - number of circular dependencies between methods. For example, if function a() calls function b() and function b() calls function c() and function c() calls function a(), then a(), b() and c() form a circular depdendency
  11. Dependencies on - number of dependencies on an method.
  12. Impacted set - This analysis shows the impacted set of a given method, that is, the complete set of all other methods that depend on the given method, either directly or transitively.)
  13. Duplication - number of common sequences of method invocations greater than a minimum value (currently 2). For example, if function a() calls functions x(), y() and z(), and function b() calls functions x(), y() and z(), then both a() and b() will be shown as sharing common calling sequences x(), y() and z().

4. We would like our properties to be independent of one another, but it's likely that many might be dependent on method size: basically, larger methods are more likely to contain more dependencies, more cyclomatic complextity, etc., as is possibly suggested in the attempt, below, to find One Metric To Rule Them All by combining existing ones, yet producing results no better than the uncombined versions. We will examine this in a later post.

Property combination Change coefficient
Size 0.43
Dependencies from 0.45
Condition count 0.35
Impact set 0.35
Size + Impact set 0.45
Size + Dependencies from 0.43
Size + Dependencies from + Condition count + Impact set 0.45

5. There are two separate metrics: impacted set and impacted set, see figure 1 below.

Impacted set is the set of all methods that depend directly or indirectly on a method. Impact set is the set of all methods that a given method directly or indirectly depends on. The impact set seems to correlate with method updates whereas the impact does not. Who knew?

Figure: Impacted set versus impact set

Figure 1: Left: all methods. Middle: Impacted set of a selected method. Right: Impact set of that same selected method.