You write small methods, yes?


As programmers, most of us keep our methods small.

Indeed Martin's famed, "Clean Code," tells us, "The first rule of functions is that they should be small. The second rule of functions is that they should be smaller than that." sed s/function/method/g

But is this actually true? And how small is small?

Let's side-step trying to identify an absolute number of lines of code or size of bytecode. Instead let's try a relative size.

Take all the methods in a code-base, and order those methods in terms of size (how much bytecode is in each compiled method). Then take the top, say, 20% of those methods. Let's call these the biggest 20%.

How much code is in these methods? More specifically: what percentage of the entire code-base do these biggest 20% contain?

If all methods were of equal size, then these biggest 20% would contain 20% of the entire code-base.

But of course code is not evenly distributed over all methods. So what percentage of code should these biggest 20% contain? Think about this before you look at the table below.

If the biggest 20% contain 30% of all code, you might think that's good: the system isn't too skewed and there aren't many monster methods lurking out there. (Or perhaps all methods are monsters, but thankfully this is seldom encountered.)

If the biggest 20% contain 90% of the code, however, you might consider this a highly skewed system, with many monster methods which are sure to attract most of the code-changes and hence be the most expensive methods in the code-base. And managers really don't like expensive methods.

Perhaps 50% of the code residing in the biggest 20% would seem like a fair threshold, above which you might think a refactoring due.

Let's examine some random, open-source Java programs on GitHub and see what percentages they reveal.

Program # methods % code in biggest 20%
swagger-core 2307 78
checkstyle 7561 78
spark (core) 42743 77
santuario-java 4679 77
tomcat 26840 77
ant 11173 74
atmosphere 4164 73
zxing 1979 73
jackson 6732 71
mybatis-3 3113 71
dubbo 12093 71
dropwizard 3102 70
logstash (core) 2519 67
redisson 13475 66
junit4 1836 61
RxJava 10159 57

Table 1: How much code resides in the biggest 20% of methods

Of course, this isn't a rigorous, statstical sample of the entire Java code-sphere, but it's hardly insignificant at 154,475 methods. And it looks like we programmers don't write small methods afterall but stuff more than half the code into just one-fifth of the methods.

Does anyone know why?