
Articles...
Copyright © 2009 Corvus International Inc. All Rights Reserved
Beware of counting LOC, my son!
The code you write, online or batch
Beware the public word and shun
The voluminous
standard patch!
...with apologies to Lewis Carroll
Counting Lines of Code (LOC)? You kidding me? In this day and age, to be counting up command lines of a 3GL to size a system seems old fashioned, if not plain weird. Lines of Code are soooo 1980s COBOL...
But how would we otherwise size a system? If we use Function Points (FP) in one of their many guises, we often resort to "backfiring" into LOC. Ditto with other measures. But why this obsession, first with "sizing" a system and secondly with sizing it in LOC?
That's What the Tools Use
The primary purpose of sizing in LOC is to be able to use one of the available estimation tools. All major tools: QSM's SLIM-Estimate®, Galorath's SEER-SEM®, CostXPert®, COCOMO (I or II, your choice), Softstar Systems' Costar®, SPR's KnowledgePLAN®, or pretty any other I've ever used or heard* of drive the estimate primarily from the size of the delivered system. And guess what unit the size used by all these tools is measured in?Right. Lines of Code
The Basic Equation
The basic equation for converting a "size" to an effort is:EFFORT = a * SIZEb
Where a and b are "constants" (not really, but I kept it simple for this article). Typical values for a and b are:
a = 2.94
b = 1.1most of the tools are also able to convert the effort into a putative schedule. A typical formula for this is:
SCHEDULE = c * EFFORTd
where c and d are also "constants" that have typical values of:
c = 3.0
d = 0.33Ideally, we would calculate the empirical values of these constants from historical project data.
Why LOC?
Why is LOC the base unit for all of these tools? The answer is simple:
Historical--this is what people "used to" count.
It used to be meaningful--back in the days when there were only a few simple languages, primitive operating systems, no packages and little reuse, whatever knowledge was in the final system, was knowledge you put there. Since acquiring this knowledge is the task, a count of LOC was a pretty good indicator of the effort to acquire this knowledge.
We gotta use something--if we want to use scope or size (of the product) based estimation, we have to use some unit.
We really want to measure knowledge--this is the thing we really want to measure. The problem is we can't. There is no unit for knowledge, no definition for knowledge, and no way of quantifying it.
We really want to measure knowledge we don't have--even if we could measure knowledge and there was a unit for it, the determining factor in the effort and schedule on a project is the knowledge we haven't (yet) obtained.
LOC is as good any any--despite its manifest disadvantages, it turns out that no other metric is any particular improvement over LOC.
We Actually Measure the Substrate
All of our system size metrics do not measure the system "size" they measure the amount of space the system knowledge would take up in one particular form. They measure the "substrate" on which the system knowledge is placed. Then we make the assumption that the amount of space the system knowledge takes up is proportional to the amount of knowledge in the system (which is proportional to the amount of knowledge we have to get, which is what takes the effort and time).
LOC actually measures how much paper would see if we factored the knowledge into a 3GL pre-compile source form and printed it out
Function Points measure how much space the knowledge would take up if we factored it into file, screen, and report layouts and then printed it out
Use Case counts measure how much room the knowledge of the system would occupy if we transcribed it into a Use Case format and printed it out.
Etc.
But it Doesn't Matter That Much Anyway...
In a later article, I will show why your choice of system sizing metric is not nearly as important you might think it should be.
* Not all "estimation tools" use LOC. There are some tools and approaches that go straight to an effort, without calculating the effort-schedule relationship. This is a tad dangerous, since without an associated schedule, we are tempted to simply divide the effort by a larger number of people to reduce the schedule. Unfortunately, we cannot build a 10,000 Staff Hour system in one hour using 10,000 people. As Fred Brooks observed: the bearing of a child is a nine-woman month job, but we cannot do it in one month using nine women