2. Complexity Computation

The principal goal Fundamentally, this program counts lines of non-comment source lines, multiplies by a “nesting factor” for each level of logic nesting and divides by a scaling factor so that the typical results lie roughly in the same range as pmccabe results. That happens to be approximately 20.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

2.1 Parsing Method

The method chosen for parsing the source has an effect on what gets seen (scored) by the program.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

2.1.1 Complexity Measurement Parsing

This program examines the actual source a human looks at when the file is opened, provided it is not pre-processed by unifdef, See section unifdef. This was chosen because uncompiled code adds to the complexity of what a human must understand. However, sometimes the source will contain unbalanced braces a la:

#if FOO
  for (int ix = foo;;) {
#else
  for (int ix = bar;;) {
#endif
    code...
  }

rendering code that cannot be parsed correctly. unifdef-ing makes it parsable. Unfortunately, because the practice of ifdef-ing unbalanced curly braces is so common, this program cannot rely on finding the correct closing brace.

CAVEAT: for the purposes of this program, procedures end when either a matching closing brace is found or a closing curly brace is found in column 1, whichever comes first. If the closing brace in column one does not match the procedure opening brace, the procedure is considered unscorable.

Fortunately, unscorable procedures are relatively unusual.

CAVEAT2: K&R procedure headers are not recognized. If anything other than an opening curly brace appears after the parameter list will cause the code recognizer to go back into “look for a procedure header” mode. K&R procedures are not just not scored, they are completely ignored.

This should probably get fixed, though.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

2.1.2 Post-PreProcessing Parsing

Another approach would be to use the C compiler and analize the tokens coming out of the preprocessor. The drawbacks are that macro expansions will add to the complexity, even though they do not add to human perceived complexity, and uncompiled code do not add to the complexity measure. The benefit, of course, is that you know for certain where a procedure body starts and ends.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

2.1.3 During PreProcessing Parsing

This would require going into the C preprocessor code and cause macros to not be expanded. Again, the great benefit is that you know for certain you can find the starting and ending braces for every procedure body. The downsides are the extra work and, again, the uncompiled code won’t get counted in the complexity measure.

This might be a useful exercise to do some day, just to see how helpful it might be. Being able to recognize all procedure bodies without fail would be a good thing.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

2.1.4 `pmccabe` Parsing

The pmccabe parsing actually inspired the method for this program. Thd difference is that pmccabe will always keep scanning until a procedure body’s closing curly brace is found, even if that means counting the code from several following procedure definitions. The consequence of this is that this program’s code will see some procedures that pmccabe will not, and vice versa.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

2.2 Complexity Measurement Algorithm

Fundamentally, this program counts non-comment source lines and examines elements of parenthesized expressions. This score is multiplied by a nesting scoring factor for each layer of code nesting.

A parenthesized expression is scanned for operators. If they are all arithmetic operators, or all arithmetic and one relational operator, the score is zero. If all the operators are boolean ands or they are all ors, then the score is one. An assignment operator with arithmetic operators also scores one. If you mix relational operators and all ands or all ors, the score is the number of boolean elements. If you mix ands and ors at the same parenthetical level, the two counts are multiplied, unless the boolean element count is higher.

Fundamentally, do not use multiple relational or boolean operators at the same parenthetical level, unless they are all boolean ands or they are all boolean ors. If you use boolean operators and relational operators in one expression, you are charged one statement for each boolean element.

After scoring each statement and any parenthesized expressions, the score is multiplied by any encompassing controlled block and added to the score of that block. A “controlled block” is a curly-braced collection of statements controlled by one of the statement controlling statements do, for, else, if, switch, or while. Stand alone blocks for scoping local variables do not trigger the multiplier.

You may trace the scores of parenthesized expressions and code blocks (see section trace output file). You will see the raw score of the code block or expression.

The final score is the outermost score divided by the “scaling factor”, See section complexity scaling factor.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

2.3 Complexity Scores

The “Complexity Scores” table shows the score of each procedure identified that also exceeded the threshold score, See section —threshold. The entries on each line are:

The computed score
The number of lines between the opening and closing curly braces
The number of non-comment, non-blank lines found there
The name of the source file
The line number of the opening curly brace
The name of the procedure

The output is sorted by the score and then the number of non-comment lines. Procedures with scores below the threshold are not displayed.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

2.4 Complexity Statistics

The statistics are displayed both as a table and as a histogram, See section Example Output. It is under the control of the —histogram option. The statistics are for each non-comment source line and each source line is given the score of its encompassing procedure. This way, larger procedures are given proportionally more weight than one line procedures.

The histogram is broken up into three ranges. Scores of 0 through 99 are displayed in 10 point groupings, 100 through 999 in 100 point groupings and 1000 and above (good grief!!, but they exist) are in 1000 point groupings. The number of asterisks represent the number of lines of code that are in procedures that score in the specified range.

The tabular statistics are also based on lines, not procedures.

‘Average line score’: This is the procedure score times the non-comment line count, all added up and divided by the total non-comment source lines found.
‘25%-ile score’
‘50%-ile score’
‘75%-ile score’
‘Highest score’: Since the distribution of scores is nothing like a bell curve, the mean and standard deviation do not give a very clear picture of the distribution of the scores. Typically, the standard deviation is larger than the average score. So, instead the program prints the the four quartile scores. The score for which 25, 50, and 75 percent of code is scored less than, plus the highest scoring procedure (100 percent of code scores less than or equal to that score).
‘Unscored procedures’: If any procedures were found that could not be scored, the number of such procedures is printed.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

2.5 Scoring Adjustments

Scores can be adjusted with three different options:

‘nesting-penalty’: See section —nesting-penalty.
‘demi-nesting-penalty’: See section —demi-nesting-penalty.
‘scale’: See section —scale.

The raw score is the number of lines or statements, whichever is greater, adjusted by a factor for the depth of the logic. Statements are nested when they are inside of a block of statements for a “block” statement (viz., “do”, “for”, “if”, “switch” or “while”). Statements within blocks used to constrain the scope of variables (not controlled by a block statement) are not multiplied by this factor.

Expressions are nested when contained within parentheses. The cost of these is different. Block level nesting multiplies the score for the block by the --nesting-penalty factor (2.0 by default). Nested expressions are multiplied by the --demi-nesting-penalty, the square root of --nesting-penalty by default.

Some attempt is made to judge the complexity of an expression. A complicated expression is one that contains an assignment operator, more than one relation operator, or a mixture of “and” and “or” operators with any other different kind of non-arithmetic operator. Expression scores are minimized by:

Doing assignments outside of boolean expressions, or at least parenthesizing them.
Parenthesizing each relationship operation in an expression of multiple “and” and/or “or” operations. Yes, precedence parses them correctly, but it is less clear.
Parenthesizing groups of “and” and “or” operations so that operators of only one type appear at one level. For example, the first expression below instead of the second. Yes, precedence means the effect is the same, but we’re after code clarity so that correctness is more obvious.
1: ((a && b) || (c && d)) 2: (a && b || c && d)
The first adds 2 to the raw score (before dividing by the scaling factor). The latter will add 5, assuming a demi-nesting-penalty of 1.41.

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by Bruce Korb on May 15, 2011 using texi2html 1.82.

2. Complexity Computation

2.1 Parsing Method

2.1.1 Complexity Measurement Parsing

2.1.2 Post-PreProcessing Parsing

2.1.3 During PreProcessing Parsing

2.1.4 pmccabe Parsing

2.2 Complexity Measurement Algorithm

2.3 Complexity Scores

2.4 Complexity Statistics

2.5 Scoring Adjustments

2.1.4 `pmccabe` Parsing