Learn how to document your code for best results
As a programming newbie, you’re often told to comment on your code as much as possible. But it’s not long before you read an article telling you to do otherwise. Confused? This article will give you a clear picture of when to comment and when to avoid it.
The principle is simple: Let the language and the runtime guarantee that what you document about the software is true, as early as possible.
I’ll start off with an undocumented function and step it up in the order that I would usually implement things. First, the easy basics, then automated enforcement that the documentation tells the truth and, finally, the weaker forms of documentation for when automated enforcement is not available.
1. No Documentation
Let’s start with this simple function, p!, factorial, straight from your basic programming exercises. This example is in Python, but this basically applies to other programming languages as well.
def fact(p): ret = 1 if x == 0: return 1 for i in range(1, x + 1): ret *= i return ret
You might be tempted to add a comment that:
""" This is the factorial """
But comments often aren’t read, and this one is redundant once you have decent naming.
Change the function to:
The documentation is now built into the code that gets run, not layered next to it. People have to read this name if they want to use your function. This is a good reason that functions should be short and have a single purpose: It means that the function-name can be near to, and descriptive of, the code.
Any other improvements that help readers understand your code (e.g., using private members where relevant) are a decent form of documentation.
Next, rename the parameter to n, to remind users that it’s an integer.
After all, though factorial is only defined for integers, it can be extended to non-integers. The Gamma function is a sort of extended factorial.
An advanced user might wonder if that is used in our implementation, and with good reason: scipy.special.factorial, for example, does indeed give an answer for non-integer float inputs, using the Gamma interpolation of factorial. We need to clarify that’s not what we’re offering here.
3. Type Hints
The name of the parameter is a kind of comment, and you could write a comment to say that it is an int (as is done in scipy.special.factorial). But there is a better way of letting readers know that the parameter is an integer, namely, type hints.
def factorial3(n: int) -> int:...
Python development tooling will warn you of most errors. The principle: Let the computer document it for you and minimize reliance on the human brain. For example, if your implementation won’t accept the float 3.0 as if it were an int, the type hint will document it right away, something you may miss doing otherwise.
Comments can be out of date or plain wrong. For example, scipy.special.factorial (not ragging on SciPy — it’s a great library!) says in its comment “n : int” but in fact will accept floats and even return a value following the Gamma function.
Our examples are in Python, but if you’re using a compile-time typed language like Java or C++, then when you declare types, the compiler will actually block many mistakes at development-time, guaranteeing that this “documentation” is accurate as early as possible.
Still, assertions are better at catching fine-grained constraints, like the ban on negative numbers as input, unless you are using a language with a more advanced type system.
4. Assertions, Exceptions or Pre- and Post-Conditions
You can do the type hints one better. Assert that the value is an integer and assert that it is non-negative. That way, it’s not just the development tooling winking at you; the runtime will shut down mistakes as they happen.
def factorial4(n: int) -> int: assert isinstance(n, int),
is not an integer and so unsupported."
assert n >= 0,
is negative and so unsupported." ...
There are different ways to throw the error.
Since there is no point in allowing such mistakes to proceed, keep assertions enabled. But you might alternatively throw a ValueError. If you’re using a language or library that supports programming by contract, with pre- and post-conditions (a kind of structured assertion), that will do the job too.
@icontract.require(lambda n: isinstance(n, int) and n>=0,
"n must be a nonnegative integer"
) def factorial_precondition(n: int) -> int: return __iterative_factorial(n)
At this point, the computer really is checking our work. But the assertions or other exceptions have a second value: They are a form of documentation that cannot go wrong. The reader knows that what an assertion says is true.
Going back to our piñata, scipy.special.factorial, the comment says “If n < 0, the return value is 0.”
Technically, that is true of this implementation, but it is confusing for a user who knows that factorial is undefined on negative numbers. Even the factorial-extension Gamma, though defined across the real and even the complex numbers, is undefined on negative integers. Much better to throw an error and make it quite clear what is allowed and what is not.
5. Unit Testing
Unit tests resemble assertions: They check that the system does what you expect during development. As documentation, they are weaker than assertions because they are far from the code, both in the files they live in and at the time that they are run. They’re less likely to be read by other devs.
Still, if the developers are continually running the tests, as they should be, they’ll start seeing errors soon, without needing to dig into comments.
For example, our simple iterative factorial is broken! It returns the value 1 for negative values, which is certainly wrong, for factorial and even in the Gamma extension. This test will show that we expect an error but unfortunately get a return value.
def test_factorial(): with pytest.raises(Exception): factorial(-2)
You can even put miniature unit tests inside comments using doctest, a built-in Python 3 feature. This puts the test as close to the code as possible, where it can best serve as documentation (though it can also clutter the view).
def factorial_doctest(n) -> int:
""" Run this with python -m doctest -v factorial.py >>> factorial_doctest(3) 6 >>> factorial_doctest(0) 1 >>> factorial_doctest(1.5) Traceback (most recent call last): ... TypeError: 'float' object cannot be interpreted as an integer """
Accordingly, integration tests are a step further from the code and less suited to documenting it. Instead, they document system-wide behavior.
If bad things are going to happen and you have no way to prevent it in advance, you can at least log it. Back to SciPy.
If you pass a float, you get this on standard error:
DeprecationWarning: Using factorial() with floats is deprecated
Apparently they used a Gamma-based implementation, and later, realizing that only integers should be supported, they wanted to restrict input to int. Though unable to do that because of dependent applications, they at least write the warning to a place where it might get read. Devs may not read documentation, but when they run into problems, they do read the logs.
We have now reached our penultimate resort, the comment. There is a place for them, but only, in my experience, in these three cases:
a. Public APIs should be documented with inline comments, followed by generation of HTML with Doxygen for Python, Javadocs for Java, etc. You should structure the comments with tags, following the principle that when a computer can check on structure, it should do just that.
An example of the structured comments in Python:
:param n: A nonnegative integer :return: The product of all integers from 1 up to and including n; or for 0, return 1.
But where the function is not a public API, use these structured fields sparingly: Such comments are for getting across a specific message about something unusual, not for meticulously documenting all aspects of usage.
b. Surprising facts should be documented. For example, users might not know that the factorial of zero is defined as 1, since it doesn’t seem to fit into the way factorial is defined as “the product of all numbers from 1 to n.” Likewise, strange workarounds and ugly hacks should be commented in code.
c. Algorithms should be named, since the computer cannot show and guarantee what algorithm is in use, nor can the reader necessarily identify it at a glance. For example, factorial can be calculated with multiplication or approximated by various algorithms for Gamma. A user with demanding numerical applications might want to know that.
def factorial(n: int) -> int:
""" This is a simple iterative implementation of factorial. The input must be a non-negative integer. Note that the factorial of zero is defined as 1. """
8. External Documentation
The comments were the penultimate resort; external documentation is the very last. RTFM is a fine slogan, but few do read the manual, since it is so far from the code that it easily gets stale. Still, if you need an overview of functionality or tutorials, you can put it in a document that is maintained separately from the code.
But watch out: You must continually review and reread these documents if you want them to reflect what is in the ever-changing code.
Documentation should tell the truth. Your language tooling and runtime have many ways of guaranteeing reliability, so use them. Put the documentation as close to the code as possible and make the automated enforcement of the documentation’s accuracy early in the development process.
The runnable code for the various implementations is here, with unit tests: https://github.com/doitintl/commenting