How to document source code responsibly
[Edit March 2021]: Hey! It’s 2021, and I discovered an awesome tool that lets you efficiently communicate decisions / context about your code without polluting it. Check out https://www.usecodex.com/.
The story begins with me, frustrated after a code review comment left by my teammate:
What this class does? Please document its purpose and how is it used within <whatever> process?
This comment triggered a chain of long replies, involving attaching links, citing widely accepted best practices, mentioning the company’s leadership principles, and the legendary “Clean Code” by Robert C. Martin. It turned intense and philosophical.
I have recently switched from a sweet start-up to an enterprise full-blown corporate. While still being in observation mode, I have found it funny to actually see most of the Seven inefficient coding habits by Kevlin Henney (comments mentioned explicitly haha).
So I have decided to use the power of my frustration to express my point of view here 😃
Basics
The commenting standards are given to an interpretation (like many software related matters). Let’s agree (well, I suggest you to agree) to have an invariant basis for the reasoning about the topic.
- Code should be written for humans
- Commenting is an additional tool that a developer can choose to use or not
- Comments are part of code
I believe most people would immediately agree with the first item, while others need deeper dive.
Commenting is just another development tool
For the second argument, consider this naive example I made up:
It is not a good code, you’d invest few seconds to try and understand what it does.
Now, we can make it easier for the reader by applying tools like:
- Better naming
- Structuring + naming better
- Commenting
It’s obvious that commenting is least preferred option. (I would like to mention again, that we are considering the code in terms of being clear). You read the comment and get better idea what’s going on in the code.
It does the job, may be not good, but now it’s definitely easier to understand. Let’s try and apply another tool — renaming things
By using proper naming, we can better understand now what the code does.
And, finally, lets restructure and rename:
We see that by applying different tools we provide more clarity, with different degree of success.
The point is that commenting is indeed only one of the tools available for making code more understandable. Sometime it is more successful, sometimes it is not.
When you’d prefer to use comments rather than other tool?
Comments shine mostly when:
- the reader needs context, that is not possible to express using the existing entities
- the code has to make special assumptions about its environment, if that’s the case — the “special” part of environment should be normalized and the comment removed
- “discusses difficult or subtle algorithms and data structures” (source)
- other tools cannot be applied (due to limitations of size / time), if that’s the case — comments are obviously temporary
- heavy performance optimization that involve ugliness, well ok
Comments are part of source code, and should be treated alike
A comment is bound tightly to code it relates to. When code changes, the comment needs to be changed or removed as well. Imagine how confusing would be having a hanging comment that is not related to code!
While technically it is not code (they are not important for a machine), we agree that the main purpose of a code is to be clear to a human, so we’d treat the comments the same way as, for exampling, naming variables. (And that can also be challenging).
The cost of commenting
Having agreed on the three basic arguments, let’s consider what is the cost of using comments:
- Comments need maintenance, when refactoring code you have to refactor comments as well
- Good comments are hard to write, good comments should be precise and relevant. Precision requires stability and well-defined requirements / environment that doesn’t change. Lack of changes is a luxury in modern software projects.
- Temptation / culture that promotes using comments prevents developers from writing cleaner, explanatory code.
- Comments hide design issues — the design of a system cannot be expressed effectively with the entities and abstractions that are clear and understandable
While there is place to have comments here and there to mitigate temporary issues, most commonly commenting is evidence of inability to use other tools (renaming, decoupling, incapsulate) to express intentions of the code written.
Documentation
There’re 2 main consumers for a written code: maintainers and external users.
Maintainers are mainly concerned with the next questions:
- What the code does
- How the code does it
External users are interested to know
- How to use the code
Documentation For External Users
Traditionally, in-code documentation describes how to use it. That’s why it is common to document the API provided by a module / package. What makes having documentation in code so convenient is:
- Proximity to the code, i.e. portability
- Ability of tools / IDE to populate some part of documentation automatically
- IDEs ability to parse documentation and show inline hints
Now, when working on a project consider whether you have any external user? Users that would use your code and need to understand how to use it properly.
So go ahead and write a good documentation, describe the arguments and their type and assumptions; enjoy the automated tools that easily can create nice HTML and PDF.
Pay attention, though, that often the purpose of adding a formatted comment is not to explain how to use the code, but to include the method in listing of exposed API, i.e. the code is self explanatory and the documentation is only added for integrity.
Consider this example:
CurrencyConverter.convertUsdToEur(15.0);
Your IDE will hint you what kind type of arguments is required. It is very clear what this function does because of the naming.
You can end up writing documentation, that doesn’t really help understand anything about the code to an external user.
public class CurrencyConverter {
/**
* Convert USD to EUR.
* @param usdAmount amount in USD
* @return amount in EUR
*/
public static Double convertUsdToEur(final Double usdAmount) {
return usdAmount * RATE;
}
}
May be you’ll want to have it because the documentation is dumped and nicely displayed on a web page / PDF document and if you won’t add the formatted comment, the function will just “disappear” from listing, but its added value in explaining the code is zero.
Documentation For Maintainers
Okay, we have a project and there’s a team of developers that work on it.
You want new / other developers to effectively engage and start development — they need to understand the overall design of the system, the terms and abstractions used, they need to understand what the code does and how.
So most chances, your project has:
- Design documentation
- API spec
- Readme
- Contribution guide
- FAQ
- Wiki
You will want to keep all of these up-to-date — this is your project’s documentation.
Not the code.
The information you would want to keep documented inline in code, but not in one of the mentioned items is very limited. Otherwise it will be duplication of information, which is, by obvious reasons is bad.
You may want to keep the relevant information close to the source code for the sake of convenience, but keep in mind that keeping it up-to-date will require and effort. Moreover, having non-updated documentation might be harmful.
You should document responsibly.
Summary
This kind of extreme commenting and documenting coding standard, and it can be summarized as:
In long term, good code should have nothing but code and external API documentation.
No over-documenting; the code should be self-explanatory. The project should have a good documentation and guides, but it should be minimal in the source code.
Of course, the reality is hard. And no idealistic approach can really be achieved in practice. Sometimes extreme opinions are intentionally manipulative.
I hope that post will help to convince other developers to eliminate the ineffectiveness of over-documenting source code and would allow them to focus on coding aspects that really matter.
I couldn’t find any research that measures time waste caused by teams that practice over-documenting, but from personal experience — it is significant and, for me personally, frustrating.
P.S.
A week after writing this post I have bumped into this funny tweet by @codinghorror. It perfectly illustrates the idea 😂
References
An excellent Putting comments in code: the good, the bad, and the ugly by Bill Sourour
ITT 2016 Seven Ineffective Coding Habits of Many Programmers by Kevlin Henney
A Survey of Improving Computer Program Readability to Aid Modification