Clean Functions – Clean Code

Apollo 8 liftoff

Now let’s take a look at rules that give you clean functions.

First off, your functions should be smaller than you think. Many people recommend functions of 20 lines, probably referring to the old times when that was how much code you could see on a computer screen, but Uncle Bob recommends a maximum of 5 lines, and I whole heartily agree with him. I hear you! 5 lines sounds like too little, sounds like you won’t be able to write a functions that does anything useful.

Well you should give it a try. The trick to building useful programs with small functions is to use many of them and composing your logic with well named, easy to understand, nice, small little blocks.

In a next post will do a nice refactoring screencasts that will guide you in breaking a big bad 400 lines function in a swarm of small functions to drive home this message. For now just remember, 5 is plenty, 5 is your MAXIMUM your functions should be smaller. And no, those terrible one liners when you declare an object, instantiate it, call 20 methods and then return another object don’t count as “one line”, those count as a whole module and you should stop using them. Remember that clean code means code that is easy to understand, that guides you. We have rules like a maximum line count to help us reach a goal, being clever with the interpretation of a rule won’t help.

Functions should do one thing only, this is quite easier when your functions are small, and importantly they should work on only one level of abstraction. Mixing and matching here is terrible for the reader that first approaches a function. They will have a hard time because they are trying to figure out what a function does, but they will be bombarded by details from deeper levels that they currently don’t care about.

An example? Well, I’ve seen (and written) functions that send scheduled emails that have mixed concepts like:

  • When should the emails be sent
  • Who should receive this emails
  • What data you need to send
  • How to get the data you need to build this emails
  • Building the emails themself, down to the html tags

So not only this function was doing many different things, it was also operating on many different abstraction layers like sharepoint CAML queries, HTML syntax, string concatenation. All those things should be done by different modules of a program, not by the same monster function.

We have already talked about naming things, same rules apply when naming functions. The names should be meaningful, descriptive. Don’t be afraid of using long names, specially for local functions. Long names are better than short names if the intent of the function is better explained, and your IDE will autocomplete them, so you won’t break your fingers while typing them.

Arguments to function make the function harder to understand. They usually are necessary, but not always.

logger.Log(message); //Is way easier to understand than
logger.Log(message, Level.Info, "http://localhost", false);

If you need to pass many arguments probably you can pass some of them to the constructor of the class. Try to use the least number of arguments possible, and remember:

enum ArgumentSize
public const int Best = 0;
public const int MaxValue = 3;

Don’t pass a boolean. That screams that the function is doing multiple things! Write two functions, one for the true case and one for the false. Give them a proper name so in the future you won’t be wondering what does calling the method Log() with a false at the end actually does.

If you need to pass multiple related arguments to a function, it’s best to group them together in a new object. The classic example is this

public void DrawLine(int x1, int y1, int x2 int y2){}
public void DrawLine(Point start, Point end){}
//Which is easier to understand?
DrawLine(10, 1, 15, 2);
var start = new Point(10, 1);
var end = new Point(15, 2);
DrawLine(start, end);

This also adheres better to the rule about working on the same level of abstraction. A function that draws lines should be dealing with points, not with integers.

You should avoid hidden side effects. This happens when a function does unexpected changes to the state of the programs. It can be modifying an instance variable or worse a global. It can also be modifying the parameters it received. This should be avoided when possible and if impossible, the name should convey this side effect so you remove the “hidden” part of the equation. That way at least the developer won’t be surprised.

A kind of side effect is an output parameter. That means a parameter that is passed to a function with the intention of returning its result in it. It can be quite hard to realize that the result of a function is in one of its parameters, so it is better to avoid this situation.

A nice convention to adopt is the Command Query Separation (CQS), this states that a function should either be a Query or it should be a Command. A query returns a value and has no side effect. A command is the opposite, it has side effects and returns void. There’s a lot to say about CQS so we will do it in a future blog post. In the mean time just remember that adhering to CQS makes your functions a lot easier to understand.

This has been said before but is worth repeating don’t return error codes, use exceptions. It makes your code easier to follow and it’s harder to accidentally ignore an error, and spend days debugging it.

Author: Maurizio Pozzobon

Maurizio has 5+ years developing solutions in the insurance industry. He is passionate about doing the right thing right, so he works in a tight loop with his clients to deliver the best solution possible.