This article is about high-level-optimization, i.e. I will explain how I usually approach optimizing a program without going into the gory details. There are a million books and web pages about low-level-optimizations and the tricks involved out there, therefore I will not dive into that (could not possibly hope to cover this in a single blog-entry anyways).
Many of the points made here may sound like a matter of course, yet I have not obeyed them myself enough times in the past to know, that sometimes the temptation to just start changing stuff around in my programs is bigger than my rationality. All of the points presented here can be found in any reasonably good book about optimization, but I know enough programmers that have never read any of them (to be honest: I have not read too many of them either). A little repetition of some obvious points will therefore not hurt.
Lets start by assuming I have a program that I think should run faster / consume less resources or become smaller, and I want to optimize the hell out of it. My first advice to myself is usually:
Step away from the keyboard, Michael, and ask yourself, if an optimization step is really necessary? Are you solving a definite problem? Or do you just want to do it, because you can and want to prove yourself, what a hotshot programmer you are?
Optimization does not come without costs, as it is usually harder to maintain an optimized program than an unoptimized one, as many optimizations sacrifice maintainability for e.g. speed. I also need to spend some time doing the optimizations (usually more than planned – quite often way more than planned), and my time is usually not free as well. These are the costs to be payed, and I should only do so for a reason and not just because I can.
If I have convinced myself, that my program really needs optimizations, it’s now time to start measuring (despite my inner voice usually shouting something along the lines of: come on, Michael, just get started already!). Depending on what I want to optimize for, I might measure the memory footprint or the size of the program. I will concentrate on the most usual case where I want to make my program run faster – in this case I will probably measure wall clock time. Not everything I try will actually benefit the total execution time of the program (e.g. because compilers know a lot of tricks these days and some of the changes I will try might cause one of these tricks to stop working – thereby slowing down the program). Therefore it is necessary to have a solid starting point to compare with.
It is also important to have a way of verifying the correctness of my program. I will not go into details here, but having tests will not hurt, as many optimizations will alter the semantics of the program. Having a way to quickly revert changes to the program (version control) will also prove beneficial, but that is a topic for a different article alltogether.
Having done that, I am now ready for the real thing (TM). But where to start? Staying with our example of optimizing for speed (which will persist to the end of this article), I usually start by unleashing the power of the compiler on the program. Playing around with the compilers optimization options usually does not take long and does not hurt the maintainability of the program at all. Therefore I usually try it first. I have also successfully played around with profile guided optimization recently. In most cases, this will yield only a couple percent speed increase, but I have witnessed greater ones as well and it is an easy thing to try.
After each optimization step, I need to make sure three things are taken care of:
- I need to verify the correctness of the optimized program. No one needs a lightning-fast program producing the wrong results. And yes, some compiler optimizations will change the semantics of the program and may therefore produce wrong results.
- If the program is still correct, I need to measure its performance (in our example probably wall clock time) and compare it to my previous performance measurement. And yes, some compiler optimizations may make the program slower.
- If the speed increased, I need to ask myself, if the programs performance now is sufficient. If yes, stop optimizing (even if it is hard, because you know of so many other things to try). If no, try the next technique described below.
Now that the compilers had their fun with the program, it is almost time to get my hands dirty. But before I can start making changes all over the place, I need to realize there is one critical piece of information I am still missing: Where is my program spending all of its time? I used to think I know that just by looking at the program. I am almost always wrong. Most of the people I know are as well. So before I spend my valuable time optimizing code that only contributes little to the total execution time, it is now time to utilize another tool out of my programmers toolbox: my trusted profiler (you have one in there, too, don’t you?). It will tell me exactly where the so called hotspots of my programs are and where my optimizations will really make a difference.
Most profilers work on a per-function-basis. Therefore, I can already hear some people complaining:
Of course my code spends most of its time in ‘big_function_foo()’, all this profiling did not help me at all!
If you are one of these people complaining, I suggest thinking about the size of your functions a little more. There are lots of rules of thumb among programmers regarding the maximum size of a function, e.g. not bigger than a screen or the minimum functionality that makes sense. If you write your functions accordingly, you will have nice and meaningful profiler output as a side effect. I wish I always obeyed these rules!
Of course, there are also profilers that work on a per-line basis. I know from experience, not to take the line numbers it spits out too seriously (because of program optimizations done by the compiler), but the information presented by these tools will allow me to at least isolate the regions of code that need optimizing, even if my functions are bigger than suggested in the above paragraph (of course this never happens. Never. Really! 🙂 ).
Now that I know, where to start optimizing, there is one last piece of high-level-advice I want to share. Before starting to unroll loops or trying out any of the fancy low-level-techniques presented in the books about optimization, I once again take a step back and start thinking about algorithms and data structures for a while. What complexity do the algorithms at my hotspots have? Is there maybe an alternative algorithm with less complexity? Or is it possible to change my data structures to avoid repeating calculations over and over again, therefore trading memory space for speed? In my experience, these and similar considerations have more potential to speed up programs than anything else. Only after this potential has been thoroughly evaluated, low-level-optimizations should come into play.
Let’s go through the three checks I mentioned earlier once again (a little repetition wont hurt 😉 ):
- Is my program still correct?
- Is it faster than before?
- Is my program fast enough now for me to stop the optimization process?
As mentioned above, these should be evaluated after every optimization step, including the ones sketched in the last paragraph.
This almost brings me to the end of this article. All other techniques for optimization I would describe as having a lower level (but if you disagree with me on this point, feel free to leave a comment).
There is one last technique, I would like to mention here: I could wait for six month and then buy a faster computer. This has been a very important and successfull optimization in the past, as usually the processing speeds have increased a lot during that time. But as I already mentioned in another article, these times may be over. Herb Sutter describes it best in his article titled The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software (this is the last time I will link to this article, I promise 🙂 ). Unless your software has been parallelized already, you may not see any speed increases at all during the next few month (years?), except possibly due to increased cache sizes. Which of course brings me back to my favorite topic of parallelization, which in itself is kind of like an optimization technique. Unfortunately, not one I would consider high-level. But this is a topic for another post…