Optimising Tail Recursion and Performance
Ryan Welch / November 14, 2015
4 min read • – views
Recursion is a very common paradigm in programming where you invoke a functon call on itself. Tail recursion is a special kind of recursion where the function call to itself is at the end of the function. It is significant in that it can always be redesigned as a loop which has much better performance and removes the overhead of calling the function repeatedly and growing the stack.
Compiler optimisations
Modern compilers are incredibly complex and good at their jobs, some modern compilers will optimise tail recursion for us, meaning that we can enjoy the benefits of writing recursively whilst everything is taken care for us under the hood.
To prove that tail recursion is optimised we will set up a simple experiment.Below is a simple C program that recursively calls the function count()
and adds 1 to our total num
each time until we reach 500000 where we return.
count.c
1#include <stdio.h>23int count(int num) {4if(num == 500000) {5printf("We did it!\n");6return 0;7}89num = num + 1;1011return count(num);12}1314int main() {15count(0);16return 0;17}18
The program should print out We did it
once we reach 500000. So let's compile it with gcc, and then test it.
1$ gcc recursion.c -o recursion
2$ ./recursion
3Segmentation fault (core dumped)
4
That's not what we expected, we get the very descriptive error message Segmentation fault instead.
Taking a closer look and debugging our program in gdb shows the function count()
called thousands of times, this is not what we expected. We expected the compiler to optimise our recursive function into a loop so we should not be seeing the count()
function being called more than once.
It turns out after reading the documentation that the compiler doesn't optimise our code by default, so let's try again with optimisations turned on. The flag -O3
tells gcc to use maximum optimisation and apply all of them to our code. A list of optimisations and what they do is here.
1$ gcc recursion.c -O3 -o recursion
2$ ./recursion
3We did it!
4
Success! Our program runs perfectly without any core dumps.
Analysing the optimisations
We're not done yet though! Let's take a look at how the compiler actually optimises these calls. I'm dumping the assembly code generated by gcc by using the -S
flag which tells gcc to stop just before it compiles the assembly code.
I've stripped away some of the assembly code and only left the relevant bits to make it easier to read.
1$ gcc recursion.c -S -o recursion
2
recursion_unoptimised.asm
1count:2.LFB0:3.cfi_startproc4pushq %rbp5.cfi_def_cfa_offset 166.cfi_offset 6, -167movq %rsp, %rbp8.cfi_def_cfa_register 69subq $16, %rsp10movl %edi, -4(%rbp)11cmpl $500000, -4(%rbp)12jne .L213movl $.LC0, %edi14call puts15movl $0, %eax16jmp .L317.L2:18addl $1, -4(%rbp)19movl -4(%rbp), %eax20movl %eax, %edi21call count22.L3:23leave24.cfi_def_cfa 7, 825ret26.cfi_endproc27.LFE0:28.size count, .-count29.globl main30.type main, @function31
As you can see in this unoptimised version, we have a call to count from within the count function.
The cmpl instruction computes a conditional (whether the current count is equal to 500000) and sets a flag. The instruction jne is a conditional branch that jumps to the location L2 if the flag is false, L2 then calls count.
The jmp call is an unconditional branch and terminates the program once the flag is true. This is clearly unoptimised and will add a new level to the stack each time count is called eventually leading to a stack overflow.
1$ gcc recursion.c -S -O3 -o recursion
2
recursion_optimised.asm
1count:2.LFB24:3.cfi_startproc4subq $8, %rsp5.cfi_def_cfa_offset 166movl $.LC0, %edi7call puts8xorl %eax, %eax9addq $8, %rsp10.cfi_def_cfa_offset 811ret12.cfi_endproc13.LFE24:14.size count, .-count15.section .text.startup,"ax",@progbits16.p2align 4,,1517.globl main18.type main, @function19
The compiler does a much better job with the optimisations on, there are no branches as in the previous code and there are no calls to the count
function from within itself. So it has successfully optimised the code to exclude tail recursion.
Closing thoughts
Whilst gcc does a good job at optimising it does not apply optimisations by default. Languages such as Java do not optimise tail recursion at all. Therefore where recusrion is critical I strongly recommend you to not rely on the compilers optimisations. Having said that, there's plenty of cases where using recursion makes your life much easier.