Optimising Tail Recursion and Performance

Ryan Welch / November 14, 2015

4 min read– views

Recursion is a very common paradigm in programming where you invoke a functon call on itself. Tail recursion is a special kind of recursion where the function call to itself is at the end of the function. It is significant in that it can always be redesigned as a loop which has much better performance and removes the overhead of calling the function repeatedly and growing the stack.

Compiler optimisations

Modern compilers are incredibly complex and good at their jobs, some modern compilers will optimise tail recursion for us, meaning that we can enjoy the benefits of writing recursively whilst everything is taken care for us under the hood.

To prove that tail recursion is optimised we will set up a simple experiment.Below is a simple C program that recursively calls the function count() and adds 1 to our total num each time until we reach 500000 where we return.

count.c
1
#include <stdio.h>
2
3
int count(int num) {
4
if(num == 500000) {
5
printf("We did it!\n");
6
return 0;
7
}
8
9
num = num + 1;
10
11
return count(num);
12
}
13
14
int main() {
15
count(0);
16
return 0;
17
}
18

The program should print out We did it once we reach 500000. So let's compile it with gcc, and then test it.

1
$ gcc recursion.c -o recursion
2
$ ./recursion
3
Segmentation fault (core dumped)
4

That's not what we expected, we get the very descriptive error message Segmentation fault instead.

Taking a closer look and debugging our program in gdb shows the function count() called thousands of times, this is not what we expected. We expected the compiler to optimise our recursive function into a loop so we should not be seeing the count() function being called more than once.

It turns out after reading the documentation that the compiler doesn't optimise our code by default, so let's try again with optimisations turned on. The flag -O3 tells gcc to use maximum optimisation and apply all of them to our code. A list of optimisations and what they do is here.

1
$ gcc recursion.c -O3 -o recursion
2
$ ./recursion
3
We did it!
4

Success! Our program runs perfectly without any core dumps.

Analysing the optimisations

We're not done yet though! Let's take a look at how the compiler actually optimises these calls. I'm dumping the assembly code generated by gcc by using the -S flag which tells gcc to stop just before it compiles the assembly code.

I've stripped away some of the assembly code and only left the relevant bits to make it easier to read.

1
$ gcc recursion.c -S -o recursion
2
recursion_unoptimised.asm
1
count:
2
.LFB0:
3
.cfi_startproc
4
pushq %rbp
5
.cfi_def_cfa_offset 16
6
.cfi_offset 6, -16
7
movq %rsp, %rbp
8
.cfi_def_cfa_register 6
9
subq $16, %rsp
10
movl %edi, -4(%rbp)
11
cmpl $500000, -4(%rbp)
12
jne .L2
13
movl $.LC0, %edi
14
call puts
15
movl $0, %eax
16
jmp .L3
17
.L2:
18
addl $1, -4(%rbp)
19
movl -4(%rbp), %eax
20
movl %eax, %edi
21
call count
22
.L3:
23
leave
24
.cfi_def_cfa 7, 8
25
ret
26
.cfi_endproc
27
.LFE0:
28
.size count, .-count
29
.globl main
30
.type main, @function
31

As you can see in this unoptimised version, we have a call to count from within the count function.

The cmpl instruction computes a conditional (whether the current count is equal to 500000) and sets a flag. The instruction jne is a conditional branch that jumps to the location L2 if the flag is false, L2 then calls count.

The jmp call is an unconditional branch and terminates the program once the flag is true. This is clearly unoptimised and will add a new level to the stack each time count is called eventually leading to a stack overflow.

1
$ gcc recursion.c -S -O3 -o recursion
2
recursion_optimised.asm
1
count:
2
.LFB24:
3
.cfi_startproc
4
subq $8, %rsp
5
.cfi_def_cfa_offset 16
6
movl $.LC0, %edi
7
call puts
8
xorl %eax, %eax
9
addq $8, %rsp
10
.cfi_def_cfa_offset 8
11
ret
12
.cfi_endproc
13
.LFE24:
14
.size count, .-count
15
.section .text.startup,"ax",@progbits
16
.p2align 4,,15
17
.globl main
18
.type main, @function
19

The compiler does a much better job with the optimisations on, there are no branches as in the previous code and there are no calls to the count function from within itself. So it has successfully optimised the code to exclude tail recursion.

Closing thoughts

Whilst gcc does a good job at optimising it does not apply optimisations by default. Languages such as Java do not optimise tail recursion at all. Therefore where recusrion is critical I strongly recommend you to not rely on the compilers optimisations. Having said that, there's plenty of cases where using recursion makes your life much easier.

© 2023 Ryan Welch. All rights reserved.