For most C programmers, the printf function is something they learned about on their first day of C coding, filing it under ``mind-boggling but harmless,'' and stopped thinking about it.
One of the fairly common bugs with printf is if you specify,
for instance, a format string of "%s %s" but pass only one
character pointer argument. About half of the time, printf will
print out garbage, and the other half of the time the program will
crash. For a long time, this sort of problem has been considered
harmless, at least in a security context.
It turns out that this is not harmless at all. Consider a setuid program that takes a file name as an argument, and, if an error occurs, does something like this:
char buffer[1024];
snprintf(buffer, sizeof(buffer), "%s: %s",
filename, strerror(errno));
fprintf(stderr, buffer);
syslog(LOG_NOTICE, buffer);
Now assume a user invokes this program with a filename of "%s%s%s"
and causes it to display the error message. First, the snprintf
call will put %s%s%s: followed by an error message into
buffer. Then fprintf will treat buffer
as a format string, and try to interpret the next three words on the
stack as characters pointers, and print them. Result: garbage is being
displayed, or the program will crash. So far, this behavior still fits
the mind-boggling but harmless category.
However, if you read the printf manual page carefully, you
will notice the %n format specifier, which interprets the next argument
as an integer pointer and stores the number of characters output so far in the
location indicated by the pointer.2.6 The exact mechanics of a format string attack are, well mind-boggling,
and well beyond the scope of this book. If you're interested in details,
please refer to the collection of URLs given at the end of this chapter.
Before we discuss the consequences of format bugs, let's try to define exactly what is a format bug, and what isn't. If you pass a constant format string to say printf, and the format specifiers it contains do not match the list of arguments, that's a format bug, but usually not security relevant, and we're not going to bother with this here.2.7 The real format bugs occur when you pass something as (part of) a format string that came from anywhere outside of the control of your program, including any kind of user input (such as command line arguments, environment variables etc), and information obtained from the network (e.g. DNS hostnames). If a potential attacker is able to modify a format string, then it's a format string bug. And here's why this is so bad:
First, an attacker can overwrite arbitrary memory locations, storing
an integer or a short integer (using %n and %hn,
respectively). Unlike buffer overflows, the attacker can modify the
return address only, defeating protection mechanisms such as
Stackguard or libsafe.
In addition, this attack is not confined to stack locations. An attacker can target other, very promising locations such as instance relocation information. For instance, if a binary in ELF format2.8 wants to call a function from a shared library, it cannot jump to that address directly because there's no guarantee that the library always appear at the same address in the process' virtual memory. Instead, it uses an array of pointers to these functions. By manipulating these pointers, an attacker can redirect a totally innocent function call like exit(1) to a different, malicious function. You may find yourself asking, why modifying arcane stuff such as relocation table function pointers instead of modifying the program code itself? The reason is that on most operating system, the memory segment containing the program code is write protected, and any attempt to modify the code will simply result in a crash.
Format bugs are quite commonplace, unfortunately. They have been discovered in network services like wu-ftpd, for instance.
Fortunately, format bugs are relatively easy to find. The standard C library supports just a small number of functions that take format strings as arguments: printf, its variants, and syslog. You can find all calls to these functions simply by using grep on the source code. You can further automate the process by using a small perl script that weeds out function calls that pass string constants as format arguments, which are harmless.2.9
Fixing format bugs is usually simple: just replace calls such as
printf(msg) with printf("%s", msg), and you're safe.
However, quite often you have functions that take a format string and a variable number of arguments, and are supposed to print this e.g. to a log file. In this case, you do want printf the format string given by the caller, of course. What you can do here is to grep your source code for calls to your error logging function, and make sure you never pass user supplied data as the format string argument.