How to make C memory safe

Making strcpy() safe without forking code

Oct 08, 2023

Abstract

There is a simple solution to make C memory safe (or at least, safer), without forking existing code. It just requires extending FORTIFY_SOURCE with simple macros that communicate memory-bounds information to the compiler.

Introduction

One of the most famous issues in cybersecurity is that the C programming language is not memory-safe. There are many solutions proposed for this problem but they all suffer from requiring forking. If a project is converted to another programming language like D, Zig, or Rust, it will no longer compile on older systems. The same thing happens with C language extensions, such as those proposed by Microsoft.

Open-source typically supports older systems, often 20 years old. Forking is not an option. Thus, the world’s system code continues to remain unsafe.

A good example is OpenSSL. Microsoft used it as an example to demonstrate memory-safe extensions to C. This created a fork, the existing project used by everybody and a Microsoft-only project used by nobody. Since a core principle of OpenSSL requires working on 20 year old computers, it’s never going to adopt Microsoft’s extensions.

The solution, of course, is to extend C with memory-safety features that don’t fork the code.

Macros

The only way to do this is to wrap things in macros. An example would be something like:

void foobar(char *buf, size_t length CHK_COUNT(buf));

This alone would be a massive upgrade to the world’s system code. Most APIs already have size information in their interfaces. Today’s compilers (clang, gcc) already have the ability to automatically do buffer length checks when it knows the size (using FORTIFY_SOURCE). All that’s needed is a way to tell the compiler that such information exists.

The above macro alone would solve about 50% of the memory-safety issues in C.

But of course, we want a whole family of macros. For example, sometimes we know an array count, sometimes the size of the buffer in bytes, and sometimes an end-pointer. Thus, we should have CHK_COUNT(), CHK_SIZE(), and CHK_ENDPTR() that are all logically the same.

Another need is to track memory allocations. The gcc/clang compilers already have rough ways of doing this. They can easily be extended by this system to do it more completely, using a variety of features.

Adding these macros to the code changes it, but does not fork it. It’ll still work everywhere the old code did. It no more forks code than any other change.

Alternate APIs

Much of the world’s APIs need no change, as they already include buffer size information. This is ideal, just adding macros to existing function/structure definitions, but leaving execution fundamentally unchanged.

But in other cases, we’ll need to upgrade APIs. This presents difficulties.

The FORTIFY_SOURCE feature shows us how to fix the problem. When it knows the size of a buffer, it’ll replace unsafe functions with those where the size information is used. For example, when it knows the buffer size, it’ll automatically change strcpy() to the following function:

char * __strcpy_chk(char * dest, const char * src, size_t destlen);

The problem is that FORTIFY_SOURCE only supports a few inbuilt functions that do this. What we want to do is extend this with any function.

This proposal suggests something like the following:

char * FOO_strcpy(char * dest, const char * src, size_t destlen
  CHK_SIZE(dest)) CHK_REPLACES(strcpy);

The CHK_REPLACES() macro says that when it sees a call to strcpy(), and it knows the buffer size info, that it silently replaces the unsafe call with the safer call. There could be multiple replacement functions, depending upon the buffer size information known, so it picks the best one.

This allows us to change the functions at the leafs of our code, which is usually trivial, without having to go back and make substantive changes throughout the rest of the code. This greatly reduces the potential for introducing bugs into a code base. Old code in a project can continue to use the implicit version of the functions while new code can be written using the explicit versions.

Forking Libraries

Forking can be a problem for binaries (executables, libraries) as well as source.

The problem we have to consider is that an executable and libraries may not have been compiled with the same settings, may not be synchronized.

Using the last section as an example, a newer program may be compiled to swap calls to FOO_strcpy(), but the library it loads may have been an older one without that function, so you’ll get an error.

There is a solution to this. A stub FOO_strcpy() can be automatically compiled into executables that calls strcpy(), discarding the buffer size info. This can be marked as a lower priority than the library version, such that if a library supports the safer function, it’ll be used, otherwise the less safe function will be used.

This restricts our possible solutions somewhat. There are a number of solutions to problems that can be hidden by macros, but which would break libraries. For example, one possible solution to memory allocation is to allocate an additional (size_t) before pointers. This would only work if all the code in a project is compiled with the new settings.

Refactoring with warnings

A big problem with FORTIFY_SOURCE is that it gives no feedback. It’ll try to replace strcpy() with a safer version when it can, but if it can’t, the compiler will be silent, calling the unsafe function.

This needs to change. If we add CHK_REPLACES() to a function, then we want to know every case where calling the replacement fails, so we can fix it.

Warnings need to be managed properly for refactoring, such as with the traditional warning levels. We want a set of check levels that become more and more aggressive the higher the level is set in command-line flags.

This allows programmers to sole the easiest problems first, then with each build solve increasingly more difficult problems. Each release would become slowly safer and safer, rather than a huge change all at once.

At the highest warning levels, we’ll want warnings even when CHK_REPLACES() succeeds. Presumably, eventually all code wants to have explicit bounds information rather than implicit calls. At that point, the unsafe functions can be completely deprecated.

Ownership

I’ve thought through the above solution only as it related to buffer-overflows. I haven’t really thought about memory allocation bugs like double-frees or use-after-frees.

Presumably, we can add macros under the same principles, things that wouldn’t demand a fork of the code, hoping to only improve the situation rather than absolutely fix it. A ton of Rust code contains the unsafe keyword, so I don’t think we need an absolute solution to be competitive in safety.

One solution would look to Rust’s “owernship” solution for inspiration.

Another solution might look to Taligent’s Pink, a 30 year old project from Apple. It had clear delineation of functions/structures that would “adopt” (take ownership) or “orphan” (give up ownership). It’s less comprehensive than Rust’s solution, but one that was designed specifically for C.

An inelegant solution

This solution is ugly. I think everyone wants a clean, neat solution that hides buffer bounds. But this solution does the opposite, it makes C uglier and uglier.

This doesn’t matter. For one thing, it fits the C philosophy of being a systems language were such language features aren’t invisible. For another thing, the only things that matters is getting something adopted. People won’t adopt pretty things that requires a fork.

Programmers will adopt this, if supported by compilers. It costs them nothing to start decorating variables and functions with macros. If nothing else, it’s self-documenting code.

We don’t even need to upgrade compilers like gcc and clang for this to add benefit. Instead, it can be added to static analyzers. They can generate warnings about unsafe buffers which can be silenced by adding such macros that they recognize.

Rust compatibility

Both C and Rust are system languages that should ideally coexist, so that it’s seamless calling from one to the other.

This can greatly be improved with such macros. We can define rules for how the APIs of one would be translated into APIs of the other, such as auto-generating code stubs that can do the translation.

Conclusion

We have a simple technique for improving C memory safety staring us in the face. We just need to extend the functionality of FORTIFY_SOURCE, using macros to communicate buffer bounds to the compiler and supply alternative functions.

This solution isn’t perfect, it’ll leave a lot of memory still unsafe. It’s not elegant, as it peppers code with ugly macros. But it achieves the most important thing, that any body can trivially upgrade their code for the future without forking off the past.

Cybersect

Discussion about this post