Some notes on Sockets programming
At some point in a college class on “networking” they will teach you how to do “Sockets” programming. There are many adequate guides to this, and you can ask AIs to explain concepts to you.
But the guides I’ve seen have problems. I don’t want to write my own guide from scratch, so in this blogpost, I’m just writing up some notes about those problems. This is a work in progress.
To work along with this, I’ve created a simple GitHub project with some programs demonstrating some of the concerns here. It’s also a work in progress.
Signals
You can’t port scan industrial networks because things crash. The major reason is because they get a SIG_PIPE error.
Unix sockets are file descriptors and hence inherit Unix file properties. One of those properties is that when you write to a “pipe” that disappears, then your program will get a SIG_PIPE signal. If you don’t handle the signal, then your program crashes. You can trigger this locally by a command like “cat /dev/random | myprog”, where myprog deliberately crashes a few milliseconds after starting. The cat program will report a pipe error.
The same happens on the network. If you are writing bytes to a connected socket, and the opposite end disappears, then you’ll get a SIG_PIPE exception.
It happens so rarely that you’ll likely never see it. The events have to happen at precisely the right time. Your program has to give the kernel some bytes to send on the connection, but before they are sent, the other end has to terminate the connection with a RST packet.
These conditions happen with port scanners like nmap and masscan. The ask for a response from the server, but often close their end in the middle of the server responding, crashing the industrial control device.
The upshot is that you always have to do something to handle the signal. This is most easily done by adding a signal handler for it, as demonstrated here in my sample.
Most texts on sockets programming don’t mention it, and the few that do, mention it only in passing, as if this were an optional thing that you can usually ignore. You can’t. You must always deal with it.
getaddrinfo()
The target for a client has an address and a port number number. This is usually specified on the command-line or config file or somehow provided by the user. The program needs to parse this, and if it’s not a numeric address, do a DNS lookup on the name.
The correct way to do this with getaddrinfo(). It’s unlikely that the sockets programming guide will tell you this.
The correct way to call the function is demonstrated here in my tcp-client.c sample program. The correct way to retrieve the results is shown here.
Among the features is that it parses IPv4 and IPv6 addresses automatically. In other words, your code will be agnostic to which stack is used. At no point in the tcp-client.c is there anything that refers to either IPv6 or IPv4, yet both are handled. This includes printing the address, shown here with getnameinfo().
When getaddrinfo(), you’ll also use getnameinfo() to format the numeric address, and gai_strerror() to format errors into useful text messages. You should consider all three of these functions together as a single unit. In particular, since any reasonable software is going to include extensive logging of what’s going on, you’ll find yourself using getnameinfo() a lot.
This function getaddrinfo() comes from the transition to IPv6, but most books that taught your professor how to do this thing come from IPv4 days. Therefore, your textbook/professor learned something like gethostbyname() or inet_aton(). When they use getaddrinfo(), they often don’t use it in a truly IPv4/IPv6 agnostic manner.
Byte-order, like htons()
A classic program is the fact that the order (of multibyte integers) on the network wire may be represented differently than within the CPU.
This is rarely a difficult problem except for C programmers. That language allows a toxic and broken idiom of casting external data structures onto internal memory.
Since the Berkeley Sockets API was written by C programmers, this sometimes appears in the structures they present, like sockaddr_in. In such cases, you need to use functions like htons() to re-order these integers.
It’s really dumb, no other languages does this crap, and if you are confused by it, it’s because you are smart and sane.
What you see in tcp-client.c and the other sample programs is that if you do things correctly (such as using getaddrinfo()), then you don’t need to mess around with this. It treats these data structures as opaque objects so that you don’t need worry about it.
In short, if you are using functions like htons() or ntohl() in your C code, you are doing something confused and messy that can be done more cleanly and elegantly another way.
There are other needs for such byte-order conversions, outside the sockets API. The correct way to do them is something like:
num = buf[0]*256 + buf[1];
This works on most languages, including C. The thing you want to avoid in C/C++ is anything that looks like the following:
num = ntohs(*(short*)buf);
DNS lookups
While getaddrinfo() gets an IP address given a name, it can’t be used for other DNS queries, such as looking up the MX record.
There’s a “DNS resolver” library that comes along with Sockets in all the systems I’ve looked at (Linux, Windows, macOS, BSD, QNX) that can do this. You call res_init() followed by res_query() or res_search().
I’ve written my own simple version of the dig program that does this, showing how to call res_search(). However, instead of using the built-in way of parsing the response, my program does it’s own custom parsing.
Since this is as universally available as Sockets on all platforms (including Windows), it should be taught as part of the Sockets API.
Windows sockets
Sockets on Windows has a bizarre history, but the upshot is that most things work. You can write your code targetting macOS and Linux, and then the process of also making it work on Windows is straightforward.
The biggest problem is dealing with POSIX descriptors vs. Windows handles.
Windows has the same architecture as Linux, with kernel resources tracked as small integers as index into a table. Unfortunately, whereas Unix descriptors start at zero, Windows handles start around 20,000.
But the principle still works, a socket is also a file handle, so you can Windows APIs like WriteFile() in order to send data across a TCP connection.
However, Windows maintains a separate table of POSIX descriptors for POSIX functions like open() and write(). You can still use those functions, but you need to translate from the Windows handle to the POSIX descriptor with a function like _open_osfhandle().
Several of my sample programs can be compiled on Windows that you can use for examples, though for simplicity’s sake, I write most of them just for Linux and macOS
send()
This function might not send as much of the data as you requested. If you send 100 bytes, it might return 42, meaning only 42 bytes were transmitted, and you’ll have to try again for the remaining 58 bytes. This happens under high load, when the kernel no longer has space to buffer the data on behalf of the app.
Thus, you must check the return value, verify the number of bytes sent is the number requested, and handle the case when not all bytes have been sent.
You can see this with a packet-sniffer when monitoring email traffic. BASE64 encoding of attachments are nicely regular with lines of 72 characters each. When you see lines that are longer or shorter than that, you know there’s a chunk missing due to a send() failure to transmit all the bytes.
recv()
Receiving bytes has the same problem, if you ask for 100 bytes to receive, and the other hand sends only 42, the function will return with 42. Technically, it might return just 1 byte at a time.
Conversely, it may return more than you want. For example, many protocols are text-based, where the receiver assumes a line of text at a time, terminated by a line-feed (‘\n’). They may request 100 bytes knowing that the line of text will be shorter. But in fact, even though the protocol specifies only a single line at a time, the other side may send multiple lines at once.
For example, a request to a website might contain the following HTTP text:
GET / HTTP/1.0
GET / HTTP/1.0
This sort of thing has been widely exploited in the past to do things that crash the server or bypass user authentication.
Resource limits
If your program creates enough connections (enough socket descriptors), it’s going to run into resource limits On Linux, the default number of descriptors per process is only a few hundred.
There are APIs that you can use to discover the limits of a program, such as how many sockets/descriptors can be opened simultaneously.
These APIs are essentially a non-optional part of the Sockets API. If you create a server that accepts incoming connections, eventually somebody is going to create all the connections and cause it to fail. It could be accidental, such as those who run pors-tscanners on industrial control networks, or intentional, such as service made public on the Internet.
select() and poll()
There are two ways to handle multiple connections. One way is to fork (fork()) worker threads or processes, assigning a thread/process to each TCP connection. Another way is by polling using either the select() or poll() system calls to see if something new has arrived.
The descriptions of these functions aren’t very good in the textbooks I’ve seen. Among the problems is that you also want to test for sendability. As described above about send(), when kernel buffer space is exhausted, you can’t send data. Therefore, you also to use select()/poll() to test for that. I’ve got some sample projects in the GitHub project to show them better, but they aren’t complete yet.
Scalability
Sockets API is only the base infrastructure that you won’t actually use in the real world. T/hat’s because while it works for a small amount of traffic, it quickly falls down for a lot of traffic.
For example, take the model where you fork worker threads/processes to handle each TCP connection. You are now using the kernel’s thread scheduling mechanism as a packet dispatcher. All the processes are blocked on the recv() function call, a packet arrives, and the kernel need to wake up and unblock the one thread for which that packet belongs. Not only is this inefficient in general, it doesn’t scale well. In other words, it’s not an O(n) process, but more like O(n^2).
The select/poll function calls are better, but not by much. They also scale more like O(n^2) as traffic scales.
Back 20 years ago, this was called the “c10k” problem.
The solution is epoll (on Linux), kqueue (on macOS and BSDs), and completion-ports (on Windows and Solaris). They efficiently dispatch packets to waiting listeners with something like O(logn) scaling.
Since it’s different for every platform, most code uses libraries, like libuv, to take care of this for them.
Thus, when writing any practical server, you might want to focus on libuv APIs instead of Socket APIs. However, you really need to understand what Socket APIs are doing under the surface to really understand what’s going on.
Conclusion
The typical documents show how to get started with Sockets programming, but there are a lot of important features they gloss over. I’ve been working on documenting these things, but I’ve only just started.