I’m fascinated by this tweet because of the huge number of people who give the wrong answer.
At the same time
Synchronous both does and does not mean “happening at the the same”.
Let’s say we are together in a boat rowing. If we both start rowing “at the same time”, then our ores will foul with each other, and we’ll end going in circles. Instead, we need to coordinate our strokes so they happen at the right time.
This is what synchronize means, not so much a the same time, but together in a coordinated way.
Consider using one of those big saws lumberjacks used to cut down a tree. While one lumberjack was pulling the other was pushing. They weren’t doing the same thing at the same time, but their actions were synchronized.
Or consider a dance. One partner leads. Each dancer is doing something different, but their actions are synchronized. Indeed, the entire ballroom floor requires dancers to synchronize their movements to avoid running into each other. Consider slam dancing in a mosh pit, where people flail around, unsynchronized.
So, dance moves are synchronized, and it doesn’t mean doing the same thing at the same time.
Serial communications
There’s both an old and new view of asynchronous communications.
Back in the 1960s we had serial lines between devices. Bits were often sent using high and low voltages. But let’s say there’s a long string of 111111111111s, where the voltage doesn’t change. Exactly how many 1s are there?
The clocks on both ends, on the sender and receiver, weren’t very good, and communication would quickly get desynchronized. The sender might think it sent ten 1s, while the receiver might think there is only nine.
There are a number of solutions to this problem.
One solution is to use a second wire sending a clock signal. This keeps the clocks on both ends synchronized, and now you can send a billion 1s on the data wire (with no change in voltage) and know exactly how many bits were sent.
Another solution is to send bits that guaranteed a voltage transition. The early RS-232 standard (still used today) would send a start bit and a stop bit, one at a high voltage, the other at a low voltage. The clocks wouldn’t drift too much in the eight bits in between.
RS-232 is called asynchronous, partly because there’s no clock signal, but partly because there’s no coordination between the sender and receiver. Consider one end connected to a teletype. A user could press a key, sending a character across the RS-232 link to a mainframe/minicomputer. From the mainframe’s point of this, this event is a surprise and interrupts whatever it was otherwise doing.
The Apache model
When the browser connects to a webserver, it first establishes a TCP connection. This connection starts with SYN packets, meaning synchronize. Here, the word doesn’t mean “at the same time” but “coordinated”. The main thing coordinated here is the exact sequence numbers each side will use when sending data.
At a highly level is the threads/processes within the browser and web server.
A browser creates a thread that first creates a TCP connection, then sends the HTTP requests, then waits for a response. While waiting for the response, the operating-system kernel puts that thread/process into a blocked state, meaning putting it to sleep, and assigns the CPU to go off and execute other (non-blocked) threads.
When the Apache-style web-server receives the connection, it creates a thread/process to handle it. That thread then waits for the incoming request, meaning it’s blocked, gone to sleep, etc.
The HTTP conversation can go back and forth many times, with many sequential requests from the browser, replied to in order by the server. The threads on both ends spend most of their time blocked, waiting while the data is in transit and for the other side to respond.
Thus, in the Apache model, each TCP connection is handled by a thread/process. The operating-system is juggling them. When a network packet arrives, the operating-system figures out which thread it belongs to, and then wakes it up. That thread does some processing, sends packets going back the other direction, and goes back to sleep.
There are usually multiple connections between a web browser and a server, so there are multiple things happening at the same time, but in terms of an individual connection, things are lockstep, coordinated, between sender and receiver. Each connection is synchronized.
The problem with this model is that operating-systems can only juggled about 1000 (a thousand) threads. After that point, it spends more time in the kernel juggling threads than in the HTTP process. Too many connections would cause a server to lock-up, spending 100% of its time in the kernel doing no useful work.
Nginx
Apache was the dominant web server through around 2010. If you were building web-sites, you were using Apache. But then as your website scaled, you complained about this performance problem, where even trying to load balance across a lot of servers didn’t seem to help.
The industry started switching to another architecture, used by nginx (engine-x). It uses a single thread (per CPU) to handle all TCP connections.
It sits in an event loop waiting for a packet to arrive on any of those TCP connections, whether there are a thousand or even a million connections. When a packet arrives, it handles that packet, does the appropriate work, then goes back to waiting for the next packet to arrive.
It’s not coordinated with any particular web browser on the other end.
In the Apache synchronous model, a thread sees a predictable sequence of events, happening one after the other. Specifically, the model is “do A first, then B”.
In the nginx asynchronous model, a thread sees a bunch of unexpected, uncoordinated events. Specifically, the model is “whenever A happens, do B”.
JavaScript and NodeJS
This “stop and wait for the other side” synchronization model is easy for programmers but hard on computers, so this nginx server technique soon migrated to the web browser.
This is called “async” programming in JavaScript. Among other things, it can spew out multiple requests to servers without caring about the order things come back. Execution doesn’t stop-and-wait.
This first became popular in browsers, as a web-page was built from multiple requests to a web-server, with constantly live updates as new things became available. Consider Gmail as an example. As a new email arrives at Gmail, the web-mail app in the browser is updated to show the new email. This is an asynchronous event, not really coordinated with anything.
JavaScript was long popular in browsers, but eventually some goofballs started running on servers as well. The most popular way of doing this is NodeJS. From the beginning, NodeJS was based upon the same async architecture as nginx.
One of the problems with NodeJS is that it has to read files. Each time it reads data, the hard drive has to spend many milliseconds retrieving it. If NodeJS had to stop and wait each time, it’d be slow. Therefore, it has asynchronous way of reading fiels.
Specifically, the fs.ReadFile() API starts a file read request, but returns immediately, before the data is read. One of the parameter is a callback function that’ll be called, asynchronously, when the file is written.
But not all NodeJS scripts want this, so there is an fs.ReadFileSync() function that stops the entire script, waiting for the response.
The synchronous function is synchronized with the hard-drive.
What the original question was probably asking is why you can read many things from the hard-drive using the async API, but only one thing at a time using the sync API. Don’t they mean opposite things?
The answer is that sync doesn’t mean “at the same time” but “coordinated. The async API isn’t coordinated with the hard-drive, the sync operation is.
Conclusion
If you look up the word “synchronize” in dictionaries like the OED or Mirriam-Websters, they overwhelmingly refer to things happening “at the same time”, matching the original question above.
But they are missing the context that it’s not really about the same time, but more that they are coordinated. The Wiktionary definition gets this right.
It’s that connotation that’s important here, the coordination of two entities, such as both sides of a network connection or with a hard drive.
What I find fascinating here is the intractability of the problem. A person who actually thinks deeply about this thing, which they’ve probably been using for more than a decade, who researches manuals and dictionaries, will find the resolution to the problem elusive. Asking on Twitter doesn’t really help, because you get a lot of well meaning folks who tell you what the answer might be, rather than what it really is.
This post, by the way, gives the correct answer. I’m not just guessing, I know what it is.
nice read