Yes, OSI Model was based on IBM mainframes
So very mainframey
ESR, a famous nerd, disagrees with my claim that the “OSI Model” was based upon IBM “mainframes”. It was, but it’s complicated.
ESR’s version of history is right—that the OSI Model was driven by European academics opposed to American DoD researchers. But there’s another view of this history, where these academics were still influenced by Big Government, Big Telcos, and Big Mainframes.
In this post, I describe how they are related. I explain how the idea of a “model” comes from mainframe networks, how the lower layers were based directly on the mainframe network stack, and how the upper layers were inspired by mainframe thinking.
What’s a Mainframe?
The “mainframe” comes from the era where computers were so expensive that an organization could only afford one of them. Such early computers didn’t even have the typical user interface of a keyboard and screen. They were just buttons and panels of “blinking lights” that were programmed with punch cards.
As Moore’s Law turned, there was a fork in technology. One fork led to the democratization of computing, with more and more power on a user’s desktop. But another fork led to smarter devices attached to the mainframe, with the mainframe still in control.
In this fork of technology, users had simple “terminals” that merely displayed content rather than running code. The app’s code still just ran on the central mainframe.
IBM was the biggest supplier of mainframe computers. In the 1970s, IBM accounted for about half of the entire computer industry. American researchers and companies like Intel and Xerox often lived in a separate non-IBM world, which is why we have non-mainframe technology today. But much of the rest of the world still lived in the grasp of IBM.
Where OSI Comes From
The lead author was Hubert Zimmermann of OSI, a French researcher who helped develop the CYCLADES network, from where the Internet gets its “end-to-end” principle.
But the Europeans in turn were building standards from the top down. At the top were the big national telephone companies and big industrial firms, with “stakeholders” and “committees”. As such, the original OSI Model was based upon industry more than academia. (The Internet was built bottom-up — nerds just built things, whatever worked became the standard).
The original 7 layers were contributed by Charles Bachman, an engineer working for Honeywell, one of IBM’s mainframe competitors. Honeywell was building a network stack modeled on IBM’s SNA.
Hence, the original OSI layers matched very closely with IBM’s “SNA” network layers, especially the bottom 3 layers. The easiest way of understanding what the OSI standards actually mean is to read documentation from the 1970s about how SNA actually worked. You really aren’t going to understand what the “Session Layer” really intends unless you learn about SNA’s “Data Flow Control (DFC)” functionality.
The important thing about IBM’s mainframe network stack was that it’s essentially a single product. Each layer specifies a piece of the larger product.
That’s where the OSI Model goes wrong. It sees the entire stack as having a fixed number of layers, where each layer has a different purpose.
That’s not how things work on the Internet. Layering happens, but in an ad hoc fashion. In RFC 791, the [Internet Protocol] runs over some sort of local link or local network, but the local details don’t matter. It might be Ethernet, or it might be pigeons. The local network may itself have sublayers, but that fact is opaque to the Internet as a whole.
The RFC 791 model is really only two sublayers: the Internet Protocol layer and the transport layer. Everything above that is just some sort of payload. What runs on top is as opaque as whatever links things below.
The idea of fixed vs. ad hoc layers led to much debate about SSL/TLS. People wanted to make SSL fit the fixed OSI Model, which really had no place for it. People struggled to accept that this was just an ad hoc layer, that it used transport below, providing encryption to the payload above.
The point is that the idea of fixed layers with assigned functionality comes from the IBM mainframe world. It is largely foreign to the Internet standards.
Layer #2 and SDLC
IBM mainframes inspired the fixed-function model as a whole, but also defined several of the layers specifically.
The OSI “Data Link” layer comes directly from IBM’s “SDLC,” which stands for “Synchronous Data Link Control”. It’s right there in the names, “Data Link”.
In the beginning (as far back as 1800s), all you had was a “link”—a wire connecting two points. (There were also multidrop wires, but forget about that for the moment.)
Such links could have “dumb” devices on either end, devices without a CPU or even transistors. The early teletypes were just that sort of dumb device.
The early RS232 serial link standard had 25 pins. Only 3 wires were strictly necessary: transmit, receive, and a common ground. Other pins were used for control. For example, if the sender was transmitting faster than the receiver could keep up, the receiver would send current down a dedicated pin to tell the sender to pause. This could be handled with mechanical solenoids rather than needing software.
The invention of the 8-bit microprocessor changed things. Suddenly, it became practical to put smart software on either end of a link. Link “protocols” changed from being separate wires to data sent back and forth in packets. If the sender was transmitting too fast, the receiver would send a message in the other direction.
This was the birth of IBM’s “SDLC.” It was a packet protocol for serial links. Among its features was adding checksums and serial numbers to packets so that if they got corrupted, they could be resent. This was a big problem with the low-tech cables and connectors of the time, where electronic noise would corrupt packets.
IBM’s SDLC quickly inspired standardization. One standards effort was a slight variation called “HDLC,” with some small variations. Another variant was called “LAPB,” used in the telco X.25 standards.
The original Ethernet had no such thing. When Ethernet was standardized and pigeonholed into Layer #2, they needed to add something compatible with SDLC. This was called “LLC” and is standardized as IEEE 802.2, and runs as a sublayer on top of the Ethernet “MAC” sublayer, both part of Layer #2.
You don’t see LLC today because it was never actually needed. It exists only to make the non-mainframe Ethernet conform to the mainframe model.
By the time LLC was create, it wasn’t really needed. Cable technology had reached the point where they would reliably transmit packets without corruption, and networks had reached the point where you wanted to retransmit lost packets “end-to-end” anyway, meaning not simply across the local link, but between the remote ends across the Internet.
Network Layer #3 Was Connection-Oriented
Hubert Zimmermann helped design CYCLADES, the early French network that influenced the design of the TCP/IP Internet. But the OSI Network Layer #3 did not work like CYCLADES—it worked like IBM’s SNA and telco X.25. It was “connection-oriented,” not “connectionless.”
Let me explain the difference.
The telephone system works according to connections or circuits. When you make a phone call using a traditional wired phone, you establish a virtual circuit consisting of a 64-kbps stream of bits flowing in each direction.
This is the “T carrier” system from the 1960s when the phone system was made digital using the newly invented transistor. Smaller streams are combined into larger streams, like the 1.544-mbps T1 line and 45-mbps T3 line. The telephone switch would then forward streams, so multiple incoming streams on one line may then be split up to flow out of different lines—each stream flowing toward its own destination.
When you dial the phone, every switch in between the caller and callee is contacted, and a 64-kbps stream is reserved. If there is congestion, the caller will instead hear the error message “no circuits are available” and the call won’t go through. You often hear this on New Year’s calling to wish family and friends well, or after natural disasters trying to call loved ones in the area.
Once a call succeeds, then congestion won’t happen after that point. You’ve got a 64-kbps stream until the call ends.
Such streams of bits are inefficient for computer networks because they are flowing all the time, even when the computers have nothing to transmit. Computer data is bursty, sending a lot for a short period of time, but silent the rest of the time.
Computers want to send data in packets rather than in streams.
To handle this, the major telephone companies (the “telcos”) developed the X.25 packet switching standard. Your computer would then request the packet-equivalent of a virtual circuit. Each packet switch between the source and destination would be contacted, and a “connection” established.
One of the properties of such connections would be to require a minimum transfer rate, such as 1-mbps. As long as you transmitted less than that minimum, your packets were guaranteed to go through the network. You could transmit faster, but those packets would be delivered “best effort.” When there’s congestion, such “best effort” packets could be dropped.
Likewise, when you used less than your guaranteed bandwidth, the switches in between would be using that opportunity to forward best-effort packets for other users.
Thus, the packet-switched X.25 network was much more efficient, and therefore cheaper, for computer users.
The Internet is a packet-switched network as well, but it’s only “best effort”. You don’t establish a connection through the network ahead of time; you simply send the packet. You can’t reserve a minimum amount of bandwidth. If there’s congestion somewhere in the network, that packet will be lost—”dropped” by the router that’s unable to forward it out a congested link. Each packet finds its own way through the network, so when you send two back-to-back packets, they may follow different paths and arrive in the wrong order.
The Internet is therefore a connectionless network.
Both mainframe and telco X.25 networks demanded the “reliability” of a connection-oriented network, so therefore the OSI Network Layer #3 specified it. The only option was a connection-oriented network. This is what Hubert Zimmermann put in the standard even though he himself helped develop CYCLADES, which had a connectionless network.
The standard was quickly amended to allow either a connectionless or connection-oriented network layer, so this is probably of little historic significance. The point is only that the mainframe ideals came first.
The other thing you need to know is that Ethernet was a Layer #3 protocol, a “Network Layer” protocol.
The Network Layer #3 is defined where a relay receives packets from one link, examines the destination address, then forwards that packet out the correct link in that direction.
This describes an Internet router (or X.25 switch). It also describes an Ethernet switch, because an Ethernet switch is a Layer #3 device.
When Andrew Tanenbaum published his first “Computer Networks” textbook based upon the OSI Model in 1980, he clearly puts Ethernet as Layer #3, because it was.
However, if you Google or ask the AI today, everyone will tell you that Ethernet is Layer #2, part of the “Data Link Layer.”
As mentioned above, it didn’t actually conform to the OSI definition. They had to add LLC to make Ethernet look more like SDLC/HDLC/LAPB, so that it could then carry SNA and X.25 traffic.
In other words, once they decided upon a rigid model with fixed functionality at each layer, they had to make Ethernet conform to its assigned layer, and added LLC to it.
The real model of today’s Internet is that one network may be layered on another. That might mean putting Internet packets inside Ethernet packets inside your home and office, which is common on the edges of the network.
On backbones, we see something else, like a system called MPLS to carry Internet traffic. MPLS is a network technology that itself may be layered on top of Ethernet, giving three layers of networks.
Internet traffic can be tunneled through VPNs, which layers the Internet on Internet, adding yet more ad hoc layers.
The point is that OSI envisioned a fixed, connection-oriented Network Layer #3 that would need something like an SDLC-like Data Link Layer #2 beneath it. It was designed this way because that’s how IBM’s mainframe network worked.
But today’s Internet doesn’t work that way. The Internet Protocol can be encapsulated in anything, maybe within local Ethernet networks (with no LLC), maybe carried by pigeons.
Upper Three Layers (#5, #6, #7)
Back in the late 1970s, the lower 3 (or 4) layers existed as practical, real things that engineers could touch. Hubert Zimmermann had them in the CYCLADES. Xerox had them in its highly influential PUP and XNS standards. They were visible in the emerging TCP/IP standards of the future Internet.
The upper three layers were more theoretical. They roughly existed in IBM’s mainframe networks.
You can’t understand the OSI Session Layer #5 without looking at IBM’s mainframe network stack. The most prominent feature was the fact that some mainframe communications can be “half-duplex,” meaning only one side can transmit at a time. Another prominent feature is that transactions can be “batched” so that they all succeed or fail together, instead of some operations succeeding while later ones fail.
The history of the OSI Presentation Layer #6 is even more difficult to understand.
Back in the day, each computer had its own way of representing data. They would have different word sizes, like 12-bit or 36-bit words. They would have different character sets, like the famous difference between IBM’s EBCDIC and US-ASCII. Structured data would often be written to disk by simply dumping the contents of memory, meaning memory layout was the file layout.
The lifecycle of data was to be created, processed, and eventually destroyed all on the same machine. If it was transferred between machines, it was usually between machines of the same type.
In those early days, the following rule was followed: the format of data was the property of where it was located.
As networking started to mean connecting computers of different types together, this became a problem. You couldn’t simply copy data from one machine to another because it would then have the wrong format. You had to convert it.
Who was responsible for the conversion? The sender? The receiver?
The answer they came up with is that this would be handled by the network stack itself. Networking computers of different types always required data conversion, so of course that should be a layer in the stack.
This was an issue for file transfer, but also for terminals. Different terminals had different character sets but also different control codes for drawing things on the screens. On Unix systems, among the features of the Telnet protocol is to communicate terminal type, so libraries like “ncurses” can be used to send the correct codes to do things like draw boxes on the screen.
At the time, sending the right terminal control codes was one of the most important features of the network for everyone involved in defining the OSI layers. They had terminals on their desktops, not personal computers.
These concerns don’t exist anymore.
The most important change in thinking is that data conversion is the wrong solution. The format of data is a property of the data itself, not where it’s located.
A PDF or JPEG is the same format regardless of what device holds it. Trying to convert data will only corrupt it, especially if one side supports features the other side doesn’t, causing them to be removed. And it’s rare for two types of formats to support precisely the same feature set.
Not only do you not want conversion to be located in the network stack, you don’t want it to happen at all.
In the past, when the lifecycle of data was contained in a single computer, the lifecycle of data today is to be transferred among computers of different types. You might take a picture on your Android phone, send it to somebody using an iPhone, through a server running Windows. Conversion never happens in this sequence of events.
At the top of the network stack is the Application Layer #7. Is this simply the payload that’s above the network, outside of it? Or is this the highest part of the network stack, inside the stack?
Nobody really knows these days.
In the mainframe view of the world, you had services like VTAM and FTAM, the parts of the network stack that dealt with “terminals” and “file transfer” respectively. An application would never transfer a file itself; it would instead request file transfer from the FTAM service.
Therefore, in the mainframe view of the world, the entire network stack is integrated, and applications are fundamentally not really network aware. They require services from the operating system to accomplish goals without being precisely a network application.
That’s not how it worked on the Internet. If you have an email application, it’s very much a network application. It’s not a program that requests services from an email subsystem using some sort of “email API.” Instead, it implements the email protocols itself, using the “Sockets API” to send and receive payloads itself. Its implementation of email protocols is above the network stack—payload, not inside the network stack.
Now, web apps are a little bit different. A lot of programmers use web-specific APIs, both on the client side and server. The “web” is therefore considered an “inside” part of the stack.
But it’s still wrong to think of the web as conforming to the OSI Application Layer #7. It’s an ad hoc layer applied on top of the existing Internet, not something designed to fulfill fixed functionality defined in a specific layer.
For example, in HTTP/1.1, the web protocol runs on top of “Transport Layer Security” or TLS. In HTTP/3.0, TLS capabilities are integrated directly into HTTP—I suppose a sort of “web layer security” instead of “transport layer security”. It’s arbitrary and ad hoc.
The Internet exists as a series of layers, but not the mainframe (and OSI) of fixed layers.
Layers #1 and #4
I don’t really discuss these layers in this post.
Layer #1, the physical layer, transcends this discussion. It existed 100 years before all this nonsense and isn’t mainframe related. It’s just a fact of life: at some point, you connect two things with a wire. In much the same way that your model should include a “payload” above the network stack, you should think of a “physical wire” below the stack, outside of the stack.
Layer #4 is probably the layer that’s actually based upon Zimmerman’s work. It’s definition is that it’s end-to-end — that whatever functionality it provides, it does so on the ends of the network instead of in between. To understand it, look at TCP/IP rather than IBM SNA.
Conclusion
The point is that both stories are correct. ESR is correct in saying that OSI was driven by European academics. But from another viewpoint, it’s really based upon IBM’s mainframe networking and is the antithesis of what we see as the Internet today. The fixed-functional layers was wrongthink that came from IBM. Data Link #2 was IBM’s SDLC. The Network Layer #3 was connection-oriented. The upper 3 layers aspired to reach the wrong goals—mainframe goals.
The entire OSI Model is a lie. It was based upon IBM mainframe thinking and is not the actual model used by the Internet. I know that everything you read, everything you Google, every answer you get from AIs will disagree with me. Nonetheless, everyone is wrong.


Jeebus this brings back some memories. I worked on Datapoint computers for 9 years, and that included their Token Ring network (ARCnet? don't remember) starting around 1980. They wouldn't provide us with the documentation we wanted, so my boss told me to just start exploring, write a game, anything, just poke at it and see what you can document.
It was dynamite fun. At some point I had documented all their system calls well enough that my boss went to them (Austiin? San Antonio? Wherever the Alamo is) and said, basically, you give us the real deal under NDA or we release this unofficial documentation to the public. He came back with the real deal.
Along the way I was poking at their Token Ring authentication for logging into terminals and found a flaw which was dumb as rocks. When you typed in the password on your terminal (which were real computers with local disc drives, not dumb tubes), it sent the username out as a broadcast asking if anyone had auth data for this user. If any node waved its hand and said "Me! I've got that user!" it would send the password, in the clear, to that node!
So I wrote a Stupid Little Test Program™ which sat in a loop looking for those username query packets, always answered "Me! I've got that user!", got the password and dumped it to the screen, and sent back the disappointing auth reply that sorry, no, I don't have auth data for that user, my bad.
I vaguely remember Datapoint fixing it lickety split quick, but that was a long time ago.
This piece really made me think about how deeply historical tech influences our current systems, it's fascinating to trace the lineage from those big mainframes to today's distributed networks. It's wild to consider that initial fork in technology, where one path led to user empowerment and the other to centralized control, and how even now with AI, we're constantly navigating those tension between centralization and distributed power.