A Charitable Research Foundation Devoted to Education, Consumer Protection, Scientific Advancement and Freedom...
		document Index \| first magazine Section \| main Archives
ÿabout donations • þ join • H home • ª media • i about • ( contact

A SECULARISTS NON-BLOG ON TCP/IP (Vol 1.)
Collected from the Notefiles of the Chairman by Scott Shepard, student intern
January 2004, updated

Prefix by Dr. Jack A. Shulman
Chair, American Computer Science Association Inc.
(Skip my boring, scathing prefix rant about Mafia Bell, and go directly to the helpful notes >)

PREFIX

Don't you hate the term "BLOG"? I told Scott to use it in the title in a way I wouldn't find objectionable. Good work, Scott! This rant is a non-Blog!! I wish we could invent the ANTI-BLOG. But then, we'd need the ANTI-AT&T to do that. Poor little AT&T, had to merge with Comcast and SBCC. What? Poor who?

As Chairman of the 1988 Malaga OSI Committee on X.25, TCP/IP Interoperability, and X.400 Interoperability and X.500 Directories, during which time I was Co-Director of the X.400 Interoperability Laboratory at AT&T Bell Labs, I was frequently quoted in the following manner:

"Dr. Shulman seems to believe that TCP/IP is 'byte strong, word wimpy, router strong, crossbar deficient, and layer skimpy, universally inapplicable to anything, universally usable by everyone. He says it will probably take over the future of the Info Super Highway despite being a bastardization of the original Starlan and DDCP in their earliest pre-commercialization forms."

This is a true quote, although I don't think I said "Starlan" or "DDCP" specifically, I still believe TCP/IP is wimpy, it has inadequate protection, too few encryption options, and does not address my original specification on Network Wide Virtual Memory (with Virtual Device) Paging, as visualized by the X.25/LANvx protocol used on the OSF/1 and Apollo One and Two workstations. Those were the days. I also happen to believe that TCP/IP is what we've got to work with at least until 2007. Yet even back in 1977 and 1983, at convocations, we ALL OF US in the Northeastern DP Community (that is, except for the AT&T guys and the few early Sun Terminal bloggers) believed that TCP/IP, destined as it was to become the adoptive standard for OSI nets at the time, "really stinks". Funny that we all also embraced its use at the time. "For want of a nail, we embraced the use of a brad in its place." Now that's a direct quote.

Blame Hackers, Microsoft, Linux, Trojans or whatever you want for the modern SPAM, SECURITY and ADWARE problems of the Internet, they are all caused by the FUNDAMENTAL FAILINGS of TCP/IP and its implementations and deployments. It really STINKS. Which is why companies like Cisco have had to invent their own Security over-layers and why people have come to believe in VPN. Me, I believe in Beatles.

My assistant, Scott, has purported to assemble a collection of my personal notes from around the web used in courses at ACSA University at TCP, UDP, IP, RFCs, Classes and various elements of addressing.

What it does not address is my discontent with IP. In any HDLC or SDLC based network, or variants, I have the sufficiency of layers and function management to be able to define devices, starting addresses of the devices, and virtualization of the devices very existence. For example, I could define nodes in a network and networks in the Universe using a 96 bit hybrid address, and still include a Level2 encryption key seed, if I wished, and then use that to demand page data to or from that device anywhere on the net, in any net in the world, across any intervening carrier, without regard for the arrangement of classes, gateways, routers, carrier weighting, IP over ATM with or without throttling, or even the sheerest issues of MTU and Freq.

With Internet Protocol and TCP/IP, we are stuck, stuck in a mud of hierarchical nets (supernets, nets and subnets), stuck with service flags, relevant organizations, predefined DNS's with Berkley Bind a glaring error in the security hierarchy, archaisms from U-U, archaism from early Telnet, and, well, a million other truisms.

When I was working on a project for the US Government, IBM and Microsoft, we came up with a way that Microsoft was able to overcome dissimilarity in its NT architecture's security procedures that by using "casting" was able to, if properly configured, easily interconnect and maintain the configuration of attached nets and workstations to their VMS'like Server Architecture. This was based on adapting the similar methods used to allow CICS to cohabitate with Unix/370 and OS/370 on earlier IBM nets, yet: to be frank with you, it works better if you think of TCP/IP nets as funnels with a strategic gateway at the top that creates a MATRIX of PCS and Servers and then overlaps connection to the nearest upstream routing combine. We've all experienced this not only with IBM and Microsoft based Server systems, but with VAX VMS, Primos, and even venerable DG and UNOx systems. Well, we were able to conquer problems with routers sending broadcast requests during Wins Winlogin dialogs to the WAN side for up to a minute before timing out and redirecting them to Domain Controllers, by static addressing PCs and pointing DNS's to their upstream Domain Controller, and we were able to eliminate lengthy login delays during Personal Settings processing by latching the Default Gateway to the Domain Controller, so that Group Policy Object Discovery and Deployment took place 10% faster. All in violation of conventional Unix hacker methodologies: never violate the Bootp or DHCP "deification". This intransigence inflexibility rears its ugly head in the vast number of indignant Unix guys who insist that this introduces redundant packet trafficking, which we remind them: "Not after the local cache's of ARP, WINs and DNS Resolver fills up, which takes place in what - ? - 4-8 packets?" Any effort to step outside the TCP/IP box to solve a problem, results in an almost programmed in response from UNIX TCP/IP adherents, who have no idea that TCP/IP is one of the worst, in fact probably the worst, least robust data communications protocols ever conceived. It was developed of necessity for a product rooted in the 60's and 70's, before the modern era of advanced VVLSI and even before the advent of the Internet. It was conceived by mediocre thinking in a world full of text messages, low level minicomputer databases and non-streaming applications, and is in use today solely to create a CONTROLLABLE yet IRRELEVANT Networking superstructure that keeps the revenue streams at the Bell carriers and equipment manufacturers intact, while repressing any competing technology.

TCP/IP and it's adoption as the core of the Internet is the byproduct of a conspiracy between the RBOCs, NSF and their keretsu, designed to Monopolize the world wide information superhighway, and to repress core competition with these Bell Operating Companies and their Long Distance "codependents", on behalf of the ruthless Monopolistic tyrants who control them, while everyone from the US Government to the US Public, to the world population at large is forced to endure a weak in the knees, short sighted, mediocre, insecure, no-where, loser-mentality conceived medium that will require decades of evolution into "some other net of Internets" (SON of Internet) before it truly begins to show it's capabilities. Meanwhile, all these RBOC "Geniuses" keep patting themsevles on the back. For what, exactly? Inventing a no-where, does nothing medium that's a half a notch more and quite a bit worse than ISDN ("It Still Does Nothing"), to become what they call the WWW (Weak Wimpy Wonder-where-the-security-is), an exercise in mental self-gratification by people with no other ability but to pin their names on something they had nothing to do with the invention of.

Meanwhile, because the "inside the box" systems are all implemented "the same way", our more traditional UNIX systems all interconnect using the somewhat overly streamlined OSI model used for TCP/IP. So, all you have to do is give up your personality, throw away your intellectual property, undo any strategic innovative advantage, and suck up to Thompson, Metcalfe and Cerf, and while you're in line kissing the backsides with these three "intellectual giants" who've descended humanity into the seventh circle of Hell with their half baked clam chowder like implementation of a network with absolutely no imagination, you can write home to Mom and ask them to please take you home from Camp Grenada. Before you give up and just accept the inevitable monopoly imposed upon us all by Ma Bell, still alive and kicking after all these years. And this from me, a Ma Bell Graduate. I can't STAND backwards protocols and weak in the knees security infrastructures like TCP/IP and I never could. That's Ma Bell's middle name: security that's designed in and involves clubbing anyone they can label "hacker" over the head and beat senseless, while defaming them and destroying their reputation. The Politics of Mafia Bell, Hell's Bells. Enough said.

For a nice set of primers on the ACTUAL OSI Model for Data Networking http://www.iso.ch/iso/en/StandardsQueryFormHandler.StandardsQueryFormHandler?scope=CATALOGUE&sortOrder=ISO&committee=ALL&isoDocType=ALL&title=true&keyword=OSI

But, note: it takes a reading. And to avail yourself of the JTC 1 Committee go here
http://www.iso.ch/iso/en/stdsdevelopment/tc/tclist/TechnicalCommitteeDetailPage.TechnicalCommitteeDetail?COMMID=1

Remember one thing: whether you are writing a way to take advantage of Router Gateway chains for a new Wifi Device, or implementing a V7 Security system, that you are working with a protocol that is only honored by its universality, not its overall strength, which is used because of its flexibility (relatively) but not its versatility. Keep in mind that we hope to leave it behind, for a MORE DYNAMIC MULTI-PATH DESIGN that is not so dependant upon Internet Numbers and old style Name Spaces with TLDs as it is with Free Format, Versatility Name and Address Spaces and a direct correlation to Virtual and Dynamic, Reconfigurable Addressing.

And then you will understand whey this graduate of the Black Hole project of AT&T, circa 60's, is discontented that it hasn't progressed a whole lot more, mired in forever reinvention of the wheel, forever trying to be compatible backwards, forwards and sideways, while not representing anything other than a switching voice definition of Datagramming and a switching Control Data definition of Packetizing. Even its original conceivers gain benefit only from its conception, and it continues to be a plague of old Algol like and Fortran like conventions adapted for a novel language environment of i/o (B, BCPL and C) enhanced to include the invisible hierarchy of the modern Registration quagmire, and other things we badly need to leave behind. Ports, and IP Addy's, and classes, and protocols that are older than the industry it claims to have newly awakened.

My apologies to colleagues such as Bob Metcalfe, Vint Cerf and Ken Thompson who it appears thought this would be the permanent legacy of the future of the Web, but sorry, IT'S JUST A BIG, WIMPY COP OUT brought about by those who know how to use and develop for it and by those who wanted control over the web. WE DESERVE WAY BETTER THAN TCP/IP. That being said, here are some valuable insights into its' current strengths and underlying concepts for the Newbie. As I said before: "It's what we have to work with, it works (mostly) and it's not all that objectionable." Of course, I've since retracted the last codification, as I'm not one of those who deifies its authorship: in fact I think TCP/IP not only wimpy, but pimply, and slow. Nonetheless: with suitable homage to the sources of pages I've used in teaching exhibits over the years, many of the links work properly. Many don't. But Scott insisted this was the most valuable set of course notes he'd ever read, so I agreed to let him put it up. Missing pieces are found under the links. Caveat Emptor. Don't get lost in the "I know more than you BS, and don't think knowing it is owning it, since truly mastering TCP/IP is to be owned BY IT. The CON JOB that no one who was conned by it, would allow anyone else to discover nor ever admit they were: TCP/IP Forever! In TCP/IP we trust, all others pay cash..." Remember I said that...

-- Dr. Jack A. Shulman, Chairman
(acsa2000@acsa.net)

The Internet Protocol

Summary The Internet Protocol provides a basic delivery service for transport protocols such as TCP and UDP. IP is responsible for getting data to its destination host and network. IP is not reliable, so the effort may fail.

Relevant STDs 2 (http://www.iana.org/ );
3 (includes RFCs 1122 and 1123);
4 (RFC 1812, republished);
5 (includes RFCs 791, 792, 919, 922, 950, and 1112)

Relevant RFCs 781 (Timestamp Option);
791 (Internet Protocol);
815 (Fragmentation Reassembly);
919 (IP Broadcasts);
922 (Broadcasting on Sub-Nets);
950 (Sub-Net Recommendations);
1108 (Security Option);
1112 (IP Multicasting and IGMP v1);
1122 (Host Network Requirements);
1349 (Type-of-Service Flags);
1455 (Data-Link Security TOS Flags);
1812 (Router Requirements);
2113 (Router Alert Option)

As we learned in Chapter 1, An Introduction to TCP/IP, a variety of protocols are used for moving application data between different systems. We saw that hardware-specific protocols are used by devices when they need to exchange data directly, that the Internet Protocol is used to get IP datagrams across the different network segments to their final destination, and that TCP and UDP provide transport and connection management services to the application protocols used by end-user applications.

Although each of these layers provides unique and valuable services, the Internet Protocol is perhaps the most important to the overall operation of the Internet in general, since it is responsible for getting data from one host to another.

In this regard, IP can be thought of as being like a national delivery service that gets packages from a sender to a recipient, with the sender being oblivious to the routing and delivery mechanisms used by the delivery agent. The sender simply hands the package to the delivery agent, who then moves the package along until it is delivered.

For example, a package that is shipped from New York to Los Angeles is given to the delivery service (let's say UPS), with instructions on where the package has to go, although no instructions are provided on how the package should get to the destination. The package may have to go through Chicago first; the delivery agent at the New York UPS office makes that routing decision. Once the package reaches the Chicago UPS office, another delivery agent at that facility decides the best route for the package to take in order to get to Los Angeles (possibly going through Denver first, for example).

At each juncture, the local delivery agent does its best to get the package delivered using the shortest available route. When the package arrives at the Los Angeles facility, then another agent does its best to get it to the final destination system, using the destination address provided with the package to determine the best local routing.

Similarly, it is the function of IP to provide relaying and delivery decisions whenever an IP datagram has to be sent across a series of networks in order for it to be delivered to the final destination. The sending system does not care how the datagram gets to the destination system, but instead chooses the best route that is available at that specific moment. If this involves sending the datagram through another intermediary system, then that system also makes routing decisions according to the current condition of the network, forwarding the data on until it arrives at the destination system, as specified in the datagram's header.

The IP Standard

IP is defined in RFC 791, which has been republished as STD 5 (IP is an Internet Standard protocol). However, RFC 791 contained some vagaries that were clarified in RFC 1122 (Host Network Requirements). As such, IP implementations need to incorporate both RFC 791 and RFC 1122 in order to work reliably and consistently with other implementations.

RFC 791 begins by stating "The Internet Protocol is designed for use in interconnected systems of packet-switched computer communication networks. The Internet protocol provides for transmitting blocks of data called datagrams from sources to destinations. The Internet protocol also provides for fragmentation and reassembly of long datagrams, if necessary, for transmission through `small packet' networks."

RFC 791 goes on to say "The Internet Protocol is specifically limited in scope to provide the functions necessary to deliver a package of bits (an Internet datagram) from a source to a destination over an interconnected system of networks. There are no mechanisms to augment end-to-end data reliability, flow control, sequencing, or other services commonly found in host-to-host protocols."

That pretty much sums it up. A source system will send a datagram to a destination system, either directly (if the destination host is on the local network) or by way of another system on the local network. If the physical medium that connects the sending and receiving systems offers enough capacity, IP will send all of the data in one shot. If this isn't possible, the data will be broken into fragments that are small enough for the physical medium to handle.

Once the datagram is sent, IP forgets about it and moves on to the next datagram. IP does not offer any error-correction, flow-control, or management services. It just sends datagrams from one host to another, one network at a time.

TIP: Remember this rule: the Internet Protocol is responsible only for getting datagrams from one host to another, one network at a time.

IP Datagrams Versus IP Packets

Hosts on an IP network exchange information using IP datagrams, which include both the units of data that contain whatever information is being exchanged and the header fields that describe that information (as well as describing the datagram itself). Whenever a device needs to send data to another system over an IP network, it will do so by creating an IP datagram, although the datagram is not what gets sent by IP, at least not in the literal sense.

Instead, IP datagrams get sent as IP packets, which are used to relay the IP datagrams to the destination system, one hop at a time. Although in many cases an IP datagram and an IP packet will be exactly the same, they are conceptually different entities, which is an important concept for understanding how IP actually works.

This concept is illustrated in Figure 2-1. In that example, Ferret needs to send an IP datagram to Fungi. However, since Fungi is on a remote network, Ferret has to send the packet containing the datagram to Sasquatch, who will then send another packet to Fungi.

Figure 2-1. IP datagrams versus IP packets

IP datagrams contain whatever data is being sent (and the associated IP headers), while IP packets are used to get the datagram to the destination system (as specified in the IP headers). These IP packets are sent using the framing mechanisms defined for the specific network medium in use on the local network, and are subject to network events such as fragmentation or loss. However, the datagram itself will always remain as the original piece of data that was sent by the original sender, regardless of anything that happens to any of the packets that are used to relay the datagram.

For example, Figure 2-2 shows a four-kilobyte datagram that is being sent from Ferret to Fungi. Since this datagram is too large for the Ethernet network to send in a single frame, the datagram is split into four IP packets, each of which are sent as individual entities in individual Ethernet frames. Once all of the IP packets are received by the destination system, they will be reassembled into the original datagram and processed.

Figure 2-2. Datagram fragmentation overview

This model is necessary due to the way that IP provides a virtual network on top of the different physical networks that make up the global Internet. Since each of those networks have different characteristics (such as addressing mechanisms, frame sizes, and so forth), IP has to provide a mechanism for forwarding datagrams across those different networks reliably and cleanly. The datagram concept allows a host to send whatever data needs to be sent, while the IP packet allows the datagram to actually get sent across the different networks according to the characteristics of each of the intermediary networks.

This concept is fundamental to the design nature of the Internet Protocol, and is the key to understanding how IP operates on complex networks.

Local Versus Remote Delivery

The IP header stores the IP addresses of both the source and destination systems. If the destination system is on the same physical network as the sending system, then the sender will attempt to deliver the datagram directly to the recipient, as shown in Figure 2-3. In this model, the sender knows that the recipient is on the same local network, so it transmits the data directly to the recipient, using the low-level protocols appropriate for that network medium.

Figure 2-3. An example of local delivery

However, if the two systems are not connected to the same IP network, then the sender must find another node on the local network that is able to relay the IP datagram on to its final destination. This intermediate system would then have to deliver the datagram if the final recipient was directly accessible, or it would have to send the datagram on to yet another intermediary system for subsequent delivery. Eventually, the datagram would get to the destination system.

A slightly more complex representation of this can be seen in Figure 2-4. In that example, the sending system knows that the destination system is on a remote network, so it locates an intermediate system that can forward the data on to the final destination. It then locates the hardware address of the forwarding system, and passes the data to the intermediate system using the low-level protocols appropriate for the underlying medium. The intermediate system then examines the destination IP address of the datagram, chooses an exit interface, and sends the data to the final destination system using the low-level protocols appropriate to that network.

Figure 2-4. An example of routed delivery

The two network models shown in Figure 2-3 and Figure 2-4 are both relatively simple, and each represents the majority of the traffic patterns found on internal corporate networks. Most networks only have a few segments, with the target being no more than a handful of hops away from the originating system.

But once datagrams start travelling over the Internet, things can get very complex very quickly. Rather than having to deal with only one or two routers, all of a sudden you may be looking at a dozen or more hops. However, IP handles complex networks the same way it handles small networks: one hop at a time. Eventually, the datagrams will get through. This concept is illustrated in Figure 2-5, which shows five different network segments in between the sending and destination systems.

Figure 2-5. A complex, multi-hop network path

In the example shown in Figure 2-5, the sender has to give a packet to the local router, which will send another packet off to a router at the other end of a modem connection. The remote router then has to forward the data to yet another router across the carrier network, which has to send the data to its dial-up peer, which will finally deliver the datagram to the destination system. In order for all of this to work, however, each router must be aware of the path to the destination host, passing the data off to the next-hop router.

How IP finds remote hosts and networks

Every IP device--regardless of the function it serves--must have an IP address for every network that it is connected to. Most systems (such as PCs) only have a single network connection, and therefore only have a single IP address. But devices that have multiple network interfaces (such as routers or high-load devices like file servers) must have a dedicated IP address for every network connection.

When the IP protocols are loaded into memory, an inventory is taken of the available interfaces, and a map is built showing what networks the system is attached to. This map is called a routing table: it stores information such as the networks that the node is connected to and the IP address of the network interface connected to that network.

If a device only has a single interface, then there will be only one entry in the routing table, showing the local network and the IP address of the system's own network interface. But if a device is connected to multiple networks--or if it is connected to the same network several times--then there will be multiple entries in the routing table.

TIP: In reality, just about every IP device also has a "loopback" network, used for testing and debugging purposes. The loopback network is always numbered 127.0.0.0, while the loopback interface always has the IP address of 127.0.0.1. This means that routing tables will generally show at least two entries: one for the physical connection and one for the loopback network.

When a system has to send a datagram to another system, it looks at the routing table and finds the appropriate network interface to send the outbound traffic through. For example, the router shown in the top-left corner of Figure 2-5 has two network connections: an Ethernet link with the IP address of 192.168.10.3 and a serial connection with an IP address of 192.168.100.1. If this router needed to send data to 192.168.10.10, then it would use the Ethernet interface for that traffic. If it needed to send datagrams to 192.168.100.100, it would use the serial interface. Table 2-1 shows what the router's routing table would look like based on this information.

Table 2-1: The Default Routing Table for 192.168.10.1

Destination Network

Interface/Router

127.0.0.0 (loopback network)

127.0.0.1 (loopback interface)

192.168.10.0 (local Ethernet network)

192.168.10.1 (local Ethernet interface)

192.168.100.0 (local serial network)

192.168.100.1 (local serial interface)

However, such a routing table would not provide any information about any remote networks or devices. In order for the router to send an IP datagram to 172.16.100.2, it would need to have an entry in the routing table for the 172.16.100.0 network. Systems are informed of these details by adding entries to the routing table. Most TCP/IP packages provide end-user tools that allow you to manually create and delete routing entries for specific networks and hosts. Using such a tool, you could inform the router that the 172.16.100.0 network is accessible via the router at 192.168.100.100. Once done, the routing table for the local router would be similar to the one shown in Table 2-2.

Table 2-2: The Routing Table for 192.168.10.1 with a Remote Route Added

Destination Network

Interface/Router

127.0.0.0 (loopback network)

127.0.0.1 (loopback interface)

192.168.10.0 (local Ethernet network)

192.168.10.1 (local Ethernet interface)

192.168.100.0 (local serial network)

192.168.100.1 (local serial interface)

172.16.0.0 (remote carrier network)

192.168.100.100 (next-hop router)

Since the router already knows how to send datagrams to 192.168.100.100, it now knows to send all datagrams for 172.16.100.2 to 192.168.100.100, under the assumption that the remote router would forward the packets for delivery. By adding entries for each network segment to the local routing table, you would be able to tell every device how to get datagrams to remote segments of the network. Such a routing table might look the one shown in Table 2-3.

Table 2-3: Complete Routing Table for 192.168.10.1, Showing Entire Network

Destination Network

Interface/Router

127.0.0.0 (loopback network)

127.0.0.1 (loopback interface)

192.168.10.0 (local Ethernet network)

192.168.10.1 (local Ethernet interface)

192.168.100.0 (local serial network)

192.168.100.1 (local serial interface)

172.16.100.0 (remote carrier network)

192.168.100.100 (next-hop router)

192.168.110.0 (remote serial network)

192.168.100.100 (next-hop router)

192.168.30.0 (remote Ethernet network)

192.168.100.100 (next-hop router)

Unfortunately, you would have to add entries for every segment of the network to every device on the network in order for everything to function properly. Each router would have to have a map showing every network and the routers that were to be used for that network. This task can be a lot of work, and is also highly prone to human error.

Several application protocols can be used to build maps of the network and distribute them to all of your systems without human intervention. The most popular of these for private networks is the Routing Information Protocol (RIP), which uses UDP broadcasts to distribute routing tables every thirty seconds. Another popular protocol is Open Shortest Path First (OSPF), which provides the same basic functionality as RIP but with more detail and less overhead. For external networks, neither of these protocols work well enough to support a significant number of networks, and other protocols (such as the Border Gateway Protocol) are more common for those environments.

In common practice, most network administrators run these dynamic routing protocols only on their routers (but not on their hosts) since they tend to consume a lot of CPU cycles, memory, and network bandwidth. They then define "default" routes at the hosts, pointing them to the router(s) that serve the local network that the host is attached to. By using this model, clients need to keep only one entry in their routing tables, while the dedicated routers worry about keeping track of the overall network topology.

Table 2-4 shows what this might look like from the perspective of our example router. Notice that it has routing entries only for the locally attached networks, and that it now knows to send any other datagrams to the default router at 192.168.100.100. That router would then forward all of the datagrams that it gets to its default router as well.

Table 2-4: A Simplified Routing Table for 192.168.10.1

Destination Network

Interface/Router

127.0.0.0 (loopback network)

127.0.0.1 (loopback interface)

192.168.10.0 (local Ethernet network)

192.168.10.1 (local Ethernet interface)

192.168.100.0 (local serial network)

192.168.100.1 (local serial interface)

0.0.0.0 (default route)

192.168.100.100 (next-hop router)

Default routes can be built manually (using the tools provided with the IP software in use on the local system), or can be assigned during system boot (using a protocol such as BOOTP or DHCP). In addition, a protocol called Router Discovery can provide network devices with default route information dynamically, updating the devices' routing tables as the network topology changes.

The examples shown earlier illustrate that managing routing tables can be complex, even with relatively small networks. Unfortunately, the Internet consists of several hundred thousand such networks. If all of the routers connecting these networks together had to be tracked by all of the other routers, there would be so much router-management traffic that nothing else could get through. The Internet would collapse under its own weight.

Route aggregation

New address assignment schemes are being deployed that allow routes to be aggregated together. Now, when you request a block of Internet addresses from your Internet Service Provider, the ISP must assign one from a larger block that has already been assigned to them. This allows routing to happen at a much higher level. Rather than ISPs having to track and advertise thousands of network routes, they only have to advertise a few super-routes.

The ISP will still have to track all of the networks that are under it, but it won't have to advertise them to other ISPs. This feature cuts down on the amount of backbone router-update traffic immensely, without losing any functionality.

Geography-based aggregation schemes are also being deployed. For example, any network that begins with 194 is somewhere in Europe. This simple assignment allows major routers on the Internet to simply forward traffic for any network that begins with 194 to the backbone routers in Europe. Those routers will then forward the datagrams to the appropriate regional ISP, who will then relay the datagrams on to their final destination.

This process is conceptually similar to the way that area codes and prefixes help the phone company route a call. Telephone switches can route a long-distance call simply by examining the area code. The main switches in the remote area code will then examine the telephone number's three-digit prefix, and route the call to the appropriate central office. By the time you finish dialing the last four digits of the phone number, the call is practically already established.

By using aggregated routing techniques, IP datagrams can be moved around the Internet in much the same manner. Aggregation allows routers to use much smaller tables (around 50,000 routes instead of two million routes), which keeps CPU and memory requirements as low as possible, which, in turn, allows performance to be higher than it otherwise would be if every router had to keep track of every network's router path.

For more information about hierarchical routing, refer to "Classless Inter-Domain Routing (CIDR)" in Appendix B, IP Addressing Fundamentals.

Datagram Independence

In the preceding section, we used an analogy of a telephone number to illustrate how routers are able to route datagrams to their final destination quickly, based on the destination IP address. However, we should also point out that IP packets are not at all like telephone calls.

Telephone networks use the concept of "circuits" to establish a point-to-point connection between two users. When two people establish a telephone call, a dedicated point-to-point connection is established and is preserved for the duration of the call. In contrast, IP networks treat every individual IP datagram as a totally unique entity, each of which is free to travel across whatever route is most suitable at that moment.

For example, if a user were to retrieve a document from a remote web server, the server would probably need to generate several IP datagrams in order to return the requested material. Each of these datagrams is considered to be a unique and separate entity, totally unrelated to the datagrams sent before or after.

Each of these datagrams may take whatever path is deemed most appropriate by the routers that are forwarding them along. Whereas the first datagram sent from the web server to the requesting client may travel across an underground fiber-optic cable, the second datagram may be sent across a satellite link, while a third may travel over a conventional network. This concept is illustrated in Figure 2-6.

Figure 2-6. Every IP datagram is an individual entity and may take a different route

These routing decisions are made by the routers in between the source and destination systems. As the network changes, the routers that are moving datagrams around will have to adapt to the changing environment. Many things can cause the network to change: network cables can be ripped up, or downstream routers can become too busy to service a request, or any number of other events can happen to cause a route to become unavailable.

A result of this independence is that datagrams may arrive at their destination out of sequence, since one of them may have gone over a fast network, while another may have been sent over a slow network. In addition, sometimes datagrams get duplicated, causing multiple copies of the same packet to arrive at the destination system.

This architecture is purposefully designed into IP: one of the original design goals for the Internet Protocol was for it to be able to survive large-scale network outages in case of severe damage caused during war-time. By allowing each datagram to travel along the most-available path, every datagram's chances of survival increases dramatically. IP does not care if some of them happen to arrive out of sequence, get lost in transit, or even arrive multiple times; its job is to move the datagram, not to keep track of it. Higher-level protocols deal with any problems that result from these events.

Furthermore, by treating every datagram as an individual entity, the network itself is relieved of the responsibility of having to track every connection. This means that the devices on the network can focus on moving datagrams along, and do not have to watch for the beginning and end of every web browser's session. This feature allows overall performance to be as high as the hardware will allow, with as little memory and CPU requirements as possible.

Housekeeping and Maintenance

Every system that receives a packet--whether the system is the final destination or a router along the delivery path--will inspect it. If the packet has become corrupt or has experienced some other form of temporary failure, then the packet will be destroyed right then and there. Whenever one of these transient errors occurs, the datagram is destroyed rather than being forwarded on.

However, if a problem occurs that is semi-permanent--for example, if the current device does not have a routing table entry for the destination network, or if the packet does not meet certain criteria for forwarding across the next-hop network--then IP may call upon the Internet Control Message Protocol (ICMP) to return an error message back to the original sender, informing them of the failure. Although the datagram will still be destroyed by the last-hop device, it will also inform the sender of the problem, thereby allowing it to correct whatever condition was causing the failure to occur.

This distinction between transient and semi-permanent failures is important. Transient errors are caused by no fault of the sender (such as can happen when the Time-to-Live timer expires, or a checksum is miscalculated), while semi-permanent failures are problems with the packet or network that will always prevent delivery from occurring over this path. In the latter case, it is best either to inform the sender of the problem so that it can take whatever corrective actions are required, or to notify the application that tried to send the data of the problem.

Chapter 5, The Internet Control Message Protocol, discusses the error messages that are generated by ICMP whenever a semi-permanent problem is encountered. However, the remainder of this section also discusses some of the transient problems that may occur with IP delivery in particular.

Header checksums

Part of this integrity-checking service is handled through the use of a checksum applied against the IP datagram's header (but not against the data inside of the IP datagram). Every device that receives an IP datagram must examine the IP header and compare that information with the value stored in the header's checksum field. If the values do not match, then the datagram is assumed to be corrupt and is discarded immediately.

The data portion of the IP datagram is not verified, for three reasons. First of all, a device would have to examine the entire datagram to verify the contents. This process would require additional CPU processing time, which is more often than not going to be a waste of time.

Second, the data portion of an IP datagram always consists of a higher-level datagram, such as those generated by TCP and UDP. Since these protocols provide their own error-checking routines, the recipient system will have to conduct this verification effort anyway. The theory is that datagrams will move faster if routers do not have to verify their contents, a task which will be handled by the destination system anyway.

Finally, some application protocols are capable of working with partially corrupt data. In those cases, IP would actually be performing a disservice if it were to throw away datagrams with invalid checksums, since the application protocol would never get it. Granted, most applications do not work this way, but most applications will also utilize some form of error-correction service to keep this from becoming a problem.

Time-to-Live

Another validation service provided by IP is checking to see if a datagram has outlived its usefulness. This is achieved through a Time-to-Live field provided in the IP datagram's header. When a system generates an IP packet, it stores a value in the Time-to-Live header field. Every system that forwards the packet decreases the value of the Time-to-Live field by one, before sending the datagram on. If the Time-to-Live value reaches zero before the datagram gets to its final destination, then the packet is destroyed.

The purpose of the Time-to-Live field is to keep datagrams that are caught in an undeliverable loop from tying up network resources. Let's assume that a pair of routers both have bad information in their routing table, with each system pointing to the other for final delivery. In this environment, a packet would be sent from one router to the other, which would then return the packet, with this process repeating forever. Meanwhile, more packets may be introduced to this network from external devices, and after a while, the network could become saturated.

But by using a Time-to-Live field, each of these routers would decrement the value by one every time it forwarded a packet. Eventually the Time-to-Live value would reach zero, allowing the datagram to be destroyed. This safeguard prevents routing loops from causing network meltdowns.

The strict definition of the Time-to-Live field states that the value is a measure of time in seconds, or any forwarding act that took less than one second to perform. However, there are very few Internet routers that require a full second to perform forwarding, so this definition is somewhat misrepresentative. In actual practice, the Time-to-Live value is decremented for every hop, regardless of the actual time required to forward a datagram from one network segment to another.

It is also important to note that an ICMP failure-notification message gets sent back to the original sender when the Time-to-Live value reaches zero. For more information on this error message, refer to "Time Exceeded" in Chapter 5.

The default value for the Time-to-Live field should be set to 64 according to the Assigned Numbers registry (http://www.iana.org/ ). In addition, some of the higher-layer protocols also have default Time-to-Live values that they are supposed to use (such as 64 for TCP, and 1 for IGMP). These values are really only suggestions, however, and different implementations use different values, with some systems setting the Time-to-Live on all outgoing IP datagrams as high as 255.

Fragmentation and Reassembly

Every network has certain characteristics that are specific to the medium in use on that network. One of the most important characteristics is the maximum amount of data that a network can carry in a single frame (called the Maximum Transmission Unit, or "MTU"). For example, Ethernet can pass only 1500 bytes in a single frame, while the typical MTU for 16-megabit Token Ring is 17,914 bytes per frame.

RFC 791 specifies that the maximum allowed MTU size is 65,535 bytes, and that the minimum allowed MTU size is 68 bytes. No network should advertise or attempt to use a value that is greater or lesser than either of those values. Several RFCs define the specific default MTU values that are to be used with different networking topologies. Table 2-5 lists the common MTU sizes for the most-common media types, and also lists the RFCs (or other sources) that define the default MTU sizes for those topologies.

Table 2-5: Common MTU Sizes and the Related RFCs

Topology

MTU (in bytes)

Defined By

Hyperchannel

65,535

RFC 1374

16 MB/s Token Ring

17,914

IBM

802.4 Token Bus

8,166

RFC 1042

4 MBs Token Ring

4,464

RFC 1042

FDDI

4,352

RFC 1390

DIX Ethernet

1,500

RFC 894

Point-to-Point Protocol (PPP)

1,500

RFC 1548

802.3 Ethernet

1,492

RFC 1042

Serial-Line IP (SLIP)

1,006

RFC 1055

X.25 & ISDN

576

RFC 1356

ARCnet

508

RFC 1051

Since an IP datagram can be forwarded across any route available, every IP packet that gets generated by a forwarding device has to fit the packet within the available MTU space of the underlying medium used on the transient network. If you're on an Ethernet network, then IP packets have to be 1500 bytes or smaller in order for them to be carried across that network as discrete entities, regardless of the size of the original datagram.

There are really two concepts at work here: the size of the original IP datagram and the size of the packets that are used to relay the datagram from the source to the destination. If the datagram is too large for the sending system's local MTU, then that system has to fragment the datagram into multiple packets for local delivery to occur. In addition, if any of those IP packets are too large to cross another network segment somewhere between the sender and final recipient, then the packets must be fragmented by that router as well, allowing them to be sent across that network.

On an isolated network, size rarely matters since all of the systems on that network will share the same maximum frame size (a server and a client can both use at most 1500-byte datagrams, if both of them are on the same Ethernet segment). However, once you begin to mix different network media together, size becomes very important.

For example, suppose that a web server were on a Token Ring network that used 4,464-byte packets, while the end users were on a separate Ethernet segment that used 1500-byte packets. The TCP/IP software on the server would generate IP datagrams (and packets) that were 4,464 bytes long (according to the MTU characteristics of the local network), but in order for the IP datagrams to get to the client, the router in between these two segments would have to fragment the large packets into smaller packets that were small enough to move over the Ethernet network, as illustrated in Figure 2-7.

Figure 2-7. One 4,464-byte packet being split into four 1500-byte packets

During the act of fragmentation, the router will do several things. First of all, it will examine the size of the data that is stored in the original packet, and then it will create as many fragments as are needed to move the original packet's data across the smaller segment. In the example shown in Figure 2-7, a single 4,464-byte IP packet would require four IP packets in order to travel across the 1500-byte Ethernet (the mathematics behind this process will be explained in a moment).

TIP: In this example, the destination host may not be able to reassemble the original datagram, since the datagram is larger than the MTU of the local Ethernet connection. RFC 1122 states that hosts must be able to reassemble datagrams of at least 576 bytes, and should be able to reassemble datagrams that are "greater than or equal to the MTU of the connected network(s)." In this case, the local MTU is 1500 bytes, although the original datagram was four kilobytes, so it is possible that the destination system would be unable to reassemble the original datagram. Although most systems do not have problems with this, it should not come as a surprise if a wireless hand-held device cannot reassemble 65 KB datagrams sent from high-speed servers.

When the original 4,464-byte packet was fragmented, the headers of each of the new 1500-byte IP packets would be given whatever information was found in the original packet's header, including the source and the destination IP addresses, the Time-to-Live value, the Type-of-Service flags, and so on.

With regards to fragmentation in particular, the most important of these fields is the Fragmentation Identifier field, which is used to mark each of the fragments as belonging to the same original IP datagram. The Fragmentation Identifier field is really more of a Datagram Identifier, and is a 16-bit "serial number" that gets generated by the sending system whenever a datagram gets created. Whenever a packet gets fragmented, all of the resulting fragments use the original datagram's Fragmentation Identifier, and the destination system uses this information to collect all of the fragments together, and then reassemble the original datagram into its original form.

In addition, two fields within each of the fragments' IP headers will also be set, to reflect the fact that fragmentation has occurred. The fields that get set are the Fragmentation Offset and a Fragment Flags field (the latter is used to provide ordering and reassembly clues to the destination system).

Fragmentation Offset

This field is used to indicate the byte-range of the original datagram that a specific fragment provides. However, only the starting position of the byte-range is provided in this field (the remainder of the packet is assumed to contain the rest of that fragment). This starting position is stored in terms of eight-byte (64-bit) blocks of data. The Fragmentation Offset identifier allows the receiving system to re-order the fragments into their proper sequence once all of the fragments have arrived.

Fragment Flags

This field provides clues as to the current fragmentation status (if any). There are three one-bit flags, although only the last two are currently used. The first bit is reserved for future use and must always be set to 0. The second bit indicates whether or not fragmentation is allowed (0 means fragmentation is allowed and 1 means do not fragment). The third and final bit is used to indicate whether a current fragment is the last (0), or if more fragments will follow this one (1).

In addition to these changes, the Total Packet Length field for each of the newly minted IP packets also gets set according to the size of the fragments (rather than the size of the original datagram).

The resulting IP packets are then sent over the Internet as independent entities, just as if they had originally been created that way. Fragments are not reassembled until they reach the destination system. Once they reach the final destination, however, they are reassembled by the IP software running on the destination system, where they are combined back into their original datagram form. Once the original datagram has been reassembled, the IP datagram's data is forwarded to the appropriate transport protocol for subsequent processing.

There are a few rules that you must remember when trying to understand how IP fragments get created:

Fragmentation only occurs on the data portion of a packet.

Packet headers are not included in the fragmentation process. If the original datagram is 4,464 bytes long, then at least 20 bytes of that datagram are being used to store header information, meaning that the data portion is 4,444 bytes long. This 4,444 bytes is what will get fragmented.

Each new fragment results in a new packet that requires its own IP headers, which consume at least 20 bytes in each new packet generated for a fragment. The IP software must take this factor into consideration when it determines the maximum amount of payload data that can be accommodated in each fragment, and thus the number of fragments that will be required for a particular MTU.

Fragmentation must occur on an eight-byte boundary. If a datagram contains 256 bytes of data, but only 250 bytes can fit into a fragment, then the first fragment contains only 248 bytes of data (248 is the largest number divisible by eight that's less than 250). The remaining 8 bytes (256 - 248 = 8) will be sent in the next fragment.

The Fragmentation Offset field is used to indicate which parts of the original datagram are in each fragment, by storing the byte count in quantities of eight-byte blocks. Rather than indicating that the starting position for a fragment's data is "248 bytes," the Fragmentation Offset field will show "31 blocks" (248 / 8 = 31). Also, note that the block count starts with 0 and not 1. This means that the 32nd block will be numbered 31 instead of 32.

As shown in Figure 2-7, in order for the original 4,464-byte IP datagram to be sent across the Ethernet network segment, four IP fragments will have to be created. Each of the new packets will contain an IP header (copied from the original datagram's header), plus however much data they could carry (although the quantity has to be divisible by eight). The result is four unique fragments, as shown in Figure 2-8.

The relevant fields from the original IP packet are shown in Table 2-6.

Table 2-6: Headers from the Original 4,464-byte Packet

Fragment

Fragment

Identifier

Reserved

Flag

May Fragment

Flag

More Fragment

Flags

Fragment

Offset

Packet

Length

1

321

0

0

0

0

4,464

Figure 2-8. The mathematics of datagram fragmentation

After converting the single 4,464-byte IP packet into four 1500-byte IP fragments, the headers of each fragment will appear as shown in Table 2-7.

Table 2-7: Headers from Four 1500-byte Fragments

Fragment

Fragment

Identifier

Reserved

Flag

May Fragment

Flag

More Fragment

Flags

Fragment

Offset

Packet

Length

1

321

0

0

1

0

1,500

2

321

0

0

1

185

1,500

3

321

0

0

1

370

1,500

4

321

0

0

0

555

24

Each of the fragments contains the following header information:

Each fragment belongs to the same original datagram, so each of them share the same "serial number" in the Fragmentation Identifier field (321 in this case).

The first bit in the 3-bit Flags field is reserved, and must be marked 0.

Each packet may be fragmented further, so the "May Fragment" flags are marked 0.

The "More Fragments" flag is used to indicate if more fragments are following after this fragment. Since the first three fragments all have another fragment coming behind them, they all have the More Fragments flag marked 1, while the last fragment identifies the end of the set by having a 0 in this field.

Since the first fragment marks the beginning of the original data, the Fragment Offset field starts at 0. Since the first fragment held 1,480 bytes of data, the second fragment would have its Fragmentation Offset field set to 185 (1480 / 8 = 185). The second fragment was also able to store 1,480 bytes, so the Fragment Offset flag for the third packet will be set to 370 ((1480 × 2) / 8 = 370). The third fragment was also able to hold 1,480 bytes, so the fourth fragment's Fragment Offset flag will be set to 555 ((1480 × 3) / 8 = 555).

In addition, each new IP packet created during the fragmentation process will also have its Total Packet Length field set to the size of the resulting IP packets, rather than set to the size of the original IP datagram.

In order for the destination system to reassemble the datagram, it must read the fragmentation-specific headers in each of the fragments as they arrive and order them into their correct sequence (as indicated by the Fragment Offset field). Since each fragment may arrive out of sequence (due to a slower link, a down segment, or whatever), the destination system has to store each fragment in memory until all of them have arrived before they can be rearranged and the data processed.

Once all of the segments have been received, the system will examine their headers and find the fragment whose Fragment Offset is 0. The IP software will then read the data portion of the IP packet containing that fragment, recording the number of eight-byte blocks that it finds. Then it will locate the fragment that shows the Fragment Offset needed to continue reading the data, and then read that fragment's data into memory. This process will continue until all of the data has been read from all of the packets. Once a packet has been read that has the "More Fragments" flag set to 0--and if each of the Fragment Offset fields matches up without leaving any holes in the final datagram--then the process is complete.

If all of the fragments do not arrive within the predefined time (normally 60 seconds on most Unix-like systems), then all of the fragments will be destroyed, and an error message will be sent to the original sender, using the ICMP "Time Exceeded" error message. For more information on this error message, refer to "Time Exceeded" in Chapter 5.

This process can get fairly tricky, and it may seem like an awful lot of overhead. However, there are many benefits offered by fragmentation. First and foremost, fragmentation allows IP to use whatever packet sizes are required by the underlying medium. Furthermore, any traffic that is local to your own network probably won't require fragmentation, so you can use large packets on your local network. If IP were forced to use a lowest-common-denominator approach of very small packets for all data, then local performance would always be miserable. But by using a flexible MTU size, the local network can run at full speed, with fragmentation only occurring whenever large datagrams must leave the local network.

TIP: RFC 791 states that all systems must be able to send an IP datagram of at least 576 bytes. Indeed, many of the early IP routers required that IP datagrams be cut into 576-byte fragments if they were to be forwarded over a different media (regardless of that media's MTU capacity).

In addition, there are some techniques that can be used by a sending system to determine the most efficient segment size when sending data to a remote network, thereby preventing fragmentation from occurring. TCP connections use a "Maximum Segment Size" header option that can be used to determine the MTU of the remote network, and most IP systems implement a technology called "Path MTU Discovery" that allows them to detect the largest available MTU on the end-to-end connection. For more information on the Maximum Segment Size option, refer to "Maximum Segment Size" in Chapter 7, The Transmission Control Protocol. For more information on Path MTU Discovery, refer to "Notes on Path MTU Discovery" in Chapter 5.

Prioritization and Service-Based Routing

One of the key differences between IP and other networking protocols is that IP offers direct support for prioritization, allowing network hosts and routers to send important packets before less important packets. This feature is particularly crucial with applications that are sensitive to high levels of delay resulting from network congestion.

For example, assume that an organization has two high-speed networks that are interconnected by a relatively slow wide area network (WAN), and that a lot of data has to cross the WAN frequently. In this example, the routers could forward data across the WAN only at whatever rate was allowed by the WAN itself. If the WAN were fixed at a maximum throughput of 256 KB/s, then the routers on the WAN could only send 262,144 bits across the WAN in a single second. This may be plenty of bandwidth for a few terminal emulation sessions--or even for a couple of simultaneous database updates--but it would not be enough for several simultaneous streaming video feeds in conjunction with those other applications.

The problem is that the routers just wouldn't be able to forward enough data across the WAN for all of the applications to work smoothly. The routers would have to start dropping packets once their buffers began filling up or as the queuing delays exceeded the maximum Time-to-Live values on some of the packets. UDP-based applications may not care much about these dropped packets, but TCP-based applications care very much about lost packets. They would attempt to resend any data that had not yet been acknowledged, and if congestion was sustained for a long period of time, then those applications would eventually just timeout.

This may not matter with some applications, but it would be a very big deal with some others, particularly those that are crucial to the operation of the business itself. For example, if users were unable to enter sales orders into a remote database, the problem would be somewhat greater than if they were unable to access a recreational video.

In order to ensure that congestion doesn't break the mission-critical applications on your network, IP supports two key concepts: prioritization and type-of-service handling. Every IP datagram has an 8-bit field (called the "TOS byte") that consists of a three-bit precedence field used for prioritization and a four-bit field that indicates specific handling characters desired for a datagram (the last bit is currently unused).

By using three bits for precedence, IP has eight levels of prioritization (0 through 7), which provide eight distinct priority levels to all IP traffic. Table 2-8 lists the values of the Precedence field and their meaning as defined in RFC 791, with the highest priority level being 7 and the lowest being 0.

Table 2-8: The Precedence Flags and Their Meaning.

Precedence

Definition

0

Routine (normal)

1

Priority

2

Immediate

3

Flash

4

Flash Override

5

Critical

6

Internetwork Control

7

Network Control

Using these priority values, you could assign database applications a higher priority level than the streaming video traffic. The routers would then sift through data that was waiting in the queue, sending the higher priority traffic before sending the lower priority traffic. In this model, the database traffic would be sent out first, while the streaming video traffic would be forced to wait until bandwidth was available. Your mission-critical applications would continue to function smoothly, while the less-critical applications would take a back seat, possibly suffering dramatic performance losses.

The remaining four bits of the TOS byte provide administrators with the ability to implement per-datagram routing based on the characteristics of the datagram's data. Thus, an IP datagram that contains Usenet news traffic can be marked as desiring a "low-cost" service, while Telnet traffic can be marked as desiring a "low-latency" service.

Originally, there were only three types of service defined in RFC 791. These services were identified with unique bits that were either on or off, depending on whether or not the specific type of service was desired. However, this interpretation was modified by RFC 1349, which added a fourth service class, and which also stated that the bits were to be interpreted as numeric values rather than independent flags. By making them numeric, the four bits provided for a maximum of sixteen possible values (0 through 15), rather than four distinct options (although the values cannot be combined and must be used independently).

There are a number of predefined Type-of-Service values that are registered with the Internet Assigned Numbers Authority (IANA). Some of the more common registered values are shown in Table 2-9.

For a detailed listing of all of the Type-of-Service values that are currently registered, refer to the IANA's online registry (accessible at http://www.isi.edu/in-notes/iana/assignments/ip-parameters).

Table 2-9: Type-of-Service Values and Their Meaning

Value

Service

Description

0

Normal

When all of the Type-of-Service flags are off, the IP datagram is to be treated as a normal datagram, and is not to be given any special handling. Almost all IP datagrams are marked with all zeroes in the Type-of-Service field.

1

Minimize Delay

The Delay flag is used to request that IP route this packet over a network that provides lower latency than normal. This may be useful for an application such as Telnet, where the user would want to see their keystrokes echoed back to them quickly. The Delay flag may be set to either 0 (normal) or 1 (low delay).

2

Maximize Throughput

The Throughput flag is used to request that IP route this packet over a network that provides higher throughput than normal. This may be useful for an application such as FTP, where the user would want to download a lot of data very quickly. The Throughput flag may be set to 0 (normal) or 1 (high throughput).

4

Maximize Reliability

The Reliability flag is used to request that IP route this packet over a network that provides the most reliable service (perhaps as indicated by overall up-time, or by the number of secondary routes). This may be useful for an application such as NFS, where the user would want to be able to open a database on a remote server without worrying about a network failure. The Reliability flag may be set to 0 (normal) or 1 (high reliability).

8

Minimize Cost

The Cost flag was added by RFC 1349 and was not defined in RFC 791. For this reason, many systems do not recognize or use it. The Cost flag is used to request that IP route this packet over the least expensive route available. This may be useful for an application such as NNTP news, where the user would not need data very quickly. The Cost flag may be set to 0 (normal) or 1 (low cost).

15

Maximize Security

RFC 1455--an experimental specification for data-link layer security--states that this flag is used to request that IP route this packet over the most secure path possible. This may be useful with applications that exchange sensitive data over the open Internet. Since RFC 1455 is experimental, most vendors do not support this setting.

In addition, the IANA's online registry also defines a variety of default Type-of-Service values that specific types of applications should use. Some of the more common application protocols and their suggested Type-of-Service values are shown in Table 2-10. For a detailed listing of all of the suggested default Type-of-Service values, refer to the IANA's online registry (accessible at http://www.isi.edu/in-notes/iana/assignments/ip-parameters).

Table 2-10: Suggested Type-of-Service Values for Common Application Protocols

Application Protocol

Suggested TOS Value

Telnet

8

FTP Control Channel

8

FTP Data Channel

4

Trivial FTP

8

SMTP Commands

8

SMTP Data

4

DNS UDP Query

8

DNS TCP Query

0

DNS Zone Transfer

4

NNTP

1

ICMP Error Messages

0

SNMP

2

It is important to note that not all of the TCP/IP products on the market today use these values. Indeed, many implementations do not even offer any mechanisms for setting these values, and will not treat packets that are flagged with these values any differently than packets that are marked for "normal" delivery. However, most of the Unix variants on the market today (including Linux, BSD, and Digital Unix) do support these values, and set the appropriate suggested default values for each of the major applications.

Administrators that have complex networks with multiple routing paths can use these type of service flags in conjunction with TOS-aware routers to provide deterministic routing services across their network. For example, an administrator might wish to send low-latency datagrams through a terrestial fiber-optic connection rather than through a satellite link. Conversely, an administrator might wish to send a low-cost datagram through a slower (but fixed-cost) connection, rather than take up bandwidth on a satellite connection.

By combining the type of service flags with the prioritization bits, it is possible to dictate very explicit types of behavior with certain types of data. For example, you could define network filters that mark all Lotus Notes packets as medium priority and tag them with the low-latency TOS flag. This would not only provide your Notes users with preferential service over less-critical traffic, but it would also cause that traffic to be routed over faster network segments. Conversely, you could also define another set of filters that marked all streaming video traffic as lower priority and also enable the high-bandwidth TOS flag, forcing that traffic to use a more appropriate route.

As long as you own the end-to-end connection between the source and destination systems, you can pretty much do whatever you want with these flags, and you should be able to queue and route those datagrams according to the flags that you set. Keep in mind, however, that most ISPs will not treat these datagrams any different than unmarked datagrams (otherwise, you'd mark all of your packets with the high-priority and minimize-latency flags). Indeed, if you need a certain type of service from an ISP, then you will mostly likely end up paying for a dedicated link between your site and the destination network, since you will not be able to have your datagrams prioritized over other customer's packets across the ISP's backbone.

The IP Header

IP datagrams consist of two basic components: an IP header that dictates how the datagram is treated and a body part that contains whatever data is being passed between the source and destination systems.

An IP datagram is made up of at least thirteen fields, with twelve fields being used for the IP header, and one field being used for data. In addition, there are also a variety of supplemental fields that may show up as "options" in the header. The total size of the datagram will vary according to the size of the data and the options in use.

Table 2-11 lists all of the mandatory fields in an IP header, along with their size (in bits) and some usage notes. For more detailed descriptions of these fields, refer to the individual sections throughout this chapter.

Table 2-11: The Fields in an IP Datagram

Field

Bits

Usage Notes

Version

4

Identifies the version of IP used to create the datagram. Every device that touches this datagram must support the version shown in this field. Most TCP/IP products use IP v4. NOTE: This book only covers IP v4.

Header Length

4

Specifies the length of the IP header in 32-bit multiples. Since almost all IP headers are 20 bytes long, the value of this field is almost always 5 (5 × 32 = 160 bits, or 20 bytes).

Type-of-Service Flags

8

Provide a prioritization service to applications, hosts, and routers on the Internet. By setting the appropriate flags in this field, an application could request that the datagram be given higher priority than others waiting to be processed.

Total Packet Length

16

Specifies the length of the entire IP packet, including both the header and the body parts, in bytes.

Fragment Identifier

16

Identifies a datagram, useful for combining fragments back together when fragmentation has occurred.

Fragmentation Flags

3

Identifies certain aspects of any fragmentation that may have occurred, and also provides fragmentation control services, such as instructing a router not to fragment a packet.

Fragmentation
Offset

13

Indicates the byte-range of the original IP datagram that this fragment provides, as measured in eight-byte offsets.

Time-to-Live

8

Specifies the remaining number of hops a datagram can take before it must be considered undeliverable and be destroyed.

Protocol Identifier

8

Identifies the higher-layer protocol stored within the IP datagram's body.

Header Checksum

16

Used to store a checksum of the IP header.

Source IP Address

32

Used to store the 32-bit IP address of the host that originally sent this datagram.

Destination IP Address

32

Used to store the 32-bit IP address of the final destination for this datagram.

Options (optional)

varies

Just as IP provides some prioritization services with the Type-of-Service flags, additional special-handling options can also be defined using the Options field. Special-
handling options include Source Routing, Timestamp, and others. These options are rarely used, and are the only thing that can cause an IP header to exceed 20 bytes in length.

Padding (if required)

varies

An IP datagram's header must be a multiple of 32 bits long. If any options have been introduced to the header, the header must be padded so that it is divisible by 32 bits.

Data

varies

The data portion of the IP packet. Normally, this would contain a complete TCP or UDP message, although it could also be a fragment of another IP datagram.

As can be seen, the minimum size of an IP header is 20 bytes. If any options are defined, then the header's size will increase (up to a maximum of 60 bytes). RFC 791 states that a header must be divisible by 32 bits, so if an option has been defined, but it only uses eight bits, then another 24 zero-bits must be added to the header using the Padding field, thereby making the header divisible by 32.

Figure 2-9 shows an IP packet containing an ICMP Echo Request Query Message, sent from Ferret to Bacteria. It does not show any advanced features whatsoever.

Figure 2-9. A simple IP packet

The following sections discuss the individual fields in detail.

Version

Identifies the version of IP that was used to create the datagram. Most TCP/IP products currently use IP v4, although IP v6 is gaining acceptance. NOTE: This book only covers IP v4.

Size

Four bits.

Notes

Since the datagram may be sent over a variety of different devices on the way to its final destination, all of the intermediary systems (as well as the destination) must support the same version of IP as the one used to create the datagram in the first place. As features are added, removed or modified from IP, the datagram header structures will change. By using the Version field, these changes can be made without having to worry about how the different systems in use will react. Without the Version field, there would be no way to identify changes to the basic protocol structure, which would result in a frozen specification that could never be changed.

Almost all TCP/IP products currently use IP v4, which is the latest "standard" version. However, a new version, IP v6, is rapidly gaining supporters and acceptance in the Internet community. It should also be pointed out that IP v4 is the first "real" version of IP, since prior versions were only drafts that were not widely deployed. NOTE: This book only covers IP v4.

Capture Sample

In the capture shown in Figure 2-10, the Version field is set to 4, indicating that this packet contains an IP v4 datagram.

Figure 2-10. The Version field

Header Length

Specifies the size of the IP header, in 32-bit multiples.

Size

Four bits.

Notes

The primary purpose of this field is to inform a system where the data portion of the IP packet starts. Due to space constraints, the value of this field uses 32-bit multiples. Thus, 20 bytes is the same as 160 bits, which would be shown here as 5 (5 × 32 = 160). Since each of the header's mandatory fields are fixed in size, the smallest this value can be is 5.

If all of the bits in this field were "on," the maximum value would be 15. Thus, an IP header can be no larger than 60 bytes (15 × 32 bits = 480 bits = 60 bytes).

Capture Sample

In the capture shown in Figure 2-11, the Header Length field is set to 5, indicating that this packet has 20-byte header (20 bytes / 32 bits = 5), which is the default size when no options are defined.

Figure 2-11. The Header Length field

See Also

"IP Options"

"Padding"

"Total Packet Length"

Type-of-Service Flags

Provides prioritization capabilities to the IP datagrams, which are then acted upon by the applications, hosts, and routers that can take advantage of them. By setting these fields appropriately, an application could request that the datagrams it generates get preferential service over other datagrams waiting to get processed.

Size

Eight bits.

Notes

Although the Type-of-Service flags have been available since IP v4 was first published, there are only a handful of applications that actually use them today. Furthermore, only a few IP software packages and routers support them, making their use by applications somewhat moot. However, as more multimedia applications and services are being deployed across the Internet, the use of Type-of-Service flags has increased dramatically, and should continue to do so.

Effectively, the Type-of-Service field is divided into two separate groups of flags. The first three bits are used to define Precedence, while the remaining five bits are used to define specific Type-of-Service options.

The Precedence flags are used to determine a datagram's priority over other datagrams waiting to be processed by a host or router. The Precedence flag uses three bits, allowing it to be set from 0 (normal) to 7 (highest priority). Table 2-8 earlier in this chapter shows the precedence values and their meanings, as defined in RFC 791.

The next four bits are used to indicate various other Type-of-Service options. In RFC 791, only three bits were used to define Type-of-Service handling characteristics. However, the usage and implementation of these bits has been redefined in RFC 1349, with four bits being used to represent a numeric value ranging from 0 (normal datagrams) to 15 (highly secure path requested). The currently-defined values for these flags and their meanings are listed back in Table 2-9.

The last bit from this byte is currently unused and must be zero (0). RFC 791 states that the last two bits are unused, although RFC 1349 added the Minimize Cost Type-of-Service flag, which used up one of them.

Capture Sample

In the capture shown in Figure 2-12, no precedence or special-handling flags have been defined. Also note that Surveyor does not show the Minimize Cost flag, and most products don't understand it.

Figure 2-12. The Type-of-Service flags

See Also

"Prioritization and Service-Based Routing"

"IP Options"

"Notes on Precedence and Type-of-Service"

Total Packet Length

Specifies the length of the entire IP packet, including both the header and data segments, in bytes.

Size

Sixteen bits.

Notes

The primary purpose of this field is to inform a system of where the packet ends. A system can also use this field to determine where the data portion of the packet ends, by subtracting the Header Length from the Total Packet Length.

The latter service is especially useful when fragmentation has occurred. Whenever a fragment indicates that another packet is following (set with the "More Fragments" flag), the system will add the value provided in the current fragment's Fragmentation Offset field to the length of the current fragment's data segment. The resulting value will then be used to determine which fragment should be read next (discovered by examining the values stored in the Fragmentation Offset field of the remaining associated fragments). By combining the Fragmentation Offset and Total Packet Length fields from each of the fragments that are received, the recipient can determine if there are any holes in the original datagram that need to be filled before it can be processed.

The minimum size of an IP packet is 21 bytes (20 bytes for the header, and 1 byte of data). The maximum size is 65,535 bytes.

Capture Sample

In the capture shown in Figure 2-13, the Total Packet Length is set to 60 bytes. Twenty of those bytes are used by the IP header, meaning that 40 bytes are used for data.

Figure 2-13. The Total Length field

See Also

"Header Length"

"Fragmentation Offset"

"Fragmentation and Reassembly"

Fragmentation Identifier

A pseudo serial number that identifies the original IP datagram that fragments are associated with.

Size

Sixteen bits.

Notes

Every datagram that gets generated has a 16-bit "serial number" that identifies the datagram to the sending and receiving systems. Although this field is actually a "datagram identifier" of sorts, it is not guaranteed to be unique at all times (16 bits isn't very large), and is really only useful for identifying the datagram that incoming fragments belong to.

When fragmentation occurs, the various fragments are sent as separate IP packets by the fragmenting system, and treated as such until they reach their final destination. The fragments will not be reassembled until they reach their final destination. Once there, however, the destination system must reassemble the fragments into the original IP datagram, and the Fragmentation Identifier field is used for this purpose.

Since this field is only 16 bits long, it does not provide a permanently unique serial number, and over time many packets may arrive with the same Fragmentation Identifier, even though those packets have never been fragmented. For this reason, the receiving system must not use this field to determine whether or not fragmentation has occurred (the Fragmentation Flags must be used for this purpose). Instead, the system must use this field only to collect fragments together when the Fragmentation Flags indicate that fragmentation has occurred somewhere upstream.

Capture Sample

In the capture shown in Figure 2-14, the Fragmentation Identifier (or Datagram Identifier, or Packet Identifier) is shown as 15966.

Figure 2-14. The Fragmentation Identifier field

See Also

"Total Packet Length"

"Fragmentation Flags"

"Fragmentation and Reassembly"

Fragmentation Flags

Identifies certain aspects of any fragmentation that may have occurred. The flags also provide fragmentation control services, such as instructing a router not to fragment a packet.

Size

Three bits.

Notes

There are three bits available in the Fragmentation Flags field. The first bit is currently unused, and must be marked 0. The remaining two bits are used as follows:

May Fragment. The May Fragment flag is used to indicate whether or not an IP router may fragment this IP packet. An application may choose to prevent a datagram from becoming fragmented for any number of reasons. It is important to realize, however, that if an IP router cannot fragment a datagram that is too large to travel over a particular network segment, then the router will destroy the IP datagram. The May Fragment flag can be set to 0 ("may fragment," the preferred default) or 1 ("do not fragment").

More Fragments. The More Fragments flag is used to indicate whether or not there are any other fragments associated with the original datagram. The More Fragments flag can be set to 0 ("last fragment," the default) or 1 ("more fragments are coming"). If an IP datagram has not been fragmented, this flag is set to 0.

Capture Sample

In the capture shown in Figure 2-15, the More Fragments flag is set to 0, indicating that this packet has not been fragmented.

Figure 2-15. The Fragmentation Flags field

See Also

"Total Packet Length"

"Fragmentation Identifier"

"Fragmentation and Reassembly"

Fragmentation Offset

Indicates the starting byte position of the original IP datagram's data that this fragment provides, in 8-byte multiples.

Size

Thirteen bits.

Notes

The first fragment's Fragmentation Offset will always be set to 0, indicating that the fragment contains the first byte of the original datagram's data.

The Fragmentation Offset field is used by the final destination system to figure out which fragment goes where in the reassembly process. Since there are no fields that provide a "fragment sequence number," the destination system must use this field in conjunction with the Total Packet Length field and the More Fragments flag.

For example, let's assume that an IP datagram's data has been split into two 64-byte fragments. The first fragment's IP header will show a Fragmentation Offset of 0, indicating that it contains the first few bytes of the original IP datagram's data. After subtracting the value of the Header Length field from the Total Packet Length, the IP software will be able to determine that the fragment's data is 64 bytes long. In addition, the More Fragments flag will be set to 1, indicating that more fragments are coming.

The next fragment will then show a Fragmentation Offset of 64 bytes, although this will be provided in an 8-byte multiple so the Fragmentation Offset field would actually show the value of 8. After subtracting the Header Size value from the Total Packet Size value, the IP software will determine that the fragment's data is 64 bytes long. Finally, the More Fragments flag will be set to 0, indicating that this fragment is the last.

By using all of these fields and flags together, the IP software is able to reassemble datagrams in their correct order.

Note that if an IP datagram has not been fragmented, the Fragmentation Offset field should be set to 0, and the More Fragments flag should also be set to 0, indicating that this packet is both the first and the last fragment.

Capture Sample

In the capture shown in Figure 2-16, the Fragmentation Offset field is set to 0 (the first byte of data).

Figure 2-16. The Fragmentation Offset field

See Also

"Total Packet Length"

"Fragmentation Flags"

"Fragmentation and Reassembly"

Time-to-Live

Specifies the maximum number of hops that a datagram can take before it must be considered undeliverable and destroyed.

Size

Eight bits.

Notes

When a source system generates an IP datagram, it places a value between 1 and 255 in the Time-to-Live field. Every time a router forwards the packet, it decreases this value by one. If this value reaches zero before the datagram has reached its final destination, the packet is considered to be undeliverable and is immediately destroyed.

Since this is an 8-bit field, the minimum (functional) value is 1 and the maximum is 255. The value of this field varies by its usage and the specific implementation. For example, RFC 793 (the document that defines TCP) states that the Time-to-Live value should be set at 60, while some applications will set this field to values as high as 128 or 255.

Capture Sample

In the capture shown in Figure 2-17, the Time-to-Live field is set to 32 (which would mean either "32 hops" or "32 seconds").

Figure 2-17. The Time-to-Live field

See Also

"Housekeeping and Maintenance"

Protocol Identifier

Identifies the type of higher-level protocol that is embedded within the IP datagram's data.

Size

Eight bits.

Notes

Remember that IP works only to move datagrams from one host to another, one network at a time. It does not provide much in the way of services to higher-level applications, a function served by TCP and UDP. However, almost every other protocol (including these two transport protocols) uses IP for delivery services.

Normally, the entire higher-level protocol message (including the headers and data) is encapsulated within an IP datagram's data segment. Once the IP datagram reaches its final destination, the receiving system will read the data segment and pass it on to the appropriate higher-level protocol for further processing. This field provides the destination system with a way to identify the higher-layer protocol for which the embedded message is intended.

Table 2-12 lists the four most common protocols, and their numeric identifiers.

Table 2-12: The Most Common Higher-Level Protocols and Their Numeric Identifiers

Protocol ID

Protocol Type

1

Internet Control Message Protocol (ICMP)

2

Internet Group Message Protocol (IGMP)

6

Transmission Control Protocol (TCP)

17

User Datagram Protocol (UDP)

There are a number of predefined protocol numbers that are registered with the Internet Assigned Numbers Authority (IANA). For a comprehensive list of all the upper-layer Protocol Identifier numbers used by IP, refer to the IANA's online registry (accessible at http://www.isi.edu/in-notes/iana/assignments/protocol-numbers).

Capture Sample

In the capture shown in Figure 2-18, the Protocol Type field is set to 1, indicating that the datagram contains an ICMP message.

Figure 2-18. The Protocol Type field

Header Checksum

Used to store a checksum of the IP header, allowing intermediary devices both to validate the contents of the header and to test for possible data corruption.

Size

Sixteen bits.

Notes

Since some portions of an IP datagram's header must be modified every time it is forwarded across a router, the sum value of the bits in the header will change as it gets moved across the Internet (at the very least, the Time-to-Live value should change; at most, fragmentation may occur, introducing additional IP headers, flags, and values). Whenever the header changes, the local system must calculate a checksum for the sum value of the header's bits, and store that value in the Header Checksum field. The next device to receive the IP datagram will then verify that the Header Checksum matches the values seen in the rest of the header. If the values do not agree, the datagram is assumed to have become corrupted and must be destroyed.

Note that the checksum only applies to the values of the IP header and not to the entire IP datagram. This is done for three reasons. First of all, a header is only going to be 20 to 60 bytes in length, while an entire datagram may be thousands of bytes long, so it is much faster to calculate only the header's checksum. Also, since the higher-layer protocols provide their own error-correction routines, the data portion of the datagram will be verified by those other protocols anyway, so it makes little sense to validate the entire datagram when validation will occur at a later stage. Finally, some applications can deal with partially corrupt data on their own, and so IP would be performing a disservice if it threw away corrupt data without ever giving the application a chance to do its job.

Capture Sample

In the capture shown in Figure 2-19, the Header Checksum has been calculated as hexadecimal "bc d6", which is correct.

Figure 2-19. The Header Checksum field

See Also

"Housekeeping and Maintenance"

Source IP Address

Identifies the datagram's original sender, as referenced by the 32-bit IP address in use on that system.

Size

Thirty-two bits.

Notes

This field identifies the original creator of the datagram, but does not necessarily identify the device that sent this particular packet.

Capture Sample

In the capture shown in Figure 2-20, the Source Address field is shown here as Ferret, which is 192.168.10.10 (or hexadecimal "c0 a8 0a 0a").

Figure 2-20. The Source Address field

See Also

"Destination IP Address"

Destination IP Address

Identifies the 32-bit IP address of the final destination for the IP datagram.

Size

Thirty-two bits.

Notes

This field identifies the final destination for the datagram, but does not necessarily identify the next router that will receive this particular packet. IP's routing algorithms are used to identify the next hop, which is determined by examining the Destination IP Address and comparing this information to the local routing table on the local system. In order for a packet to be delivered to the final destination system, that system's IP address must be provided in the header and must always remain in the header.

Capture Sample

In the capture shown in Figure 2-21, the Destination Address is shown as Bacteria, which is 192.168.20.50 (or hexadecimal "c0 a8 14 32").

Figure 2-21. The Destination Address field

See Also

"Source IP Address"

"Local Versus Remote Delivery"

IP Options

Everything an IP system needs to deliver or forward a packet is provided in the default headers. However, sometimes you may need to do something special with a datagram, extending its functionality beyond those services provided by the standard header fields. IP Options provide a way to introduce special-handling services to the datagrams or packets, allowing a system to instruct a router to send the datagram through a predefined network, or to note that the path a datagram took should be recorded, among other things.

Size

Varies as needed. The default is zero bits, while the maximum is 40 bytes (a restriction imposed by the limited space that is available in the Header Length field).

Notes

Options provide special-delivery instructions to devices on the network, and can be used to dictate the route that a datagram must take, or to record the route that was taken, or to provide other network-control services. Options are not mandatory, and most IP datagrams do not have any options defined. However, all network devices should support the use of options. If a device does not recognize a specific option type, then it should ignore the option and go ahead and process the datagram as normal.

By default, no options are defined within the IP header, meaning that this field does not exist. An IP header can have as many options as will fit within the space available (up to 40 bytes), if any are required.

Each option has unique characteristics. For more information on the various options and their ramifications, refer to "Notes on IP Options" later in this chapter.

Capture Sample

In the capture shown in Figure 2-22, the packet does not have any options defined.

Figure 2-22. The IP Options area

See Also

"Header Length"

"Padding"

"Fragmentation and Reassembly"

"Notes on IP Options"

Padding

Used to make an IP datagram's header divisible by 32 bits.

Size

Varies as needed.

Notes

The length of an IP header must be divisible by 32 bits if it is to fit within the small Header Length field. Most IP headers are 160 bits long, since that's the size of a normal header when all of the mandatory fields are used. However, if any options have been defined, then the IP header may need to be padded in order to make it divisible by 32 again.

See Also

"Header Length"

"IP Options"

Notes on IP Options

There can be many options in a single IP datagram, up to the amount of free space available in the IP header. Since an IP header can only be 60 bytes long at most--and since 20 bytes are already in use by the default fields--only 40 bytes are available for options.

Options are identified using three separate fields as shown in Figure 2-23: Option-Type, Option-Length, and Option-Data. The Option-Type field is used to indicate the specific option in use, while the Option-Length field is used to indicate the size of the option (including all of the fields and Option-Data combined). Since each option has unique characteristics (including the amount of data provided in the option-data field), the Option-Length field is used to inform the IP software of where the Option-Data field ends (and thus where the next Option-Type field begins).

Figure 2-23. The IP Option-Type sub-fields

The Option-Type field is eight bits long and contains three separate flags that indicate the specific option being used: copy, class, and type.

The first bit from the Option-Type field indicates whether or not an option should be copied to the headers of any IP fragments that may be generated. Some options--particularly those that dictate routing paths--need to be copied to each of the fragments' headers. Other options do not need to be copied to every fragments' headers, and will only be copied to the first fragment's header instead.

The next two bits define the option "class" (an option class is a grouping of options according to their functionality). Since there are two bits, there are four possible classes, although only two are used. Class 0 is used for network control options, while class 2 is used for debugging services. Classes 1 and 3 are reserved for future use.

The last five bits of the Option-Type field identify the specific option, according to the option class in use. Table 2-13 lists the most commonly used IP options. Each of these options is described in detail in the next sections of this chapter. For a detailed listing of all of the IP Options that are currently registered, refer to the IANA's online registry (accessible at http://www.isi.edu/in-notes/iana/assignments/ip-parameters).

Table 2-13: The Option-Type Definitions, Including Their Classes, Codes, and Lengths

Class

Code

Bytes

Description

0

0

0

End of option list

0

1

0

No operation

0

2

11

Security options (for military uses)

0

7

varies

Record route

0

3

varies

Loose source routing

0

9

varies

Strict source routing

0

20

4

Router alert

2

4

varies

Timestamp

The Option-Length field is used to measure bytes of data, so a value of 1 would mean "one byte." Since the Option-Length field is eight bits long, this allows for a maximum of 255 bytes of storage space to be specified, although the complete set of options cannot total more than 40 bytes (a restriction incurred from the Header Length's size limitation).

The following sections discuss the IP options in detail.

End of Option List

Used to mark the end of all the options in an IP header.

Class and Code

Class 0, Code 0

Size

Eight bits.

Copy to all fragments?

May be copied, added, or deleted as needed.

Defined In

RFC 791.

Status

Standard.

Notes

This option comes after all of the other options, and not at the end of every option.

The End of Option List option does not have an Option-Length or Option-Data field associated with it. It simply marks the end of the options in use with a specific IP header. If this option does not end on a 32-bit boundary, then the IP header must be padded.

No Operation

Used to internally pad options within the Options header field.

Class and Code

Class 0, Code 1

Size

Eight bits.

Copy to all fragments?

May be copied, added, or deleted as needed.

Defined In

RFC 791.

Status

Standard.

Notes

Sometimes it is desirable to have an option aligned on a certain boundary (such as having an option start at the 8th, 16th or 32nd bit off-set). If this is the case, the No Operation option can be used to internally pad the Options header field.

The No Operation option does not have an Option-Length or Option-Data field associated with it. It is used by itself to pad the IP Option field by a single byte. If more padding is required, the No Operation option can be used again, as many times as needed.

Security Options

Used to specify military security flags. This option is used only on military networks.

Class and Code

Class 0, Code 2

Size

Eighty-eight bits.

Copy to all fragments?

Yes.

Defined In

RFC 791.

Status

Standard.

Notes

Security options allow datagrams to classify their contents as being anywhere from "Unclassified" to "Top Secret," and also provide mechanisms for determining if a device is authorized to send certain types of traffic. Because of the highly vertical nature of this option, I suggest that people who are interested in using it should refer to RFC 1108, which deals with it in detail.

Record Route

Provides a facility for routers to record their IP addresses, allowing a system to see the route that an IP datagram took on its way from the original source to the final destination.

Class and Code

Class 0, Code 7

Size

Varies as needed.

Copy to all fragments?

No (first fragment only).

Defined In

RFC 791.

Status

Standard.

Notes

If a system wishes to have the route recorded, it must allocate enough space in the IP header for each device to place its IP address in the related Option-Data field.

In order to facilitate this process, the Record Route option has a separate 8-bit "pointer" field that is placed at the beginning of the Option-Data field. The pointer indicates the byte position where the IP address of the current router should be recorded. If the pointer is greater than the option length, then no more room is available. If there is sufficient space, then the router will write its four-byte IP address at the location specified by the pointer, and then increment the pointer so that it points to the next offset in the Option-Data field. (Interestingly, RFC 791 states that "if there is some room but not enough room for a full address to be inserted, the original datagram is considered to be in error and is discarded.") The process will continue until there is no more space, or until the datagram is delivered to its final destination.

Due to the limited space available, this option is not very useful on the open Internet.

Loose Source Routing

Identifies a network path that the IP datagram should take, with variations allowed as long as all of the defined routes are taken at some point.

Class and Code

Class 0, Code 3

Size

Varies as needed.

Copy to all fragments?

Yes.

Defined In

RFC 791.

Status

Standard.

Notes

Loose Source Routing allows an originating system to list landmark routers that a datagram must visit on the way to its destination. In between these landmark routers, the datagram may be sent wherever the network tells it to go.

In order to facilitate this process, the Loose Source Route option uses an 8-bit pointer field that is placed at the beginning of the Option-Data field. The pointer indicates the byte position that contains the next landmark to be visited. Once a landmark has been visited, the pointer is moved to an offset that points to the next landmark. If the pointer exceeds the option-length value, then no more landmarks can be used, and normal routing takes over.

Each router that touches the datagram will also record its own IP address in the option-data as well, as specified in "Record Route" in the previous section of this chapter. Due to the limited space available, this option is not very useful on the open Internet.

There are some security concerns with this option. By specifying a route that datagrams must take, it is possible for an intruder to mark external datagrams as being internal to your network. Normally, any datagrams sent in response to these datagrams would never leave your network, although by specifying a source-route, the hacker can tell your systems to send the datagrams to him by way of his own routers. For this reason, most firewalls block incoming packets that have this option defined.

Strict Source Routing

Identifies a network path that the IP datagram must take, without exception.

Class and Code

Class 0, Code 9

Size

Varies as needed.

Copy to all fragments?

Yes.

Defined In

RFC 791.

Status

Standard.

Notes

Strict Source Routing allows an originating system to list the specific routers that a datagram must visit on the way to its destination. No deviation from this list is allowed.

In order to facilitate this process, the Strict Source Route option uses an 8-bit pointer field that is placed at the beginning of the option-data field. The pointer indicates the byte position that contains the IP address of the next router to be visited. Once a router has been visited, the pointer is moved to an offset that points to the IP address of the next router. If the pointer exceeds the option-length value, then no more routes can be used, and normal routing takes over.

Each router also records its own IP address in the moving list of landmarks, as specified in "Record Route" earlier in this chapter. Due to the limited space available, this option is not very useful on the open Internet.

As with Loose Source Routing, there are some security concerns with this option. By specifying a route that datagrams must take, it is possible for an intruder to mark external datagrams as being internal to your network. Normally, any datagrams sent in response to these datagrams would never leave your network, although by specifying a source-route, the hacker can tell your systems to send the datagrams to him by way of his own routers. For this reason, most firewalls block incoming packets that have this option defined.

Router Alert

Used to inform a router that the current IP packet has some peculiarities that should be studied before it is forwarded on.

Class and Code

Class 0, Code 20

Size

Thirty-two bits.

Copy to all fragments?

Yes.

Defined In

RFC 2113.

Status

Proposed Standard, Elective.

Notes

Typically, routers will blindly forward datagrams that are destined for a remote network host or network. They do not normally process datagrams unless those datagrams are explicitly addressed to the router (as indicated by the Destination Address field), or are broadcasts or multicasts that the router is participating in.

However, sometimes the data in a datagram is of such a nature that the router should examine it closely before simply forwarding it on. For example, an experimental form of Path MTU Discovery currently under development requires that routers return bandwidth information about the last network that the probe crossed before reaching the router. In order for this to work, the router has to process the datagram--which is actually destined for a remote host--see that it is a request for MTU information, and then return the requested data. Without this option, the router would simply pass the datagram on to the next-hop router or final destination system.

The two-byte Option-Data field used with Router Alert allows for 65,535 possible numeric codes. The only currently defined code is 0, which states that routers should examine the datagram before forwarding it on. The other 65,534 codes are currently undefined.

Timestamp

Identifies the time at which a router processed the IP datagram.

Class and Code

Class 2, Code 4

Size

Varies as needed.

Copy to all fragments?

No (first fragment only).

Defined In

RFC 791.

Status

Standard.

Notes

The Timestamp option is conceptually similar to the Record Route option, with the critical exception being that the router will also place a timestamp into the Option-Data field (actually, the source device can choose the specific information that it wants to have recorded).

In order to facilitate this process, the Timestamp option uses an 8-bit pointer field similar to the pointer found in the Source Route and Record Route options, as well as a four-bit overflow field, and a four-bit set of flags.

The overflow field provides a counter for the routers that could not register their timestamps. This allows an administrator to see how much of the network they could not record, due to lack of space. The flags are used to define the behavior that an administrator wishes the routers to adhere to. These behaviors are listed in Table 2-14.

Table 2-14: Flags Used with the Timestamp Option

Flag Value

Description

0

Timestamps only (do not record router addresses)

1

Record router addresses, followed by timestamps

2

Match timestamps with preexisting router addresses

Timestamps are recorded as 32-bit integers that represent the number of milliseconds since midnight, Universal Time.

As the datagram is passed around the Internet, the routers use the pointer to indicate the byte position where they should write their data. Once a router has been visited, the pointer is moved to an offset that points to the next 32-bit field where timestamp recording should occur. If the pointer exceeds the option-length value, then no more timestamps can be recorded. At this point, routers should begin to increment the overflow counter as the datagram moves through the network. Interestingly, RFC 791 states that "if there is some room but not enough room for a full timestamp to be inserted, or if the overflow count itself overflows, the original datagram is considered to be in error and is discarded."

Due to the limited space available, this option is not very useful on the open Internet.

IP in Action

Although IP is responsible only for getting datagrams from one host to another, one network at a time, this seemingly simple service can actually get quite complex. An IP device has to route traffic to the appropriate network whenever a datagram needs to be forwarded; it has to break large datagrams into smaller pieces whenever datagrams have to be sent across a small network; and it has to make decisions based on the priority of the data.

Notes on IP Routing

Since IP is designed as a node-centric networking protocol, every device has equal access to the network. In this model, any device can communicate with any other device directly, without requiring the services of a centralized host. Nodes do not send traffic to a central host for processing and relay services, but instead communicate directly with the destination system, if possible.

When this is not possible--such as when the two hosts are on separate networks--then the sending device has to locate another device to relay the traffic to the destination system on its behalf. Even in this situation the sending device is still self-deterministic, since it chooses which local device it will send the datagrams to for forwarding.

The process of choosing an intermediate forwarding device is called routing. Whenever a device needs to choose a forwarder, it looks at a local list of available networks and forwarders (called the "routing table"), and decides which interface and forwarder is the most appropriate for the specific datagram that needs to be sent.

As was discussed in "Local Versus Remote Delivery" earlier in this chapter, the routing table on a system can be built using several different tools. To begin with, most systems build a basic routing table that shows the available network interfaces and the networks they are attached to. This information can then be supplemented with manual entries that identify specific forwarders for specific networks and hosts, or a simple "default route" for all non-local networks.

In addition, routing protocols can be used to automatically update the routing tables on the hosts of a network that changes often. Some of the more-common routing protocols in use today on corporate networks are Routing Information Protocol (RIP), Open Shortest Path First (OSPF), and Router Discovery (RDISC). However, these protocols are not able to scale up to the quantity of routes that are found on the Internet backbone, and protocols such as the Border Gateway Protocol (BGP) are more common in those environments.

Figure 2-24 shows a Windows NT 4.0 system with a fairly typical routing table. By looking at the "Active Routes" list, we can see the routers and networks that this device knows about explicitly.

Figure 2-24. The routing table on a Windows NT 4.0 PC

The routing table shown in Figure 2-24 looks somewhat complicated, but in reality is not that difficult to understand. The first thing we can tell (from the "Interface List") is that the PC is connected to three distinct networks: the "loopback" network (which is common to all IP devices), a local Ethernet network, and a dial-up network (which is currently inactive).

The "Active Routes" list shows all of the networks and forwarders that this device knows about. The first entry shows a destination of "0.0.0.0" (the default route for this device), with a forwarding gateway address of "192.168.10.3". Any datagrams that this host does not know how to deliver will be sent to that router for delivery.

The next two entries show the local networks that are currently active on this host, including the loopback network ("127.0.0.0") and the local Ethernet network ("192.168.10.0"). In addition, the subnet masks for those networks are shown, as are the IP addresses of the local network interface points on this system for those networks. This information provides the local host with the data it needs to route datagrams from the internal TCP/IP software to the appropriate local network.

In addition, there is a routing entry for the local Ethernet device explicitly, which indicates that any traffic bound for that network should be sent to the loopback address for delivery. This would indicate that all traffic is sent to the local loopback interface for forwarding and that the loopback adapter is in fact a forwarder.

The remaining entries show less-granular routes for general purpose network traffic. For example, the routing entry for "192.168.10.255" is a broadcast address for the local network, and the routing table shows that any traffic for that address should be sent to the Ethernet card for delivery. The last two entries show the all-points multicast address of "224.0.0.0" and the all-points broadcast address of "255.255.255.255", with both entries listing the local Ethernet card as the forwarder.

Most systems have similar routing tables, although they may not show as much information. For example, Figure 2-25 shows the routing table from a Solaris 7 client, which also has loopback and Ethernet interfaces. However, these entries do not show the detailed level of routing that the Windows NT 4.0 host does.

Figure 2-25. The routing table on a Solaris host

Notice also that the routing table in Figure 2-25 does not show explicit routing entries for the network interface cards like the Windows NT 4.0 host does. This is because Solaris uses a different networking kernel design than NT (the latter routes local traffic through the loopback interface, while Solaris passes it directly from the kernel to the network interface).

Most TCP/IP implementations also provide a traceroute program that can be used to see the route that datagrams are taking to get to specific destination systems. These programs typically send an ICMP or UDP message to an explicit destination system, setting the IP Time-to-Live value to a low value so that it will be rejected by routers along the path. This results in the intermediate systems returning ICMP error messages back to the sending system, which can then display the list of routers that rejected the forwarding requests. The traceroute program is described in detail in "Notes on traceroute" in Chapter 5.

Notes on Fragmentation

As discussed in "Fragmentation and Reassembly" earlier in this chapter, each of the different network topologies have different Maximum Transfer Unit (MTU) sizes, which represent the maximum amount of data that can be passed in a single frame. On Ethernet networks, the MTU is typically 1500 bytes, while 16 MB/s Token Ring has a default MTU size of 17,914 bytes. Some networks have smaller MTUs, with the minimum allowed value being just 68 bytes.

Whenever an IP datagram needs to be sent across a network to another device, the datagram must be small enough to fit within the MTU size constraints of the local network. For example, if the local network is Ethernet, then the IP datagram must be 1500 bytes or less in order for the datagram to get sent across that network. If the datagram is larger than 1500 bytes, then it must be split into multiple fragments that are each small enough to be sent across the local network.

Most of the time, datagrams do not require fragmentation. On local networks, every device uses the same MTU size, so local packets are never fragmented. And most of the networks in use on the Internet (either as destination networks or intermediate ISP networks) are capable of handling packets that are 1500 bytes in length, which is the largest size that most dial-up clients will generate. The only times that fragmentation typically occurs is on mixed local networks that have Ethernet and Token Ring (or other large-frame networks), or when a host on an Ethernet network tries to send data to a dial-up user that is using a small MTU size. In either of these situations, fragmentation will definitely occur.

In addition, fragmentation occurs if the application that is generating the datagram tries to send more data than will fit within the local network's MTU. This happens quite often with UDP-based applications such as the Network File Service (NFS). This can also be forced to happen through the use of programs such as ping, simply by specifying a large datagram size as a program option.

Figure 2-26. The first fragment of a large datagram

For example, Figure 2-26 and Figure 2-27 show a large ICMP message being sent from Krill to Bacteria that was too large for the local network to handle, and so the datagram had to be fragmented into two packets. What's most interesting about this is the fact that Krill fragmented the datagram before it was ever sent, since it could not create a single IP packet that was large enough to handle the full datagram.

Figure 2-26 shows the first fragment of the original (unfragmented) datagram, and Figure 2-27 shows the second (last) fragment. Notice that the Fragmentation Identifier field is the same in both captures, and that the first fragment has the More Fragments flag enabled, while the last fragment does not.

Figure 2-27. The second fragment of a large datagram

Also, notice that Figure 2-26 shows the Fragmentation Offset as 0, which indicates that the first fragment contains the starting block of data from the original datagram, while Figure 2-27 shows the Fragmentation Offset as 1480, which indicates that the last fragment contains data starting at that byte.

For more information on fragmentation-related issues, refer to "Fragmentation and Reassembly" earlier in this chapter.

Notes on Precedence and Type-of-Service

Applications can use the Precedence and Type-of-Service flags to dictate specific per-datagram handling instructions to the hosts and routers that forward the datagrams through a network. For example, the Precedence flags allow applications to set specific prioritization flags on the datagrams they generate, allowing them to define a higher-priority over normal traffic. Using this field, a database client could flag all IP datagrams with a higher priority than normal, which would inform the routers on the network to prioritize the database traffic over normal or lower-priority traffic.

Figure 2-28. An IP packet with a precedence of 7

Figure 2-28 shows an ICMP Echo Request Query Message sent from Arachnid to Bacteria, with a Precedence value of 7 in the IP header's Type-of-Service field. This IP packet would be given a higher priority over any other packets with a lower priority value, assuming the router supported this type of special handling operation (many routers do not offer this type of support).

Besides prioritization, the Type-of-Service byte also offers a variety of different special-handling flags that can also be used to dictate how a particular datagram should be treated. A Telnet client could set the "Minimize Latency" Type-of-Service flag on the datagrams that it generated, requesting that routers forward that traffic across a faster (and possibly more expensive) network than it might normally choose, for example. In addition, an FTP server could set the Maximize Throughput flag on the IP datagrams that it generated, requesting that routers choose the fastest-available link, while a Usenet News (NNTP) client could set the Minimize Cost flag, if it desired.

Figure 2-29 shows a Telnet client on Bacteria setting the "Minimize Latency" Type-of-Service flag on a Telnet connection to Krill. This packet would then get routed over a faster network than any packets that were not marked with these flags, assuming the router supported this type of operation (many routers do not offer this type of support).

Figure 2-29. A Telnet connection with the "Minimize Latency" Type-of-Service flag enabled

For more information on these flags and their usage, refer to "Prioritization and Service-Based Routing" earlier in this chapter.

Troubleshooting IP

Since IP provides only simple delivery services, almost all of the problems with IP are related to delivery difficulties. Perhaps a network segment is down, or a router has been misconfigured, or a host is no longer accepting packets.

In order to effectively debug problems with IP delivery, you should rely on the ICMP protocol. It is the function of ICMP to report on problems that will keep IP datagrams from getting delivered to their destination effectively. For more information on ICMP, refer to Chapter 5.

Misconfigured Routing Tables

The most common cause of connectivity problems across a network is that the routing tables have not been properly defined. In this scenario, your datagrams are going out to the remote destination, and datagrams are being sent back to your system but are taking a bad route on the way to your network. This problem occurs when the advertised routes for your network point to the wrong router (or do not point to any router).

This is a very common problem with new or recently changed networks. It is not at all unusual for somebody to forget to define the route back to your new network. Just because the datagrams are going out does not mean that return datagrams are coming back in on the same route.

The only way to successfully diagnose this problem is to use the traceroute program from both ends of a connection, seeing where in the network path the problem occurs. If you stop getting responses after the second or third hop on outbound tests, then it is highly likely that the router at that juncture has an incorrect routing entry for your network, or doesn't have any entry at all. For more information on traceroute, refer to "Notes on traceroute" in Chapter 5.

Media-Related Issues

Since IP packets are sent inside of media-specific frames, there can be problems with some network media that will manifest when used with IP packets. For example, some network managers have reported problems with network infrastructure equipment such as Ethernet hubs and switches that have problems dealing with full-sized (1500-byte) packets. In those situations, you will need to use ICMP to probe the network for delivery problems through equipment that is acting suspicious.

One way to do this is to send incrementally larger ICMP Echo Request messages to other devices on those networks, testing to see where they stop working. If the hub or switch stops forwarding data to all of the attached devices after a certain point, then it is possible that the device itself could be eating the packets. However, it is also entirely possible that the problem lies with your own equipment. In order to verify your suspicions, you should test connectivity using another system with a different network adapter (since your adapter may be the true culprit). However, if only one or two devices fail to respond, then the problem is likely to be with the adapters or drivers in use with those systems.

In addition, some network managers have reported problems with wide-area networking equipment that interprets some bit patterns from the IP packet as test patterns. In those cases, the WAN equipment may eat the packets. The packets that are most problematic are those that contain long sequences of ones or zeros, although packets that contain alternating ones and zeroes have also been problematic for some users. If you have reproducible problems with some of your WAN links, you may want to look at the data inside of the IP packets to see if you have any long strings of specific bit patterns, and then use a program such as ping to verify that the test pattern is causing the problems.

Fragmentation Problems

In addition, a variety of fragmentation-related problems can crop up that will prevent datagrams from being successfully delivered. Since IP will process only a complete datagram (and more importantly, will discard an incomplete datagram), fragmentation problems will cause a substantial number of retransmissions if an error-correcting protocol is generating the IP datagrams.

Fragmentation problems can occur in a variety of cases, although the most common cause is due to the sender attempting to detect the end-to-end MTU of a network using Path MTU Discovery, but an intermediary device does not return ICMP Error Messages back to the sending system. The result is that the sender continues trying to send packets that are too large to be fragmented, with the Don't Fragment flag enabled. For a comprehensive discussion on Path MTU Discovery and the problems that can result, refer to "Notes on Path MTU Discovery" in Chapter 5.

Other fragmentation problems can occur when using infrastructure equipment that is under heavy load, or when the network itself becomes somewhat congested. In those situations, a device that is fragmenting packets for delivery of another (smaller) network is losing some of the fragments, or the network itself is losing the packets. These problems can be difficult to diagnose, since ping tests using small or normal-sized messages across the network may perform just fine.

The best way to diagnose these problems is to send large ICMP Echo Request messages to the remote system, forcing fragmentation to occur on the network. If some (but not all) of the ICMP query messages are responded to, then it is likely that a device or segment on the network is eating some of the fragmented packets. For a detailed discussion on using ping to test the network, refer to "Notes on ping" in Chapter 5.

Understanding IP Addressing

Every computer that communicates over the Internet is assigned an IP address that uniquely identifies the device and distinguishes it from other computers on the Internet. An IP address consists of 32 bits, often shown as 4 octets of numbers from 0-255 represented in decimal form instead of binary form. For example, the IP address

168.212.226.204

in binary form is

10101000.11010100.11100010.11001100.

But it is easier for us to remember decimals than it is to remember binary numbers, so we use decimals to represent the IP addresses when describing them. However, the binary number is important because that will determine which class of network the IP address belongs to. An IP address consists of two parts, one identifying the network and one identifying the node, or host. The Class of the address determines which part belongs to the network address and which part belongs to the node address. All nodes on a given network share the same network prefix but must have a unique host number.

Class A Network -- binary address start with 0, therefore the decimal number can be anywhere from 1 to 126. The first 8 bits (the first octet) identify the network and the remaining 24 bits indicate the host within the network. An example of a Class A IP address is 102.168.212.226, where "102" identifies the network and "168.212.226" identifies the host on that network.

Class B Network -- binary addresses start with 10, therefore the decimal number can be anywhere from 128 to 191. (The number 127 is reserved for loopback and is used for internal testing on the local machine.) The first 16 bits (the first two octets) identify the network and the remaining 16 bits indicate the host within the network. An example of a Class B IP address is 168.212.226.204 where "168.212" identifies the network and "226.204" identifies the host on that network.

Class C Network -- binary addresses start with 110, therefore the decimal number can be anywhere from 192 to 223. The first 24 bits (the first three octets) identify the network and the remaining 8 bits indicate the host within the network. An example of a Class C IP address is 200.168.212.226 where "200.168.212" identifies the network and "226" identifies the host on that network.

Class D Network -- binary addresses start with 1110, therefore the decimal number can be anywhere from 224 to 239. Class D networks are used to support multicasting.

Class E Network -- binary addresses start with 1111, therefore the decimal number can be anywhere from 240 to 255. Class E networks are used for experimentation. They have never been documented or utilized in a standard way.

For further information on IP addressing and subnetting, see:

IP Addressing Fundamentals

IP uses an anarchic and highly-distributed model, with every device an equal peer to every other device on the global Internet. This structure was one of IP's original design goals, as it proved to be useful with a variety of systems, did not require a centralized management system (which would never have scaled well), and provided for fault-tolerance on the network (no central management means no single point of failure).

In order for systems to locate each other in this distributed environment, nodes are given explicit addresses that uniquely identify the particular network the system is on, and uniquely identify the system to that particular network. When these two identifiers are combined, the result is a globally-unique address.

This concept is illustrated in Figure B-1. In this example, the network is numbered 192.168.10, and the two nodes are numbered 10 and 20. Taken together, the fully-qualified IP addresses for these systems would be 192.168.10.10 and 192.168.10.20.

Figure B-1. The two parts of an IP address

Subnet Masks and CIDR Networks

IP addresses are actually 32-bit binary numbers (for example, 11000000101010000000000100010100). Each 32-bit IP address consists of two subaddresses, one identifying the network and the other identifying the host to the network, with an imaginary boundary separating the two.

The location of the boundary between the network and host portions of an IP address is determined through the use of a subnet mask. A subnet mask is another 32-bit binary number, which acts like a filter when it is applied to the 32-bit IP address. By comparing a subnet mask with an IP address, systems can determine which portion of the IP address relates to the network, and which portion relates to the host. Anywhere the subnet mask has a bit set to "1", the underlying bit in the IP address is part of the network address. Anywhere the subnet mask is set to "0", the related bit in the IP address is part of the host address.

For example, assume that the IP address 11000000101010000000000100010100 has a subnet mask of 11111111111111111111111100000000. In this example, the first 24 bits of the 32-bit IP address are used to identify the network, while the last 8 bits are used to identify the host on that network.

The size of a network (i.e., the number of host addresses available for use on it) is a function of the number of bits used to identify the host portion of the address. If a subnet mask shows that 8 bits are used for the host portion of the address block, a maximum of 256 possible host addresses are available for that specific network. Similarly, if a subnet mask shows that 16 bits are used for the host portion of the address block, a maximum of 65,536 possible host addresses are available for use on that network.

If a network administrator needs to split a single network into multiple virtual networks, the bit-pattern in use with the subnet mask can be changed to allow as many networks as necessary. For example, assume that we want to split the 24-bit 192.168.10.0 network (which allows for 8 bits of host addressing, or a maximum of 256 host addresses) into two smaller networks. All we have to do in this situation is change the subnet mask of the devices on the network so that they use 25 bits for the network instead of 24 bits, resulting in two distinct networks with 128 possible host addresses on each network. In this case, the first network would have a range of network addresses between 192.168.10.0 -192.168.10.127, while the second network would have a range of addresses between 192.168.10.128 -192.168.10.255.

Networks can also be enlarged through the use of a technique known as "supernetting," which works by extending the host portion of a subnet mask to the left, into the network portion of the address. Using this technique, a pair of networks with 24-bit subnet masks can be turned into a single large network with a 23-bit subnet mask. However, this works only if you have two neighboring 24-bit network blocks, with the lower network having an even value (when the network portion of the address is shrunk, the trailing bit from the original network portion of the subnet mask will fall into the host portion of the new subnet mask, so the new network mask will consume both networks). For example, it is possible to combine the 24-bit 192.168.10.0 and 192.168.11.0 networks together since the loss of the trailing bit from each network (00001010 vs. 00001011) produces the same 23-bit subnet mask (0000101x), resulting in a consolidated 192.168.10.0 network. However, it is not possible to combine the 24-bit 192.168.11.0 and 192.168.12.0 networks, since the binary values in the seventh bit position (00001011 vs. 00001100) do not match when the trailing bit is removed.

In the modern networking environment defined by RFC 1519 [Classless Inter-Domain Routing (CIDR)], the subnet mask of a network is typically annotated in written form as a "slash prefix" that trails the network number. In the subnetting example in the previous paragraph, the original 24-bit network would be written as 192.168.10.0/24, while the two new networks would be written as 192.168.10.0/25 and 192.168.10.128/25. Likewise, when the 192.168.10.0/24 and 192.168.11.0/24 networks were joined together as a single supernet, the resulting network would be written as 192.168.10.0/23. Note that the slash prefix annotation is generally used for human benefit; infrastructure devices still use the 32-bit binary subnet mask internally to identify networks and their routes.

All networks must reserve any host addresses that are made up entirely of either ones or zeros, to be used by the networks themselves. This is so that each subnet will have a network-specific address (the all-zeroes address) and a broadcast address (the all-ones address). For example, a /24 network allows for 8 bits of host addresses, but only 254 of the 256 possible addresses are available for use. Similarly, /25 networks have a maximum of 7 bits for host addresses, with 126 of the 128 possible addresses available (the all-ones and all-zeroes addresses from each subnet must be set aside for the subnets themselves).

Table B-1 shows some of the most common subnet masks, and the number of hosts available on them after subtracting the all-zeroes and all-ones addresses.

Table B-1: Common Subnet Masks and Their Host Counts

Subnet Mask (Slash Prefix)

Subnet Mask (Dotted Decimal)

Network Bits in Subnet Mask

Host Bits in Subnet Mask

Hosts per Net

/16

255.255.0.0

16

16

65,534

/17

255.255.128.0

17

15

32,766

/18

255.255.192.0

18

14

16,382

/19

255.255.224.0

19

13

8,190

/20

255.255.240.0

20

12

4,094

/21

255.255.248.0

21

11

2,046

/22

255.255.252.0

22

10

1,022

/23

255.255.254.0

23

9

510

/24

255.255.255.0

24

8

254

/25

255.255.255.128

25

7

126

/26

255.255.255.192

26

6

62

/27

255.255.255.224

27

5

30

/28

255.255.255.240

28

4

14

/29

255.255.255.248

29

3

6

/30

255.255.255.252

30

2

2

All the systems on the same subnet must use the same subnet mask in order to communicate with each other directly. If they use different subnet masks they will think they are on different networks, and will not be able to communicate with each other without going through a router first. Hosts on different networks can use different subnet maks, although the routers will have to be aware of the subnet masks in use on each of the segments.

Subnet masks are used only by systems that need to communicate with the network directly. For example, external systems do not need to be aware of the subnet masks in use on your internal networks, since those systems will route data to your networks by way of your parent network's address block. As such, remote routers need to know only about your provider's subnet mask. For example, if you have a small network that uses only a /28 prefix that is a subset of your ISP's /20 network, remote routers need to know only about your upstream provider's /20 network, while your upstream provider needs to know your subnet mask in order to get the data to your local /28 network.

The Legacy of Network Classes

The use of variable-length subnet masks as described in the preceding section was introduced to the Internet community in 1993 by RFC 1519 as a methodology for maximizing the utilization of limited IPv4 addresses. Even though this specification is nearly a decade old--and even though it is the addressing and routing architecture required by the modern Internet--many legacy systems and documents still refer to the "class-based" addressing architecture that preceded CIDR.

Under the old class-based architecture, network addresses are assigned according to fixed subnet mask values, called "classes." These classes are listed in Table B-2. Note that classes A, B, and C are used for end-user network assignments, while classes D and E do not contain end-user addresses.

Table B-2: Class-Based Subnet Masks

Class

Subnet Mask (Slash Prefix)

Subnet Mask (Dotted Decimal)

Usage Description

A

/8

255.0.0.0

Very large networks, always subnetted

B

/16

255.255.0.0

Large networks, typically subnetted

C

/24

255.255.255.0

Small networks, the most common class

D

/32

255.255.255.255

Multicasting group addresses (no hosts)

E

Undefined

Undefined

Reserved for experimental purposes

The primary benefit of the class-based model is that routers do not have to be made explicitly aware of the subnet mask in use on a particular network. Whereas CIDR requires that each network route be accompanied by a subnet mask for the advertised network, under the class-based model routers only have to examine the destination IP address to determine the subnet mask associated with that address. By looking at the four leading bits from the destination address, a device can determine which class the destination address falls into, and use that information to determine its subnet mask. Once this information is gleaned, the device can determine which portion of the address refers to the network, and then look up the router for that network. This concept is illustrated in Table B-3.

Table B-3: The First Four Bits from the Major Network Classes

Class

Lead Bits

Slash

Prefix

Possible Address Values[1]

Nets per Class

Hosts per Net

A

0xxx

/8

0.0.0.0-127.255.255.255

128

16,777,214

B

10xx

/16

128.0.0.0-191.255.255.255

16,384

65,534

C

110x

/24

192.0.0.0-223.255.255.255

2,097,152

254

D

1110

/32

224.0.0.0-239.255.255.255

268,435,456

0[2]

E

1111

Undefined

240.0.0.0-255.255.255.255

Undefined

Undefined

The address-to-class mapping methodology is described in more detail here:

Class A networks

Class A addresses always have the first bit of their IP addresses set to "0". Since Class A networks have an 8-bit network mask, the use of a leading zero leaves only 7 bits for the network portion of the address, allowing for a maximum of 128 possible network numbers, ranging from 0.0.0.0 -127.0.0.0.0. However, many of the address blocks from this range have been set aside for other uses over time. In short, any IP address in the range of 0.x.x.x -127.x.x.x is considered a Class A address with a subnet mask of 255.0.0.0, although many of these addresses are considered invalid by Internet routers.

Class B networks

Class B addresses have their first bit set to "1" and their second bit set to "0". Since Class B addresses have a 16-bit network mask, the use of a leading "10" bit-pattern leaves 14 bits for the network portion of the address, allowing for a maximum of 16,384 networks, ranging from 128.0.0.0 -191.255.0.0. Several network addresses from this range have also been reserved over time. Any IP address in the range of 128.0.x.x -191.255.x.x is considered a Class B address with a subnet mask of 255.255.0.0, but many of these addresses are considered invalid by Internet routers.

Class C networks

Class C addresses have their first two bits set to "1" and their third bit set to "0". Since Class C addresses have a 24-bit network mask, this leaves 21 bits for the network portion of the address, allowing for a maximum of 2,097,152 network addresses, ranging from 192.0.0.0 -223.255.255.0. Many network address reservations have been made from the Class C pool, substantially reducing the number of Internet-legal Class C networks. Any IP address in the range of 192.0.0.x -223.225.255.x is considered a Class C address with a subnet mask of 255.255.255.0, but again many of these addresses are considered invalid by Internet routers.

Class D networks

Class D addresses are used for multicasting applications, as described in Chapter 4, Multicasting and the Internet Group Management Protocol. Class D addresses have their first three bits set to "1" and their fourth bit set to "0". Class D addresses are 32-bit network addresses, meaning that all the values within the range of 224.0.0.0 -239.255.255.255 are used to uniquely identify multicast groups. There are no host addresses within the Class D address space, since all the hosts within a group share the group's IP address for receiver purposes.

None of the network addresses were reserved in the original allocations, although a variety of subsequent multicast addressing schemes have resulted in reservations. Refer to the IANA pages at http://www.isi.edu/in-notes/iana/assignments/multicast-addresses for information on these reservation schemes. In short, each multicast address exists as a 32-bit network address, so any address within the range of 224.0.0.0 -239.255.255.255 is a Class D multicast address.

Class E networks

Class E addresses are defined as experimental and reserved for future testing purposes. They have never been documented or utilized in a standard way.

The number of networks available to each of the subnet classes--and the number of hosts possible on each of those networks--varies widely between the classes. As we saw in Table B-3, there are only a few Class A networks available, although each of them can have millions of possible hosts. Conversely, there are a couple of million possible Class C networks, but they can serve only 254 devices each (after subtracting for the all-ones and all-zeroes host addresses).

All told, there are around 4.3 billion IP addresses (less, if you don't consider Class D and E addresses, which cannot be used as host addresses). Unfortunately, the class-based, legacy addressing scheme places heavy restrictions on the distribution of these addresses.

Every time a Class A address is assigned to an organization, almost 17 million host addresses go with it. If all 126 Class A networks were assigned, two billion of the possible addresses would be gone. If all the available Class B networks were assigned, another billion host addresses would be gone as well. This is true regardless of whether the host addresses within those network blocks are used or not; the network address is published along with its routing information, so all host addresses within the network are reachable only through that route.

Class C addresses represent the biggest problem, however, for two reasons. First of all, there are fewer IP addresses available in all the Class C networks than there are in the other classes (only about 536 million possible host addresses from all the Class C networks combined). Second, Class C networks are the most popular, since they reflect the size of the majority of LANs in use.

Every time a Class C address is assigned, 256 addresses go with it. Organizations that have 3 segments but only 60 devices are wasting over 700 possible addresses (3 segments × 256 maximum IP addresses = 768 addresses - 60 active nodes = 708 inactive addresses). Whether all the addresses are actually put to use or not is irrelevant; they are assigned to a specific network and cannot be used by anybody else. This problem is even worse with Class B addresses, since an organization with a few hundred nodes might be given a Class B address, in which case it is wasting several thousand IP addresses.

Remember, however, that TCP/IP networks are inherently router-based, and it takes much less overhead to keep track of a few networks than millions of them. If all the addresses were assigned using Class C networks, routers would have to remember and process 16 million network routes; this would quickly overwhelm even the fastest of routers, and network traffic would either slow to a crawl or fail completely. Having larger network classes allows routers to work with smaller routing tables.

Remember also that the original architecture of the Internet consisted mostly of large networks connecting to each other directly, and didn't look much like the hierarchical design used today. It was easy to give one huge address block to the military and another to Stanford University. In that model, routers had to remember only one IP address for each network, and could reach millions of hosts through each of those routes.

Today, however, things are considerably different, and organizations of all sizes are connecting to the Internet. Some networks are still quite large, requiring many thousands of network numbers, while some are quite small, consisting of only a handful of PCs. In this environment class-based routing does not scale well, although there still exists the need for bundled networks so that routers do not have to remember millions of separate routers and network paths.

This problem has been resolved through the use of variable-length subnet masks, as described in the earlier section "Subnet Masks and CIDR Networks." When variable-length subnet masks are used instead of predefined subnet masks, blocks of addresses can be assigned to organizations using a subnet mask that is appropriate for the number of devices on that network. If a network has only 8 PCs, it only needs a network block with a 28-bit subnet mask, which provides it with 16 addresses (14 of which are usable by the hosts).

In this context, CIDR-based addressing rules do not care about the "class" to which a network address appears to belong. Instead, CIDR-aware systems rely on the explicit presence of a subnet mask to make packet-forwarding decisions, and use the class only as a last-ditch effort in the event that the subnet mask is not explicitly defined in the network's routing tables.

This results in substantially less wasted address space, although it also results in more routing entries that must be managed. However, another key part of the CIDR architecture is that network blocks are assigned hierarchically, with top-level service providers getting big network numbers (a large ISP may get a network with a /13 prefix, allowing a maximum of 524,288 host addresses for that network assignment), and those organizations can subnet their large networks into multiple smaller networks for their downstream customers.

This allows a single routing entry for the top-level ISP to be used for all the networks underneath it. Rather than the top-level routers having to store routing information for the 32,000+ individual /28 networks beneath the ISP, they have to remember only the routes for the /13 network.

Since the modern Internet now predominately uses CIDR addressing and routing, the most important thing to remember about the historical class-based addressing scheme is that it is a legacy design. Just about all of the modern operating systems and network infrastructure devices today fully support variable-length subnet masks. However, much of the older equipment still enforces the use of class-based addressing, and many training courses still teach this historical architecture as if it were current technology. For the most part, network administrators should not be concerned with network classes unless they are suffering problems with legacy equipment.

Internet-Legal Versus Private Addressing

Although the pool of IP addresses is somewhat limited, most companies have no problems obtaining them. However, many organizations have already installed TCP/IP products on their internal networks without obtaining "legal" addresses from the proper sources. Sometimes these addresses come from example books or are simply picked at random (several firms use networks numbered 1.2.3.0, for example). Unfortunately, since they are not legal, these addresses will not be usable when these organizations attempt to connect to the Internet. These firms will eventually have to reassign Internet-legal IP addresses to all the devices on their networks, or invest in address-translation gateways that rewrite outbound IP packets so they appear to be coming from an Internet-accessible host.

Even if an address-translation gateway is installed on the network, these firms will never be able to communicate with any site that is a registered owner of the IP addresses in use on the local network. For example, if you choose to use the 36.0.0.0/8 address block on your internal network, your users will never be able to access the computers at Stanford University, the registered owner of that address block. Any attempt to connect to a host at 36.x.x.x will be interpreted by the local routers as a request for a local system, so those packets will never leave your local network.

Not all firms have the luxury of using Internet-legal addresses on their hosts, for any number of reasons. For example, there may be legacy applications that use hardcoded addresses, or there may be too many systems across the organization for a clean upgrade to be successful. If you are unable to use Internet-legal addresses, you should at least be aware that there are groups of "private" Internet addresses that can be used on internal networks by anyone. These address pools were set aside in RFC 1918, and therefore cannot be "assigned" to any organization. The Internet's backbone routers are configured explicitly not to route packets with these addresses, so they are completely useless outside of an organization's internal network. The address blocks available are listed in Table B-4.

Table B-4: Private Addresses Provided in RFC 1918

Class

Range of Addresses

A

Any addresses in 10.x.x.x

B

Addresses in the range of 172.16.x.x-172.31.x.x

C

Addresses in the range of 192.168.0.x-192.168.255.x

Since these addresses cannot be routed across the Internet, you must use an address-translation gateway or a proxy server in conjunction with them. Otherwise, you will not be able to communicate with any hosts on the Internet.

An important note here is that since nobody can use these addresses on the Internet, it is safe to assume that anybody who is using these addresses is also utilizing an address-translation gateway of some sort. Therefore, while you will never see these addresses used as destinations on the Internet, if your organization establishes a private connection to a partner organization that is using the same block of addresses you are, your firms will not be able to communicate. The packets destined for your partner's network will appear to be local to your network, and will never be forwarded to the remote network.

There are many other problems that arise from using these addresses, making their general usage difficult for normal operations. For example, many application-layer protocols embed addressing information directly into the protocol stream, and in order for these protocols to work properly, the address-translation gateway has to be aware of their mechanics. In the preceding scenario, the gateway has to rewrite the private addresses (which are stored as application data inside the application protocol), rewrite the UDP/TCP and IP checksums, and possibly rewrite TCP sequence numbers as well. This is difficult to do even with simple and open protocols such as FTP, and extremely difficult with proprietary, encrypted, or dynamic applications (these are problems for many database protocols, network games, and voice-over-IP services, in particular). These gateways almost never work for all the applications in use at a specific location.

It is always best to use formally-assigned, Internet-legal addresses whenever possible, even if the hosts on your network do not necessarily require direct Internet access. In those cases in which your hosts are going through a firewall or application proxy of some sort, the use of Internet-legal addresses causes the least amount of maintenance trouble over time. If for some reason this is not possible, use one of the private address pools described in Table B-4. Do not use random, self-assigned addresses if you can possibly avoid it, as this will only cause connectivity problems for you and your users.

1. These values do not reflect the reserved network addresses.

2. Multicast addresses are shared by all group members. Members do not have explicit addresses within the group.

Generic Top-Level Domains
ON THE INTERNET

A TLD (Top-Level Domain) is a reservation of names in the Domain Name Space that have a particular membership or assignability/qualification to join, sponsored/operated by particular TLD sponsors/operators.

The .aero domain is reserved for members of the air-transport industry and is sponsored by Société Internationale de Télécommunications Aéronautiques (SITA).
The .biz domain is restricted to businesses and is operated by NeuLevel, Inc.
The .com domain is operated by VeriSign Global Registry Services.
The .coop domain is reserved for cooperative associations and is sponsored by Dot Cooperation LLC.
The .info domain is operated by Afilias Limited.
The .jobs TLD is reserved for human resource managers and is sponsored by Employ Media LLC.
The .museum domain is reserved for museums and is sponsored by the Museum Domain Management Association.
The .name domain is reserved for individuals and is operated by Global Name Registry.
The .net domain is operated by VeriSign Global Registry Services.
The .org domain is operated by Public Interest Registry. It is intended to serve the noncommercial community, but all are eligible to register within .org.
The .pro domain is restricted to credentialed professionals and related entities and is operated by RegistryPro.
The .travel domain is reserved for entities whose primary area of activity is in the travel industry and is sponsored by Tralliance Corporation.

Registrations in the domains listed above may be made through dozens of competitive registrars. For a list of the currently operating accredited registrars, go to the InterNIC site. Information about becoming an accredited registrar is available on the ICANN site.

The .gov domain is reserved exclusively for the United States Government. It is operated by the US General Services Administration.
The .edu domain is reserved for postsecondary institutions accredited by an agency on the U.S. Department of Education's list of Nationally Recognized Accrediting Agencies and is registered only through Educause.
The .mil domain is reserved exclusively for the United States Military. It is operated by the US DoD Network Information Center.
The .int domain is used only for registering organizations established by international treaties between governments. It is operated by the IANA .int Domain Registry.

IP Subnet Calculations

1. IP Addressing

At this point one should realize that IP, the Internet Protocol, is a network layer (OSI layer 3) protocol, used to route packets between hosts on different networks. To suit this purpose, IP must define an addressing scheme, so that a packet's intended destination can be indicated.

An IP address is composed of 32 bits. These 32 bits are divided into 4 octets of 8 bits each. You may have seen an IP address represented like this: 172.68.15.24. We must remember, however, that the computer understands this number only in binary, so we must often deal with them in binary. Many people are intimidated by this initially, but soon find that it is not difficult. If you do not allow yourself to be flustered, you can master this topic.

IP addresses are assigned to orginazations in blocks. Each block belongs to one of three classes: class A, class B, or class C. You can tell what class an IP address is by the value in its first octet.

Class A	1-126
Class B	128-191
Class C	192 -->

An IP address consists of two fields. The first field identifies the network, and the second field identifies the node on the network. Which bits of the address are in the network field and which bits are in the host field is determined by the subnet mask.

When a class A IP license is granted, you are assigned something like this: 99.0.0.0. Only the value of the bits in the first octet are assigned. This means you are free to assign any values you wish in the second, third and fourth octets.

The defualt subnet mask for a class A network is 255.0.0.0. High bits, ones, indicate the bits that are part of the network field of the IP address. The default subnet mask does not create subnets. Therefor, a class A network with the default subnet mask is one network. The three octets that are unassigned and unmasked are part of the host field of the address. There is a total of 24 bits in those three octets. Each bit can be in one of two states. Therefor, 2^24 is the number of host addresses that can be assigned on that network, almost. Two addresses are reserved on every network, x.x.x.0 and x.x.x.255. So the total number of hosts possible on this network is 2^24. 2^24-2=16,777,214 hosts for a class A IP network.

When a class B license is granted, the first two octets are assigned. For example, 172.198.x.x. The default subnet mask for a class B is 255.255.0.0. One network, two octets free, 16 bits for the host address field. 2^16-2=65,534 possible host addresses on a class B IP network.

When a class C license is granted, the first three octets are assigned, for example: 193.52.16.0. The default subnet mask for a class C is 255.255.255.0. Once octet makes up the host address field. 2^8-2=254 host addresses possible on a class C network.

2. Reason for Subnetting

We said that the default subnet mask for a class A IP network is 255.0.0.0. Once octet only of a class A network address identifies the network, with this subnet mask. This leaves three octets of 8 bits each, or 24 bits, to identify the host on that one network. 2^24=16,777,216 addresses. Two addresses are reserved, x.x.x.0 and x.x.x.255. 16,777,214 nodes can be assigned an IP address on this network.

It is highly unlikely that any organization would want one network of 16,777,214 nodes. They might want that many devices connected in a wide area network (WAN), thus capablee of communicating when neccessary, but they will want to subdivide this huge network into mostly self-contained subnetworks of nodes that communicate with each other often. This is called subnetting.

To understand why, consider what would happen in either a broadcast or a token passing network that consisted of over 16,000,000 nodes. Nothing would happen. It simply would not work. Though the problem is not as drastic, class B and class C IP networks are often subnetted, also.

The subnet mask is used to subdivide an IP network into subnets. This is a division that takes place in OSI layer 3, so it is a logical division that is created by the addressing scheme. This logical division is usually combined with a physical division. Many subnets are physically isolated from the rest of the network by a device such as a router or a switch. This aspect of subnetting is discussed in Unit 3--Data Link Layer.

3. How Subnetting Works

The bits of an address that are masked by the subnet mask are the bits that make up the network field of the address. To subnet, the default subnet mask for a network is extended to cover bits of the address that would otherwise be part of the host field. Once these bits are masked, they become part of the network field, and are used to identify subnets of the larger network.

Here is where we begin dealing with both addresses and subnetmasks in binary. Get yourself a cold beverage, stretch, take a deep breath and don't worry. Once you get your brain around the concepts, it is not difficult. You just have to keep trying until the light goes on.

3.1 Translating Binary to Decimal

Both IP addresses and subnet masks are composed of 32 bits divided into 4 octets of 8 bits each. Here is how a single octet translates from binary to decimal. Consider an octet of all ones: 11111111.

128   64   32   16   8   4   2   1
---   --   --   --   -   -   -   -
 1     1    1    1   1   1   1   1 
128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = 255

Here's another: 10111001

128   64   32   16   8   4   2   1
---   --   --   --   -   -   -   -
 1     0    1    1   1   0   0   1
128 +  0 + 32  +16 + 8 + 0 + 0 + 1 = 185

and 00000000

128   64   32   16   8   4   2   1
---   --   --   --   -   -   -   -
 0     0    0    0   0   0   0   0
 0  +  0 +  0 +  0 + 0 + 0 + 0 + 0 = 0

3.2 Converting Decimal to Binary

Converting decimal to binary is similar. Consider 175:

128   64   32   16   8   4   2   1
---   --   --   --   -   -   -   -
 1     0    1    0   1   1   1   1
128 +  0 + 32 +  0 + 8 + 4 + 2 + 1 = 175

175=10101111

3.3 Simple Subnetting

The simpliest way to subnet is to take the octet in the subnet mask that covers the first unassigned octet in the IP address block, and make all its bits high. Remember, a high bit, a 1, in the subnet mask indicates that that corresponding bit in the IP address is part of the network field. So, if you have a class B network 172.160.0.0, with the subnet mask 255.255.0.0, you have one network with 65, 534 possible addresses. If you take that subnet mask and make all the bits in the third octet high

128   64   32   16   8   4   2   1
---   --   --   --   -   -   -   -
 1     1    1    1   1   1   1   1
128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = 255

you get the subnet mask 255.255.255.0.

172.60.  0. 0
255.255.255.0

Now the third octet of all the addresses on this network are part of the network field instead of the host field. That is one octet, or eight bits, that can be manipulated to create subnets. 2^8-2=254 possible subnets now on this class B network.

One octet is left for the host field. 2^8-2=254 possible host addressed on each subnet.

3.4 Advanced Subnetting

That is the simplist way to subnet, but it may not be the most desirable. You might not want 254 subnets on your class B network. Instead, you might use a subnet mask like 255.255.224.0. How many subnets would this give you? The first step is to see how many bits are allocated to the network by this mask.

128   64   32   16   8   4   2   1
---   --   --   --   -   -   -   -
 1     1    1    0   0   0   0   0
128 + 64 + 32 +  0 + 0 + 0 + 0 + 0 = 224

3 bits are allocated. 2^3-2=6 subnets.

How many hosts on each subnet? Well, 5 bits from this octet are left for the host field, and 8 bits in the fourth octet, for a total of 13 bits in the host field. 2^13-2=8190 possible hosts on each subnet.

The subnet mask is always extended by masking off the next bit in the address, from left to right. Thus, the last octet in the subnet mask will always be one of these: 128, 192, 224, 240, 248, 252, 254 or 255.

Given the IP address of a host and the subnet address for the network, you need to be able to calculate which subnet that host is on. To do this we compare the binary representation of the pertinent octet of the subnet mask witht he binary representation of the corresponding octet in the IP address. Example:

IP address=172.60.50.2
subnet mask=255.255.224.0

50= 00110010
224=11100000

We perform a logical on these two numbers. We will be left with only the bits where there is a one in both octets.

00110010
11100000
--------
00100000=32

This host is on subnet 172.60.32.0.

We also need to be able to find the range of assignable IP addresses on this subnet. To do this, we take the binary that tells us the subnet address, in this case 00100000, and compare it with the subnet mask.

00100000
11100000

The bits convered by the mask we will leave as they are. The rest of the bits we make high. So

00100000
11100000
--------
0011111=63

The range of assignable IP addresses on the subnet 172.60.32.0 is 172.60.32.1-172.60.63.254.

On every network and subnet, two addresses are reserved. At the low end of the range of addresses for the network or subnet, in this case 172.60.64.0, is the address for the network or subnet itself. The address at the high end of the range of addresses, in this case 172.60.95.255, is the broadcast address. Any message sent to the broadcast address will be received by every host on the network.

LAYERS

TCP/IP Lower-Layer (Interface, Internet and Transport) Protocols (OSI Layers 2, 3 and 4)

The TCP/IP protocol suite is largely defined in terms of the protocols that constitute it; several dozen are covered in this Guide. Most of the critical protocols of the suite function at the lower layers of the OSI Reference Model: layers 2, 3 and 4, which correspond to the network interface, internet and transport layers in the TCP/IP model architecture. Included here are the all-important Internet Protocol (IP) at layer 3 and Transmission Control Protocol (TCP) at layer 4, which combine to give TCP/IP its name.

Due to the importance of these and other TCP/IP protocols at the lower layers, this is the largest chapter of The TCP/IP Guide. It contains four subsections. The first describes the two TCP/IP protocols that reside at the network interface layer, layer 2 of the OSI model: PPP and SLIP. The second describes a couple of “special” protocols that reside architecturally between layers 2 and 3: ARP and RARP. The third covers the TCP/IP internet layer (OSI network layer, layer 3), including IP and several other related and support protocol. The fourth describes the TCP/IP transport layer protocols TCP and UDP.

TCP/IP Network Interface / Internet "Layer Connection" Protocols

The second layer of the OSI Reference Model is the data link layer; it corresponds to the TCP/IP network interface layer. It is there that most LAN, WAN and WLAN technologies are defined, such as Ethernet and IEEE 802.11. The third layer is the network layer, also called the internet layer in the TCP/IP model, where internetworking protocols are defined, the most notable being the Internet Protocol. These two layers are intimately related, because messages sent at the network layer must be carried over individual physical networks at the data link layer. They perform different tasks but as neighbors in the protocol stack, must cooperate with each other.

There is a set of protocols that serves the important task of linking together these two layers and allowing them to work together. The problem with them is deciding where exactly they should live! They are sort of the “black sheep” of the networking world—nobody denies their importance, but they always think they belong in “the other guy's” layer. For example, since these protocols pass data on layer two networks, the folks who deal with layer two technologies say they belong at layer three. But those who work with layer three protocols consider these “low level” protocols that provide services to layer three, and hence put them as part of layer two.

So where do they go? Well, to some extent it doesn't really matter. Even if they are “black sheep” I consider them somewhat special, so I gave them their own home. Welcome to “networking layer limbo”, also known as “OSI layer two-and-a-half”. J This is where a couple of protocols are described that serve as “glue” between the data link and network layers. The main job performed here is address resolution, or providing mappings between layer two and layer three addresses. This resolution can be done in either direction, and is represented by the two TCP/IP protocols ARP and RARP (which, despite their similiaties, are used for rather different purposes in practice.)

Address Resolution and the TCP/IP Address Resolution Protocol (ARP)

Communication on an internetwork is accomplished by sending data at layer three using a network layer address, but the actual transmission of that data occurs at layer two using a data link layer address. This means that every device with a fully-specified networking protocol stack will have both a layer two and a layer three address. It is necessary to define some way of being able to link these addresses together. Usually, this is done by taking a network layer address and determining what data link layer address goes with it. This process is called address resolution.

In this section I look at the problem of address resolution at both a conceptual and practical level, with a focus on how it is done in the important TCP/IP protocol suite. I begin with a section that overviews address resolution in general terms and describes the issues involved in the process. I then describe the TCP/IP Address Resolution Protocol (ARP), probably the best-known and most commonly used address resolution technique. I also provide a brief overview of how address resolution is done for multicast addresses in IP, and the method used in the new IP version 6.

Reverse Address Resolution and the TCP/IP Reverse Address Resolution Protocol (RARP)
(Page 1 of 4)

The TCP/IP Address Resolution Protocol (ARP) is used when a device needs to determine the layer two (hardware) address of some other device but has only its layer three (network, IP) address. It broadcasts a hardware layer request and the target device responds back with the hardware address matching the known IP address. In theory, it is also possible to use ARP in the exact opposite way. If we know the hardware address of a device but not its IP address, we could broadcast a request containing the hardware address and get back a response containing the IP address.

The Motivation For Reverse Address Resolution

Of course the obvious question is: why would we ever need to do this? Since we are dealing with communication on an IP internetwork, we are always going to know the IP address of the destination of the datagram we need to send—it's right there in the datagram itself. We also know our own IP address as well. Or do we?

In a traditional TCP/IP network, every normal host on a network knows its IP address because it is stored somewhere on the machine. When you turn on your PC, the TCP/IP protocol software reads the IP address from a file, which allows your PC to “learn” and start using its IP address. However, there are some devices, such as diskless workstations, that don't have any means of storing an IP address where it can be easily retrieved. When these units are powered up they know their physical address only (because it's wired into the hardware) but not their IP address.

The problem we need to solve here is what is commonly called bootstrapping in the computer industry. This refers to the concept of starting something from a zero state; it is analogous to “pulling yourself up by your own bootstraps”. This is seemingly impossible, just as it seems paradoxical to use a TCP/IP protocol to configure the IP address that is needed for TCP/IP communications. However, it is indeed possible to do this, by making use of broadcasts, which allow local communication even when the target's address is not known.

TCP/IP Transport Layer Protocols

The first three layers of the OSI Reference Model—the physical layer, data link layer and network layer—are very important layers for understanding how networks function. The physical layer moves bits over wires; the data link layer moves frames on a network; the network layer moves datagrams on an internetwork. Taken as a whole, they are the parts of a protocol stack that are responsible for the actual “nuts and bolts” of getting data from one place to another.

Immediately above these we have the fourth layer of the OSI Reference Model: the transport layer, called the host-to-host transport layer in the TCP/IP model. This layer is interesting in that it resides in the very architectural center of the model. Accordingly, it represents an important transition point between the hardware-associated layers below it that do the “grunt work”, and the layers above that are more software-oriented and abstract.

Protocols running at the transport layer are charged with providing several important services to enable software applications in higher layers to work over an internetwork. They are typically responsible for allowing connections to be established and maintained between software services on possibly distant machines. Perhaps most importantly, they serve as the bridge between the needs of many higher-layer applications to send data in a reliable way without needing to worry about error correction, lost data or flow management, and network-layer protocols, which are often unreliable and unacknowledged. Transport layer protocols are often very tightly-tied to the network layer protocols directly below them, and designed specifically to take care of functions that they do not deal with.

In this section I describe transport layer protocols and related technologies used in the TCP/IP protocol There are two main protocols at this layer; the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP). I also discuss how transport-layer addressing is done in TCP/IP in the form of ports and sockets.

Transmission Control Protocol (TCP) and User Datagram Protocol (UDP)

TCP/IP is the most important internetworking protocol suite in the world; it is the basis for the Internet, and the “language” spoken by the vast majority of the world's networked computers. TCP/IP includes a large set of protocols that operate at the network layer and above. The suite as a whole is anchored at layer three by the Internet Protocol (IP), which many people consider the single most important protocol in the world of networking.

Of course, there's a bit of architectural distance between the network layer and the applications that run at the layers well above. While IP is the protocol that performs the bulk of the functions needed to make an internetwork, it does not include many capabilities that are needed by applications. In TCP/IP these tasks are performed by a pair of protocols that operate at the transport layer: the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP).

Of these two, TCP gets by far the most attention. It is the transport layer protocol that is most often associated with TCP/IP, and, well, its name is right there, “up in lights”. It is also the transport protocol used for many of the Internet's most popular applications, while UDP gets second billing. However, TCP and UDP are really peers that play the same role in TCP/IP. They function very differently and provide different benefits and drawbacks to the applications that use them, which makes them both important to the protocol suite as a whole. The two protocols also have certain areas of similarity, which makes it most efficient that I describe them in the same overall section, highlighting where they share characteristics and methods of operation, as well as where they diverge.

In this section I provide a detailed examination of the two TCP/IP transport layer protocols: the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP). I begin with a quick overview of the role of these two protocols in the TCP/IP protocol suite, and a discussion of why they are both important. I describe the method that both protocols employ for addressing, using transport-layer ports and sockets. I then have two detailed sections for each of UDP and TCP. I conclude with a summary quick-glance comparison of the two.

Incidentally, I describe UDP before TCP for a simple reason: it is simpler. UDP operates more like a classical message-based protocol, and in fact is more similar to IP itself than is TCP. This is the same reason why the section on TCP is much larger than that covering UDP: TCP much more complex and does a great deal more than UDP.

TCP and UDP Overview and Role In TCP/IP
(Page 1 of 3)

The transport layer in a protocol suite is responsible for a specific set of functions. For this reason, one might expect that the TCP/IP suite would have a single main transport protocol to perform those functions, just as it has IP as its core protocol at the network layer. It is a curiosity, then, that there are two different widely-used TCP/IP transport layer protocols. This arrangement is probably one of the best examples of the power of protocol layering—and hence, an illustration that it was worth all the time you spent learning to understand that pesky OSI Reference Model. J

Differing Transport Layer Requirements in TCP/IP

Let's start with a look back at layer three. In my overview of the key operating characteristics of the Internet Protocol, I described several important limitations of how IP works. The most important of these are that IP is connectionless, unreliable and unacknowledged. Data is sent over an IP internetwork without first establishing a connection, using a “best effort” paradigm. Messages usually get where they need to go, but there are no guarantees, and the sender usually doesn't even know if the data got to its destination.

These characteristics present serious problems to software. Many, if not most, applications need to be able to count on the fact that the data they send will get to its destination without loss or error. They also want the connection between two devices to be automatically managed, with problems such as congestion and flow control taken care of as needed. Unless some mechanism is provided for this at lower layers, every application would need to perform these jobs, which would be a massive duplication of effort.

In fact, one might argue that establishing connections, providing reliability, and handling retransmissions, buffering and data flow is sufficiently important that it would have been best to simply build these abilities directly into the Internet Protocol. Interestingly, that was exactly the case in the early days of TCP/IP. “In the beginning” there was just a single protocol called “TCP” that combined the tasks of the Internet Protocol with the reliability and session management features just mentioned.

There's a big problem with this, however. Establishing connections, providing a mechanism for reliability, managing flow control and acknowledgments and retransmissions: these all come at a cost: time and bandwidth. Building all of these capabilities into a single protocol that spans layers three and four would mean all applications got the benefits of reliability, but also the costs. While this would be fine for many applications, there are others that both don't need the reliability, and “can't afford” the overhead required to provide it.

TCP and UDP Overview and Role In TCP/IP
(Page 2 of 3)

The Solution: Two Very Different Transport Protocols

Fixing this problem was simple: let the network layer (IP) take care of basic data movement on the internetwork, and define two protocols at the transport layer. One would provide a rich set of services for those applications that need that functionality, with the understanding that some overhead was required to accomplish it. The other would be simple, providing little in the way of classic layer-four functions, but it would be fast and easy to use. Thus, the result of two TCP/IP transport-layer protocols:

Transmission Control Protocol (TCP): A full-featured, connection-oriented, reliable transport protocol for TCP/IP applications. TCP provides transport-layer addressing to allow multiple software applications to simultaneously use a single IP address. It allows a pair of devices to establish a virtual connection and then pass data bidirectionally. Transmissions are managed using a special sliding window system, with unacknowledged transmissions detected and automatically retransmitted. Additional functionality allows the flow of data between devices to be managed, and special circumstances to be addressed.
User Datagram Protocol (UDP): A very simple transport protocol that provides transport-layer addressing like TCP, but little else. UDP is barely more than a “wrapper” protocol that provides a way for applications to access the Internet Protocol. No connection is established, transmissions are unreliable, and data can be lost.

By means of analogy, TCP is a fully-loaded luxury performance sedan with a chauffeur and a satellite tracking/navigation system. It provides lots of frills and comfort, and good performance. It virtually guarantees you will get where you need to go without any problems, and any concerns that do arise can be corrected. In contrast, UDP is a stripped-down race car. Its goal is simplicity and speed, speed, speed; everything else is secondary. You will probably get where you need to go, but hey, race cars can be finicky to keep operating.

TCP/IP Protocol Suite and Architecture

Just as Ethernet rules the roost when it comes to LAN technologies and IEEE 802.11 is the boss of the wireless LAN world, modern internetworking is dominated by the suite known as TCP/IP. Named for two key protocols of the many that comprise it, TCP/IP has been in continual development and use for about three decades. In that time, it has evolved from an experimental technology used to hook together a handful of research computers, to the powerhouse of the largest and most complex computer network in history: the global Internet, connecting together millions of networks and end devices.

In this section I begin our magical tour through the mystical world of TCP/IP. J I begin with an overview of TCP/IP and a brief look at its very interesting history. I discuss the services provided in TCP/IP networks, and then explain the architectural model used under TCP/IP. I then provide a brief description of each of the most important TCP/IP protocols that are discussed in this Guide.

You may have noticed that this section is relatively small, even though its title seems to encompass the entire subject of this TCP/IP Guide. The reason is that this section only provides a high-level overview of TCP/IP. Most of the content of the Guide is concerned with explaining the several dozen individual protocols that comprise TCP/IP; these can be found in other sections and subsections of the Guide. For convenience, you can also find direct links to the descriptions of these protocols in the TCP/IP Protocols topic in this section.

TCP/IP Overview and History
(Page 1 of 3)

The best place to start looking at TCP/IP is probably the name itself. TCP/IP in fact consists of dozens of different protocols, but only a few are the “main” protocols that define the core operation of the suite. Of these key protocols, two are usually considered the most important. The Internet Protocol (IP) is the primary OSI network layer (layer three) protocol that provides addressing, datagram routing and other functions in an internetwork. The Transmission Control Protocol (TCP) is the primary transport layer (layer four) protocol, and is responsible for connection establishment and management and reliable data transport between software processes on devices.

Due to the importance of these two protocols, their abbreviations have come to represent the entire suite: “TCP/IP”. (In a moment we'll discover exactly the history of that name.) IP and TCP are important because many of TCP/IP's most critical functions are implemented at layers three and four. However, there is much more to TCP/IP than just TCP and IP. The protocol suite as a whole requires the work of many different protocols and technologies to make a functional network that can properly provide users with the applications they need.

TCP/IP uses its own four-layer architecture that corresponds roughly to the OSI Reference Model and provides a framework for the various protocols that comprise the suite. It also includes numerous high-level applications, some of which are well-known by Internet users who may not realize they are part of TCP/IP, such as HTTP (which runs the World Wide Web) and FTP. In the topics on TCP/IP architecture and protocols I provide an overview of most of the important TCP/IP protocols and how they fit together.

Early TCP/IP History

As I said earlier, the Internet is a primary reason why TCP/IP is what it is today. In fact, the Internet and TCP/IP are so closely related in their history that it is difficult to discuss one without also talking about the other. They were developed together, with TCP/IP providing the mechanism for implementing the Internet. TCP/IP has over the years continued to evolve to meet the needs of the Internet and also smaller, private networks that use the technology. I will provide a brief summary of the history of TCP/IP here; of course, whole books have been written on TCP/IP and Internet history, and this is a technical Guide and not a history book, so remember that this is just a quick look for sake of interest.

The TCP/IP protocols were initially developed as part of the research network developed by the United States Defense Advanced Research Projects Agency (DARPA or ARPA). Initially, this fledgling network, called the ARPAnet, was designed to use a number of protocols that had been adapted from existing technologies. However, they all had flaws or limitations, either in concept or in practical matters such as capacity, when used on the ARPAnet. The developers of the new network recognized that trying to use these existing protocols might eventually lead to problems as the ARPAnet scaled to a larger size and was adapted for newer uses and applications.

In 1973, development of a full-fledged system of internetworking protocols for the ARPAnet began. What many people don't realize is that in early versions of this technology, there was only one core protocol: TCP. And in fact, these letters didn't even stand for what they do today; they were for the Transmission Control Program. The first version of this predecessor of modern TCP was written in 1973, then revised and formally documented in RFC 675, Specification of Internet Transmission Control Program, December 1974.

TCP/IP Overview and History
(Page 2 of 3)

Modern TCP/IP Development and the Creation of TCP/IP Architecture

Testing and development of TCP continued for several years. In March 1977, version 2 of TCP was documented. In August 1977, a significant turning point came in TCP/IP’s development. Jon Postel, one of the most important pioneers of the Internet and TCP/IP, published a set of comments on the state of TCP. In that document (known as Internet Engineering Note number 2, or IEN 2), he provided what I consider superb evidence that reference models and layers aren't just for textbooks, and really are important to understand:

We are screwing up in our design of internet protocols by violating the principle of layering. Specifically we are trying to use TCP to do two things: serve as a host level end to end protocol, and to serve as an internet packaging and routing protocol. These two things should be provided in a layered and modular way. I suggest that a new distinct internetwork protocol is needed, and that TCP be used strictly as a host level end to end protocol.

-- Jon Postel, IEN 2, 1977

What Postel was essentially saying was that the version of TCP created in the mid-1970s was trying to do too much. Specifically, it was encompassing both layer three and layer four activities (in terms of OSI Reference Model layer numbers). His vision was prophetic, because we now know that having TCP handle all of these activities would have indeed led to problems down the road.

Postel's observation led to the creation of TCP/IP architecture, and the splitting of TCP into TCP at the transport layer and IP at the network layer; thus the name “TCP/IP”. (As an aside, it's interesting, given this history, that sometimes the entire TCP/IP suite is called just “IP”, even though TCP came first.) The process of dividing TCP into two portions began in version 3 of TCP, written in 1978. The first formal standard for the versions of IP and TCP used in modern networks (version 4) were created in 1980. This is why the first “real” version of IP is version 4 and not version 1. TCP/IP quickly became the standard protocol set for running the ARPAnet. In the 1980s, more and more machines and networks were connected to the evolving ARPAnet using TCP/IP protocols, and the TCP/IP Internet was born.

TCP/IP Overview and History
(Page 3 of 3)

Important Factors in the Success of TCP/IP

TCP/IP was at one time just “one of many” different sets of protocols that could be used to provide network-layer and transport-layer functionality. Today there are still other options for internetworking protocol suites, but TCP/IP is the universally-accepted world-wide standard. Its growth in popularity has been due to a number of important factors. Some of these are historical, such as the fact that it is tied to the Internet as described above, while others are related to the characteristics of the protocol suite itself. Chief among these are the following:

Integrated Addressing System: TCP/IP includes within it (as part of the Internet Protocol, primarily) a system for identifying and addressing devices on both small and large networks. The addressing system is designed to allow devices to be addressed regardless of the lower-level details of how each constituent network is constructed. Over time, the mechanisms for addressing in TCP/IP have improved, to meet the needs of growing networks, especially the Internet. The addressing system also includes a centralized administration capability for the Internet, to ensure that each device has a unique address.
Design For Routing: Unlike some network-layer protocols, TCP/IP is specifically designed to facilitate the routing of information over a network of arbitrary complexity. In fact, TCP/IP is conceptually concerned more with the connection of networks, than with the connection of devices. TCP/IP routers enable data to be delivered between devices on different networks by moving it one step at a time from one network to the next. A number of support protocols are also included in TCP/IP to allow routers to exchange critical information and manage the efficient flow of information from one network to another.
Underlying Network Independence: TCP/IP operates primarily at layers three and above, and includes provisions to allow it to function on almost any lower-layer technology, including LANs, wireless LANs and WANs of various sorts. This flexibility means that one can mix and match a variety of different underlying networks and connect them all using TCP/IP.
Scalability: One of the most amazing characteristics of TCP/IP is how scalable its protocols have proven to be. Over the decades it has proven its mettle as the Internet has grown from a small network with just a few machines to a huge internetwork with millions of hosts. While some changes have been required periodically to support this growth, these changes have taken place as part of the TCP/IP development process, and the core of TCP/IP is basically the same as it was 25 years ago.
Open Standards and Development Process: The TCP/IP standards are not proprietary, but open standards freely available to the public. Furthermore, the process used to develop TCP/IP standards is also completely open. TCP/IP standards and protocols are developed and modified using the unique, democratic “RFC” process, with all interested parties invited to participate. This ensures that anyone with an interest in the TCP/IP protocols is given a chance to provide input into their development, and also ensures the world-wide acceptance of the protocol suite.
Universality: Everyone uses TCP/IP because everyone uses it!

This last point is, perhaps ironically, arguably the most important. Not only is TCP/IP the “underlying language of the Internet”, it is also used in most private networks today. Even former “competitors” to TCP/IP such as NetWare now use TCP/IP to carry traffic. The Internet continues to grow, and so do the capabilities and functions of TCP/IP. Preparation for the future continues, with the move to the new IP version 6 protocol in its early stages. It is likely that TCP/IP will remain a big part of internetworking for the foreseeable future.

TCP/IP Services and Client/Server Operation
(Page 1 of 3)

TCP/IP is most often studied in terms of its layer-based architecture and the protocols that it provides at those different layers. And we're certainly going to do that, don't worry. These protocols, however, represent the technical details of how TCP/IP works. They are of interest to us as students of technology, but are normally hidden from users who do not need to see the “guts” of how TCP/IP works to know that it works. Before proceeding to these details, I think it might be instructive to take a “bigger picture” look at what TCP/IP does.

TCP/IP Services

In the section describing the OSI Reference Model I mentioned that the theoretical operation of the model is based on the concept of one layer providing services to the layers above it. TCP/IP covers many layers of the OSI model, and so it collectively provides services of this sort as well in many ways. Conceptually, we can divide TCP/IP services into two groups: services provided to other protocols and services provided to end users directly.

Services Provided to Other Protocols

The first group of services consists of the core functions implemented by the main TCP/IP protocols such as IP, TCP and UDP. These services are designed to actually accomplish the internetworking functions of the protocol suite. For example, at the network layer, IP provides functions such as addressing, delivery, and datagram packaging, fragmentation and reassembly. At the transport layer, TCP and UDP are concerned with encapsulating user data and managing connections between devices. Other protocols provide routing and management functionality. Higher-layer protocols use these services, allowing them to concentrate on what they are intended to accomplish.

End-User Services

The other general types of service provided by TCP/IP are end-user services. These facilitate the operation of the applications that users run to make use of the power of the Internet and other TCP/IP networks. For example, the World Wide Web (WWW) is arguably the most important Internet application. WWW services are provided through the Hypertext Transfer Protocol (HTTP), a TCP/IP application layer protocol. HTTP in turn uses services provided by lower-level protocols. All of these details are of course hidden from the end users, which is entirely on purpose!

TCP/IP Services and Client/Server Operation
(Page 2 of 3)

The TCP/IP Client/Server Structural Model

An important defining characteristic of TCP/IP services is that they primarily operate in the client/server structural model. This term refers to a system where a relatively small number of (usually powerful) server machines is dedicated to providing services to a much larger number of client hosts; I describe the concept more in the topic on network structural models in the networking fundamentals chapter. Just as client/server networking applies to hardware, this same concept can be applied to software and protocols, and this is exactly what was done in the design of TCP/IP protocols and applications.

TCP/IP protocols are not set up so that two machines that want to communicate use identical software. Instead, a conscious decision was made to make communication function using matched, complementary pairs of client and server software. The client initiates communication by sending a request to a server for data or other information. The server then responds with a reply to the client, giving the client what it requested, or else an alternative response such as an error message or information about where else it might find the data. Most (but not all) TCP/IP functions work in this manner, which is illustrated in Figure 19.

**Figure 19: TCP/IP Client/Server Operation**

Most TCP/IP protocols involve communication between two devices, but the two rarely act as peers in the communication; one acts as the *client* and the other as the *server*. This simplified illustration shows a common example—a World Wide Web transaction using the Hypertext Transfer Protocol (HTTP). The Web browser is an HTTP client and initiates the communication with a request for a file or other resource sent over the Internet to a Web site, which is an HTTP server. The server then responds to the client with the information requested. Servers will generally respond to many clients simultaneously.

There are numerous advantages to client/server operation in TCP/IP. Just as client hardware and server hardware can be tailored to their very different jobs, client software and the server software can also be optimized to perform their jobs as efficiently as possible. Let's take again the WWW as another example. To get information from the Web, a Web client software (usually called a browser) sends requests to a Web server. The Web server then responds with the requested content. (There's more to it than that, of course, but that's how it appears to the user.) The Web browser is created to provide the interface to the user and to talk to Web servers; the Web server software is very different, generally consisting only of high-powered software that receives and responds to requests.

TCP/IP Architecture and the TCP/IP Model
(Page 1 of 3)

The OSI reference model consists of seven layers that represent a functional division of the tasks required to implement a network. It is a conceptual tool that I often use to show how various protocols and technologies fit together to implement networks. However, it's not the only networking model that attempts to divide tasks into layers and components. The TCP/IP protocol suite was in fact created before the OSI Reference Model; as such, its inventors didn't use the OSI model to explain TCP/IP architecture (even though the OSI model is often used in TCP/IP discussions today, as you will see in this Guide, believe me.)

The TCP/IP Model

The developers of the TCP/IP protocol suite created their own architectural model to help describe its components and functions. This model goes by different names, including the TCP/IP model, the DARPA model (after the agency that was largely responsible for developing TCP/IP) and the DOD model (after the United States Department of Defense, the “D” in “DARPA”). I just call it the TCP/IP model since this seems the simplest designation for modern times.

Regardless of the model you use to represent the function of a network—and regardless of what you call that model!—the functions that the model represents are pretty much the same. This means that the TCP/IP and the OSI models are really quite similar in nature even if they don't carve up the network functionality pie in precisely the same way. There is a fairly natural correspondence between the TCP/IP and OSI layers, it just isn't always a “one-to-one” relationship. Since the OSI model is used so widely, it is common to explain the TCP/IP architecture both in terms of the TCP/IP layers and the corresponding OSI layers, and that's what I will now do.

TCP/IP Architecture and the TCP/IP Model
(Page 2 of 3)

TCP/IP Model Layers

The TCP/IP model uses four layers that logically span the equivalent of the top six layers of the OSI reference model; this is shown in Figure 20. (The physical layer is not covered by the TCP/IP model because the data link layer is considered the point at which the interface occurs between the TCP/IP stack and the underlying networking hardware.) The following are the TCP/IP model layers, starting from the bottom.

**Figure 20: OSI Reference Model and TCP/IP Model Layers**

The TCP/IP architectural model has four layers that approximately match six of the seven layers in the OSI Reference Model. The TCP/IP model does not address the physical layer, which is where hardware devices reside. The next three layers—*network interface*, *internet* and *(host-to-host) transport*—correspond to layers 2, 3 and 4 of the OSI model. The TCP/IP *application* layer conceptually “blurs” the top three OSI layers. It’s also worth noting that some people consider certain aspects of the OSI session layer to be arguably part of the TCP/IP host-to-host transport layer.

Network Interface Layer

As its name suggests, this layer represents the place where the actual TCP/IP protocols running at higher layers interface to the local network. This layer is somewhat “controversial” in that some people don't even consider it a “legitimate” part of TCP/IP. This is usually because none of the core IP protocols run at this layer. Despite this, the network interface layer is part of the architecture. It is equivalent to the data link layer (layer two) in the OSI Reference Model and is also sometimes called the link layer. You may also see the name network access layer.

On many TCP/IP networks, there is no TCP/IP protocol running at all on this layer, because it is simply not needed. For example, if you run TCP/IP over an Ethernet, then Ethernet handles layer two (and layer one) functions. However, the TCP/IP standards do define protocols for TCP/IP networks that do not have their own layer two implementation. These protocols, the Serial Line Interface Protocol (SLIP) and the Point-to-Point Protocol (PPP), serve to fill the gap between the network layer and the physical layer. They are commonly used to facilitate TCP/IP over direct serial line connections (such as dial-up telephone networking) and other technologies that operate directly at the physical layer.

TCP/IP Architecture and the TCP/IP Model
(Page 3 of 3)

Internet Layer

This layer corresponds to the network layer in the OSI Reference Model (and for that reason is sometimes called the network layer even in TCP/IP model discussions). It is responsible for typical layer three jobs, such as logical device addressing, data packaging, manipulation and delivery, and last but not least, routing. At this layer we find the Internet Protocol (IP), arguably the heart of TCP/IP, as well as support protocols such as ICMP and the routing protocols (RIP, OSFP, BGP, etc.) The new version of IP, called IP version 6, will be used for the Internet of the future and is of course also at this layer.

(Host-to-Host) Transport Layer

This primary job of this layer is to facilitate end-to-end communication over an internetwork. It is in charge of allowing logical connections to be made between devices to allow data to be sent either unreliably (with no guarantee that it gets there) or reliably (where the protocol keeps track of the data sent and received to make sure it arrives, and re-sends it if necessary). It is also here that identification of the specific source and destination application process is accomplished

The formal name of this layer is often shortened to just the transport layer; the key TCP/IP protocols at this layer are the Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). The TCP/IP transport layer corresponds to the layer of the same name in the OSI model (layer four) but includes certain elements that are arguably part of the OSI session layer. For example, TCP establishes a connection that can persist for a long period of time, which some people say makes a TCP connection more like a session.

Application Layer

This is the highest layer in the TCP/IP model. It is a rather broad layer, encompassing layers five through seven in the OSI model. While this seems to represent a loss of detail compared to the OSI model, I think this is probably a good thing! The TCP/IP model better reflects the “blurry” nature of the divisions between the functions of the higher layers in the OSI model, which in practical terms often seem rather arbitrary. It really is hard to separate some protocols in terms of which of layers five, six or seven they encompass. (I didn't even bother to try in this Guide which is why the higher-level protocols are all in the same chapter, while layers one through four have their protocols listed separately.)

Numerous protocols reside at the application layer. These include application protocols such as HTTP, FTP and SMTP for providing end-user services, as well as administrative protocols like SNMP, DHCP and DNS.

Note: The internet and host-to-host transport layers are usually considered the “core” of TCP/IP architecture, since they contain most of the key protocols that implement TCP/IP internetworks.

In the topic that follows I provide a brief look at each of the TCP/IP protocols covered in detail in this Guide and more detail on where they all fit into the TCP/IP architecture. There I will also cover a couple of protocols that don't really fit into the TCP/IP layer model at all.

Key Concept: The architecture of the TCP/IP protocol suite is often described in terms of a layered reference model called the TCP/IP model, DARPA model or DOD model. The TCP/IP model includes four layers: the network interface layer (responsible for interfacing the suite to the physical hardware on which it runs), the internet layer (where device addressing, basic datagram communication and routing take place), the host-to-host transport layer (where connections are managed and reliable communication is ensured) and the application layer (where end-user applications and services reside.) The first three layers correspond to layers two through four of the OSI Reference Model respectively; the application layer is equivalent to OSI layers five to seven

TCP/IP Protocols
(Page 1 of 4)

Since TCP/IP is a protocol suite, it is most often discussed in terms of the protocols that comprise it. Each protocol “resides” in a particular layer of the TCP/IP architectural model we saw earlier in this section. Every TCP/IP protocol is charged with performing a certain subset of the total functionality required to implement a TCP/IP network or application. They work together to allow TCP/IP as a whole to operate.

First, a quick word on the word “protocol”. You will sometimes hear TCP/IP called just a “protocol” instead of a “protocol suite”. This is a simplification that while technically incorrect, is widely used. I believe it arises in large part due to Microsoft referring to protocol suites as “protocols” in their operating systems. I discuss this issue in more detail in a topic devoted to protocols in the networking fundamentals chapter.

As I mentioned earlier in this section, there are a few TCP/IP protocols that are usually called the “core” of the suite, because they are responsible for its basic operation. Which protocols to include in this category is a matter of some conjecture, but most people would definitely include here the main protocols at the internet and transport layers: the Internet Protocol (IP), Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). These core protocols support many other protocols, to perform a variety of functions at each of the TCP/IP model layers. Still others enable user applications to function.

On the whole, there are many hundreds of TCP/IP protocols and applications, and I could not begin to cover each and every one in this Guide. I do include sections discussing several dozen of the protocols that I consider important for one reason or another. Full coverage of each of these protocols (to varying levels of detail) can be found in the other chapters of this Guide.

Below I have included a number of tables that provide a summary of each of the TCP/IP protocols discussed in this Guide. Each table covers one of the TCP/IP model layers, in order from lowest to highest, and I have provided links to the sections or topics where each is discussed. The organization of protocols in the TCP/IP suite can also be seen at a glance in Figure 21.

Network Interface Layer (OSI Layer 2) Protocols

TCP/IP includes two protocols at the network interface layer, SLIP and PPP, which are described in Table 19..

**Table 19: TCP/IP Protocols: Network Interface Layer (OSI Layer 2)**
Protocol Name	Protocol Abbr.	Description
Serial Line Interface Protocol (SLIP)	SLIP	Provides basic TCP/IP functionality by creating a layer-two connection between two devices over a serial line.
Point-to-Point Protocol	PPP	Provides layer-two connectivity like SLIP, but is much more sophisticated and capable. PPP is itself a suite of protocols (“sub-protocols” if you will) that allow for functions such as authentication, data encapsulation, encryption and aggregation, facilitating TCP/IP operation over WAN links.

Network Interface / Network Layer (“OSI Layer 2/3”) Protocols

Table 20 describes ARP and RARP, the “oddballs” of the TCP/IP suite. In some ways they belong in both layer two and layer three, and in other ways neither. They really serve to link together the network interface layer and the internet layer. For this reason, I really believe they belong between these two and call them “layer connection” protocols. See the section devoted to these protocols and their unique layer for more on this issue.

**Table 20: TCP/IP Protocols: Network Interface / Network Layer (“OSI Layer 2/3”)**
Protocol Name	Protocol Abbr.	Description
Address Resolution Protocol	ARP	Used to map layer three IP addresses to layer two physical network addresses.
Reverse Address Resolution Protocol	RARP	Determines the layer three address of a machine from its layer two address. Now mostly superseded by BOOTP and DHCP.

TCP/IP Protocols
(Page 3 of 4)

Network Layer (OSI Layer 3) Protocols

The very important network layer contains the Internet Protocol and several related and support protocols, as shown in Table 21.

**Table 21: TCP/IP Protocols: Network Layer (OSI Layer 3)**
Protocol Name	Protocol Abbr.	Description
Internet Protocol, Internet Protocol Version 6	IP, IPv6	Provides encapsulation and connectionless delivery of transport layer messages over a TCP/IP network. Also responsible for addressing and routing functions.
IP Network Address Translation	IP NAT	Allows addresses on a private network to be automatically translated to different addresses on a public network, providing address sharing and security benefits. (Note that some people don’t consider IP NAT to be a protocol in the strict sense of that word.)
IP Security	IPSec	A set of IP-related protocols that improve the security of IP transmissions.
Internet Protocol Mobility Support	Mobile IP	Resolves certain problems with IP associated with mobile devices.
Internet Control Message Protocol	ICMP/ICMPv4, ICMPv6	A “support protocol” for IP and IPv6 that provides error-reporting and information request-and-reply capabilities to hosts.
Neighbor Discovery Protocol	ND	A new “support protocol” for IPv6 that includes several functions performed by ARP and ICMP in conventional IP.
Routing Information Protocol, Open Shortest Path First, Gateway-to-Gateway Protocol, HELLO Protocol, Interior Gateway Routing Protocol, Enhanced Interior Gateway Routing Protocol, Border Gateway Protocol, Exterior Gateway Protocol	RIP, OSPF, GGP, HELLO, IGRP, EIGRP, BGP, EGP	Protocols used to support the routing of IP datagrams and the exchange of routing information.

Host-to-Host Transport Layer (OSI Layer 4) Protocols

The transport layer contains the essential protocols TCP and UDP, as shown in Table 22.

**Table 22: TCP/IP Protocols: Host-to-Host Transport Layer (OSI Layer 4)**
Protocol Name	Protocol Abbr.	Description
Transmission Control Protocol	TCP	The main transport layer protocol for TCP/IP. Establishes and manages connections between devices and ensures reliable and flow-controlled delivery of data using IP.
User Datagram Protocol	UDP	A transport protocol that can be considered a “severely stripped-down” version of TCP. It is used to send data in a simple way between application processes, without the many reliability and flow management features of TCP, but often with greater efficiency.

TCP/IP Protocols
(Page 4 of 4)

Application Layer (OSI Layer 5/6/7) Protocols

As discussed in the topic on the TCP/IP model, in TCP/IP the single application layer covers the equivalent of OSI layers 5, 6 and 7. The application protocols covered in this Guide are shown in Table 23.

**Table 23: TCP/IP Protocols: Application Layer (OSI Layer 5/6/7)**
Protocol Name	Protocol Abbr.	Description
Domain Name System	DNS	Provides the ability to refer to IP devices using names instead of just numerical IP addresses. Allows machines to resolve these names into their corresponding IP addresses.
Network File System	NFS	Allows files to be shared seamlessly across TCP/IP networks.
Bootstrap Protocol	BOOTP	Developed to address some of the issues with RARP and used in a similar manner: to allow the configuration of a TCP/IP device at startup. Generally superseded by DHCP.
Dynamic Host Configuration Protocol	DHCP	A complete protocol for configuring TCP/IP devices and managing IP addresses. The successor to RARP and BOOTP, it includes numerous features and capabilities.
Simple Network Management Protocol	SNMP	A full-featured protocol for remote management of networks and devices.
Remote Monitoring	RMON	A diagnostic “protocol” (really a part of SNMP) used for remote monitoring of network devices.
File Transfer Protocol, Trivial File Transfer Protocol	FTP, TFTP	Protocols designed to permit the transfer of all types of files from one device to another.
RFC 822, Multipurpose Internet Mail Extensions, Simple Mail Transfer Protocol, Post Office Protocol, Internet Message Access Protocol	RFC 822, MIME, SMTP, POP, IMAP	Protocols that define the formatting, delivery and storage of electronic mail messages on TCP/IP networks.
Network News Transfer Protocol	NNTP	Enables the operation of the Usenet online community by transferring Usenet news messages between hosts.
Hypertext Transfer Protocol	HTTP	Transfers hypertext documents between hosts; implements the World Wide Web.
Gopher Protocol	Gopher	An older document retrieval protocol, now largely replaced by the World Wide Web.
Telnet Protocol	Telnet	Allows a user on one machine to establish a remote terminal session on another.
Berkeley “r” Commands	—	Permit commands and operations on one machine to be performed on another.
Internet Relay Chat	IRC	Allows real-time chat between TCP/IP users.
Administration and Troubleshooting Utilities and Protocols	—	A collection of software tools that allows administrators to manage, configure and troubleshoot TCP/IP internetworks.

END OF VOLUME 1

NOTE: THIS IS DRAFT 5 AND SUBJECT TO REVISION
--SCOTT (scott-sh@acsa.net)

LEGAL NOTICE: No plagarism is intended, the notes above are from various points of the web, and may contain commercial references, and are included here for educational and non-commercial / non profit purposes.