716 lines
37 KiB
Text
716 lines
37 KiB
Text
|
==Phrack Inc.==
|
||
|
|
||
|
Volume Three, Issue 28, File #3 of 12
|
||
|
|
||
|
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
|
||
|
<> <>
|
||
|
<> Introduction to the Internet Protocols <>
|
||
|
<> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <>
|
||
|
<> Chapter Eight Of The Future Transcendent Saga <>
|
||
|
<> <>
|
||
|
<> Part One of Two Files <>
|
||
|
<> <>
|
||
|
<> Presented by Knight Lightning <>
|
||
|
<> July 3, 1989 <>
|
||
|
<> <>
|
||
|
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
|
||
|
|
||
|
|
||
|
Prologue
|
||
|
~~~~~~~~
|
||
|
Much of the material in this file comes from "Introduction to the
|
||
|
Internet Protocols" by Charles L. Hedrick of Rutgers University.
|
||
|
That material is copyrighted and is used in this file by
|
||
|
permission. Time differention and changes in the wide area
|
||
|
networks have made it necessary for some details of the file to
|
||
|
updated and in some cases reworded for better understanding of
|
||
|
our readers. Also, Unix is a trademark of AT&T Technologies,
|
||
|
Inc. -- Just thought I'd let you know.
|
||
|
|
||
|
If you are not already familiar with TCP/IP, I would suggest that
|
||
|
you read "Introduction to MIDNET" (Phrack Inc., Volume Three,
|
||
|
Issue 27, File 3 of 12) for more information. That file is
|
||
|
Chapter Seven of The Future Transcendent Saga and contains
|
||
|
information about TCP/IP and how it is used within the National
|
||
|
Science Foundation Network (NSFnet).
|
||
|
|
||
|
|
||
|
Table of Contents - Part One
|
||
|
~~~~~~~~~~~~~~~~~
|
||
|
* Introduction
|
||
|
* What Is TCP/IP?
|
||
|
* General Description Of The TCP/IP Protocols
|
||
|
The TCP Level
|
||
|
The IP Level
|
||
|
The Ethernet Level
|
||
|
|
||
|
|
||
|
Introduction
|
||
|
~~~~~~~~~~~~
|
||
|
This article is a brief introduction to TCP/IP, followed by
|
||
|
suggestions on what to read for more information. This is not
|
||
|
intended to be a complete description, but it can give you a
|
||
|
reasonable idea of the capabilities of the protocols. However,
|
||
|
if you need to know any details of the technology, you will want
|
||
|
to read the standards yourself.
|
||
|
|
||
|
Throughout the article, you will find references to the
|
||
|
standards, in the form of "RFC" (Request For Comments) or "IEN"
|
||
|
(Internet Engineering Notes) numbers -- these are document
|
||
|
numbers. The final section (in Part Two) explains how you can
|
||
|
get copies of those standards.
|
||
|
|
||
|
|
||
|
What Is TCP/IP?
|
||
|
~~~~~~~~~~~~~~~
|
||
|
TCP/IP is a set of protocols developed to allow cooperating
|
||
|
computers to share resources across a network. It was developed
|
||
|
by a community of researchers centered around the ARPAnet.
|
||
|
|
||
|
First some basic definitions; The most accurate name for the set
|
||
|
of protocols I am describing is the "Internet protocol suite."
|
||
|
TCP and IP are two of the protocols in this suite (they will be
|
||
|
described below). Because TCP and IP are the best known of the
|
||
|
protocols, it has become common to use the term TCP/IP to refer
|
||
|
to the whole family.
|
||
|
|
||
|
The Internet is a collection of networks, including the Arpanet,
|
||
|
NSFnet, regional networks such as MIDnet (described in Chapter
|
||
|
Seven of the Future Transcendent Saga), local networks at a
|
||
|
number of University and research institutions, and a number of
|
||
|
military networks. The term "Internet" applies to this entire
|
||
|
set of networks.
|
||
|
|
||
|
The subset of them that is managed by the Department of Defense
|
||
|
is referred to as the "DDN" (Defense Data Network). This
|
||
|
includes some research-oriented networks, such as the ARPAnet, as
|
||
|
well as more strictly military ones (because much of the funding
|
||
|
for Internet protocol developments is done via the DDN
|
||
|
organization, the terms Internet and DDN can sometimes seem
|
||
|
equivalent).
|
||
|
|
||
|
All of these networks are connected to each other. Users can
|
||
|
send messages from any of them to any other, except where there
|
||
|
are security or other policy restrictions on access. Officially
|
||
|
speaking, the Internet protocol documents are simply standards
|
||
|
adopted by the Internet community for its own use. The
|
||
|
Department of Defense once issued a MILSPEC definition of TCP/IP
|
||
|
that was intended to be a more formal definition, appropriate for
|
||
|
use in purchasing specifications. However most of the TCP/IP
|
||
|
community continues to use the Internet standards. The MILSPEC
|
||
|
version is intended to be consistent with it.
|
||
|
|
||
|
Whatever it is called, TCP/IP is a family of protocols. A few
|
||
|
provide "low-level" functions needed for many applications.
|
||
|
These include IP, TCP, and UDP (all of which will be described in
|
||
|
a bit more detail later in this file). Others are protocols for
|
||
|
doing specific tasks, e.g. transferring files between computers,
|
||
|
sending mail, or finding out who is logged in on another
|
||
|
computer.
|
||
|
|
||
|
Initially TCP/IP was used mostly between minicomputers or
|
||
|
mainframes. These machines had their own disks, and generally
|
||
|
were self-contained. Thus the most important "traditional"
|
||
|
TCP/IP services are:
|
||
|
|
||
|
- File Transfer -- The file transfer protocol (FTP) allows a
|
||
|
user on any computer to get files from another computer, or
|
||
|
to send files to another computer. Security is handled by
|
||
|
requiring the user to specify a user name and password for
|
||
|
the other computer.
|
||
|
|
||
|
Provisions are made for handling file transfer between
|
||
|
machines with different character set, end of line
|
||
|
conventions, etc. This is not quite the same as "network
|
||
|
file system" or "netbios" protocols, which will be
|
||
|
described later. Instead, FTP is a utility that you run
|
||
|
any time you want to access a file on another system. You
|
||
|
use it to copy the file to your own system. You then can
|
||
|
work with the local copy. (See RFC 959 for specifications
|
||
|
for FTP.)
|
||
|
|
||
|
- Remote Login -- The network terminal protocol (TELNET)
|
||
|
allows a user to log in on any other computer on the
|
||
|
network. You start a remote session by specifying a
|
||
|
computer to connect to. From that time until you finish
|
||
|
the session, anything you type is sent to the other
|
||
|
computer. Note that you are really still talking to your
|
||
|
own computer, but the telnet program effectively makes your
|
||
|
computer invisible while it is running. Every character
|
||
|
you type is sent directly to the other system. Generally,
|
||
|
the connection to the remote computer behaves much like a
|
||
|
dialup connection. That is, the remote system will ask you
|
||
|
to log in and give a password, in whatever manner it would
|
||
|
normally ask a user who had just dialed it up.
|
||
|
|
||
|
When you log off of the other computer, the telnet program
|
||
|
exits, and you will find yourself talking to your own
|
||
|
computer. Microcomputer implementations of telnet
|
||
|
generally include a terminal emulator for some common type
|
||
|
of terminal. (See RFCs 854 and 855 for specifications for
|
||
|
telnet. By the way, the telnet protocol should not be
|
||
|
confused with Telenet, a vendor of commercial network
|
||
|
services.)
|
||
|
|
||
|
- Computer Mail -- This allows you to send messages to users
|
||
|
on other computers. Originally, people tended to use only
|
||
|
one or two specific computers and they would maintain "mail
|
||
|
files" on those machines. The computer mail system is
|
||
|
simply a way for you to add a message to another user's
|
||
|
mail file. There are some problems with this in an
|
||
|
environment where microcomputers are used.
|
||
|
|
||
|
The most serious is that a micro is not well suited to
|
||
|
receive computer mail. When you send mail, the mail
|
||
|
software expects to be able to open a connection to the
|
||
|
addressee's computer, in order to send the mail. If this
|
||
|
is a microcomputer, it may be turned off, or it may be
|
||
|
running an application other than the mail system. For
|
||
|
this reason, mail is normally handled by a larger system,
|
||
|
where it is practical to have a mail server running all the
|
||
|
time. Microcomputer mail software then becomes a user
|
||
|
interface that retrieves mail from the mail server. (See
|
||
|
RFC 821 and 822 for specifications for computer mail. See
|
||
|
RFC 937 for a protocol designed for microcomputers to use
|
||
|
in reading mail from a mail server.)
|
||
|
|
||
|
These services should be present in any implementation of TCP/IP,
|
||
|
except that micro-oriented implementations may not support
|
||
|
computer mail. These traditional applications still play a very
|
||
|
important role in TCP/IP-based networks. However more recently,
|
||
|
the way in which networks are used has been changing. The older
|
||
|
model of a number of large, self-sufficient computers is
|
||
|
beginning to change. Now many installations have several kinds
|
||
|
of computers, including microcomputers, workstations,
|
||
|
minicomputers, and mainframes. These computers are likely to be
|
||
|
configured to perform specialized tasks. Although people are
|
||
|
still likely to work with one specific computer, that computer
|
||
|
will call on other systems on the net for specialized services.
|
||
|
This has led to the "server/client" model of network services. A
|
||
|
server is a system that provides a specific service for the rest
|
||
|
of the network. A client is another system that uses that
|
||
|
service. Note that the server and client need not be on
|
||
|
different computers. They could be different programs running on
|
||
|
the same computer. Here are the kinds of servers typically
|
||
|
present in a modern computer setup. Also note that these
|
||
|
computer services can all be provided within the framework of
|
||
|
TCP/IP.
|
||
|
|
||
|
- Network file systems. This allows a system to access files on
|
||
|
another computer in a somewhat more closely integrated fashion
|
||
|
than FTP. A network file system provides the illusion that
|
||
|
disks or other devices from one system are directly connected
|
||
|
to other systems. There is no need to use a special network
|
||
|
utility to access a file on another system. Your computer
|
||
|
simply thinks it has some extra disk drives. These extra
|
||
|
"virtual" drives refer to the other system's disks. This
|
||
|
capability is useful for several different purposes. It lets
|
||
|
you put large disks on a few computers, but still give others
|
||
|
access to the disk space. Aside from the obvious economic
|
||
|
benefits, this allows people working on several computers to
|
||
|
share common files. It makes system maintenance and backup
|
||
|
easier, because you don't have to worry about updating and
|
||
|
backing up copies on lots of different machines. A number of
|
||
|
vendors now offer high-performance diskless computers. These
|
||
|
computers have no disk drives at all. They are entirely
|
||
|
dependent upon disks attached to common "file servers". (See
|
||
|
RFC's 1001 and 1002 for a description of PC-oriented NetBIOS
|
||
|
over TCP. In the workstation and minicomputer area, Sun's
|
||
|
Network File System is more likely to be used. Protocol
|
||
|
specifications for it are available from Sun Microsystems.) -
|
||
|
remote printing. This allows you to access printers on other
|
||
|
computers as if they were directly attached to yours. (The
|
||
|
most commonly used protocol is the remote lineprinter protocol
|
||
|
from Berkeley Unix. Unfortunately, there is no protocol
|
||
|
document for this. However the C code is easily obtained from
|
||
|
Berkeley, so implementations are common.)
|
||
|
|
||
|
- Remote execution. This allows you to request that a
|
||
|
particular program be run on a different computer. This is
|
||
|
useful when you can do most of your work on a small computer,
|
||
|
but a few tasks require the resources of a larger system.
|
||
|
There are a number of different kinds of remote execution.
|
||
|
Some operate on a command by command basis. That is, you
|
||
|
request that a specific command or set of commands should run
|
||
|
on some specific computer. (More sophisticated versions will
|
||
|
choose a system that happens to be free.) However there are
|
||
|
also "remote procedure call" systems that allow a program to
|
||
|
call a subroutine that will run on another computer. (There
|
||
|
are many protocols of this sort. Berkeley Unix contains two
|
||
|
servers to execute commands remotely: rsh and rexec. The
|
||
|
Unix "man" pages describe the protocols that they use. The
|
||
|
user-contributed software with Berkeley 4.3 contains a
|
||
|
"distributed shell" that will distribute tasks among a set of
|
||
|
systems, depending upon load.
|
||
|
|
||
|
- Name servers. In large installations, there are a number of
|
||
|
different collections of names that have to be managed. This
|
||
|
includes users and their passwords, names and network
|
||
|
addresses for computers, and accounts. It becomes very
|
||
|
tedious to keep this data up to date on all of the computers.
|
||
|
Thus the databases are kept on a small number of systems.
|
||
|
Other systems access the data over the network. (RFC 822 and
|
||
|
823 describe the name server protocol used to keep track of
|
||
|
host names and Internet addresses on the Internet. This is
|
||
|
now a required part of any TCP/IP implementation. IEN 116
|
||
|
describes an older name server protocol that is used by a few
|
||
|
terminal servers and other products to look up host names.
|
||
|
Sun's Yellow Pages system is designed as a general mechanism
|
||
|
to handle user names, file sharing groups, and other databases
|
||
|
commonly used by Unix systems. It is widely available
|
||
|
commercially. Its protocol definition is available from Sun.)
|
||
|
|
||
|
- Terminal servers. Many installations no longer connect
|
||
|
terminals directly to computers. Instead they connect them to
|
||
|
terminal servers. A terminal server is simply a small
|
||
|
computer that only knows how to run telnet (or some other
|
||
|
protocol to do remote login). If your terminal is connected
|
||
|
to one of these, you simply type the name of a computer, and
|
||
|
you are connected to it. Generally it is possible to have
|
||
|
active connections to more than one computer at the same time.
|
||
|
The terminal server will have provisions to switch between
|
||
|
connections rapidly, and to notify you when output is waiting
|
||
|
for another connection. (Terminal servers use the telnet
|
||
|
protocol, already mentioned. However any real terminal server
|
||
|
will also have to support name service and a number of other
|
||
|
protocols.)
|
||
|
|
||
|
- Network-oriented window systems. Until recently,
|
||
|
high-performance graphics programs had to execute on a
|
||
|
computer that had a bit-mapped graphics screen directly
|
||
|
attached to it. Network window systems allow a program to use
|
||
|
a display on a different computer. Full-scale network window
|
||
|
systems provide an interface that lets you distribute jobs to
|
||
|
the systems that are best suited to handle them, but still
|
||
|
give you a single graphically-based user interface. (The most
|
||
|
widely-implemented window system is X. A protocol description
|
||
|
is available from MIT's Project Athena. A reference
|
||
|
implementation is publically available from MIT. A number of
|
||
|
vendors are also supporting NeWS, a window system defined by
|
||
|
Sun. Both of these systems are designed to use TCP/IP.)
|
||
|
|
||
|
Note that some of the protocols described above were designed by
|
||
|
Berkeley, Sun, or other organizations. Thus they are not
|
||
|
officially part of the Internet protocol suite. However they are
|
||
|
implemented using TCP/IP, just as normal TCP/IP application
|
||
|
protocols are. Since the protocol definitions are not considered
|
||
|
proprietary, and since commercially-supported implementations are
|
||
|
widely available, it is reasonable to think of these protocols as
|
||
|
being effectively part of the Internet suite.
|
||
|
|
||
|
Note that the list above is simply a sample of the sort of
|
||
|
services available through TCP/IP. However it does contain the
|
||
|
majority of the "major" applications. The other commonly-used
|
||
|
protocols tend to be specialized facilities for getting
|
||
|
information of various kinds, such as who is logged in, the time
|
||
|
of day, etc. However if you need a facility that is not listed
|
||
|
here, I encourage you to look through the current edition of
|
||
|
Internet Protocols (currently RFC 1011), which lists all of the
|
||
|
available protocols, and also to look at some of the major TCP/IP
|
||
|
implementations to see what various vendors have added.
|
||
|
|
||
|
|
||
|
General Description Of The TCP/IP Protocols
|
||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
TCP/IP is a layered set of protocols. In order to understand
|
||
|
what this means, it is useful to look at an example. A typical
|
||
|
situation is sending mail. First, there is a protocol for mail.
|
||
|
This defines a set of commands which one machine sends to
|
||
|
another, e.g. commands to specify who the sender of the message
|
||
|
is, who it is being sent to, and then the text of the message.
|
||
|
However this protocol assumes that there is a way to communicate
|
||
|
reliably between the two computers. Mail, like other application
|
||
|
protocols, simply defines a set of commands and messages to be
|
||
|
sent. It is designed to be used together with TCP and IP.
|
||
|
|
||
|
TCP is responsible for making sure that the commands get through
|
||
|
to the other end. It keeps track of what is sent, and
|
||
|
retransmitts anything that did not get through. If any message
|
||
|
is too large for one datagram, e.g. the text of the mail, TCP
|
||
|
will split it up into several datagrams, and make sure that they
|
||
|
all arrive correctly. Since these functions are needed for many
|
||
|
applications, they are put together into a separate protocol,
|
||
|
rather than being part of the specifications for sending mail.
|
||
|
You can think of TCP as forming a library of routines that
|
||
|
applications can use when they need reliable network
|
||
|
communications with another computer.
|
||
|
|
||
|
Similarly, TCP calls on the services of IP. Although the
|
||
|
services that TCP supplies are needed by many applications, there
|
||
|
are still some kinds of applications that don't need them.
|
||
|
However there are some services that every application needs. So
|
||
|
these services are put together into IP. As with TCP, you can
|
||
|
think of IP as a library of routines that TCP calls on, but which
|
||
|
is also available to applications that don't use TCP. This
|
||
|
strategy of building several levels of protocol is called
|
||
|
"layering." I like to think of the applications programs such as
|
||
|
mail, TCP, and IP, as being separate "layers," each of which
|
||
|
calls on the services of the layer below it. Generally, TCP/IP
|
||
|
applications use 4 layers:
|
||
|
|
||
|
- An application protocol such as mail.
|
||
|
|
||
|
- A protocol such as TCP that provides services need by many
|
||
|
applications.
|
||
|
|
||
|
- IP, which provides the basic service of getting datagrams to
|
||
|
their destination.
|
||
|
|
||
|
- The protocols needed to manage a specific physical medium, such
|
||
|
as Ethernet or a point to point line.
|
||
|
|
||
|
TCP/IP is based on the "catenet model." (This is described in
|
||
|
more detail in IEN 48.) This model assumes that there are a
|
||
|
large number of independent networks connected together by
|
||
|
gateways. The user should be able to access computers or other
|
||
|
resources on any of these networks. Datagrams will often pass
|
||
|
through a dozen different networks before getting to their final
|
||
|
destination. The routing needed to accomplish this should be
|
||
|
completely invisible to the user. As far as the user is
|
||
|
concerned, all he needs to know in order to access another system
|
||
|
is an "Internet address." This is an address that looks like
|
||
|
128.6.4.194. It is actually a 32-bit number. However it is
|
||
|
normally written as 4 decimal numbers, each representing 8 bits
|
||
|
of the address. (The term "octet" is used by Internet
|
||
|
documentation for such 8-bit chunks. The term "byte" is not
|
||
|
used, because TCP/IP is supported by some computers that have
|
||
|
byte sizes other than 8 bits.)
|
||
|
|
||
|
Generally the structure of the address gives you some information
|
||
|
about how to get to the system. For example, 128.6 is a network
|
||
|
number assigned by a central authority to Rutgers University.
|
||
|
Rutgers uses the next octet to indicate which of the campus
|
||
|
Ethernets is involved. 128.6.4 happens to be an Ethernet used by
|
||
|
the Computer Science Department. The last octet allows for up to
|
||
|
254 systems on each Ethernet. (It is 254 because 0 and 255 are
|
||
|
not allowed, for reasons that will be discussed later.) Note
|
||
|
that 128.6.4.194 and 128.6.5.194 would be different systems. The
|
||
|
structure of an Internet address is described in a bit more
|
||
|
detail later.
|
||
|
|
||
|
Of course I normally refer to systems by name, rather than by
|
||
|
Internet address. When I specify a name, the network software
|
||
|
looks it up in a database, and comes up with the corresponding
|
||
|
Internet address. Most of the network software deals strictly in
|
||
|
terms of the address. (RFC 882 describes the name server
|
||
|
technology used to handle this lookup.)
|
||
|
|
||
|
TCP/IP is built on "connectionless" technology. Information is
|
||
|
transfered as a sequence of "datagrams." A datagram is a
|
||
|
collection of data that is sent as a single message. Each of
|
||
|
these datagrams is sent through the network individually. There
|
||
|
are provisions to open connections (i.e. to start a conversation
|
||
|
that will continue for some time). However at some level,
|
||
|
information from those connections is broken up into datagrams,
|
||
|
and those datagrams are treated by the network as completely
|
||
|
separate. For example, suppose you want to transfer a 15000
|
||
|
octet file. Most networks can't handle a 15000 octet datagram.
|
||
|
So the protocols will break this up into something like 30
|
||
|
500-octet datagrams. Each of these datagrams will be sent to the
|
||
|
other end. At that point, they will be put back together into
|
||
|
the 15000-octet file. However while those datagrams are in
|
||
|
transit, the network doesn't know that there is any connection
|
||
|
between them. It is perfectly possible that datagram 14 will
|
||
|
actually arrive before datagram 13. It is also possible that
|
||
|
somewhere in the network, an error will occur, and some datagram
|
||
|
won't get through at all. In that case, that datagram has to be
|
||
|
sent again.
|
||
|
|
||
|
Note by the way that the terms "datagram" and "packet" often seem
|
||
|
to be nearly interchangable. Technically, datagram is the right
|
||
|
word to use when describing TCP/IP. A datagram is a unit of
|
||
|
data, which is what the protocols deal with. A packet is a
|
||
|
physical thing, appearing on an Ethernet or some wire. In most
|
||
|
cases a packet simply contains a datagram, so there is very
|
||
|
little difference. However they can differ. When TCP/IP is used
|
||
|
on top of X.25, the X.25 interface breaks the datagrams up into
|
||
|
128-byte packets. This is invisible to IP, because the packets
|
||
|
are put back together into a single datagram at the other end
|
||
|
before being processed by TCP/IP. So in this case, one IP
|
||
|
datagram would be carried by several packets. However with most
|
||
|
media, there are efficiency advantages to sending one datagram
|
||
|
per packet, and so the distinction tends to vanish.
|
||
|
|
||
|
|
||
|
* The TCP level
|
||
|
|
||
|
Two separate protocols are involved in handling TCP/IP datagrams.
|
||
|
TCP (the "transmission control protocol") is responsible for
|
||
|
breaking up the message into datagrams, reassembling them at the
|
||
|
other end, resending anything that gets lost, and putting things
|
||
|
back in the right order. IP (the "internet protocol") is
|
||
|
responsible for routing individual datagrams. It may seem like
|
||
|
TCP is doing all the work. However in the Internet, simply
|
||
|
getting a datagram to its destination can be a complex job. A
|
||
|
connection may require the datagram to go through several
|
||
|
networks at Rutgers, a serial line to the John von Neuman
|
||
|
Supercomputer Center, a couple of Ethernets there, a series of
|
||
|
56Kbaud phone lines to another NSFnet site, and more Ethernets on
|
||
|
another campus. Keeping track of the routes to all of the
|
||
|
destinations and handling incompatibilities among different
|
||
|
transport media turns out to be a complex job. Note that the
|
||
|
interface between TCP and IP is fairly simple. TCP simply hands
|
||
|
IP a datagram with a destination. IP doesn't know how this
|
||
|
datagram relates to any datagram before it or after it.
|
||
|
|
||
|
It may have occurred to you that something is missing here. I
|
||
|
have talked about Internet addresses, but not about how you keep
|
||
|
track of multiple connections to a given system. Clearly it
|
||
|
isn't enough to get a datagram to the right destination. TCP has
|
||
|
to know which connection this datagram is part of. This task is
|
||
|
referred to as "demultiplexing." In fact, there are several
|
||
|
levels of demultiplexing going on in TCP/IP. The information
|
||
|
needed to do this demultiplexing is contained in a series of
|
||
|
"headers." A header is simply a few extra octets tacked onto the
|
||
|
beginning of a datagram by some protocol in order to keep track
|
||
|
of it. It's a lot like putting a letter into an envelope and
|
||
|
putting an address on the outside of the envelope. Except with
|
||
|
modern networks it happens several times. It's like you put the
|
||
|
letter into a little envelope, your secretary puts that into a
|
||
|
somewhat bigger envelope, the campus mail center puts that
|
||
|
envelope into a still bigger one, etc. Here is an overview of
|
||
|
the headers that get stuck on a message that passes through a
|
||
|
typical TCP/IP network:
|
||
|
|
||
|
It starts with a single data stream, say a file you are trying to
|
||
|
send to some other computer:
|
||
|
|
||
|
......................................................
|
||
|
|
||
|
TCP breaks it up into manageable chunks. (In order to do this,
|
||
|
TCP has to know how large a datagram your network can handle.
|
||
|
Actually, the TCP's at each end say how big a datagram they can
|
||
|
handle, and then they pick the smallest size.)
|
||
|
|
||
|
.... .... .... .... .... .... .... ....
|
||
|
|
||
|
TCP puts a header at the front of each datagram. This header
|
||
|
actually contains at least 20 octets, but the most important ones
|
||
|
are a source and destination "port number" and a "sequence
|
||
|
number." The port numbers are used to keep track of different
|
||
|
conversations. Suppose 3 different people are transferring
|
||
|
files. Your TCP might allocate port numbers 1000, 1001, and 1002
|
||
|
to these transfers. When you are sending a datagram, this
|
||
|
becomes the "source" port number, since you are the source of the
|
||
|
datagram. Of course the TCP at the other end has assigned a port
|
||
|
number of its own for the conversation. Your TCP has to know the
|
||
|
port number used by the other end as well. (It finds out when
|
||
|
the connection starts, as I will explain below.) It puts this in
|
||
|
the "destination" port field. Of course if the other end sends a
|
||
|
datagram back to you, the source and destination port numbers
|
||
|
will be reversed, since then it will be the source and you will
|
||
|
be the destination. Each datagram has a sequence number. This
|
||
|
is used so that the other end can make sure that it gets the
|
||
|
datagrams in the right order, and that it hasn't missed any.
|
||
|
(See the TCP specification for details.) TCP doesn't number the
|
||
|
datagrams, but the octets. So if there are 500 octets of data in
|
||
|
each datagram, the first datagram might be numbered 0, the second
|
||
|
500, the next 1000, the next 1500, etc. Finally, I will mention
|
||
|
the Checksum. This is a number that is computed by adding up all
|
||
|
the octets in the datagram (more or less - see the TCP spec).
|
||
|
The result is put in the header. TCP at the other end computes
|
||
|
the checksum again. If they disagree, then something bad
|
||
|
happened to the datagram in transmission, and it is thrown away.
|
||
|
So here's what the datagram looks like now.
|
||
|
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
| Source Port | Destination Port |
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
| Sequence Number |
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
| Acknowledgment Number |
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
| Data | |U|A|P|R|S|F| |
|
||
|
| Offset| Reserved |R|C|S|S|Y|I| Window |
|
||
|
| | |G|K|H|T|N|N| |
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
| Checksum | Urgent Pointer |
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
| your data ... next 500 octets |
|
||
|
| ...... |
|
||
|
|
||
|
If you abbreviate the TCP header as "T", the whole file now looks like this:
|
||
|
|
||
|
T.... T.... T.... T.... T.... T.... T....
|
||
|
|
||
|
You will note that there are items in the header that I have not
|
||
|
described above. They are generally involved with managing the
|
||
|
connection. In order to make sure the datagram has arrived at
|
||
|
its destination, the recipient has to send back an
|
||
|
"acknowledgement." This is a datagram whose "Acknowledgement
|
||
|
number" field is filled in. For example, sending a packet with
|
||
|
an acknowledgement of 1500 indicates that you have received all
|
||
|
the data up to octet number 1500. If the sender doesn't get an
|
||
|
acknowledgement within a reasonable amount of time, it sends the
|
||
|
data again. The window is used to control how much data can be
|
||
|
in transit at any one time. It is not practical to wait for each
|
||
|
datagram to be acknowledged before sending the next one. That
|
||
|
would slow things down too much. On the other hand, you can't
|
||
|
just keep sending, or a fast computer might overrun the capacity
|
||
|
of a slow one to absorb data. Thus each end indicates how much
|
||
|
new data it is currently prepared to absorb by putting the number
|
||
|
of octets in its "Window" field. As the computer receives data,
|
||
|
the amount of space left in its window decreases. When it goes
|
||
|
to zero, the sender has to stop. As the receiver processes the
|
||
|
data, it increases its window, indicating that it is ready to
|
||
|
accept more data. Often the same datagram can be used to
|
||
|
acknowledge receipt of a set of data and to give permission for
|
||
|
additional new data (by an updated window). The "Urgent" field
|
||
|
allows one end to tell the other to skip ahead in its processing
|
||
|
to a particular octet. This is often useful for handling
|
||
|
asynchronous events, for example when you type a control
|
||
|
character or other command that interrupts output. The other
|
||
|
fields are not pertinent to understanding what I am trying to
|
||
|
explain in this article.
|
||
|
|
||
|
|
||
|
* The IP Level
|
||
|
|
||
|
TCP sends each datagram to IP. Of course it has to tell IP the
|
||
|
Internet address of the computer at the other end. Note that
|
||
|
this is all IP is concerned about. It doesn't care about what is
|
||
|
in the datagram, or even in the TCP header. IP's job is simply
|
||
|
to find a route for the datagram and get it to the other end. In
|
||
|
order to allow gateways or other intermediate systems to forward
|
||
|
the datagram, it adds its own header. The main things in this
|
||
|
header are the source and destination Internet address (32-bit
|
||
|
addresses, like 128.6.4.194), the protocol number, and another
|
||
|
checksum. The source Internet address is simply the address of
|
||
|
your machine. (This is necessary so the other end knows where
|
||
|
the datagram came from.) The destination Internet address is the
|
||
|
address of the other machine. (This is necessary so any gateways
|
||
|
in the middle know where you want the datagram to go.) The
|
||
|
protocol number tells IP at the other end to send the datagram to
|
||
|
TCP.
|
||
|
|
||
|
Although most IP traffic uses TCP, there are other protocols that
|
||
|
can use IP, so you have to tell IP which protocol to send the
|
||
|
datagram to. Finally, the checksum allows IP at the other end to
|
||
|
verify that the header wasn't damaged in transit. Note that TCP
|
||
|
and IP have separate checksums. IP needs to be able to verify
|
||
|
that the header didn't get damaged in transit, or it could send a
|
||
|
message to the wrong place. It is both more efficient and safer
|
||
|
to have TCP compute a separate checksum for the TCP header and
|
||
|
data. Once IP has tacked on its header, here's what the message
|
||
|
looks like:
|
||
|
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
|Version| IHL |Type of Service| Total Length |
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
| Identification |Flags| Fragment Offset |
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
| Time to Live | Protocol | Header Checksum |
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
| Source Address |
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
| Destination Address |
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
| TCP header, then your data ...... |
|
||
|
| |
|
||
|
|
||
|
If you represent the IP header by an "I", your file now looks like this:
|
||
|
|
||
|
IT.... IT.... IT.... IT.... IT.... IT.... IT....
|
||
|
|
||
|
Again, the header contains some additional fields that will not
|
||
|
be discussed in this article because they are not relevent to
|
||
|
understanding the process. The flags and fragment offset are
|
||
|
used to keep track of the pieces when a datagram has to be split
|
||
|
up. This can happen when datagrams are forwarded through a
|
||
|
network for which they are too big. (This will be discussed a
|
||
|
bit more below.) The time to live is a number that is decremented
|
||
|
whenever the datagram passes through a system. When it goes to
|
||
|
zero, the datagram is discarded. This is done in case a loop
|
||
|
develops in the system somehow. Of course this should be
|
||
|
impossible, but well-designed networks are built to cope with
|
||
|
"impossible" conditions.
|
||
|
|
||
|
At this point, it's possible that no more headers are needed. If
|
||
|
your computer happens to have a direct phone line connecting it
|
||
|
to the destination computer, or to a gateway, it may simply send
|
||
|
the datagrams out on the line (though likely a synchronous
|
||
|
protocol such as HDLC would be used, and it would add at least a
|
||
|
few octets at the beginning and end).
|
||
|
|
||
|
|
||
|
* The Ethernet Level
|
||
|
|
||
|
Most networks these days use Ethernet which has its own
|
||
|
addresses. The people who designed Ethernet wanted to make sure
|
||
|
that no two machines would end up with the same Ethernet address.
|
||
|
Furthermore, they didn't want the user to have to worry about
|
||
|
assigning addresses. So each Ethernet controller comes with an
|
||
|
address built-in from the factory. In order to make sure that
|
||
|
they would never have to reuse addresses, the Ethernet designers
|
||
|
allocated 48 bits for the Ethernet address. People who make
|
||
|
Ethernet equipment have to register with a central authority, to
|
||
|
make sure that the numbers they assign don't overlap any other
|
||
|
manufacturer. Ethernet is a "broadcast medium." That is, it is
|
||
|
in effect like an old party line telephone. When you send a
|
||
|
packet out on the Ethernet, every machine on the network sees the
|
||
|
packet. So something is needed to make sure that the right
|
||
|
machine gets it. As you might guess, this involves the Ethernet
|
||
|
header.
|
||
|
|
||
|
Every Ethernet packet has a 14-octet header that includes the
|
||
|
source and destination Ethernet address, and a type code. Each
|
||
|
machine is supposed to pay attention only to packets with its own
|
||
|
Ethernet address in the destination field. (It's perfectly
|
||
|
possible to cheat, which is one reason that Ethernet
|
||
|
communications are not terribly secure.) Note that there is no
|
||
|
connection between the Ethernet address and the Internet address.
|
||
|
Each machine has to have a table of what Ethernet address
|
||
|
corresponds to what Internet address. (I will describe how this
|
||
|
table is constructed a bit later.) In addition to the addresses,
|
||
|
the header contains a type code. The type code is to allow for
|
||
|
several different protocol families to be used on the same
|
||
|
network. So you can use TCP/IP, DECnet, Xerox NS, etc. at the
|
||
|
same time. Each of them will put a different value in the type
|
||
|
field. Finally, there is a checksum. The Ethernet controller
|
||
|
computes a checksum of the entire packet. When the other end
|
||
|
receives the packet, it recomputes the checksum, and throws the
|
||
|
packet away if the answer disagrees with the original. The
|
||
|
checksum is put on the end of the packet, not in the header. The
|
||
|
final result is that your message looks like this:
|
||
|
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
| Ethernet destination address (first 32 bits) |
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
| Ethernet dest (last 16 bits) |Ethernet source (first 16 bits)|
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
| Ethernet source address (last 32 bits) |
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
| Type code |
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
| IP header, then TCP header, then your data |
|
||
|
| |
|
||
|
...
|
||
|
| |
|
||
|
| end of your data |
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
| Ethernet Checksum |
|
||
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||
|
|
||
|
If you represent the Ethernet header with "E", and the Ethernet
|
||
|
checksum with "C", your file now looks like this:
|
||
|
|
||
|
EIT....C EIT....C EIT....C EIT....C EIT....C
|
||
|
|
||
|
When these packets are received by the other end, of course all
|
||
|
the headers are removed. The Ethernet interface removes the
|
||
|
Ethernet header and the checksum. It looks at the type code.
|
||
|
Since the type code is the one assigned to IP, the Ethernet
|
||
|
device driver passes the datagram up to IP. IP removes the IP
|
||
|
header. It looks at the IP protocol field. Since the protocol
|
||
|
type is TCP, it passes the datagram up to TCP. TCP now looks at
|
||
|
the sequence number. It uses the sequence numbers and other
|
||
|
information to combine all the datagrams into the original file.
|
||
|
|
||
|
This ends my initial summary of TCP/IP. There are still some
|
||
|
crucial concepts I have not gotten to, so in part two, I will go
|
||
|
back and add details in several areas. (For detailed
|
||
|
descriptions of the items discussed here see, RFC 793 for TCP,
|
||
|
RFC 791 for IP, and RFC's 894 and 826 for sending IP over
|
||
|
Ethernet.)
|
||
|
__________________________________________________________________
|