Sie sind auf Seite 1von 8

TCP/IP

The term“TCP/IP” is usually used to refer to a suite of protocols which specify how devices
communicate over a packet switched network.

The term“OSI” is used to refer to a model which specifies a layered model architecture of how
devices communicate over a packet switched network. The OSI model was an ideal model
proposed by the ISO. OSI is a seven layered model and the seven layers are : physical, data-link,
network, transport, session, presentation and application.

In this ideal model, each layer had a well defined function and was totally isolated from all other
layers. It is very important to understand that OSI is only an architectural model which specifies
what each layer in the model should do. It does NOT specify the implementation details ( or
protocols) of each layer.

A protocol suite is based on an architectural model. So, is there a protocol suite built on the OSI
model ? Yes. There have been quite a few protocol suites built based on the OSI model.
Unfortunately TCP-IP is not one of them. The most widely used protocol suite which is closest to
the OSI model is X.25. Now, you are probably wondering“How are OSI and TCP-IP related?” Let
me answer that question by saying that TCP-IP is a protocol suite which is vaguely based on the
OSI model.

Application ] Left to the Application.


Presentation ] Left to the Application.
Session ] Left to the Application.
Transport ] TCP or UDP
Network ] IP
Data Link ] Any data link layer protocol
Physical ] Any physical layer protocol.

First of all, the TCP-IP protocol suite uses a 5 layer architectural model for implementing
networks. (this is different from X.25 protocol suite which strictly follows the OSI 7 layered
model)

Second, the TCP-IP protocol suite specifies the protocols for only two layers. It does not care
what protocols are being followed at the other 3 layers. This is the most important feature of the
TCP-IP protocol suite. It is flexible enough to be used with any lower layer protocols (protocols
below it) for e.g. all of the following are valid TCP-IP implementations :-
TCP-IP over Ethernet (Ethernet over copper wires)
TCP-IP over Ethernet (Ethernet over Fiber optic cable)
TCP-IP over FDDI
TCP-IP over ATM.

It is this flexibility of TCP-IP which makes it independent of the properties of the underlying
network. This is a BIG advantage in the real world and this is what has made TCP-IP the de-facto
standard in the industry.
Protocol Suite vs. Architectural model
Ok ! If you are confused the two concepts of an“architectural model” and“protocol suite”, all I can
say is you are not alone. It is a little difficult to grasp these two concepts simultaneously. Let me
give you an analogy. Let us suppose we want to invent a game. We form a committee (ISO) to
give us a basic model of a game they think would be interesting.

The committee (ISO) comes up with the basic model (OSI) which says that in this game, there
would be two teams, say Team A and Team B. One person from Team A would throw the ball at
a player from Team B. The aim of the person from Team B would be to hit the ball as hard as
possible using the bat. The committee (ISO) specifies some more concepts like these.

Once we have the basic model, we give this basic model to two organizations to develop a game
on this basic model. One comes up with baseball (X.25) and the other comes up with cricket
(TCP-IP). These are the protocol suites– the complete set of rules that any team playing the game
(implementing networking) should follow. I hope that makes it a little bit more clear.

Addresses
Each machine which is on a network has to have some kind of an address so that other machines
can communicate with it using this address. In an IP network , there are usually two addresses
corresponding to each machine. One is the machine address and the other is the IP address. The
question now is “Why does a host need two addresses?” To answer that, we should first
understand the difference between the two addresses. The difference between these two addresses
is that whereas the machine address is machine-specific, the IP address is not. The machine
address is the address that comes “built-in” into your computer from the manufacturer. Each
computer is assured to have a unique hardware address since the hardware address is made up of
the manufacturer code and the serial number given by the manufacturer. Needless to say, each
manufacturer has a unique manufacturer code.

Now, we know that when you buy a computer from the vendor, it has a unique machine address,
so why don’t other machines use this address to communicate with your computer ?– Because
using machine addresses for communication will make“routing” a nightmare– If you are wondering
what this means, just read on. We will look into this later.

IP addresses:-
An IP address is a 32 bit number allocated by a central agency. Each IP address is divided into a
“network portion” and a“host portion”. The network portion of the IP address identifies the network
this host is a member of. (remember that the Internet is simply a network of networks of hosts).
The host portion of the IP address identifies a particular host in a given network.

IP addresses are allocate by the central agency on a per-network-basis. This means that IP
addresses are not allocated to one computer at a time but rather one network at a time. This is the
reason why IP addresses are allocated to organizations which have an in-house network or to
ISPs. There are basically four types of IP addresses a network can be allocated. The classification
of IP addresses into Class A, B, C and D is done on the basis of the number of hosts (individual
computers) in a network.

A Class A address is allocated to those networks which have a very huge number of hosts (\---).
A Class B address is allocated to those networks which have a moderately large number of hosts
(\---) A Class C address is allocated to those networks which have less than 255hosts. Class D
addresses are special purpose (IP multicast) addresses.

Another characteristic of IP addresses is that they are self-identifying. This means that given an
IP address, one can tell whether the address is a Class A address or a Class B address or a Class
C address or a Class D address. The self identifying feature is implemented using the first three
highest order bits of the IP address. These three bits are 0xx for Class A addresses, 10x for Class
B addresses, 110 for Class C addresses and 111 for Class D addresses.

Anyway, the introduction of IP addresses eases the issues of routing to a large extent and the
primary reason for this is the fact that IP addresses have a logical boundary between the network
portion and the host portion. (More about this here.)

IP, Routing, et al :-
Finally, we are there. Now, we can start talking about the actual IP. What is this protocol ? What
does it do ? Why we need it ? Where does it fit in, in a network ? and more importantly why is
there such a hoopla around this IP ?

Where do I go ? – Routing :-
<TBD>

ARP(Address Resolution Protocol)


To understand the ARP , let us consider a LAN of four hosts, say A, B , C and D. Now, suppose
A wants to send some data to C. It will do so by specifying the IP address of C. So far, So good.
But here is the catch– To send data to a computer / host , you have to know its hardware/machine
address. I repeat - To send data to a computer / host , you HAVE to know its hardware/machine
address. There is no other way.

So, now the issue is , given the IP address of a host , how do we find its machine address.
Here is where ARP comes in. ARP is a protocol which specifies how to find the machine address
of a host in a “physical network”, given its IP address. Note the use of the words “physical network”.
By a physical network , we refer to a network which is physically connected. ARP can be used to
find the machine address of the destination machine ONLY if the destination host and the source
host are on the same physically connected network.

Now, let us see how exactly ARP works. In our example, where A knows the IP address of C and
wants to find its machine address it broadcasts a message saying“Who has this IP address ?” This
message being a broadcast, all hosts on the physically connected network get this message.
However only the host which has the specified IP address (in this case C), replies to this message.
In the reply, C sends its own machine address to A. A can now use this to send data to C. In
reality , the ARP is a little more complex than what is specified above. There are two more
features of ARP that you should know.

ARP cache– in our example suppose A followed the ARP to get C’s machine address and send
date to it. Once A has sent the data, suppose A wants to send some more data. Now, what should
it do ? It can obviously use the same procedure to get C’s machine address but that would be a
wastage of bandwidth (In the networking world Bandwidth happens to be as costly as gold
today). So, here is what A does. The first time it gets C’s machine address it stores it in a memory
area known as ARP cache so that next time if it wants to send data to C, it can look up its ARP
cache and find C’s machine address. So, the ARP algorithm changes a little. Whenever a host
wants to find another hosts machine address, it first looks up its ARP cache. If it finds the address
there, well and good otherwise it uses ARP to find the address as usual. (Disadvantages of ARP
cache.)

Source address– As a thumb rule, if A wants to send some data to C, it is usually the case that C
would soon need to send some data to A which implies that soon enough C would soon need to
find out the machine address of A. One solution is to let C find out A’s machine address by using
ARP. But there is another way which saves bandwidth. Remember, initially A broadcasts a
message to find out the machine address of C. What if , in this broadcast message, A includes its
own IP address and its machine address. If this be done, C can then store this mapping of A’s
address in its own ARP cache. As a matter of fact, not only C but all hosts in the network can
store this mapping of A’s address for future use. This is what is done to save bandwidth.

Disadvantages of ARP cache


Although the use of ARP cache is a very good idea, it has some disadvantages of its own.
Consider this :- A has C’s IP address-to-machine address mapping in its ARP cache. Further
suppose that this entry in the cache was based on a message sent to A about an hour ago. Now,
when A wants to send some data to C, it uses this ARP cache entry for finding C’s data. But
suppose, C has crashed or its machine address has been changed (How to change a host’s machine
address ?). In such a situation the entry that A’s ARP cache is no longer valid but there is no way
A can know this. So, A assumes this “stale” entry to be correct and transmits data to a wrong
machine address (or to an un-existing machine address). This is the concept of “soft-state”–an
entity which has to be continuously updated otherwise the entry may not be correct anymore.

You are probably wondering“Is there a solution to this problem”. The answer is Yes and No. It is
important to understand that unlike the ideal world where there are problems and solutions, in the
real networking world there are issues and compromises. As far as the issue of soft state of the
ARP cache is concerned, the way around this issue is two fold :-

In the ARP cache, along with the address mapping, store the“time this info was received” too and
every time an entry becomes more than N seconds old, delete it. If there is an ARP packet on the
network carrying info about mapping of ANY addresses, use it to update your entry– even if you
have this mapping stored correctly in the cache, update the“time”, so that it is not auto-deleted
soon.

How to change a host’s machine address ?


The machine address of a host is the address embedded by the manufacturer in the Network
Interface Card (NIC). The NIC is the hardware interface of the host to the computer. It is the card
into which the network cable gets plugged in. Anyway, the bottom line is that the machine
address of a host can be changed by changing its NIC.

RARP (Reverse Address Resolution Protocol)


As you might have guessed correctly, RARP is the reverse of ARP. It is a low level protocol
which is used to find the IP address given its machine address. Ok ! Ok ! I know you are pulling
your hair right now. You are probably thinking this “If an application on A needs to send data to
an application on C, it sends it using C’s IP address. IP uses ARP and C’s IP address to find out
C’s machine address.”

Now, where do we see a situation wherein we have the machine address and we need the IP
address ? Why would we do that anyway ? Isn’t the machine address of the destination enough to
send the data to it anyway ? You are correct– if you do know the machine address of the
destination host, you can send data to it without knowing its IP address and moreover if you know
the machine address of the destination, the chances are that you know it’s IP address too. So,
where do we need RARP ?

Here goes the explanation– Where do you think computers store their IP address? No, it is not the
NIC. The NIC has the machine address not the IP address. So ? Remember that the IP address has
to be stored at a place wherein it is not lost even when the computer switches off. So, we store it
in the hard-disk (actually, we can store it in the boot-up code of the OS too which sits in the
ROM but it is usually not store there. Here is why.)

Networks connect not only computers but a lot of other hardware too. Some of this hardware is a
one-board system with a microprocessor, some other external peripherals and an on-board NIC
interface. Usually such cards do not have a hard-disk since hard disks are bulky and cannot be
added on-board. Where do you thing these boards (which are on the network and which need to
have an IP address) store their IP addresses ? They store their IP addresses on a server on the
network. There is usually a server devoted for this purpose and it is known as a RARP server.

Now, consider the situation when such a card boots up. It needs to find out its IP address so that it
can start communicating. But the IP address is stored on a server on the network. How can this
card communicate (with the server) on the network if it itself does not have an IP address ?
Doesn’t all hosts need an IP address to communicate over the network. They do. UNLESS they
decide to broadcast on the network. This is what RARP does. It uses an initial broadcast message
to send out its own machine address and requesting RARP servers to inform it about its IP
address. The RARP server will respond to this request, look up its database and send back the IP
address to the machine. Note that the reply sent by the RARP server is sent using… ….

Why is the IP address Not stored in the boot-up code of the OS ?


The OS boot-up code should be generic enough so that it can be used to boot all machines. It
should not be machine specific otherwise the manufacturer would have to supply a different OS
boot-up floppy for each machine (which is not commercially / economically viable). Since the IP
address is machine specific, it cannot be stored in the boot-up code.

ICMP (Internet Control Message Protocol)


<TBD>

TCP
<TBD>

UDP
The UDP, like the TCP, is a transport-layer protocol. However, the popularity of TCP has
overshadowed UDP. The UDP is basically a very thin layer protocol which means that it does not
add much functionality to the services provided by the IP layer. However, this by no means ,
implies that it is not an important protocol.

UDP is characterized by the fact that it is a connectionless protocol which does not add any
reliability to the services provided by the IP layer. But isn’t that a drawback?– No. Not
Necessarily. Consider a situation where you have an underlying network which is totally reliable.
Such a network would ensure that whatever the IP layer passes down to the data link layer is
delivered correctly and sequentially to the destination. With such a network, adding reliability
and re-transmission capabilities at the transport layer would be an unnecessary over-head which
would waste bandwidth. It is in such situations where UDP is used. Using UDP results in
minimal overheads and therefore maximizing bandwidth use for transmitting data.

But then what does UDP do anyway ? Why doesn’t IP provide functionality directly to the
application ? UDP introduces the concept of “ports” in the system. Ports are what lets IP have
multiple logical endpoints of communications within a machine. Simply put, this means that on
one machine, ports allow more than one application to use the IP for communication over the
network. As an example consider that an application on one host wants to send data to a particular
application on another host. The source host can specify that the data be sent to the destination
host but how does the sender specify that the data is meant for a particular application on that
host. Here is where“ports” come in. Ports introduce a logical layer / concept within a machine. (for
the technically oriented). For more information on ports, go here.

“Why aren’t process-ids or application names not used for specifying a particular application
within a machine?”
As far as the application names are concerned, I think most of you must have guessed that they
cannot be used for specifying the exact destination because of the fact that there can be more than
one instance of an application running simultaneously on a machine. Now, for the process-ids.
The reason why we don’t use them is this– the sender application should find out the process id of
an application running on a remote machine before transmitting the data. How should it do this?
Remember that process-ids are dynamically assigned and there is no way of knowing them
beforehand.

In case of ports, some applications have fixed globally assigned ports which they use for
communication (these ports are assigned for these applications by a global authority).
Applications which do not have a globally assigned port, select a port number between 1-255
AND informs the other end (application at the remote machine which wants to communicate with
this application). Since the other end is informed of the port number selected before any data is
transmitted, communication is facilitated.

What is Ethernet ? FDDI ? ATM?


Ethernet and FDDI are Data-Link layer protocols. The data-link layer is responsible for ensuring
reliable transfer of data across a single physical link. Note carefully that the data link layer
protocols ensure reliable transfer across a single physical link and NOT end-to-end reliable
transmission. To make this a little more clearer, we should realize that when a source machine
sends out some data for the destination machine, both the machines may not necessarily be
directly connected to each other. In fact, in the real world, this is rarely the case. Usually, there
are multiple computers which lie in the path between the two machines.

For , the sake of understanding, let us assume that computers A, B, C, D and E are in a network
and in that particular order. This means that when A has to send some data to E, that data has to
pass through B, C and D.

Now, when A transmits the data, it is the responsibility of the data-link layer protocol in A to
ensure that the data reaches B correctly. Next, when B forwards this data to C, it is the
responsibility of the data-link layer protocol in B to ensure that the data reaches C correctly.
This goes on. And finally, when D transmits the data, it is the responsibility of the data-link layer
protocol in D to ensure that the data reaches E correctly.

What is meant by Unix networking ?


The TCP / IP is a protocol used by machines to communicate with each other. Most machines
today support TCP/IP. Now, now, now…just wait, read the previous line again. What does that
mean ? What is meant when we say that a particular machine supports TCP/IP ? How does a
machine support TCP/IP ?

Obviously, there has to be some software which implements TCP/IP. Where is this software ?
Did you load it into the machine ? No. Then where did it come from ? Think about it. When you
bought the machine all it had was the operating system loaded. So ? Yes !! the TCP/IP stack is
built into all operating systems today including Unix.

So, how do you use the TCP/IP stack. The answer is pretty simple. You don’t. As a user, you just
use the applications which use the TCP/IP stack. These applications hide all TCP/IP details from
the user.

Ok! So, how do the applications use the TCP/IP stack ?


Good question. Now, as I said all operating systems today have TCP/IP stack implemented inside
the O.S. Each O.S. provides an API (Application Programing Interface) to application developers
to use the TCP/IP stack. Unix has an in-built TCP/IP stack too and it too provides APIs for
applications to communicate over the network using TCP/IP. The development of these
applications , which communicate over a network using these APIs, is what is usually referred to
as network programing. In unix terminology it is always refered to as Socket programing. Why ?
Read on…

What is a socket ?
Before we answer that question. Let me ask you this - What is a file ? If you said“It is a place to
store data” or somethig on that track, let me remind you that data is physically stored in memory
not in files and also that we do have empty files too.

So? What is a file ? Tough question , right ? Anyway, what I am trying to point out here is that
like the concept of a file, the concept of a socket is an abstract concept. A socket is defined as an
endpoint of communication.
For a sending/transmitting/source application, it is basically the entry point of data that is to be
sent over the network. For a receiving/destination application, it is the exit point of data that is
sent over the network to this application.

Sockets and ports ?


The concept of sockets and ports is usually confusing to most people. Remember, that ports were
defined as the Service points where the applications interacted with the transport layer
(TCP/UDP) to send/receive data. But isn’t that what sockets are ? Yes and no. Socket is an O.S.
concept and Port is a TCP/IP concept.

Note that there are different O.Systems implementing TCP/IP. All these implementations will
have ports but not all of them will have the Sockets. Similarly Unix has other networking stacks
built in besides, the TCP/IP stacks. All these stacks in a unix implementation use Sockets but not
all of them have the concept of Ports. Unix provides a common API for all these stacks. In fact , it
is the network programmer whi has to specify which stack he wants to use.

Anyway, for Unix TCP/IP networking, what needs to be noted is that a socket has to be“bound” to
a port before it can be used. So, there you have it. No more confusions. Since ports and sockets
are identical concepts you need to bind a socket to a port to link the application to the TCP/IP
stack.

Using sockets
Like files, the first thing that has to be done with a socket is to open it. When a socket is opened,
Unix assigns it with a socket descriptor (analogous to a file descriptor). All system calls which
use this socket refer to the socket using its socket descriptor. After a socket is opened, how it is
used next depends on what the socket is being used for ?

It depends on conditions like whether it is being used in a connection-oriented network or a


connectionless network and whether the application is a server appliacation or a client
application.

Das könnte Ihnen auch gefallen