Tcp ip sockets in c donahoo pdf




















This guide was great for showing me the ropes They cover a lot of ground quickly, so you may need to supplement the readings with some internet searching, but that s mostly for background information and getting a deeper understanding of the concept being covered. This book includes enough easy to read explanations of actual working code that you can within a few hours have a client server socket connection up and working Both the text and the code are well written and easy to follow Advanced information about non blocking I O allowed me to quickly implement background socket reads without tying up a foreground process The annoying occurrences of address already in use errors is explained clearly, and how to get around them by the use of socket options I [ Excellent Book for the beginning programmer Especially if you are an embedded programmer.

Combine it with theprecise instruction found in C Network Programming, and you'llfind that building network applications is easier and quicker thanever. This book helps newcomers get started with a look at the basicsof network programming as they relate to C , including thelanguage's network. Donahoo,Kenneth L. Calvert,Michael J. Richard Stevens. Computer Networks by Larry L.

Peterson,Bruce S. C Network Programming by Richard Blum. The proper m e t h o d for restarting interrupted system calls differs between UNIX variants. In some systems, restarting is the de- fault behavior. In this case, the program m u s t restart the interrupted function.

We do not use any of these approaches because they are not portable, and they complicate the program with issues that we are not addressing.

Given the descriptor for a connected socket, getpeername returns a s o c k a d d r structure containing the r e m o t e IP address and port information. A c o m p a n i o n function, getsockname , r e t u r n s the same type of information for the local IP address and port.

As with other socket calls using sockaddr, the addressLength is an in-out parameter specifying the length of the address structure in bytes. Threads decrease this cost by allowing multitasking within the same process: A newly created thread simply shares the same address space code and data with the parent, negating the need to duplicate the parent state. The program comments are limited to code that differs from the forking echo server. The ThreadArgs structure contains the "real" list of parameters.

In this p r o g r a m the thread function only needs a single argument clntSock , so we could have simply passed a pointer to an integer; however, the ThreadArgs structure provides a more general framework for thread argument passing. Population of thread argument structure: lines We only pass the client socket descriptor to the new thread.

Invocation of the n e w thread: lines 5. Because the ThreadArgs structure is allocated on a per-connection basis, the new thread can deallocate threadArgs once the parameter s have been extracted. Threads do not always provide this capability, so additional server functionality must be provided to monitor and kill individual threads. In this case, a threaded Web server with threads may get the same amount of CPU time as the game Minesweeper. In addition, the scheduling and context switching among many processes or threads creates extra work for a system.

As the number of processes or threads increases, the operating system spends more and more time dealing with this overhead. Eventually, the point is reached where adding an additional process or thread actually decreases overall performance.

That is, a client might experience shorter service time if its connection request were queued until some preceding client finished, instead of creating a new process or thread to service it. We can avoid this problem by limiting the number of processes created by the server, which we call constrained-multitasking servers. We present a solution for processes, but it is directly applicable to threads.

In this solution, the server begins as the other servers by creating, binding, and listening to a socket. Then the server creates a set number say, N of processes, each of which loops forever, accepting connections from the same listening socket.

Then the system picks one process, and the socket descriptor for that new connection is returned only in that process; the others remain blocked until the next connection is established, another lucky winner is chosen, and so on.

Prototype for "main" of forked process: line 2 Each of the N processes executes the ProcessMain function. Spawning processLimit processes: lines Execute loop processLimit times, each time forking a process that calls ProcessMain with servSock as the parameter. Parent exits after spawning children: line 30 4. ProcessMain " lines ProcessMain runs forever handling client requests.

Because only N processes are created, we save in scheduling overhead, and because each process lives forever handling client requests, we save in process creation overhead. Of course, if we spawn too few processes, we can still have clients waiting unnecessarily for service. For example, we might want to provide echo service on several ports at once. The problem with this becomes clear as soon as you consider what happens after the server creates and binds a socket to each port.

It is ready to accept connections, but which socket to choose? A call to accept or recv on one socket may block, causing established connections to another socket to wait unnecessarily. This problem can be solved using nonblocking sockets, but in that case the server ends up continuously polling the sockets, which is wasteful. Fortunately, UNIX provides a way to do this. For example, passing NULL for exceptionDescs causes s e l e c t to completely ignore excep- tions on any sockets.

Though the maximum number of descriptors can be quite large, most applications use very few descriptors. To avoid making s e l e c t search all possible vector positions for all three vectors, we give it a hint by specifying in maxDescPlusl the maximum number of descriptor values to consider in each descriptor vector. Since descriptors begin at 0, the number of descriptors is always the maximum descriptor value plus one.

For example, if descriptors 0, 3, and 5 are set in the descriptor list, the number of descriptors for select to consider is 6 0 through 5 , which is also the maximum descriptor value 5 plus one. Notice that we set maxDescPlusl for all three descriptor lists. If the exception descriptor list has the largest descriptor value, say, 7, then we set maxDescPlusl to 8, irrespective of the descriptor values set for read and write. Don't answer yet because select does even more!

If timeout is NULL, s e l e c t has no timeout bound and waits until some descriptor becomes ready. For example, if descriptors 0, 3, and 5 are set in the read descriptor list, the write and exception descriptor lists are NULL, and descriptors 0 and 5 have data available for reading, s e l e c t returns 2, and only positions 0 and 5 are set in the returned read descriptor list. An error in select is indicated by a return value of - 1. Let's reconsider the problem of running the echo service on multiple ports.

If we create a socket for each port, we could list these sockets in a readDescriptor list. A call to select , given such a list, would suspend the program until an echo request arrives for at least one of the descriptors. We could then handle the connection setup and echo for that particular socket.

The user can specify an arbitrary number of ports to monitor. To illustrate that select works on nonsocket descriptors as well, this server also watches for input from the standard input stream, which it interprets as a signal to terminate itself.

Set up a socket for each port: lines 2. Create list of file descriptors for s e l e c t : lines 3. Wrap up: lines Close all ports and free memory. Such one-to-one communication is called unicast because only one Cuni" copy of the data is sent "cast". In some cases, information is of interest to multiple recipients.

We could simply unicast a copy of the data to each recipient; however, this may be very inefficient. Consider the case where the sender connects to the Internet over a single path. Unicasting multiple copies over that single connection creates duplication, wasting bandwidth. In fact, if each unicast connection across this shared path requires a fixed amount of bandwidth, there is a hard limit to the number of receivers we can support.

For example, if a video server sends 1-Mbps streams and the server's network connection is only 3 Mbps a healthy connection rate , it can only support three simultaneous users. Fortunately, the network provides a way to more efficiently use bandwidth. Instead of making the sender responsible for duplicating packets, we can give this job to the network.

In our video server example, we send only a single copy of the stream across the server's connection to the network, which duplicates the data only when appropriate.

With this model of duplication, the server uses only 1 Mbps across its connection to the network, irrespective of the number of clients. There are two types of network duplication: broadcast and multicast. With broadcast, all hosts on the network receive a copy of the message. A broadcast message is indiscriminately sent to everyone on the network. A multicast message is sent to some potentially empty subset of all the hosts on the network. Obviously, broadcast is just a special case of multicast, where the subset of receivers contains all of the hosts on the network.

The main distinction between the use of broadcast and unicast is the form of the address. In practice, there are two types of broadcast address: local broadcast and directed broadcast. A local broadcast address Local broadcast messages are never forwarded by routers.

A host on an Ethernet LAN can send a message to all other hosts on that same LAN, but the message will not be forwarded by a router so no other hosts may receive it. IP addresses have two parts: the network and the host identifier.

If the network identifier is X, a directed broadcast address for that network is an IP address with the high-order bits set to X and the remaining bits set to 1 i.

For example, the directed broadcast address for a network with network identifier With subnetting, we consider the subnet identifier part of the network identifier, so the definition of a directed broadcast address for a subnet is the same. For example, if a network with subnet m a s k What about a network-wide broadcast address to send a message to all hosts? There is no such address. To see why, consider the impact on the network of a broadcast to every host on the Internet.

The send of a single datagram would result in a very, very large n u m b e r of packet duplications by the routers, and bandwidth would be consumed on each and every network. The consequences of misuse malicious or accidental are too great, so the designers of IP left out such an Internet-wide broadcast facility on purpose.

Even with these restrictions, network-scoped broadcast can be very useful. Often, it is used in state exchange for network games where the players are all on the same broadcast local network. We create a sender and receiver to demonstrate the use of UDP broadcast, as shown in BroadcastSender.

Our sender broadcasts a given string every three seconds to the specified broadcast address. Again, the main difference is the form of the address. A multicast address identifies a set of receivers for multicast messages. The designers of IP allocated a range of the address space dedicated to multicast. These are class D addresses and range from With the exception of a few reserved multicast addresses, a sender can send datagrams addressed to any class D address.

Our next example, MulticastSender. Every IP packet contains a TTL, initialized to some default value and decremented by each router that handles the packet. When the TTL reaches 0, the packet is discarded. By setting the TTL, we limit the number of hops a multicast packet can traverse from the sender.

We can change the default TTL value by setting a socket option. The TTL may also be set for broadcast; however, since routers generally do not forward broadcast packets, it usually has no effect. Unlike broadcast, network multicast duplicates the message only to a specific set of receivers.

This set of receivers, called a multicast group, is identified by a shared multicast or group address. These receivers need some mechanism to notify the network of their interest in receiving data sent to a particular multicast address. Once notified, the network can begin forwarding the multicast messages to the receiver. This notification, called "joining a group," is accomplished with a multicast request sent by the sockets interface.

Our multicast receiver joins a specified group, receives and prints a single multicast message from that group, and exits. Multicast The decision of using broadcast or multicast in an application depends on several issues, including the portion of network hosts interested in receiving the data and the knowledge of the communicating parties.

Broadcast works well if a large percentage of the network hosts wish to receive the message; however, if there are m a n y more hosts than receivers, broadcast is very inefficient. In the Internet, broadcasting would be very expensive even if the communication had 10, interested receivers because the data would have to be duplicated to every host on the Internet well over 10, In this case, multicast limits the duplication of data for delivery to only the networks that have hosts interested in the message.

Because of the negative consequences of Internet-wide broadcast, most routers do not forward broadcast packets; thus, applications are generally limited to LAN broadcasts only. The disadvantage of multicast is that IP multicast receivers m u s t know the address of a multicast group to join. Knowledge of an address is not required for broadcast. In some contexts, this makes broadcast a better m e c h a n i s m for discovery than multicast. All hosts can receive broadcast by default, so it is simple to ask all hosts a question like "Where's the printer?

State precisely the conditions under which an iterative server is preferable to a multi- processing server. Would you ever need to implement a timeout in a client or server that uses TCP? How can you determine the m i n i m u m and m a x i m u m allowable sizes for a socket's send and receive buffers?

Determine the m i n i m u m s for your system. Consider what might h a p p e n if it were ignored. This is especially true of TCP sockets.

This chapter describes some of what goes on "under the hood" when you create and use a socket. Please note that this description covers only the normal sequence of events and glosses over m a n y details.

Nevertheless, we believe that even this basic level of under- standing is helpful. Readers who want the full story are referred to the TCP specification [12] or to one of the more comprehensive treatises on the subject [3, 20].

Figure 6. The program refers to these structures via the descriptor returned by socket. This is best thought of as simply a "handle" that is linked to an underlying socket structure. As the figure indicates, more than one descriptor can refer to the same socket structure. In fact, descriptors in different processes can refer to the same underlying socket structure. By "socket structure" here we m e a n all data structures in the socket layer and TCP implementation that contain state information relevant to this socket abstraction.

Thus, the socket structure contains send and receive queues and other information, including the following: 9 The local and remote Internet addresses and port numbers associated with the socket. The local Internet address labeled "Local IP" in the figure is one of those assigned to the local host; the local port is either set with bind or chosen arbitrarily by the implementation when the socket is first used. The remote address and port identify the remote socket, if any, to which the socket is connected.

We will see more about how they are set in Section 6. In Figure 6. Knowing about the existence of these data structures and how they are affected by the underlying protocols is useful because they control various aspects of the behavior of the sockets API functions.

For example, because TCP provides a reliable byte-stream service, a copy of any data passed in a send must be kept until it has been successfully received at the other end.

When the send returns, the program cannot know whether the data has actually been sent or not--only that it has been copied into the local buffer. Moreover, the nature of the byte-stream service means that message boundaries are n o t preserved in the receive queue.

As we saw earlier Section 3. On the other hand, with a UDP socket, packets are n o t buffered for retransmission, and by the time a call to sendto returns, the message has been given to the network subsystem for transmission. If the network subsystem cannot handle the message for some reason, the message is silently dropped this is rare. However, as we discussed in Chapter 4, message boundaries are preserved in UDP's receive queue; a single call to recvfrom will n e v e r return data from more than one received message.

Sections 6. Then in Section 6. Finally, in Section 6. In particular, data passed in a single send can be returned by multiple r e c v s at the other end, and a single call to recv may return data passed in multiple send s.

This TCP connection transfers bytes to the receiver. The way these bytes are grouped for delivery at the receiving end of the connection depends on the timing between the calls to send and recv at the two ends of the connection as well as the size of the buffers provided to the recv calls. W e can think of the sequence of all bytes sent in one direction on a TCP connection up to a particular instant in time as being divided into three FIFO "queues": 1.

SendQ: Bytes buffered in the sockets layer at the sender that have not yet been success- fully transmitted to the receiving host. RecvQ: Bytes buffered in the sockets layer at the receiver waiting to be delivered to the receiving program, that is, waiting to be returned via recv.

Delivered: Bytes already returned to the receiving program via recv. A call to send appends bytes to SendQ. It is important to realize that this transfer cannot be 1The default behavior for stream sockets is for a call send s, buffer, n, 0 to block until all n bytes have been transferred to SendQ. This behavior can be changed by making the socket nonblocking or by using a nonblocking send call see Section 5. Bytes are moved from RecvQ to Delivered as a result of recv calls by the receiving program.

The size of the transferred chunks depends on the amount of data in RecvQ and the size of the buffer given to recv. The different shadings denote bytes passed in the three different calls to send shown above. N o w suppose the receiver calls recv and gives it a buffer size of bytes.

The recv callwill return all of the bytes present in the waiting-for-delivery RecvQ queue. Note that this number includes data from the firstand second calls to send.

At some time later, after TCP has completed transfer of more data, the three partitions would be as shown in Figure 6. If the receiver now calls recv with a buffer size of , that many bytes will be moved from the waiting-for-delivery RecvQ queue to the already-delivered Delivered queue. The resulting state of the queues is shown in Figure 6. The number of bytes returned by the next call to recv depends on the size of the buffer and the timing with respect to the transfer of data from the send-side queue to the receive-side queue.

The movement of data from the SendQ to the RecvQ buffer has important implications for the design of application protocols. We have already encountered the need to parse messages as they are received over a TCP socket when in-band delimiters are used for framing Section 3. In the following sections, we consider two more subtle ramifications. Although the actual amount of memory they use may grow and shrink dynamically, a hard limit is necessary to prevent all the system's memory from being gobbled up by a single TCP con- nection under control of a misbehaving program.

These limits can be changed, as we saw in Section 5. The point is that these buffers are finite and, therefore, they can fill up. Let's consider some of the implications of that fact. Once RecvQ is full, the TCP flow control mechanism kicks in and prevents the transfer of any bytes from the sending host's SendQ until space becomes available in RecvQ as a result of a call to recv.

A sending program can continue to call send until SendQ is full. Once SendQ is full, send blocks until space becomes available, that is, until some bytes are transferred to the receiving host's RecvQ. If RecvQ is also full, everything stops until the receiving program calls recv , so that some bytes can be transferred to Delivered.

If the receiving program does not call recv , a large send may not complete successfully. One way this can h a p p e n is if both programs are sending simultaneously.

As a concrete example, consider a connection between a program on Host A and a program on Host B. The first bytes of data at Host A have been transferred to the other end; another bytes have been copied into SendQ at Host A. The remaining bytes cannot be sent--and, therefore, send will not r e t u r n - - u n t i l space frees up in RecvQ at Host B. Unfortunately, the same situation holds for the program at Host B. Therefore, neither program's send call will ever complete.

The moral of the story: Design the protocol carefully to avoid simultaneous send s in both directions. Throughput refers to the rate at which bytes of user data from the sender are made available to the receiving program. In programs that transfer a large a m o u n t of data, we want to maximize this rate. In the absence of network capacity or other limitations, bigger buffers generally result in higher throughput.

The reason for this has to do with the cost of transferring data into or out of the kernel buffers. That is, the socket layer fills up the SendQ buffer, waits for data to be transferred out of it by the TCP protocol, refills SendQ, waits some more, and so on.

Each time the sockets layer has to wait for data to be removed from SendQ, some time is wasted in the form of overhead i.

This overhead is approximately the same as that incurred by a completely new system call. Thus, the effective size of a call to send is limited by the actual SQS. For receive, the same principle applies: However large the buffer you give to recv , it will be copied out in chunks no larger than RQS, with overhead incurred between chunks. Although there is always a system-imposed m a x i m u m size for each buffer, it is typically significantly larger than the default on m o d e r n systems.

Remember that these considerations apply only if your program needs to send an amount of data significantly larger than the buffer size all at once. Let us now consider how a socket gets to and from the Established state; as we'll see in Section 6.

In what follows, as in all the examples of this book, we assume that connect is called by the client and that the server calls bind , l i s t e n , and accept. In this and the remaining figures of this section, the large arrows depict events that cause the socket structures to change state. Events that occur in the application program i. Time proceeds left to right in these figures.

The client's Internet address is depicted as A. D, and the server's is W. Z; the server's port n u m b e r is Q. D Local IP A. Z Remote IP W. Z Figure 6. When the client creates a TCP socket, it is initially in the Closed state. When the client calls connect with Internet address W. Z and port Q, the system fills in the four address fields in the socket structure. Because the client did not previously call bind , a local port number P , not already in use by another TCP socket, is chosen by the system and assigned to this socket.

The local Internet address is also assigned; the address used is that of the network interface through which packets will be sent to the server A. The TCP opening handshake is known as a "three-way" handshake because it typically involves three messages: a connection request from client to server, an acknowledgment from server to client, and another acknowledgment from the client back to the server.

The client TCP considers the connection to be established as soon as it receives the acknowledgment from the server. If the client TCP does not receive a response from the server within a reasonable period of time, it times out and gives up. The protocol retransmits handshake messages multiple times, at increasing intervals, before giving up. Provides tutorial-based instuction in key sockets programming techniques, focusing exclusively on Jva and complemented by example code. Includes references to the relevant Java class libraries that often go beyond the "official" Java documentation in clarity and explanation.

Compatible with any devices. This volume focuses on the underlying sockets class, one of the basis for learning about networks in any programming language. Nonetheless, many network programmers recognize that their applications could be much more robust. Skeleton code and a library of common functions allow you to write applications without having to worry about routine chores.

The networking capabilities of the Java platform have been extended considerably since the first edition of the book. This new edition covers version 1. In addition, the book covers several new classes and capabilities introduced in the last few revisions of the Java platform. The example code is also modified to take advantage of new language features such as annotations, enumerations, as well as generics and implicit iterators where appropriate.

This book's focused, tutorial-based approach helps the reader master the tasks and techniques essential to virtually all client-server projects using sockets in Java. Chapter 1 provides a general overview of networking concepts to allow readers to synchronize the concepts with terminology. Chapter 2 introduces the mechanics of simple clients and servers. Chapter 3 covers basic message construction and parsing. Chapter 4 then deals with techniques used to build more robust clients and servers.

Chapter 5 NEW introduces the scalable interface facilities which were introduced in Java 1. Chapter 6 discusses the relationship between the programming constructs and the underlying protocol implementations in more detail. Programming concepts are introduced through simple program examples accompanied by line-by-line code commentary that describes the purpose of every part of the program.

No other resource presents so concisely or so effectively the material necessary to get up and running with Java sockets programming. Focused, tutorial-based instruction in key sockets programming techniques allows reader to quickly come up to speed on Java applications. Concise and up-to-date coverage of the most recent platform 1.

In , W. The model for that book was a brilliant, unfettered approach to networking concepts that has proven itself over time to be popular with readers of beginning to intermediate networking knowledge.

The Illustrated Network takes this time-honored approach and modernizes it by creating not only a much larger and more complicated network, but also by incorporating all the networking advancements that have taken place since the mids, which are many.

This book takes the popular Stevens approach and modernizes it, employing equipment, operating systems, and router vendors.



0コメント

  • 1000 / 1000