DWise1's Sockets Programming Pages

HARD HAT AREA

WATCH YOUR STEP

Purpose of these Pages

The purpose of these pages is two-fold:

To share with the community what I have been learning about sockets programming.
In the process, to organize and clarify the information for myself.
I am in the process of learning sockets programming. I discovered long ago that one of the best ways to learn something is to teach it. So in that vein, by trying to explain it to you, I am also helping myself learn the material.
These pages do not even begin to intend to provide comprehensive coverage of the subject. There is a multitude of books and sites that explain the concepts better, a few of which I list on my resources page. All I really intend to do here is to cover the basics needed to get started, though over time I will undoubtedly expand upon some topics as needed.

MAJOR CAVEAT!

I first developed these pages in 2002 through 2008. Because IPv6 has only just recently started going on-line (as of mid-2011), I concentrated on IPv4 and had not yet studied IPv6. As a result, these pages only cover IPv4. Also, some of the functions I use have been deprecated in favor of newer functions that also support IPv6, but I have not covered them yet.
Despite this deficiency, you should still be able to learn a lot from these pages and be able to get started with network programming. After I get every thing back up on-line, I will revisit those issues and update these pages accordingly.

Introduction

Network programming is the writing of programs that will communicate over a network with programs running on another computer. There are several different programming models for accomplishing this. Indeed, at first almost every different operating system had its own proprietary programming model, a condition which continues to exist to some extent. But with the growth in popularity of TCP/IP as a standard networking protocol, a native Application Programming Interface (API) for TCP/IP has also become more popular: Berkeley Sockets.
The Sockets programming model, AKA "Berkeley Sockets", was first introduced in 1983 in the 4.2 BSD Unix system. That programming model consists of a set of data structures, predefined constants, and functions that perform system calls. Although it originated as a TCP/IP programming model, it appears that it can also be used for other protocol families, including UNIX domain, IPX/SPX, XEROX NS, X.25, SNA, DECnet, AppleTalk, and NetBios. The model is that flexible (we'll cover part of the reason for that in the section on addressing).
In addition, sockets programming is available in many different languages and development environments. While it is natively C (and hence also C++) and UNIX, I have also seen it in Perl, Visual BASIC, Delphi, Java, Python, Windows, and LabView. From what I've seen, the concepts and most of the core function names are the same -- if anything, the tendency is to expand on the core API rather than to change it appreciably. Therefore, what you learn in one development environment should be transferable to the others. By the way, my approach in these pages is in C.
Now, I have to admit that the idea of writing code that would network computers together scared me, so I approached it cautiously. This meant that I wasted a lot of time in "analysis paralysis" researching all that I could before I would try to write down the first line of code. But then I finally got started writing sockets applications by playing with the code from Donahoo and Calvert's The Pocket Guide to TCP/IP Sockets: C Version -- it is clear, concise, and a bargain at $15. I recommend it highly to beginning sockets programmers.
Sockets programming really is a lot easier to do than it seems at first.

A Few Basic Comments

All that sockets programming handles is establishing communication between the two computers and passing data between them:
- The sockets programming portion of the program just makes that communication possible and it passes the data in blissful ignorance of the actual content. The common analogy is to a telephone or telegraph system.
- It does not handle precisely how that data is formatted nor how the computers' applications conduct their session with each other. That is the job of the application protocol.
- Therefore, essentially the same sockets programming functions can be used in nearly all of your networking applications.
- On the down side, in order to write a particular type of application (eg, telnet, ftp, ntp), you not only need to know sockets programming (the easy part), but also the application protocol for that type of application (the harder part).
Sockets programming isn't very complicated.
- The basic Application Programming Interface (API) only consists of handful of data structures and a couple/few dozen functions.
- A sockets program doesn't need to be large nor elaborate.
- Just a few functions should be more than enough to provide sockets functionality. Perl can do it in fewer than a dozen lines of code.
- Sockets functions can easily be embedded in any type of application, even a "DOS" app (actually, a Win32 console application).
You should not need to obtain, buy, nor install any special software, outside of a compiler/development system (which you will need anyway if you are going to program):
- If you have a recent-enough compiler/development environment (not more than about 10 years old), then sockets-programming support should already be installed.
- TCP/IP support has been included in Windows since Windows 95 and in Linux since forever. I cannot speak for Macs, though.
You do not need to actually be on a network to develop sockets programs.
- You can do almost all your development work on a single, stand-alone PC. You do not need two computers; you just need to have TCP/IP installed on one computer.
- What you are really doing is writing two programs that communicate with each other through networking. The address that you connect to can just as easily be on the same computer as on a computer half-way across the world (after all, no computer on earth is more than half-way across the world).
- Each computer has the standard localhost IP address of 127.0.0.1. Two sockets applications running on the same computer can connect to each other, just so long as they are not both trying to use the same port.

Some Basic Networking Caveats

These are what I think are some of the more common "stupid mistakes" that you could make in network programming, especially when you're getting started. Remember that I drew up this list mainly from my personal experience:

Mistake #1. Trying to talk to a host on a LAN with wrong network address.

Even if both computers are connected together and sitting side-by-side, if their IP addresses do not place them on the same network (ie, if the network portion of their IP addresses are not the same), then they will never be able to talk with each other.
The reason for this is that TCP/IP has two entirely different ways of resolving the IP address to a physical MAC address and which one it uses depends on whether the hosts are on the same network or not.

Mistake #2. Mixing up TCP and UDP.

Not only are TCP and UDP two different protocols, but they also use two separate sets of ports; e.g., Port 80/TCP is entirely different and separate from port 80/UDP. So a TCP client will not be able to talk with a UDP server, nor will a UDP client be able to talk with a TCP server.
Don't laugh. In preparing to answer a forum question, I compiled her client code and tried to connect it to my server. Well, duh! She was using UDP and trying to connect to a TCP server and I made the exact same mistake! The same thing happened when I wrote my first rtime server and couldn't get a known-good client to connect to it. So it's a lot easier to make this mistake than you may think.

Mistake #3. Not taking care of byte order.

This one you'll encounter as you start to write your programs. When a computer stores a multi-byte value, it can either start with the higher-order byte ("big-endian") or with the lower-order byte ("little-endian"). As long as the data stays within a given computer and is only shared with computers of the same type, there's no problem and the entire issue of byte order is completely transparent to the user. But as soon as you start sharing that data with any possible computer in the world, byte order becomes an important issue.
The standard byte order on the internet is big-endian, high-order byte first. Sockets provides functions for converting host byte order into network byte order, so the byte order of the host can remain transparent to the programmer. However, the programmer must still remember to use the functions.
The two most common places where not using the built-in functions will cause problems are:

When loading the port number into an address structure.

If the host byte order is different, then the port number you have entered will be entirely different from what you think it is -- eg, port 23 will become port 5888 and port 80 will become port 20,480. The client will try to connect to the wrong port and will never be able to connect. Unless the same mistake was made in both the client and the server, which will turn this situation into a future debugging nightmare when a third application that was written correctly tries to connect.

When loading or reading from the data packet.

In this case, you will end up reading the data in reversed order or cause the destination host to read it in reversed order. In either case, a non-zero value will be misinterpreted and data corruption will have occured. Again, if both hosts make the same mistake then the error will be masked for a time and cause troubleshooting headaches down the road.

I'll provide links to more complete explanations as I expand this portion of my site.

Topics

The following is a table of contents providing links to the topics on this sockets programming site. This table of contents currently provides the only access to these topics.
More links will be added as the topics are written and uploaded.

Sockets Programming Home Page

Introduction
A Few Basic Comments
Some Basic Networking Caveats

Basic TCP/IP Theory

What is TCP/IP?
TCP/IP and Packets
Protocol Layers
tcp and udp Protocols
The TCP Connection

Connecting (The "Triple Handshake")
Disconnecting (Shutting Down Gracefully)

Ports and Sockets
IP Addressing
Byte Order

IP Addresses

IP Addressing Theory
So Why do They Need to be on the Same Network?
Private Addresses and Network Address Translation (NAT)

Working with Sockets

Basic Overview
Create a socket with socket()
Set up the socket address by filling in the sockaddr_in
Bind the socket to a port with bind()
Listen for a connection with listen()
Connect to a Server with connect()
Accept a Connection with accept()
Send a message (tcp) with send()
Receive a message (tcp) with recv()
Send a message (udp) with sendto()
Receive a message (udp) with recvfrom()
Close the connection with shutdown()
Close the socket with close()

Sockets Applications

Basic Application Structure

Client/Server Applications
Basic Client/Server Operation
Client/Server Sequence of Communication

Basic TCP Client/Server Sequence
Basic UDP Client/Server Sequence
Basic Broadcast Client/Server Sequence

Application Protocols

Basic Concepts
The echo Service
The daytime Service

Windows Sockets (WinSock)

A Little History
The Basic "How to Convert from UNIX to Winsock"

Linking the Winsock Library to Your Project
Absolutely Necessary Changes
Winsock Error Codes
Optional but Advisable Changes
The WSA* Extended Functions

Caveat Programmer
Resources

Books
Web Sites

Miscellaneous Topics

Address Resolution (DNS)

Accessing the Domain Name Service (DNS) to convert a domain name into an IP address and vice versa.

Data Representation within a Packet

General guidelines on how to format a packet, insert data into it, and extract data back out. Also refered to as "wire format." Includes an example (SNTP message format, RFC 2030) with techniques in C for inserting and extracting data.

Survey of RFC Data Formatting Specifications

A survey of the RFCs of well-known services to learn how they format their data. Provides us with ideas for when we create our own protocols -- why re-invent the wheel?

Dealing With and Getting Around Blocking Sockets

How to get your program to be remain responsive while handling multiple sockets.
Currently includes a brief discussion of server strategies.

Graceful Shutdown and Crash Detection

Server Strategies

Resources and Links

Compilers
Books
Web Sites

Sample Network Applications with Source Code

Disclaimer
Links to the Source Code

Basic TCP and UDP echo servers and clients
Broadcast time server and client
Multi-client TCP and UDP echo servers with select, nonblocking sockets, and multithreading.
UDPTimeC and UDPTimeD -- Basic time client and server. Supports NTP and time service (port 37)
Trivial FTP (tftp) client.
MiM ("Man in the Middle")

Return to Top of Page
Return to DWise1's Programming Home Page

Contact me.

Share and enjoy!

First uploaded on 2002 November 08.
Updated on 2011 September 10.