PDA

View Full Version : some basic network theory


vonaldinjo
05-19-2003, 03:47 PM
I think this is almost like asking the a,b,c... of computing.

I have two linux computers connected with gigabit ethernet sending data back and forth between the two. If I sent a message of 100000000 (1e8)characters using socket in stream mode from client to server, server seems to be receiving it. The server is supposed to send it back. But it is not. Who is swallowing the message? Is it the network, i mean ethernet? It has enough capacity to transmit this message, isnt it? Is it then the computing power of the server that is failing? or what else is it?

RobSeace
05-19-2003, 07:04 PM
Well, you'll need to provide some more details on what exactly
is happening... If you're dealing with a TCP socket, then the
data shouldn't be getting dropped anywhere; or, rather, I should
say, if it is, you shouldn't notice it, since TCP will resend as
needed to get the data there, if it's at all possible for it to get
there at all... You say the server receives the data, but doesn't
send it back? But, what do you mean by that, exactly? Does
the send()/write() fail on the server? Or, does it SEEM to send
the data fine, but the data just sits in the server's send queue,
going nowhere? Or, does the data seem to clear the send queue,
but never arrive in the receive queue of the client machine? (That
last one should be a complete impossibility, BTW, with TCP...)
You should be able to see the amount of send/receive queued
data with "netstat" on each end...

mlampkin
05-19-2003, 09:33 PM
If you're waiting for the recving end to get all the data (all 100 megs) prior to it echoing it back, thats probably where the problem is occurring...

If you are using straight (system call) sends, you may be missing an error such as no buffs left... and would 2x check for that...

If you are using an upper level language and system / impl stream buffering instead of pure sends, that could also be an issue... since data may be getting cached at the app level and not being automatically flushed out to the tcp stack when room becomes available in the connection buffer...

...or any number of other things... but those are the two that pop immediately to mind... also see Rob's message (and questions) above...


Michael

Loco
05-19-2003, 10:53 PM
One thing you are not short of is: resources.

You've got many ways to check your system for the problems. I would start by using the following:

Check the frames sent/received at the OS level
Check the data sent/received at the application level by adding debug statements to the programs
Check the system calls of the application
Check the packets traversing the network


Checking frames/packets sent/recvd
For this, the easiest way would be to use ifconfig

[15:52:27][root@localhost:~]$ ifconfig
eth0 Link encap:Ethernet HWaddr 00:90:27:4F:0A:36
inet addr:1.2.3.4 Bcast:1.2.3.X Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:105667 errors:0 dropped:0 overruns:0 frame:5096
TX packets:108241 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
Interrupt:11 Base address:0xfcc0

eth1 Link encap:Ethernet HWaddr 00:60:97:70:AC:3B
inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:101962 errors:0 dropped:0 overruns:0 frame:0
TX packets:102513 errors:0 dropped:0 overruns:0 carrier:0
collisions:2442 txqueuelen:100
Interrupt:10 Base address:0xfca0

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:6853 errors:0 dropped:0 overruns:0 frame:0
TX packets:6853 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0

Here you can see that for each interface in my machine I get simple statistics of packets sent and received. Check this values before and after sending the information on both machines, and you should see some differences (probably big because of the data size).

Check the server is up and listening by using netstat --inet -pnl. This should show you the programs and ports they are listening on. If you cannot find your server program in there, chances are something is wrong in your server code. (I assume you are using LINUX, other UNIX systems do not have the --inet or the -p parameters, so you'll have to find which ones correspond to these, if any)

If you are using TCP sockets, after connection, use netstat --inet -pn on both to see if the connection is succesful, you should see an entry on both for the same connection (the same addresses and ports). Pay special attention to the Recv-Q and Send-Q fields of the ouput.

Adding debug lines to your code
The idea is simple, just write lines of code that write informative messages that you can use to control where in your code the application is executing, and the status of it. For example, seeing this output from the server should help you out pinpoint the error, the following is the expected output:

Server started...
Listening on port XXXX
New connection accepted from 1.2.3.4, file descriptor assigned 34
select() returned there is data available
fd 34 has data available
Read 1500 bytes of data
Echoing back to sender
send() returned 1500
select() returned there is data available
fd 34 has data available
...

However, you read something like:

Server started...
Listening on port XXXX
New connection accepted from 1.2.3.4, file descriptor assigned 34
select() returned there is data available
fd 34 has data available
Read 1500 bytes of data
select() returned there is data available
fd 34 has data available
...

You can see that the "echoing" and "send()" lines are now missing. Something in your code did not allow this part to run. So, now you can go and check it...

I know this is pretty basic, I hope you won't take it personally!!!

Check system calls
One of the best tools I have seen in Linux is "strace", I know the other UNIX have tools like this, so I leave it to you to check that for your particular flavor.

Usually, you don't have access to the code that is not working. And usually, the error is not in the code but in the configuration of either the application, or the whole system.

In these cases the strace utility is very handy. It is used to check the execution of system calls from an application.
Let's suppose you run an echo daemon, and it doesn't appear to work. You cannot put debug code on it, and you just can't figure out what the problem is.

You run strace against the offending program and you get something like this in the output:
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 0
setsockopt(0, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(0, {sin_family=AF_INET, sin_port=htons(7), sin_addr=inet_addr("0.0.0.0")}}, 16) = 0
And you finally understand that this application by default binds to TCP port 7, but the client application tries to connect to port 8000...

This is just a simple example.

Network packet???
Finally, the best way to see what is going on with packets that seem to be lost, is to use a packet sniffer. I use tcpdump for is flexibility and easy of use.

vonaldinjo
05-20-2003, 08:43 AM
wow! thats a lot of info (& questions !!). thankyou guys for the advice and the enthu.

vonaldinjo
05-20-2003, 01:03 PM
By basic debugging with screen printings, it seems like there was a delay for the receiving side to come 'to' recv() while the sending side was 'at' send(). It took me a while to realise that until the receiver recv()s all the bits, the sender does not consider it as being send according to TCP. so the sender seemed stuck. Am I wrong? Anyway when the sender and receiver co-ordinated, my msg is not lost.

according to mlampkin,
"If you're waiting for the recving end to get all the data (all 100 megs) prior to it echoing it back, thats probably where the problem is occurring... "

this is how I am not doing it now, how else can I do it?

RobSeace
05-20-2003, 01:26 PM
If the sending side was getting blocked in send(), then I would
guess you simply filled up both the receive buffer on the receiving
side AND the send buffer on the sending side... So, yes, until
the receiver started reading the data and clearing out the buffers,
no more could be sent...