Does a single lost UDP packet freeze the stream?
|
|
Morgaine Dinova
Active Carbon Unit
Join date: 25 Aug 2004
Posts: 968
|
11-10-2004 14:46
TCP provides a sliding and overlapped data/ACK window into a byte-oriented data stream. LL's UDP-based implementation presumably employs something very similar so that single lost UDP packets don't freeze all further streaming until the mishap is resolved.
If we assume that LL does this already, then multiple UDP data and ACK packets will be in transit concurrently --- which is great. But if this is so, then the sliding window doesn't seem to be wide enough.
Does anyone know whether LL's client-server UDP protocol is already windowed, and if so, what the window size is, ie. the maximum number of outstanding ACKs? And, can those with poor quality links change it if needed?
|
|
Morgaine Dinova
Active Carbon Unit
Join date: 25 Aug 2004
Posts: 968
|
03-11-2005 08:06
<sigh> 
|
|
si Money
The nice demon.
Join date: 21 May 2003
Posts: 477
|
03-11-2005 14:52
Actually, the SL implementation of lost UDP is far worse than that.
Yes, it does cause the entire stream to freeze and return to the point where the out-of-sequence packet occured (bear in mind, there are many streams at one point, so it doesnt' freeze *everything*).
However, they also put in a lovely (broken) system which on out-of-sequence packet receive, it starts trying to throttle your bandwidth in the assumption that packet loss == bandwidth deficiency. Which is a very bad assumption.
I have a thread on packet loss further down in this forum, which is supposedly being looked at by a few people at LL. I haven't heard any updates on it though.
_____________________
Like a soul without a mind In a body without a heart I'm missing every part -- Progress -- Catherine Omega: Yes, but lots of stuff isn't listed. "Making UI harder to use than ever" and "removing all the necessary status icons" things.... there's nothing like that in the release notes. 
|
|
Lee Linden
llBuildMonkey();
Join date: 31 Dec 1969
Posts: 743
|
03-15-2005 10:36
I'm poking around asking the developers... but my initial understanding is that the above description isn't quite how SL works. More to come soon, hopefully.
|
|
si Money
The nice demon.
Join date: 21 May 2003
Posts: 477
|
03-15-2005 18:44
Lee, I'd love a good discussion and explanation of the network and retraining mechanics of the SL streams. It would be very helpful in tuning firewalls and systems to better handle the UDP sensitivity of SL.
Most specifically, my understanding is that a UDP packet for a particular stream out of order causes that stream to stop and return to the point of out-of-alignment, without a delay for acceptance of packets which may enter the stream late, but still arrive.
Secondly, the biggest problem I run into involves the bandwidth retraining of SL based upon the above. Watching the logs you can definitively see that it's a documented behavior that an out of order packet (packet loss, as it is considered) causes the bandwidth throttle of SL to automatically drop.
I do understand the reason behind this, and 95% of the time it is likely a valid consideration, that packet loss/out of sequence packets are coming from the fact that a user is doing something secondary with their network connection, and the packets are coming in out of sequence due to the bandwidth saturation of this. This is a good idea in these cases, and will help/solve the problem.
The problem I see, though, is in the 5% of situations where the packet loss is an error condition, and not a loss based upon bandwidth availability. In an error condition the drop rate will most likely be fairly consistent percentage (statistically speaking), regardless of bandwidth until the point of bandwidth congestion. What this means it that by lowering the bandwidth you amplify the problem by lowering the amount of data which is sent in a given timeframe (lower bandwidth) which will slow SL, but the packet loss will continue at the same given percentage (delay of communications from errors), which causes double the harm to the SL stream as a whole.
Where this leads is basically not a bad design on the retraining system, but a non-flexible one. A simple solution would be a checkbox with the bandwidth limit slider, which turns on/off the bandwidth auto-limiter based upon packet loss (marked NOT RECOMMENDED!) to allow for those troubleshooting an error state to turn this off.
As a solution to the first problem listed, I'd love to see a configurable delay system for a stream receiver, allowing for some adjustments to allow a window of out-of-order receive before a packet is considered lost, most likely using a queue. This would be pretty difficult to implement, so I understand why it hasn't been. Another good system would be an implementation of RED into the queue, to allow for a more intelligent detection of bandwidth congestion lost packets versus error state lost packets.
These are my thoughts, at least. I'd love to hear responses from those who develop the network protocol for SL as to why these would be bad ideas, and suggestions they have for client-side network optimization.
_____________________
Like a soul without a mind In a body without a heart I'm missing every part -- Progress -- Catherine Omega: Yes, but lots of stuff isn't listed. "Making UI harder to use than ever" and "removing all the necessary status icons" things.... there's nothing like that in the release notes. 
|
|
si Money
The nice demon.
Join date: 21 May 2003
Posts: 477
|
03-24-2005 15:45
bump for hopeful dev response.
_____________________
Like a soul without a mind In a body without a heart I'm missing every part -- Progress -- Catherine Omega: Yes, but lots of stuff isn't listed. "Making UI harder to use than ever" and "removing all the necessary status icons" things.... there's nothing like that in the release notes. 
|
|
Rathe Underthorn
Registered User
Join date: 14 May 2003
Posts: 383
|
03-24-2005 16:39
SL allows for out of order packets. The window is somewhat long, but not very long, I do not know the exact size of this but I'll explain my understanding of the protocol below.
Every packet is is assigned a sequence number to simulate a stream of UDP. There are three subgroups of packets based on their frequency, high, medium, and low. I believe frequency has some relation to priority as well.
Important packets are assigned a reliable flag. These are the only packets that get acked. Acks are sent in another packet or sometimes appended to outgoing packets. If the server (or client) sent a reliable packet and didn't receive an ack for it within a given period of time it will resend that packet with the same sequence number and a resent flag. After a given period of time and a couple resends it will eventually drop the packet entirely. However it never stops continuing to send or receive packets in order as it would have normally had a packet not been dropped, it simply inserts the resent packet into the stream out of order and is capable of handling out of order packets.
Since the client always maintains a stream to the data server, the sim you're currently in, and all surrounding sims within viewing distance it has many seperate streams going on at the same time and manages each one individually. Sometimes if a stream is experiencing too much packet loss it will drop the stream entirely and try reconnecting (usually in the case of sims) and starting over. The problem comes when too many reliable packets go unacked in a long enough period of time. All packets not flagged reliable can be happily dropped without any real impact, other than probably laggy enviromental updates. I think sometimes though in a bad situation the high frequency unreliable packets can really start pounding the server and client when the reliable ones need more priority or eventually a timeout is going to be raised.
There's also a layer of virtual streams over these UDP streams, I think, for file transfers, textures, sounds, objects, compiled scripts, apperance texture layers, etc. The cool thing is that it sends out parts of a file in seperate packets and these can be sent out of order and are reassembled in order in the virtual file system (cache). This is particularly useful for how Second Life starts to render things in real-time as fast as the textures, or parts of the textures come in because it utilizes Kakadu for progressive wave based JP2 (JPEG2000) streams.
I could ramble on for days more, and probably more accurately, but I'm at work and should get back to doing it.
|
|
si Money
The nice demon.
Join date: 21 May 2003
Posts: 477
|
03-25-2005 08:28
Interesting -- if the streams act like this, how can we apply stream behavior to 'rubberbanding'?
If the stream is capable of handling an out of order packet in sequence you shouldn't see that, but rather the normal "warping" effect that you see in most MMOGs.
In the stream of movement updates you should see in effect
0,1,2,4,5,6,7,8,9 and still end up at 9, with 3 retransmitted later, causing a warp between updates 2 and 4.
In SL, we see 0,1,2,4,5,6,7,8,9 and then we jump back to 3, with all updates beyond that point dropped. Now while the stream will continue to transmit packets and updates will go, as soon as it realizes it's "lost" on both sides, you jump back to the original point.
Now, granted, this is only an example based upon movement updates, but since the "rubberbanding" effect is visible both to the person experiencing it and those viewing that person.
Am I wrong on this?
_____________________
Like a soul without a mind In a body without a heart I'm missing every part -- Progress -- Catherine Omega: Yes, but lots of stuff isn't listed. "Making UI harder to use than ever" and "removing all the necessary status icons" things.... there's nothing like that in the release notes. 
|
|
Merv Tank
Registered User
Join date: 6 Dec 2005
Posts: 5
|
My throttle observations
01-17-2006 20:56
If the client gets "too busy" trying to do to much work per frame, you will see artifically high ping times in the client. These are not based on true network RTT, but on the client not scheduling its time correctly and massively (factor of 10-100) overestimating true sim ping time. This causes the client to see imaginary packet loss/delay that is not there. It respondes by (incorrectly) throttling down the connection streams. I have quite a good connection to LL and the world and watch it throttle up and down from 100K/sec up to 900K/sec bouncing back and forth with a period of less than a minute. This usually tracks with jumping/rubberbanding and artificially high ping sim times.
My guess is that bad ping time estimation is driving poor bandwidth estimation. How about providing a preview/beta build that runs over a socks/TCP proxy at LL and see if that fixes my grey/textures problem?
Thanks, Merv.
|
|
Introvert Petunia
over 2 billion posts
Join date: 11 Sep 2004
Posts: 2,065
|
01-17-2006 21:20
They've been beaten over the head repeatedly about this topic. This was the last quasi-official word I've seen on the subject. I don't think that anyone there actually knows how the protocol works or is supposed to work. But as an attempt at rewritting TCP from scratch they've pretty much failed their senior project. Good luck with your pursuit.
|
|
Argent Stonecutter
Emergency Mustelid
Join date: 20 Sep 2005
Posts: 20,263
|
01-19-2006 10:55
From: si Money Yes, it does cause the entire stream to freeze and return to the point where the out-of-sequence packet occured (bear in mind, there are many streams at one point, so it doesnt' freeze *everything*). They seem to have changed this... if you watch the traffic in the debug window you'll see messages about out-of-order packets after the lost packet, followed by a VERY rapid bunch of "replaying" messages when the lost packet gets resent... far too fast for them to be resending all those missing packets. They're clearly saving the out-of-order packets in a local buffer somewhere so they can be used when the lost packet shows up.
|