These forums are CLOSED. Please visit the new forums HERE
chat lag anomoly |
|
Winter Phoenix
Voyager of Experiences
Join date: 15 Nov 2004
Posts: 683
|
10-14-2005 19:41
Ive been experiencing waves of lag which come out of nowhere and nail me for a few minutes. Not unusual in and of itself, been dealing with lag events for awhile, but this one seems to be a new version of lag event, at least for me. What happens is I will be chatting in room, and suddenly the lag hits, and what I type does not appear on screen. Thirty seconds later it shows up. Then theres this one, I type something, nothing shows up on screen. A little later I can chat in real time, and in the middle of the conversation the old chat I typed a minute ago finally shows up. And everybody in the room says, "what are you talking about??? " And I reply, " nooooo, I typed that line two minutes ago!!" Where the hell did that sentance go for 45 seconds? Spooled up somewhere in a buffer or something? This has been messing with me on a daily basis for the past month. Whats the scoop???
_____________________
~GIVEN FREE REIGN THE SYSTEM WILL TELL YOU,
WHAT TO DO, WHEN AND HOW TO DO IT, WHAT YOU CAN READ, VIEW, OR LISTEN TO, WHAT YOU CAN SAY, WHAT YOU CAN DO WITH YOUR OWN BODY, AND SUCK ALL YOUR MONEY OUT OF YOUR POCKET WHILE IT DOES THIS! QUESTION AUTHORITY!~ W.P |
Lee Linden
llBuildMonkey();
![]() Join date: 31 Dec 1969
Posts: 743
|
10-17-2005 13:36
Try this page for some good tips with regard to latency:
http://secondlife.com/tiki/tiki-index.php?page=TechLagLatency |
Malachi Petunia
Gentle Miscreant
![]() Join date: 21 Sep 2003
Posts: 3,414
|
10-17-2005 14:04
From the cited article:
If Packet Loss is a non-zero number, your network or ISP may be having issues. Also in the cited article, you mention the use of tools like WinMTR to determine where the loss is occuring.As these high-latency or high-loss scenarios are often bursty, by the time you have set up instrumentation, the customer may not be able to get any statistics on transient latency or loss. On the other hand, somewhere at the LL data center is somehere between one and a handful of gateway machines which likely keep continual statistics on outbound packets intentionally dropped due to excessive outbound congestion. Do you have those numbers at your disposal; if you do, would you be willing to publish them? I agree, there are many steps along the internet from here to there (and back) and certainly latency or loss can happen at any router in the end-to-end connection. If you were to say that there are no outbound packets that are dropped due to LL egress congestion, that would rule out one very real possibility and would require querying a few machines. Can you, will you assert that LL egress connections are never subject to intentional packet drops to outbound link congestion? Thanks for any information you can provide. _____________________
|
Lee Linden
llBuildMonkey();
![]() Join date: 31 Dec 1969
Posts: 743
|
10-17-2005 16:34
Like I said before, I'm an easy person to misread.
No, I won't tell you it's never our fault. I will not assert that we never lose a single packet of data. When I first attempt to diagnose a problem on the client's end, or on ours, depends on a few simple factors: 1) Whether I'm experiencing problems. 2) Whether other Lindens are experiencing problems. 3) Whether the people I'm speaking to inworld are experiencing problems. 4) How tightly the current grid op is gritting their teeth. 5) Whether or not the current grid op is running, and if so, how quickly. |
Malachi Petunia
Gentle Miscreant
![]() Join date: 21 Sep 2003
Posts: 3,414
|
new data - game packet loss and pings incommensurable
10-17-2005 18:02
Lee, your response above says everything but the specific data that would aid us both in pinpointing the problems.
Upon logging in tonight at ~1700LST, I was seeing huge packet loss spikes while at a singular merchant in sim105 with my camera trained on one 0.5m sales container. There were seven agents in the sim at that time, none in my vicinity. Whenever I see the red bar, I lower my Preferences->Network->Bandwidth, in this case from my default (and typically lossless) 900kbps to 600kbps. The loss spikes were still ocurring. Here is the loss data cut and pasted from Help->About over a 5 minute period: You are at 257375.6, 254752.9, 31.9 in Deimos located at sim105.agni.lindenlab.com (66.150.245.41:13002) I was not able to timestamp them, but as you can see, they were continually increasing. From the Ctrl-Shift-1 stats window, the loss was not constant, but would run from 0.0% to 11% or so over the sample period with Ping Sim and Ping User relatively stable at 100ms. CPU: 0.13 micron Intel Pentium 4 (2000 Mhz) Memory: 1535 MB OS Version: Microsoft Windows XP Service Pack 1 (Build 2600) OpenGL Vendor: NVIDIA Corporation OpenGL Renderer: GeForce 6800/AGP/SSE2 OpenGL Version: 2.0.0 Packets Lost: 637/34398 (1.9%) Packets Lost: 655/35298 (1.9%) Packets Lost: 681/36579 (1.9%) Packets Lost: 693/37494 (1.8%) Packets Lost: 758/41006 (1.8%) Packets Lost: 808/43244 (1.9%) Per recommendation, I fired up WinMTR in the middle of my sampling. Here is the direct ouput of WinMTR (with the first few hops names obscured for "privacy" ![]() CODE There are a couple of notable things. First, the router on the far side of my link does not respond to pings (but obviously passes traffic). Second, as you likely know, routers typically respond to ICMP messages with a lower priority than data messages as they need to use the "slow-path" in the router which means being interpreted by the router controller instead of being simply forwarded, thus, the ping times for ICMP traffic has a higher variance than the data packets. Most importantly, the lossiest of the hops was at your "front door" (pnap.net) and then sim105 itself. However in the WinMTR session which was temporally contained in the SL loss data shown above, your "front door" dropped a mere 5 ICMP packets in less time than sim105 lost over 800 datagrams. Lastly, as the attached log file shows, it really didn't matter at all what my Bandwidth preference was set to, because at various times my bandwidth was throttled to 0.18Mbps as at: 2005-10-17T23:39:26Z INFO Sending throttle settings, total BW 180000 I'm tired of doing your detective work for you and your brush-off above is impolite. Could we have your egress statistics? I've ponied up more than my fair share as I don't draw a LL paycheck. On second thought, I don't need em, as during the course of writing this, I let WinMTR run again and the big money loser was your front door at border1.ge1-1-bbnet1.sfo002.pnap.net with 112 out of 1953 pings dropped (the second lossiest - with half the loss rate - was one upstream from you at sl-internap-140-0.sprintlink.net). I know where the problem is. _____________________
|
Winter Phoenix
Voyager of Experiences
Join date: 15 Nov 2004
Posts: 683
|
All I have to say to that is..
10-18-2005 01:16
Yow!!!!
_____________________
~GIVEN FREE REIGN THE SYSTEM WILL TELL YOU,
WHAT TO DO, WHEN AND HOW TO DO IT, WHAT YOU CAN READ, VIEW, OR LISTEN TO, WHAT YOU CAN SAY, WHAT YOU CAN DO WITH YOUR OWN BODY, AND SUCK ALL YOUR MONEY OUT OF YOUR POCKET WHILE IT DOES THIS! QUESTION AUTHORITY!~ W.P |
Malachi Petunia
Gentle Miscreant
![]() Join date: 21 Sep 2003
Posts: 3,414
|
not surprised at all
10-18-2005 02:20
As expected, at 0200LST, with secondlife.com reporting 1229 active users (compared to ~3500 as of 1700LST yesterday) I am again able to run at 1000kbps with virtually no loss: Packets Lost: 80/213974 (0.0%).
As also expected, essentially no loss here: CODE |---------------------------------------------------------------------------------------|While the log file is too boring (read: no faults) to bother scrubbing for posting here. Dudes, you got like major congestion. As I noted in Hotline to Linden, this could explain much: ...As the protocols used by SL above the IP layer cause more traffic generation in the instance of IP level packet loss, this is a problem which will likely amplify itself in a positive feedback loop; that is, a little bit of loss due to output overages will induce more loss which in turn causes greater loss still. Given the bandwidth clamping mechanisms in the UDP retransmission methods that SL uses, one might also see wild hysteresis resulting from this overload. This could underlie the "lag-waves" that other players have reported of late where ping sim/user times suddenly spike from ~100ms nominal to 1 second to 8 seconds, freezing clients and causing other nastiness. _____________________
|
Nathan Stewart
Registered User
Join date: 2 Feb 2005
Posts: 1,039
|
10-18-2005 08:06
This looks pretty conclusive via another host
CODE
_____________________
|
Malachi Petunia
Gentle Miscreant
![]() Join date: 21 Sep 2003
Posts: 3,414
|
supporting data - ping loss at pnap load and hour independent
10-18-2005 13:27
I've found a relatively constant rate of loss coming from the Eastern US to LL regardless of hour or load:
CODE sim105.agni.lindenlab.com 18-oct-05 0830LST Inworld agents: ~900Do I have an agenda? Yep. I'm tired of real data being rejected in favor of what someone in customer support feels with his 5 point "belly test" enumerated above. _____________________
|
Nathan Stewart
Registered User
Join date: 2 Feb 2005
Posts: 1,039
|
10-18-2005 13:35
Well thought i would test this with a low loaded grid, so heres a trace to the preview grid, Sims online approx 53, agents online 10
CODE
_____________________
|
Malachi Petunia
Gentle Miscreant
![]() Join date: 21 Sep 2003
Posts: 3,414
|
I am Jack's total lack of surprise
10-18-2005 22:29
CODE sim105.agni.lindenlab.com 18-oct-05 2200LST Inworld agents: ~2200And to rule out the possibility of a fault in WinMTR, here is another tool querying the gateway router directly: Ping statistics for border1.ge2-1-bbnet2.sfo002.pnap.net (63.251.63.65): Alas, I have no data on how hard the "grid op" is gritting his or her teeth. I do know that neither Lee nor Hotline to Linden has deigned to acknowledge this despite the distinct possibility that it is inducing or exacerbating a plethora of problems with Second Life. Packets: Sent = 7806, Received = 7561, Lost = 245 (3% loss), Approximate round trip times in milli-seconds: Minimum = 94ms, Maximum = 367ms, Average = 104ms Be the change® _____________________
|
Malachi Petunia
Gentle Miscreant
![]() Join date: 21 Sep 2003
Posts: 3,414
|
10-22-2005 00:06
Please forgive me for having posted analysis that may have indicated that there was router congestion at Linden Lab's Data Center. I was clearly mistaken as was indicated in a recent Hotline to Linden response:
While I can't say we never have packet loss, we have massive amounts of bandwidth. If people are experiencing severe packet loss, the more likely culprit is their ISP. _____________________
|
Waves Lightcloud
SexBall Safety Designer
![]() Join date: 22 May 2004
Posts: 193
|
10-22-2005 00:26
Please forgive me for having posted analysis that may have indicated that there was router congestion at Linden Lab's Data Center. I was clearly mistaken as was indicated in a recent Hotline to Linden response:I offer my humble apologies for suggesting that there could be a fault at Linden Lab rather than at the globally dispersed ISPs dozens of players. I withdraw my hypothesis as it was clearly refuted and I shall not assert such again in the future. Oh please refute refute, Lee Let me put it inlaymen terms: Tang was already smoking a cig, and I had'nt even got my pants off yet !! you want to talk packet loss ............ The chat thing is really getting bad......My punch lines come onscreen after the party was over, some may bless you, But over all the last 3 weeks the losses are stagering for fluid game play. -Waves |
Winter Phoenix
Voyager of Experiences
Join date: 15 Nov 2004
Posts: 683
|
ohhh yah
10-23-2005 14:40
I forgot this thread I started was about my delayed chat phenominon. All the interesting charts were nice, but jeeeez. Got lost in there. What I was really interested in seeing was if it was just me or if a number of people were experiencing a new wave of chat latency thats developed over the past month. Waves Nightcloud just told me that there is at least one more poor soul experiencing this anomoly. Not enough to rule out a singular " it must just be me situation", but it does show me Im not the only one giving joke punchlines after everybody has left the party for the night.
_____________________
~GIVEN FREE REIGN THE SYSTEM WILL TELL YOU,
WHAT TO DO, WHEN AND HOW TO DO IT, WHAT YOU CAN READ, VIEW, OR LISTEN TO, WHAT YOU CAN SAY, WHAT YOU CAN DO WITH YOUR OWN BODY, AND SUCK ALL YOUR MONEY OUT OF YOUR POCKET WHILE IT DOES THIS! QUESTION AUTHORITY!~ W.P |
Philip Linden
Founder, Linden Lab
Join date: 18 Nov 2002
Posts: 428
|
10-29-2005 08:31
Sorry for not seeing this thread sooner - it certainly does look like there is loss at our routers. We will take a look on monday and see what we can find.
As was mentioned by Robin and Lee, our network connectivity to Internap is at least in theory quite good - we have (I believe - of course our growth is changing this rapidly) 2 separate GigE connections to Internap, who are themselves multi-homed. However, this data looks very compelling in suggesting packet loss at our outgoing switches or routers. Let me check into it. Again, sorry we didn't see this earlier. The 'chat lag' mentioned above, unfortunately, probably isn't caused by low levels of loss at routers. I say this because if there were a feedback spiral in retransmission that was exacerbating switch packet loss to the point of multi-second delays, a couple of things would happen that I don't think we are seeing here: 1. The loss and delay would simultaneously affect all or most users logged onto SL. 2. The max delay would be consistent with the switch buffer capacity - which is generally only a few seconds of flow. Seeing chat come to you a minute later suggests that something is blocking the simulator process - something like a bug or a large autosave process (every 2 hours the sims must save all their data to disk, and this can really slow things down on some sims). Since the simulator has lower aggregate traffic and can use main memory to buffer messages, a one-minute delay on chat is pretty likely something that is totally stalling the simulator. We know we have several problems in this category that happen with content on some simulators, and we are working to fix them. _____________________
Philip Linden
Chairman & Founder, Linden Lab blog: http://secondlife.blogs.com/philip |
ZsuZsanna Raven
~:+: Supah Kitteh :+:~
Join date: 19 Dec 2004
Posts: 2,361
|
10-29-2005 08:38
I have noticed intermittent chat lag, but the last few days have been pretty bad.
_____________________
~Mewz!~
![]() |
Khamon Fate
fategardens.net
![]() Join date: 21 Nov 2003
Posts: 4,177
|
11-01-2005 10:16
This seems to be happening again. I notice that when I disconnect, the web pages and forums are also unreachable, however, any other sites I connect to display no latency whatsoever.
The delay seems to be centered in the Dallas and SF areas. Maybe it's just that time of day. I wonder if they've ever considered distributing the sims and asset OH, I just got a new TOS agreement. They've been conducting a silent rollout across The Grid All Hail The Central Grid. nevermind _____________________
Visit the Fate Gardens Website @ fategardens.net
|
Malachi Petunia
Gentle Miscreant
![]() Join date: 21 Sep 2003
Posts: 3,414
|
11-04-2005 08:27
Sorry for not seeing this thread sooner - it certainly does look like there is loss at our routers. We will take a look on monday and see what we can find. As was mentioned by Robin and Lee, our network connectivity to Internap is at least in theory quite good - we have (I believe - of course our growth is changing this rapidly) 2 separate GigE connections to Internap, who are themselves multi-homed. However, this data looks very compelling in suggesting packet loss at our outgoing switches or routers. Let me check into it. Again, sorry we didn't see this earlier. _____________________
|
Philip Linden
Founder, Linden Lab
Join date: 18 Nov 2002
Posts: 428
|
11-08-2005 06:20
An update: In looking into this so far, we took a look at several days of recent activity by graphing the average reported session packet loss (packets lost are reported to us at the end of each SL user session) against peak hourly concurrency. That data suggests that there is nothing different happening at heavy concurrency - packet loss levels stay about the same regardless of how many people are connected to the grid.
We are still looking at two things: Trying to reproduce the ping/MTR loss that Malachi is reporting at border1.ge2-1.bbnet.sfo002.pnap.net Checking logs on our internal switches (those we own as opposed to internap, etc) to make sure that they are not reporting any sort of full buffers/packet loss. _____________________
Philip Linden
Chairman & Founder, Linden Lab blog: http://secondlife.blogs.com/philip |