Welcome to the Second Life Forums Archive

These forums are CLOSED. Please visit the new forums HERE

Sim hardware differences getting excessive

eltee Statosky
Luskie
Join date: 23 Sep 2003
Posts: 1,258
10-22-2004 22:54
I understand that as SL grows, they can't replace every box during every refresh. But some recent circumstances have brought to light just how disparate the oldest are getting from the newest now.

After the recent emergency downtime, lusk came back up, at more than double the framerate it had been previously.. from about 800, to 1600+. essentially the total time the server was taking on processing was halved what it was previously. At the time we had jus cheered our good luck. One thing we noticed tho was the sim changed from a '250' block sim (i think sim 263.agni.lindenlab.com) to a 450 block sim (sim451.agni.lindenlab.com)

wow thats a fast sim we thought, and it was... not only did it have a high number, but it stayed higher, longer with more usage.

well an accidental crash bug was found... and lusk went down.

and when it came back, it came back sub 200 fps, with flickering time dilation. We figured the thing we had been playing with that exposed the crash bug was causing the problem, so we quickly deleted it. but the sim stayed *LOW* the sim was in the same state, now, with 150 fps and time dilation.. it was creaking with 3-4 people in it... forget ever hostin an event or even jus havin some friends come over.. just 5 minutes before, with 1500 fps of headroom all that and more had been possible for us. in fact its not even just the overall number, its how badly it scales..

when i looked at the actual IP, it turns out it was now sim16.agni.lindenlab.com in these very oldest sims EVERYTHING 'costs' more sim lag wise.. every agent in the sim takes more time, every script running has a higher overhead. you get the idea

again i understand not every sim can be replaced in every refresh...

but these very oldest sims have got to go. when there is a *TEN TIMES* performance delta, thats just untenable. i would say even a 2x performance gap is probably too much from one box to the next as that can make a sim busy but handling itself well sudenly fall into chunking abit...

a 10x performance gap is like trying to run the indy 500 in a honda civic, horsepower wise



that could literally break a budding SL business if its sim suddenly went from 800 fps to 80

and i understand there is a certian 'perception' that 200 is enough.. and in some ways it is.. a sim thats just at 200 with load will still be mostly usable.. the problem is it has *ZERO* overhead, for events, or new projects, etc.

*especially* when every single agent and every single script in the sim is aparantly taking 10x as long to process as it would be in a newer better box.
_____________________
wash, rinse, repeat
Chromal Brodsky
ExperimentalMetaphysicist
Join date: 24 Feb 2004
Posts: 243
10-23-2004 00:19
The original specs for the SL servers were PIII systems, as I understand it. I don't know the specifics, but here's some conjecture for a relatively top-of-the-line PIII--

PIII-1.13Ghz (512kB L2)
133mhz FSB
PC133 RAM


Now, a typical modern system:

P4-2.8C (512kB L2)
800mhz FSB
DDR400 RAM (in dual-channel 865/875 northbridge)

This is more or less a jump from mid-level 2002 to mid-level 2004 tech. Worth noting, a P4-2.0Ghz is probably about equivalent or even a little *slower* than a PIII-1.13 w/512kB L2, but beyond that, we do begin to see some speed benefit from *some* of the architecture changes to the PPro core.

But, then we factor in things like FSB, which has jumped from 133MHz to 800Mhz. That has a huge impact, as does the increase in RAM speeds from PC133 to DDR400 dual-channel memory.

You'd be hard-pressed to buy a new PIII system today, and so, unless Linden Lab is acquiring used hardware, it's pretty much inevitable that their new sim server purchases, speaking plainly, run circles around the original sim hardware.

I was present for the apparent rebooting of the Lusk Sim on a machine whose performance seemed more like one of the old school systems. Qualitatively, even with only four or five connected agents, there could be no doubt that the user experience was diminished. It would appear as though, when a sim crashes, it is assinged in a semi-random manner to the hardware in the standby pool because, after multiple crashes, Lusk Sim moved from a 250 Sim FPS platform to a 800 Sim FPS platform. Alas, the 1800 Sim FPS platform didn't resurface.

Pragmatically, utilizing existing hardware is an understandable goal. Unfortunately, this is also leading to some substantial inequality in user experience. While making the hardware a sim runs on a random function of first-come first-served may sound statistically fair, it is ultimately damaging to the experience of participating. It can literally make the difference between the experience being laggy IRC graphical chat and being the new virtual reality medium Linden's vision is working toward.

I guess, from there, there are a variety of ways to frame this issue. Is it a 'fairness' issue? A 'linden needs to decomission old hardware' issue? Maybe it's more of an optimization issue, in that sims with greater aggregate dwell generally need more resources. However it's framed, I agree with Eltee-- the impact of the disparity between the sim hardware has become difficult to ignore. Being on a slow sim is akin to being under seige. It's not just annoying-- it's actively unpleasant and damaging to those who're established there. Please address this concern soon and candidly.
Morgaine Dinova
Active Carbon Unit
Join date: 25 Aug 2004
Posts: 968
10-23-2004 02:44
From: Chromal Brodsky
Unfortunately, this is also leading to some substantial inequality in user experience. While making the hardware a sim runs on a random function of first-come first-served may sound statistically fair, it is ultimately damaging to the experience of participating. It can literally make the difference between the experience being laggy IRC graphical chat and being the new virtual reality medium Linden's vision is working toward.
Indeed, Chromal.

Sadly, this unfairness is an inherent property of the statically tiled architecture of the present grid, which requires big-bang upgrade of all machines simultaneously for fairness, and that is clearly untenable. As long as a particular machine is associated with a particular zone, it's inevitable that any obsolescence translates directly into slowdown for that zone, and any new hardware added elsewhere in the grid cannot make up for it.

Our big scalability thread is very strongly concerned with breaking this static association between servers and zones. It provides fairness as a side-effect of achieving full scalability, since individual server power is distriibuted across all zones. It also allows older boxes to remain in the pool and hence to continue adding to overall server capabilities, when otherwise they would have to be taken out of service under static assignment or else the unfairness would be too great. In the hard commercial world where money doesn't grow on trees, this is invaluable.
_____________________
-- General Mousebutton API, proposal for interactive gaming
-- Mouselook camera continuity, basic UI camera improvements
Michi Lumin
Sharp and Pointy
Join date: 14 Oct 2003
Posts: 1,793
10-23-2004 21:04
I don't think that we're arguing thta a mass upgrade is reasonable, or expected, Morgaine. I don't think anyone expects LL to bring ALL hardware up to parity with the new sims.

But the older sims (<30.agni?) are SO much lower, SO problematic, that, well, shouldn't they at least be replaced at this point?

By looking at this, it looks like --- (and its certainly never been admitted to us) --- that quite possibly, LL has NEVER replaced a sim on the grid with newer hardware! (Just added more and more.)

Dropping from 1800 sfps to *200* sfps with NO LOAD is just NOT acceptable in -today's SL-.

Back when 10-20 people would be IN WORLD at a time, maybe this was ok. Back when building and scripting wasn't as prevalent...

But before the 'dynamic hardware allocation' that went in with 1.4 happened, Lusk was NO HIGHER than 200 simfps EVER, for over a year.

We couldn't hold events. We couldn't have more than three people hanging out, really. Forget building. The 'real estate' that we bought in SL was worthless.

After 1.4? Sometimes we have good weeks, and we get a fast sim. Other weeks we get medicore horsepower where we have to move events elsewhere. Sometimes, we hit that lucky early bank of 16, and we're back to v1.0 again, with time dilation shooting off the chart.


Linden Lab: What makes me the most angry is this: I am TIRED of people coming onto our land and making smart ass remarks to ME, disparaging comments to US, about Lusk. Since time immemorial, "Lusk is Laggy" has been a meme.

We've gotten into fights over neighbors as to who gets to use more scripts, who's causing the sfps to go <30, etc... That really doesn't need to happen.

People are GOING TO START TO NOTICE the disparity between sims, LL, and you're going to see conflict as to to people not being able to USE the land that they pay you folks for.

A while back we were told that the older, earlier hardware would be repurposed for a test grid. All signs point to the fact that this has not happened.

I have to wonder, is LL's plan to continue running the old hardware until it simply fails?
Korg Stygian
Curmudgeon Extraordinaire
Join date: 3 Jun 2004
Posts: 1,105
10-23-2004 22:18
Since the resident consensus seems to be that the void sims are a good idea... I was wondering if there is any serious consideration of simply repurposing/reassinging the oldest hardware to these sims? Take the oldest hardware out of the reallocation loop completely and use them to run the void sims.

Since well over 90% of the time that I look, no one is in any of the void sims (which is when I head over to them), is there any reason that even the oldest hardware currently in use coudln't run the "4-void sim" setup? Notice, I said, the times when I look. I could be completely wrong here.

But I think the max number of people I have ever counted in the 4 void sims (7,8, 12 and 20) that I am thinking of, was 6 total. No buildings. 6 vehicles. 6 av's. I would think that any of the hardware used even in early SL should be able to run what is essentially a blank sim with a couple of av's and a few vehicles.

Of course... I am probably wrong and I am sure someone will be happy to educate me about just how and why I am wrong here.

Please do.
eltee Statosky
Luskie
Join date: 23 Sep 2003
Posts: 1,258
10-24-2004 01:08
From: Korg Stygian
I would think that any of the hardware used even in early SL should be able to run what is essentially a blank sim with a couple of av's and a few vehicles.

Of course... I am probably wrong and I am sure someone will be happy to educate me about just how and why I am wrong here.

Please do.


actually thats a pretty damn good idea. Even if the old hardware can't run FOUR voids, which it MORE than likely can, jus using it to fill in those gaps even three, or two.. etc at a time rather than wasting whole new servers on it would probably work great. It'd keep LL from having to completely throw away old hardware, and so long as the voids were kept on a seperate crash recover cycle from the normal sims, it'd keep a crash from having a sim come back up on one of the dreaded originals.

also it might encourage LL to build with more spaces in mind... since they could be rather easily filled with the quite ample bank of older machines, they could place more buffers and end up with fewer sims that have eight 'active' neighbors
_____________________
wash, rinse, repeat
Moleculor Satyr
Fireflies!
Join date: 5 Jan 2004
Posts: 2,650
10-24-2004 02:06
While I agree that void sims are a generally good thing, and the older boxes should be given the job of running them, if LL ever does any such thing I hope they get permission from the neighboring sims before shoving a void sim in. Some places enjoy their "on edge" feel without actually being "on the edge".
Meiyo Sojourner
Barren Land Hater
Join date: 17 Jul 2004
Posts: 144
10-24-2004 04:00
I could be very mistaken but I SWEAR I read or heard somewhere that older machines were either already being cycled out or they are planning on cycling them out to put into the preview grid. Maybe this was in a town hall discussion... I dunno. G'nite.

-Meiyo
_____________________
I was just pondering the immortal words of Socrates when he said...
"I drank what??"
Morgaine Dinova
Active Carbon Unit
Join date: 25 Aug 2004
Posts: 968
10-24-2004 04:03
From: Michi Lumin
But the older sims (<30.agni?) are SO much lower, SO problematic, that, well, shouldn't they at least be replaced at this point?

By looking at this, it looks like --- (and its certainly never been admitted to us) --- that quite possibly, LL has NEVER replaced a sim on the grid with newer hardware! (Just added more and more.)
...
I have to wonder, is LL's plan to continue running the old hardware until it simply fails?
It could be so. It's probably just a financial decision, since they have very limited options.

In a statically tiled grid where any given zone can only be served by one machine, what alternatives are there after all? Very few. A grid server is either assigned to a zone or else it's not adding its power to the grid at all, there's no halfway house. Which mean that your call for older servers to be replaced is equivalent to asking LL to take the business decision of writing off their investment in that old server right now. And that old server probably cost a lot more than the current ones.

It can be fudged a bit by hand of course, by shuffling servers around for a spot of coarse manual load balancing, but that can never be fair. And they can stick old boxes out in the water and even get them to time-share across more than one water zone, as they have been doing (it's always been assumed I think that the oldest servers were used for that, but Korg's question is a good one --- maybe they haven't). And that's it, there are no other options. And once all the water zones have been given old boxes, what then?

Well, none of this would be a problem in a dynamically-assigned architecture like we're been discussing in the scalability thread. The grid then becomes virtualized and all servers contribute whatever power they have to all zones, and only where processing is required. As an example, a water zone would take zero server power out of the pool while there are no avs nor scripted objects in it.

Although the fairness is just a side-effect of having a dynamic architecture, it's worth noting how LL's options would suddenly open up. Instead of coming under customer pressure to write off older kit, it would be able to leave it in the pool until the end of its days --- a sure way to recoup their investment. And new boxes could be added to the pool at any time in step with customer numbers, without requiring any service outages, all transparently to users.

Such benefits are so collosal from a business perspective that sometimes they alone drive the change to a dynamic architecture, instead of the need for scalability. I've seen this before, when having a permanent 24/7 callout team to fix failed servers has been regarded as either too draining on staff or too costly. In our case it's probably the other way around because Philip clearly wants large customer numbers, but either way, those currently suffering from server power disparity would come out winning.
_____________________
-- General Mousebutton API, proposal for interactive gaming
-- Mouselook camera continuity, basic UI camera improvements
Kex Godel
Master Slacker
Join date: 14 Nov 2003
Posts: 869
10-24-2004 05:14
Suggestions
  1. Use a few older sims to make more sandboxes, make them island sandboxes so that they get the most speed possible without neighbor sims to negotiate with.

  2. Sell the older hardware as island sims, at a significantly discounted price. Say $100/month, $200 setup. The setup fee should definitely be discounted since you'll be selling a server you already got extensive use out of.

  3. Make a new residential continent from the older hardware. Set the sim-wide prim limit to the old 10,000 standard, but also give a land allocation bonus, so that for example, owning 1024m of land only costs 512m of your account's land allocation. This would create a somewhat peaceful atmosphere for a more "rural" (lower-density) development.

  4. [Michi's idea] Create a permanent preview/test grid. Slower hardware should be a better testing platform since you won't need as many people to stress it or load it down. Also, a permanent test/preview grid would encourage people to create very high quality content, since they could test texture and animation uploads there without spending money testing them on the main grid.

  5. [Korg's idea] Fill in voids with older hardware, but keep those regions Linden-owned, and no-build (like the existing void sims).

  6. [Morgaine's idea] A dynamic grid would be the optimal solution. It would almost certianly a complex system to implement -- however, it would definitely pay off in the long run to invest research and development into such a system. =)
Kurt Zidane
Just Human
Join date: 1 Apr 2004
Posts: 636
10-24-2004 16:02
Cluster Processing would probable offer great performance for every one. But if a sim has a memory leek, or start to having an exponential increase in processing time; wouldn't the sim start to cause problems in other sims? Because now it could flood it's problems over several machines.

If they're not going to replace the server. Perhaps they should consider having different pools of machines. IE: a p3 sim can only use a back up p3 machine. And a p4 sims can only use a back p4 machine. I know this would suck if your sim is a p3, but at least every one would know their average fps. And that also means you could move away from the 'problem' if you want to. It also would avoid the possibility of people purposely trying to crashing sims , their own and others, in an attempt to get their sim running on a p4 instead of a p3.
Morgaine Dinova
Active Carbon Unit
Join date: 25 Aug 2004
Posts: 968
10-24-2004 20:08
From: Kex Godel
Suggestions ...
A masterful summary! :)

Yes indeed, that's a very good set of ideas, well expressed.
_____________________
-- General Mousebutton API, proposal for interactive gaming
-- Mouselook camera continuity, basic UI camera improvements
Morgaine Dinova
Active Carbon Unit
Join date: 25 Aug 2004
Posts: 968
10-24-2004 20:42
From: Kurt Zidane
Cluster Processing would probable offer great performance for every one. But if a sim has a memory leek, or start to having an exponential increase in processing time; wouldn't the sim start to cause problems in other sims? Because now it could flood it's problems over several machines.
This is an important issue you raise, although it's not really affected much by the architecture of the platform. A gibbering server always has the potential of affecting other components in its environment, and the common method of reducing the likelihood of full-scale mayhem is to isolate clustered servers with reliable switches, and isolate different clusters by keeping them in separate VLANs. This kind of technology has been standard for a decade now, and is rock solid. One would not expect any special problems in this area.
From: someone
If they're not going to replace the server. Perhaps they should consider having different pools of machines. IE: a p3 sim can only use a back up p3 machine. And a p4 sims can only use a back p4 machine. I know this would suck if your sim is a p3, but at least every one would know their average fps.
There would be no need for that in a dynamically assigned architecture. All servers old and new can operate on the workload stream together and to the best of their abilities. There would be no slower zones and faster zones, since fast and slow operations are statistically interspersed to yield a minimally varying averaged FPS in all zones. A slower server would merely get through fewer workload units per second than a faster server, that's all.

And that's why disposal of old servers won't be a problem even after all the test clusters etc are filled up with them, because even old servers can continue working within a dynamic cluster right to the end of their days, or until LL has the money to write them off and replace them.
_____________________
-- General Mousebutton API, proposal for interactive gaming
-- Mouselook camera continuity, basic UI camera improvements
Morgaine Dinova
Active Carbon Unit
Join date: 25 Aug 2004
Posts: 968
10-30-2004 15:21
Philip's latest Town Hall is negative on the issue in this thread, btw:
From: Philip Linden (27-28 Oct Town Hall logs)
Yoshi Platini: Will you be investing in server upgrades to bring all sims to a uniform performance level?
Philip Linden: Currently all the sims are very similar hardware in terms of performance.
Philip Linden: The differences are in the content.
Philip Linden: We will keep building tools to better balance and debug content, yes.
Philip Linden: But we don't need new servers.
In the short-term then, you won't see the mentioned imbalances addressed, it seems.

In the longer term though, it's all going to be dynamic as we've been discussing, since that's the only real/viable approach to scalability:
From: someone
Beatfox Xevious: Are there any plans to distribute processing of the grid evenly across the servers, instead of having a dedicated server per sim? This would be a major boon to high-traffic sims.
Philip Linden: We will work on ways to better distribute load. This is a hard problem....
Philip Linden: we are thinking about different designs.
Philip Linden: Long term I'm sure it will be something we'll be able to do.
So, I guess that's some small comfort to you, Eltee. :)
_____________________
-- General Mousebutton API, proposal for interactive gaming
-- Mouselook camera continuity, basic UI camera improvements
eltee Statosky
Luskie
Join date: 23 Sep 2003
Posts: 1,258
10-31-2004 05:41
mm yeah the negative from phil i found rather troubling... either he lied directly.. or hopefully more likely somone was *SUPPOSED* to have taken care of the problem already and had well potentially back-burnered it.

I heard through some other channels that it may be happening now that its been brought to someones attention.

Its hard to get enough into a townhall question (such that we know in thie particular instance its not content its servers since three different dns addresses for the same wim within 5 minutes had a near 10x performance spread heh)

overall though this is going to be a continuous problem with the current architexture.. its scalable sure... but its also always going to have some creaky dusty old corners that will never behave as well as the new stuff.. and believe me when i say it'll just be a server crash away
_____________________
wash, rinse, repeat
Morgaine Dinova
Active Carbon Unit
Join date: 25 Aug 2004
Posts: 968
10-31-2004 08:23
Until the architecture goes dynamic (and it will, there is no other realistic way), I think your best bet might be to push for LL decoupling the rendering and networking code into separate threads in the client.

Although this doesn't really speed up your slow sim server CPU of course, it can reduce the number of context switches it has to do if it's not waiting for your ACKs, so this helps to prevent its cycles being wasted. (Generally event loops exhaust one event source before returning to the outer loop.)

And of course on a personal level, the responsiveness you perceive in your client would be improved by the above even if your slow server is still nattering away to your box, because many objects are already in your cache and are still valid yet are not currently drawn, so merely turning around can result in instant rendering instead of network-jammed rendering.
_____________________
-- General Mousebutton API, proposal for interactive gaming
-- Mouselook camera continuity, basic UI camera improvements
Alicia Eldritch
the greatest newbie ever.
Join date: 13 Nov 2004
Posts: 267
12-08-2004 13:04
From: Morgaine Dinova
Until the architecture goes dynamic (and it will, there is no other realistic way), I think your best bet might be to push for LL decoupling the rendering and networking code into separate threads in the client.


I second that. Simple and it will help a lot.
On a related point, generally, the more things they can get the clients to do, the better.

(A wire-frame mode might be nice too.)
Eggy Lippmann
Wiktator
Join date: 1 May 2003
Posts: 7,939
12-08-2004 13:26
They are already "decoupling" the client. They have been for months.