I triple dog dare you, LL!
|
|
Mark Linden
Funky Linden Monkey
Join date: 20 Nov 2002
Posts: 179
|
11-04-2004 21:34
For the techies:
[mark@sim1 ~]$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.26GHz stepping : 4 cpu MHz : 2259.207 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips : 4508.87
sim1 is the oldest production machine we have. As you can see, it's a P4 2.26 Ghz (with 512MB of RAM, but you can't see that from this cpuinfo).
[mark@sim400 ~]$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.80GHz stepping : 9 cpu MHz : 2795.281 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid bogomips : 5583.66
sim400 is one of the newer ones. As you can see, it's a P4 2.8 Ghz. That's only 600 Mhz faster than the original sim1. It also has 512MB of RAM.
Some of the middle ones are 2.4Ghz or 2.6Ghz.
Some of the really new ones are actually dual Opterons (that run 2 sims each... one per CPU). They have 1GB of RAM, (512MB per simulator).
The difference in hardware truely isn't very significant... however, we could be doing something "wrong". OTOH, 5ms in Run scripts is okay; any sim FPS greater than 100 or so is basically wasting server CPU...
|
|
Moleculor Satyr
Fireflies!
Join date: 5 Jan 2004
Posts: 2,650
|
11-04-2004 21:41
From: Mark Linden The difference in hardware truely isn't very significant... however, we could be doing something "wrong". OTOH, 5ms in Run scripts is okay; any sim FPS greater than 100 or so is basically wasting server CPU... Which just further supports our statement that performance is not equal. 1. SimFPS is one thing, but when Time Dilation starts twitching, that's a serious performance issue. 2. If the difference in hardware isn't significant, and there's a significant difference in performance (and there is), then the problem is probably elsewhere. Content is one possibility, yes, and LL has the ability to prove/disprove that in the palm of their hands. All they need to do is take a few minutes to run a couple tests, see if we're wrong or not. Until y'all do though, don't blame the content when we ourselves are 99.9% certain that the content isn't changing to any significant degree between these server/performance changes. If it's not content, and it's not hardware, then it's software, and something that can be fixed. You're in bug fixing mode, are you not?
_____________________
</sarcasm>
|
|
Gwyneth Llewelyn
Winking Loudmouth
Join date: 31 Jul 2004
Posts: 1,336
|
Magical moment here...
11-05-2004 04:27
Slightly off-topic, but I really would like both Mark and Philip for their time and patience to actually post some "inner details" of Second Life's grid computers. That was a surprise for me (well, I'm still "new" in SL  ) since I usually regard that as somewhat "confidential" information. Thanks for sharing it - I really don't remember of any sort of company being so "free" with their "inside information", in the entertainment sector or elsewhere (and trust me, the companies I worked for would NEVER do that  ). More to the topic, one of the hardest issues in measuring a distributed network is that results are "unpredictable". By this I mean that, unlike non-networked, single-user computers, where you can load all the software you want in a certain order, do your testing, and come out with the same results (computers are supposed to be "predictable"  , the same does not apply to networked, multi-user systems. This is not a "lame excuse" for telling Linden Lab "sure guys, you're doing a great job, ignore our complains, we know we can't trust our own measurements anyway". It's just stating something which may not be obvious for people dealing with distributed, multi-user networked environments. Measurements aren't "trivial" to do. Worse than that: similar to quantum mechanics, when you try to measure these kinds of environments, you are actually introducing changes in the environment, which you should take into account. If you remember the "database wars", you can also see where performance benchmarks are oh-so-very-hard to get right. If I'm not mistaken, Oracle still forbids people to benchmark their databases and publish the results - unless an Oracle technician does the fine-tuning. One reason for that is because you can install a copy of MySQL, use the default settings, download MySQL's own open-source benchmarks, and see it outperform Oracle's fabulous database server every time in every test, running on the same hardware. Fine-tuning Oracle to give better results than MySQL (on the same benchmark suite) is way too difficult except for a very talented technician (and most of those work for Oracle anyway). So how can a server with just 20% less CPU (or rather... BogoMIPS) be 10x as slow running a script, in the same sim, with the same set of objects, and apparently the same environment? The issue here is that it's never the "same" environment. Lots of stuff can be different. For instance, when a sim reboots, often its asset cache is also cleaned. So even if there is just one single agent (avatar) connected to it at, say, 2 AM PST (low traffic in the Internet overall, at least for the US and Europe), this "new" reloaded sim will actually perform worse than the same sim running on an "older and slower" computer where the asset cache is "filled up" (even with multiple agents connected to it at a very busy hour, from the point of view of Internet traffic). Yes, the cache really makes a difference, and very often you can get completely unexpected results just by fine-tuning the cache (also a trick which is beyond the common knowledge of a computer-savvy techie - it's one of the most hard things to do in the systems administration realm  - and that's why LL has top-of-the-line sysadmins working with them). So is the difference measured by eltee or claimed by Moleculor Satyr just a question of cache? Of course not, there are literally thousands of tiny itty bitty details that play a role, too. ALT-1 stats are a big help, but, as someone mentioned it, we get "canned" statistics from a "virtual environment" (which is actually much more meaningful than just looking at the computer's load average or memory consumption). It's impossible to know what exactly causes a loss of performance just by running a script and try to understand why it gives such different values on "almost equal hardware". A test running on a computer with 24% more CPU power can suddenly take 2 or 10 times less to run  and it's not obvious at all why this happens. As a rule of thumb, however, you can have an overall idea of server performance - i.e. saying "this server is slower/performs worse than that one". If you do around-the-clock testing, you would certainly be able to measure a "trend" in performance differences: say, a sim running on computer A will always be slower than running on computer B. However, according to my experience, it will be almost impossible (some authors would claim that due to the chaotic nature of a networked, distributed environment, it would be totally impossible) to look at the test results (and sincerely, ONE "test" is an exception and nothing more, it's just what we call anecdotal evidence - trying to extrapolate reality from one single test point) and get consistent results with an error margin that satisfies the masters of statistical analysis. This usually means assessing that "this computer is always 2x slower than this one" with a 95% confidence margin. However, statistics do not apply very well to measurements of networked, distributed environments. You usually get error margins of 100% or 1000% (instead of 5%  ) and have so many points to discard from the data series that suddenly your test results do not make any sense at all. Finally, we also have a problem arising from the degree of "fine measurement" we can do. It's not realistic to say "this script runs in just 100 ms on computer A, but takes 1 second on computer B" since average ping time is around 200 ms  Measurements should be done only for a higher order of magnitude. So we all suffer when the asset server takes 30 seconds to save your inventory (< 1.5.6), but it's very hard to blaim the asset server if it takes one second instead of half a second. Our possibility of measurement is simply not "fine-grained" enough for us to make statements at that scale. Again, this also not obvious, and running looped tests usually can give a better assessment of what is going on. But on the other hand, a "long loop" (running for several minutes) will be subjected to thousands of small disparities (lag, packet storms, agents coming in and out of the sim, the computer doing self-maintenance like swapping memory pages in and out, etc.) which will be "anomalies" influencing the final results. So does this mean that Philip, Mark and the rest of the people at Linden Lab are "right" and we all (who log in to SL and can see what happens) are "wrong"? Not at all. My point is that "both" are right. LL claims that all sims are "approximately the same hardware", and that is certainly true - even when taking into account that a difference of 18 months, according to Moore's Law, is enough to double raw CPU power  Nevertheless, the difference in hardware is, for practical matters, almost irrelevant. On the other hand, we certainly experience different results when connected to "older sims" - either we "feel" it (everything seems to be slower) or by running tests and coming up with the same result. How to do benchmarks under this model? There is really just one way. Disconnect two computers from the grid - one "early model", one "latest model". Load the same sim, clear caches, reboot. Run performance tests with no connectivity to the net and no agents in-world. Compare  A die-hard "distributed skeptic" will probably quibble about things like "this computer is 0.01% faster!" (since on a multi-user environment it's almost impossible to get that level of precision). But there will certainly be a difference, and I agree it will be more than the 24% BogoMIPS difference shown by Mark's post. The point is, that difference will be for isolated hardware only - and not harware plugged in into the grid. - Gwyneth (yes, I did some research in measurements of distributed, graphical environments, a loooooooong while ago  )
|
|
eltee Statosky
Luskie
Join date: 23 Sep 2003
Posts: 1,258
|
11-05-2004 07:52
From: Mark Linden any sim FPS greater than 100 or so is basically wasting server CPU... well thats a statement i'll have to take issue with pretty specifically mark... first off you begin to notice time dilation twitch below about 200, not 100... and even small twitches there begin to make for big problems as people move around etc... the other issue is yes, okay say too a sim at 200 is *usable* for the people *IN IT RIGHT THEN*... it is NOT usable however in the grand scheme of the word if its at that 200 with only 1-2 people in sim.. and you wanting to have an event on yer land... yer STARTIN with 200 sim fps.. before an event... get 20 people in there and believe me yer gonna be at 20-30 sim fps with 50% time dilation the whole time... while on a new sim yer startin at 1000 sim fps.. and you won't be dipping below 200 even in the middle of an event... and thats sorta the crux... yer sayin worst case 200 is usuable fine.. and i'm agreeing with you.. but on sim16... in lusk... 200 was *BEST CASE*... it was there.. while empty... and it was *ONLY* going to go down as people started arriving. and theres more to this problem than simple hardware cycles on the cpu... theres somethin weird.. and i've noticed it for months... in that on these older sims.. that first block... each *AGENT* aka person in the sim.. seems to take *ALOT* more processing than on the newer sims... one of the things i didn't mention earlier was that on the 450 block sims (which seem more than double end user performance wise from even the 400 block ones...) (lusk under a 400 block sim is about 1ms runtasks... under a 450 block sim was 0.5ms runtasks)... each and every agent on sim16 took 0.3-0.4ms runtasks... so with THREE agents in sim, and no child agents, 'run agents' as a timeslice was over 1ms... with 10 agents in sim, on the older lusk back when.. would easily be 4-5ms while the SAME EXACT 10 agents, when we all moved to perry, to test... was 0.5ms run agents. later when lusk was on a 450 block sim, 10 agents was also about 0.5ms runtasks... so on the same 'sim' the same people, with the same scripts in the background... would take 10x longer to process... which leads me to believe theres somehtin else going on... its not mere mhz.. but some jus architectural improvements that *REALLY* directly improved the server performance on the newest batches, compared with the 'generic' pool from sim 100ish to sim 450... versus the 450+ ones.. and really directly hurt the oldest sims.. in compareson to the middle block ones as well. I don't know is it l2 cache? aka the bytecode interpreter for scripts now fitting inside the l2 cache completely on the latest sims, whereas before in the middle sims it could fit 1/2 in and required a single cache flush, and on the oldest sims it could only fit 1/4 in and require 4 flushes per script hit etc i duno. i mena thats all conjecture... but there are very *VERY* distinct real world performance differences between these sims.. and it has real direct in world issues when time dilation starts going off... and even empty sims are 'slow' meaning they will be totally *unusable* when someone is actually having an event, compared to a new sim, which will run better during an event, than an old one runs empty... please... look im asking ya take down lusk for testing then... take a mirror archive of lusk as a sim.. and run it, in a controlled environment on three machines... pick sim16... pick sim 200, and pick sim 460 (or as close to those as you can get) and put 4 people in it in each case.. and just study whats going on. because *IN* those exact same conditions... the performance of sim 16 was 5ms total timeslice of processing... the performance of sim 200 was 1.5ms total timeslice in processing.. and the performance of sim 460 was total timeslice of processing about 0.7ms im not sayin they ALL need to be sim 460's... but if a sim near empty is performing at 5ms total timeslice (and is gettin 0.3-0.5ms more per player in it)... that *directly* damages the ability of *ANYONE* in that sim to host events or even invite friends over to hang out as the dilation starts flickering
_____________________
wash, rinse, repeat
|
|
eltee Statosky
Luskie
Join date: 23 Sep 2003
Posts: 1,258
|
11-05-2004 08:10
hmm... second glance okay... the latest ones are dual opterons... im assuming thats my '450 class' sim.. they are fundamentally faster, per cpu, at running a sim slice.. okay so that explains the jump between the 'bulk' of the middle sims the 100's through the middle 400's to the very latest 450+ class
and from the specs given the earliest cpu's are non HT cpu's.. the '0.50 cpu' on the midle class versus 1.0 cpu on the very first ones seems to carry that out.
I don't remember all of the exact changes that came with that specific line of HT enabled p4's... but i think it may go a way to explaining why the pre HT sims are taking *MUCH* longer to process a single 'frame' of the server, on the order of 3-4x longer.. (different northbridge architectures, assumably slower ram, etc as well as jus the cpu differences)
the performance per clock of the p4 has increased as the line has matured.. from the rather abysmal initial showing against the hich clock p3's back when they launched to the rather zippy cpu's of the last few generations.
so it looks like we have three basic blocks here:
Pre HT p4's.. which took 5ms to process one cycle of october 12th lusk HT enabled p4's, which took 1.5ms to process one cycle of october 12th lusk and opterons, which took 0.7ms to process one cycle of october 1th lusk
it sounds to me more'n anything like there is somethin really specifically against those early pre HT p4's.. somethin holdin them back or not running well (who knows mebbe it could be somethin as silly as their coolers on some'f them are clogged by now and they jus need to be opened up and dustoffed to stop what could potentially be thermal throttling.
within that middle block... i will agree there is *VERY* little difference... lusk running on sim167 was not really noticably slower at all than sim405 (mebbe average of 1.6ms on sim167 total server slice rather than 1.5ms now after the 1.5.6 upgrade that its on sim405) but really. check out those first block servers, something is very wrong with them
_____________________
wash, rinse, repeat
|
|
Moleculor Satyr
Fireflies!
Join date: 5 Jan 2004
Posts: 2,650
|
11-05-2004 10:37
Heh. I thought this had been posted to this thread last night. I was wrong. I then went to bed. Since this post is much more relevant over here, I've moved it.From: Kelly Linden In almost all cases sims that show .50 are running on a hyperthreaded CPU. In these cases the sim is using all the CPU on the server and is the only sim running on that server, the stat just shows funny.
The void sims are currently 4 sims to 1 server so they will usually show .25.

From: Philip Linden We always assign one CPU to one sim/region for those regions in which people own land. We may put more into one in the future, but only in cases where we are trying to create large fixes areas of unowned land around the owned land, and in those cases we will come up with a way to clearly indicate the density of sims to CPUs.
Only the void sims and wholly linden-owned land run more than one sim/CPU.
Seems important to understand how much sim you are getting. Ok. I'll say this again, in a different way. I could care less if my simulator is running on a 1MHz machine with a K of RAM. I don't care if it's running on a Beowulf cluster. All I care about is the end result. The performance of the sim. If you ignore the fact that one sim shows a 1.00 and another shows 0.50 (which it seems you're telling me to do, and it makes sense), it only makes my evidence that much more convincing. Magenta's (Sim35  running twice as many active objects and scripts as Seacliff (Sim6  , and yet performance is FAR better in Magenta than Seacliff. When Seacliff was on a 300+ Sim, the Time Dilation never flickered below 0.99. Hell, it hardly flickered at all. Now it easily dips down to 0.85ish, and occasionally spikes much, much lower. Now, I don't own the whole sim. I don't control the content within the sim. I am unaware of what caused the change. The only two variables that I am aware of changing between the two times I looked at the server stats was the void sims coming up, and the server being switched from a 300+ sim to a 4. It could very well be that something within the content changed to cause this performance decrease. I don't know. And I have no WAY of knowing, which is why I came to you guys. I hoped that you would be at least CURIOUS enough to want to run a few simple tests. This is, after all, your world. You're claiming that performance is equal, despite the meager differences in hardware, and while I agree that the differences in the hardware are small, I definitely do NOT agree that performance is equal. All the evidence I've seen so far has proven otherwise. If I'm wrong, then I'm wrong. But until I see proof of that, I'm going to continue to believe that something is fundamentally wrong with servers, the farther back in age you go. I don't care if the SimCPU number is showing 1.00, 0.5, 0.25, 0.01, or a magical 3.14. All I care about is that Time Dilation number, the SimFPS spiking to sub-three-digit numbers, and the fact that occasionally my avatar moves through the air like it's warm butter, rather than air. This is limiting what I can do in the sim. Right now, I've stopped developing the behaviour and existence of objects that would fit the description of objects that interact with the world because of the sudden and unexpected server performance degredation. I thought that LL wanted to see objects that interacted or at least responded to changes in their environment? I can't develop any such objects if I have to constantly worry about whether or not tomorrow the sim will get shoved onto Sim1.
_____________________
</sarcasm>
|
|
Ian Linden
Linden Lab Employee
Join date: 19 Nov 2002
Posts: 183
|
11-05-2004 15:44
Phew, lots of issues here.
First, to the veracity of the stats themselves: Gwyneth brings up the core point, and mentions BogoMIPS. The sim stats in the Alt-1 window are sometimes Bogus; if the SimFPS is over 100, the ms numbers are Completely Bogus. Why? Once a sim is able to do everything that is asked of it, which is often the case, it starts having idle frames, during which it does nothing. During these frames, the time spent executing the various tasks is zero, and those zeros contribute to the average, making it seem artificially low. Worse, the extent to which this phenomenon occurs is only partially due to the speed of the sim - vagueries of the task scheduler have an enourmous impact, and things tend to fragment over time. (Case in point: in another thread, people thanked us for speeding up the sims w/ 1.5.6. We didn't. What we did do though, is restart all the sims, which cleared out lots of cobwebs.) So, beyond a certain threshold (100 fps, 200 if you're very picky), these numbers mean nothing. Put another way, we didn't put the ms numbers in to detect problems - we put them in to find out what the problem is, IF we already know there's a problem.
Second, Moleculor has a good point about time dilation. This stat is probably the least bogus number there is - when it's 0.95-1.00, things are pretty good, but as it drops things get ugly fast. Usually drops in this number are caused by physically complex interactions, and no matter how fast your CPU is, Havok's not going to be able to give you smooth performance when these happen. To examine this issue specifically, I went to Seacliff today, and I found a physical, scripted, prim-complex object wedged in a bad interpenetration state between two rocks. I returned it, and the time dilation immediately stabilized.
Third, eltee's partially right that a sim with a 200FPS baseline is in for more pain when the people start showing up. However, I disagree with the thesis that under truely equivalent conditions, newer sims will be extremely different. I HAVE benchmarked ALL of our sim hardware back-to-back with the exact same simstates and avatar load, and I was never able to detect more than a 35% difference at the widest angle. That's not true performance parity, but it's not such a large difference that it could cause one machine to run at 200FPS and one to run at 20FPS. But, there are lots of other things that could cause such a difference: we've just found some very bad slowdowns with complex attachment sets entering a sim, there's the aforementioned fragmentation over time, moving avatars are VERY different from avatars standing still, avatars w/ empty caches are VERY different from avatars w/ full caches, and different scripts in different states cause VERY different amounts of sim load.
In conclusion, sim performance is a very complex issue and the hardware involved is a relatively minor variable in a big equation. Unfortunately the upshot of this is that none of us have as much control over sim performance as we would like. If you're very concerned about it, we'd like to hear your ideas for how to fairly implement performance tools. What would happen if you could see the highest-load physical objects and scripts, even if they belonged to other people? How about just the ones on your land?
In the meantime, we DO have fixes in the pipe which will imrove sim performane across the board, although I can't give you any sort of ETA at this point.
|
|
Moleculor Satyr
Fireflies!
Join date: 5 Jan 2004
Posts: 2,650
|
11-05-2004 17:51
Grr. I reported the lag in Seacliff several weeks ago. It's taken a thread like this to get someone to look around for interpenetrating objects? It's not exactly something we can do ourselves (easily, at least). (Why are responses to bug reports and support emails so delayed, anyway? I've made a good six to a dozen reports in the past few weeks, and JUST TODAY got a response to -one- of them.) From: Ian Linden What would happen if you could see the highest-load physical objects and scripts, even if they belonged to other people? How about just the ones on your land? I can recall several occasions since I joined in 1.2 that people have asked for "load views" that would allow us to see what's creating the most load within a sim. Most of those requests were made around 1.2. Is it seriously something that CAN be implimented? Can it be done without delaying Havok2, land-info LSL calls, improvements to the particle system, llDetectedCollisionPoint, or llCreateNotecard? I'm thinking a black Tron-esque view with a polygon mesh/net replacing the ground (so we can see the ground, but see through it at the same time) and load being represented by brightness. Maybe even use those Light Glows you guys have. No sky, sun, moon, stars, or anything like that. Black background with... grey objects for normal prims, brighter for scripted/physical, maybe using different colors to represent different things. Pulses of light/particles to represent collisions. Basically a much revamped object beacon mode. But cooler looking.
_____________________
</sarcasm>
|
|
eltee Statosky
Luskie
Join date: 23 Sep 2003
Posts: 1,258
|
11-05-2004 18:43
From: Moleculor Satyr Grr. I reported the lag in Seacliff several weeks ago. It's taken a thread like this to get someone to look around for interpenetrating objects? It's not exactly something we can do ourselves (easily, at least). heh actually moleculor finding those kinda objects is fairly easy... i've been doin that in lusk (especially in the estates which tended to be messes for months on end) forever pretty much heh... turning on 'object beacoms' can show you all of the objects with scripts (red), physics enabled (green), or both (orange). Further physical collisions (one of the chief causes of physical lag) will get transient bright yellow crosslines. past that most basic, objects updating the client (and potentially causing network lag) can be seen with debug, and show updates. In this case puffs of 'smoke' come out from an object every time it sens an update to your client. While avatars and such will often be sending streams as they move about, having an otherwise intert object suddenly send huge streams of updates can certainly cause alot of lag, both client and server. (things like high density color flasher dance floors etc are often guilty of this) For more advanced hunting i have some scripts for sniffing out active scripted objects that are quite good at what they do (they've found several hidden land scanners around lusk and perry) as to the differences specific machines can have on things all i can say is lusk spent *A LONG* time on one of the oldest machines. Rather recently, since the decoupling of sims and hardware, its done a full 180 and been one of the faster heavily used sims. it now has *MORE* scripted objects and more overal density than it had 3-4 months ago and its still running *significantly* smoother... new agents in the sim generally only take 0.1ms and we haven't seen time dilation since the sims switched. *EXCEPT* for that one interval where we landed back on sim16 that night. It was *EXACTLY* like it had been for those many months... flickering time dilation, general sluggishness, etc... with *thee* people in sim this was *especially* troubling coming off one of the latest opterons that had been running alot smoother, with 20 people just the night before, than it was running on sim16 then, with three for easily 6 months or more we couldn't have events... just having 10-15 *friends* hang out would occasionally drop into the teens, sim fps wise with 0.75 or more dilation. then suddenly put on different hardware, *everything* changed immediately. The content was exactly the same as it was before... but what we could do with luskwood, event and just general people wise, has been significantly improved (to understand how commonplase and utterly unbearable the lag in lusk would *routinely* get with jus a few more people.. when it was running the old sim.. for my birthday in august i was thrown a party.. and the most popular 'favor' everyone went home with was an 'i lagged lusk for eltee Statosky's birthday) now we routinely have that many people just hanging out.. and not only are the numbers 'good' but there isn't any server performance lag what so ever
_____________________
wash, rinse, repeat
|
|
Steve Patel
Registered User
Join date: 4 May 2004
Posts: 39
|
11-09-2004 23:00
From: eltee Statosky Pre HT p4's.. which took 5ms to process one cycle of october 12th lusk HT enabled p4's, which took 1.5ms to process one cycle of october 12th lusk and opterons, which took 0.7ms to process one cycle of october 1th lusk
it sounds to me more'n anything like there is somethin really specifically against those early pre HT p4's.. somethin holdin them back or not running well (who knows mebbe it could be somethin as silly as their coolers on some'f them are clogged by now and they jus need to be opened up and dustoffed to stop what could potentially be thermal throttling.
Well Sim1has hyperthreading. Notice the "ht" flag. And if its the oldest the others probably do too. (Except the dual opterons  ) From: Mark Linden For the techies:
[mark@sim1 ~]$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.26GHz stepping : 4 cpu MHz : 2259.207 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips : 4508.87
|
|
Mark Linden
Funky Linden Monkey
Join date: 20 Nov 2002
Posts: 179
|
11-10-2004 11:56
Actually, the ht flag is misleading; sim1 does not, in fact, support hyperthreading, but it indicates via the processor flags that it does.
Not sure if Linux or the P4 itself is at fault, but it's not a good indicator of hyperthreading by itself.
We do use it, though, so that we know if we see 2 CPUs and they both have ht, then we're pretty sure it's actually a hyperthreaded p4, and make decisions appropriately with respect to provisioning.
|
|
Chromal Brodsky
ExperimentalMetaphysicist
Join date: 24 Feb 2004
Posts: 243
|
11-19-2004 15:07
From: Mark Linden For the techies:
[mark@sim1 ~]$ cat /proc/cpuinfo model name : Intel(R) Pentium(R) 4 CPU 2.26GHz cache size : 512 KB bogomips : 4508.87
[mark@sim400 ~]$ cat /proc/cpuinfo model name : Intel(R) Pentium(R) 4 CPU 2.80GHz cache size : 512 KB bogomips : 5583.66
sim400 is one of the newer ones. As you can see, it's a P4 2.8 Ghz. That's only 600 Mhz faster than the original sim1. It also has 512MB of RAM.
The difference in hardware truely isn't very significant... however, we could be doing something "wrong".
Some of the really new ones are actually dual Opterons (that run 2 sims each... one per CPU). They have 1GB of RAM, (512MB per simulator). Mark, this information is an excellent glimpse into the infrastructure that helps support the SecondLife sims. I just want to correct (slightly) one point you make about about hardware differences and develop that thought a little.. You mention a P4-2.26, a P4-2.8, and the newest Opteron-2xx dual-CPU systems (242? 244?), and then compare their raw clock speeds (or possibly BogoMIPS, not sure which you meant). On that surface, yes, the hardware appears to be quite similar. But there's more going on here than just raw CPU speed, because we have seen a bit of a shift in the architectural implementation that affects SecondLife's performance. The P4-2.26 was available in early 2002 as the Northwood P4-2.xA; it featured a 400Mhz frontside bus (FSB) connecting it to the northbridge memory hub. A typical 2002 board would probably have used the Intel 845D northbridge chipset, providing PC2100 memory support, maximal memory bandwidth of 2.1GB/s behind a 400Mhz FSB. The P4-2.8 we see now is either a northwood or a prescott, but I'll bet you're using northwoods. It features an 800Mhz FSB, probably connected to an 865 or 875 northbridge MCH running PC3200 in dual channel mode, yeilding as much as 6.4GB/s. Finally, we have an Opteron-24x system. I'm not sure whether you're running 242s, 244s, 246s, or what. Splitting the difference, let's suppose this were a dual Opteron-244 system. Each CPU has a local dual-channel memory controller, running PC2700, for up to 5.3GB/s low latency memory bandwidth. This may have a positive impact on memory latency, as you don't need to go through the FSB to each a memory controller. Additionally, you get a 1MB L2 cache, to speed things along even further, potentially. That and the extra x86-64 registers and 64-bit architectural advantages. You'll give some of that up because the SSE isn't as fast as the P4s running ICC object code, but it's still quite good. I'm not sure what impact you could expect on network interface or disk controller DMA through the Hypertranport, but I'm guessing it's negligible. So, we have: P4-2.26 with 512kB of L2 and 2.1GB/s of memory bandwidth. P4-2.8 with 512kB of L2 and 6.4GB/s of memory bandwidth. Opteron-244(?) with 1024kB of L2 and 5.3GB/s of low latency bandwidth (PER CPU), x86-64 64-bit architecture advantages. SecondLife servers are CPU and memory bound; they are iterating and reiterating for tasks, agents, and physical tasks. The disk access speed surely impacts sim boot times, the network controller's bus connect speed will certainly impact the sending and receiving of packets, but at the end of the day, the speed at which these sim execution loops can retrieve and process object, task, agent, physics data for a given "frame" is going to be very much related to the memory speed of the host architecture; the ability to quickly shuffle that data into the CPU when a L2 cache hit fails is paramount. The difference from the earlier P4s to the later ones is easier to quantify. The newer P4s on 865/875 chipsets have up to 3x the memory bandwidth. 3x more bandwidth is not insignificant. The calculus of how latency, bandwidth, and x86-64 features mix into making the opteron platform a significantly better (or not) host for SL sims is.... well, less obvious, though it clearly is the superior dual-processor architecture choice because the CPUs don't share a memory hub. Anyway, I hope I've shown that there's a lot more to performance than clock speed or BogoMIPs when we start running real-world software. I'm delighted to hear you guys are using Opterons, hopefully with the 8131-based Gigabit NICs. I think the deeper look shows that the architecture running on computers today have improved in leaps and bounds over the last 24-30 months, and that impact on sim performance is, predictably, VERY significant. From: someone OTOH, 5ms in Run scripts is okay; any sim FPS greater than 100 or so is basically wasting server CPU... Well, the danger here is background load. If I said I was concerned that my empty or quiet sim ran at only 200 FPS, the conventional wisdom at Linden Lab would discount this. But, we must also consider headroom and what happens when we add agents, task load, etc. All off a sudden, that FPS is at 50 FPS, and things start to degrade rapidly toward deep think. Sim FPS below 500 or so are like an axe blade hanging by the slenderest of threads over our necks; will the sim be okay? Nobody knows; if 10 agents connect, and there are another 5 child agents, things may get very ugly very fast. Sim FPS should be thought of both as a sim's present load as well as its potential headroom. And remember, kids, don't let 0's impress you on the largeness of a number 1600 Sim FPS is only 4x more headroom than 200 Sim FPS, and that headroom goes fast when agents get involved. Yeah, so... Mark, if I'm way off here, I'm all ears. Please share more information. I'm particularly interested in trading notes about the Opteron-24x architecture, because I believe the processes you are running on yours are very similar in nature to the stuff I run on mine (though mine are running meteorological and plant pathology models, a different sort of virtual world with surprisingly similar computational requirements). Best Regards, Chromal
|
|
Mark Linden
Funky Linden Monkey
Join date: 20 Nov 2002
Posts: 179
|
11-19-2004 16:12
Jesus, Chromal... next time we're going to spec hardware, I'm just going to post it here and wait for you to analyse it.  You're pretty much right about memory bandwidth and the secondlife server app; the thing that's suprised us is that the end-user perceived difference was so noticable. It doesn't help that many of the end-user numbers are difficult to interprete, and often fairly meaningless (which causes lots of confusion). Hopefully some of the upcoming changes to stats will help everybody figure out where the problems actually are. With respect to the Opterons... the low-latency is helpful. We don't run them in 64-bit mode (they don't even have an AMD64 aware kernel on them), but they are still very fast for us.
|
|
Chromal Brodsky
ExperimentalMetaphysicist
Join date: 24 Feb 2004
Posts: 243
|
11-19-2004 17:13
From: Mark Linden Jesus, Chromal... next time we're going to spec hardware, I'm just going to post it here and wait for you to analyse it.  You're pretty much right about memory bandwidth and the secondlife server app; the thing that's suprised us is that the end-user perceived difference was so noticable. It doesn't help that many of the end-user numbers are difficult to interprete, and often fairly meaningless (which causes lots of confusion). Hopefully some of the upcoming changes to stats will help everybody figure out where the problems actually are. With respect to the Opterons... the low-latency is helpful. We don't run them in 64-bit mode (they don't even have an AMD64 aware kernel on them), but they are still very fast for us. Yeah, I think a real challenge gets posed by the way the same resources are shared by so many different developers (LSL scripters and builders) and other users within SL. This ranges from folks who wrote their first LSL script and don't realize there might be a better way to folks who wrote demanding experiments and proofs-of-concept that have been put to use in unanticipated ways by others. It's like an old UNIX server of yore, with many students and professors working together, but no /bin/ps or /bin/top. ^_^ I firmly believe anything that puts more performance-related information into our hands will be a huge aid in utilizing the sim resources in a more optimized manner. Glad to hear the Opteron system architecture is a boon, even without processor-specific optimizations. I'll be curious to see what dual core CPU technology, EMT64 architecture compiler support in ICC, and X86-64 and/or EMT64 hardware will mean for SL infrastructure in 2005!
|
|
Mark Linden
Funky Linden Monkey
Join date: 20 Nov 2002
Posts: 179
|
11-20-2004 16:01
From: Chromal Brodsky unanticipated ways by others. It's like an old UNIX server of yore, with many students and professors working together, but no /bin/ps or /bin/top. ^_^ I firmly believe anything that puts more performance-related information into our hands will be a huge aid in utilizing the sim resources in a more optimized manner.
Yep. This is actually almost exactly how we think about it. I often find myself wanting sltop in world  From: Chromal Brodsky Glad to here the Opteron system architecture is a boon, even without processor-specific optimizations. I'll be curious to see what dual core CPU technology, EMT64 architecture compiler support in ICC, and X86-64 and/or EMT64 hardware will mean for SL infrastructure in 2005!
Unless the EMT64 stuff is cheaper and drops power consumption, it's probably not going to mean anything for SL infrastructure in 2005. But we designed for heterogeneus server clusters, so we might end up with a rack or two in here somewhere. M
|
|
Eggy Lippmann
Wiktator
Join date: 1 May 2003
Posts: 7,939
|
11-20-2004 16:33
FWIW, we want that too. A number of people have complained that they would like to have something like the output of "ps -aux". Unless / until you implement some controls on the amount of textures and scripts someone can use, we need a way to control it ourselves. I worked for Briana Dawson a while ago, as a sort of "lag consultant". It's very hard. You have no easy way to figure out which idiot put up 1024x1024 textures in the middle of a huge mall. Or which person is using too many scripts. Some things do puff update clouds. But what if some idiot is running 300 while(1) scripts in a tiny transparent cube?It will cause a lot of lag and be virtually impossible to detect. Of course I dont really know how things work on the SL simulators, but if it isnt too difficult, please consider allowing us to see something like a list of all scripts sorted by owner. This lets us find out who is overusing simulator resources, and if you throw in the object name and position, we can delete it. A similar list of all textures, owners and object name/position, that we could sort by size or owner would also be useful.
|
|
Mark Linden
Funky Linden Monkey
Join date: 20 Nov 2002
Posts: 179
|
11-20-2004 17:16
Eggy, we know. Believe me, if you guys want it AND we also want it, there's some pretty good reasons why we haven't done it yet. It's not as easy as it first appears.  M
|
|
Strife Onizuka
Moonchild
Join date: 3 Mar 2004
Posts: 5,887
|
11-21-2004 04:43
From: Ian Linden ...and things tend to fragment over time. (Case in point: in another thread, people thanked us for speeding up the sims w/ 1.5.6. We didn't. What we did do though, is restart all the sims, which cleared out lots of cobwebs.) ... if cobwebs are an issue then why not have the sims reboot every now and then when there isn't anyone around? Early monday & tuesday mornings are a great time.
_____________________
Truth is a river that is always splitting up into arms that reunite. Islanded between the arms, the inhabitants argue for a lifetime as to which is the main river. - Cyril Connolly
Without the political will to find common ground, the continual friction of tactic and counter tactic, only creates suspicion and hatred and vengeance, and perpetuates the cycle of violence. - James Nachtwey
|
|
Mark Linden
Funky Linden Monkey
Join date: 20 Nov 2002
Posts: 179
|
11-22-2004 09:00
From: Strife Onizuka ... if cobwebs are an issue then why not have the sims reboot every now and then when there isn't anyone around? Early monday & tuesday mornings are a great time. We've thought about this; the problem is, though, that the sims that tend to need it the most just aren't empty for long enough to make this a viable solution.
|
|
Alicia Eldritch
the greatest newbie ever.
Join date: 13 Nov 2004
Posts: 267
|
12-29-2004 21:18
how about making Intra-Sim clusters?
like, I know the architecture and concept doesn't support sim-to-sim load distribution ... yet. (though I think it will have to at some point...)
but how about taking some of the older machines and putting them together, and running some load-balancing solution on them, within the same sim?
|
|
Strife Onizuka
Moonchild
Join date: 3 Mar 2004
Posts: 5,887
|
12-30-2004 02:46
I think this could be done by offloading to the other machine some of the avatars for assets. That being the objects, textures, sounds and everything else that needs to be sent to the user. Physics and scripts would be best done by a single machine, but scripts could also be offloaded, but it would add latency to script based physics functions.
_____________________
Truth is a river that is always splitting up into arms that reunite. Islanded between the arms, the inhabitants argue for a lifetime as to which is the main river. - Cyril Connolly
Without the political will to find common ground, the continual friction of tactic and counter tactic, only creates suspicion and hatred and vengeance, and perpetuates the cycle of violence. - James Nachtwey
|