Welcome to the Second Life Forums Archive

These forums are CLOSED. Please visit the new forums HERE

Server Releaes Beta : General Discussion

Abigail Merlin
Child av on the lose
Join date: 25 Mar 2007
Posts: 777
12-30-2008 21:32
Yes I noticed lots of L3 outages when I was running my irc network too, they have been having trouble for decades.
On another note, how is the Q&A of 1.25 comming along= anyidea when we can expect 1.25 to be on aditi for testing and when the pilot might be if all goes wel?
Prospero Linden
Linden Lab Employee
Join date: 6 Aug 2007
Posts: 315
01-04-2009 07:37
I expect to get 1.25 back on aditi on Monday or Tuesday. I'll post something in the 'announcements' thread of this forum when that happens.
Maggie Darwin
Matrisync Engineering
Join date: 2 Nov 2007
Posts: 186
01-05-2009 13:55
From: Prospero Linden
Yesterday's outage had nothing to do with the rolling restart. There was an outage with L3.

Didn't say it did.

The question I was asking was about a possible correlation between rolling restarts and database server load.

Intuitively it seems to me that restarting every region server on the grid, even over a few days time, would tend to increase load on the asset servers as each region rediscovers the properties of the objects rezzed in it. (Unless they do a very good job of cacheing that data locally, a cache that survives a restart with a different server version.)

I wasn't sure if this idea referred to a well-known and well-understood phenomenon, was one that betrays my ignorance of the various architectures involved, was a reasonable unconfirmed suspicion, or was a question nobody had asked before.

If it's a brand new idea, building a timeline of the grid status reports showing RRs and asset server congestion might be interesting. Unless LL has better data than that internally, of course.

If it turns out to be a strong correlation, one might contemplate rolling a bit slower...unless infrastructure improvements endow us with a much more robust asset service. People might not moan and groan so much that their regions were being restarted if they knew the outage would be short and not likely to be accompanied by problems fetching inventory or having logins disabled "until things get better".

That always reminds me of Beavis and Butthead "Uh....we're like closed or something, go away..."
Prospero Linden
Linden Lab Employee
Join date: 6 Aug 2007
Posts: 315
01-05-2009 16:48
Rolling restarts USED to place a huge load on the database. However, with the introduction of a backend piece of software called the "region conductor", that has largely gone away. It used to be that we'd push rolling restarts as fast as we could, watching the database carefully, and we'd have to back off if the database started to grumble a little. Nowadays, with the region conductor, we run the rolling restarts 1.5-2x faster than we used to, and never get a peep out of the database.

The change is because it used to be that simulators hit the database directly to find out what regions to run. During a rolling restart, that meant a LOT of simulators all hitting the database at once trying to find regions to run. Nowadays, just one process-- the region conductor-- does that, and it caches the information. The simulators all then hit the region conductor. Yeah, there's probably some added load, due to information that gets saved when regions go down, but empirically the load on the database due to a rolling restart is a LOT less than it was before server version 1.23.

I still monitor the database during rolling restarts-- if the database starts to overload for any other reason, of course, we want to slow down or stop the rolling restart both to avoid contributing to database overload, and to avoid the RR having trouble because the database dropped connections from it. However, with our current server software the rolling restarts are not all that bad for the database any more.
Maggie Darwin
Matrisync Engineering
Join date: 2 Nov 2007
Posts: 186
01-05-2009 17:23
From: Prospero Linden
...empirically the load on the database due to a rolling restart is a LOT less than it was before server version 1.23...


OK, thanks for the details.

Just goes to show you how performance superstitions start...I actually remember the rolls of servers before 1.23, and it seemed to me the hit was noticeable out on the grid back then.

Now I feel old, and I only had my first rezday back in November. :-)
Bane Darrow
Registered User
Join date: 23 Apr 2006
Posts: 21
01-05-2009 23:05
From: Prospero Linden
Rolling restarts USED to place a huge load on the database. However, with the introduction of a backend piece of software called the "region conductor", that has largely gone away. It used to be that we'd push rolling restarts as fast as we could, watching the database carefully, and we'd have to back off if the database started to grumble a little. Nowadays, with the region conductor, we run the rolling restarts 1.5-2x faster than we used to, and never get a peep out of the database.

The change is because it used to be that simulators hit the database directly to find out what regions to run. During a rolling restart, that meant a LOT of simulators all hitting the database at once trying to find regions to run. Nowadays, just one process-- the region conductor-- does that, and it caches the information. The simulators all then hit the region conductor. Yeah, there's probably some added load, due to information that gets saved when regions go down, but empirically the load on the database due to a rolling restart is a LOT less than it was before server version 1.23.

I still monitor the database during rolling restarts-- if the database starts to overload for any other reason, of course, we want to slow down or stop the rolling restart both to avoid contributing to database overload, and to avoid the RR having trouble because the database dropped connections from it. However, with our current server software the rolling restarts are not all that bad for the database any more.


Maybe a dumb question, but why are there even database problems to begin with? I mean, I know the answer I'm going to get is 'SL is massively complex and huge and so many paramaters, yadda yadda'... but honestly... It's not like there aren't bigger and hotter databases out there. These scaling issues have been solved many times over in many different scenarios.

It's one of my giant frustrations with SL; in the end, these problems just -aren't- that hard despite the responses I'm going to get to this. 100,000 simultaneous users is not rocket science.

Anyway, I promise not to get in a flame with the 'You're wrong, it's really hard' answer, I've already been there. :) Shoulda hired me. ;)
Prospero Linden
Linden Lab Employee
Join date: 6 Aug 2007
Posts: 315
01-06-2009 07:51
Point me to another service that has 70,000 concurrent users interacting in real time with a virtual world where they have access to the kind of data that is represented by the assets, presence information about their friends, the ability to modify their environment in a persistent way, the ability to chat with individuals and groups, and all the other sorts of things you can do here.

Oh, and, remember, it's all a single world with everybody online in the same shared experience; no need to make sure that you log into the right shard or "server" if you want to interact with your friends and with the specific events you're looking for.

Also remember that all of these things stays consistent with each other.

Really, when you folks step forward and say, "it's not that hard!", you seem to forget that nobody else really HAS done this before.
Maggie Darwin
Matrisync Engineering
Join date: 2 Nov 2007
Posts: 186
01-06-2009 08:25
From: Prospero Linden
Really, when you folks step forward and say, "it's not that hard!", you seem to forget that nobody else really HAS done this before.
That much is true. There are some technology decisions that I think represent a legacy burden in retrospect (using C++ and MySQL, UDP traffic to the viewer), but the scope of what LL is doing really is unprecedented. Most of those decisions were probably the right ones at the time, given the rather low probability that we'd find ourselves where we are today.

I wouldn't worry much about an opinion that grossly understates the scope of the problem and closes with "you should have hired me". ;-)
Meade Paravane
Hedgehog
Join date: 21 Nov 2006
Posts: 4,845
01-06-2009 08:48
From: Prospero Linden
Rolling restarts USED to...

TY for the info, Prospero!

I always like to see Lindens talk about how stuff works - tons of stuff goes on back at the lab that we residents are just totally not clued in to.

Other posts from you (or other Lindens) about how stuff works or what's coming down the pipe or whatever would not be unwelcome. In your copious spare time, of course. :)
_____________________
Tired of shouting clubs and lucky chairs? Vote for llParcelSay!!!
- Go here: http://jira.secondlife.com/browse/SVC-1224
- If you see "if you were logged in.." on the left, click it and log in
- Click the "Vote for it" link on the left
Escort DeFarge
Together
Join date: 18 Nov 2004
Posts: 681
01-06-2009 13:59
From: Prospero Linden
Really, when you folks step forward and say, "it's not that hard!", you seem to forget that nobody else really HAS done this before.

You folks? Hmm, does that include me folks? :) Anyways...

I do think that residents often forget that:
- SL is a really hard, really complex, sophisticated and innovative system.
- LL do respond to us and visibly work hard on it.

I do think that LL sometimes forget that:
- Committed residents pay from about 200 USD to over 1000 USD *per month* for the service, which is a substantial outlay.
- When things go wrong in SL it almost always has a real-world value loss for those residents, or at the very least a significant loss of their time (and time is money).

Yours respectfully,
/esc
_____________________
http://slurl.com/secondlife/Together
Abigail Merlin
Child av on the lose
Join date: 25 Mar 2007
Posts: 777
01-25-2009 21:35
Now we have 1.25.4 up and running across the grid is there a 1.25.5 lined up or will we got to 1.26 for the next testing?
Prospero Linden
Linden Lab Employee
Join date: 6 Aug 2007
Posts: 315
01-26-2009 06:10
We'll be deciding today or tomorrow if we need to do 1.25.5 (for bug fixes and for some more database load mitigation measures) first.

However, before we go to 1.26, we have to convert our servers from Sarge to Etch, and there will be QA and testing associated with that. As such, I don't expect 1.26 beta testing to begin for a week or two at least.
Rand Charleville
Helpy Helperton!
Join date: 6 Dec 2007
Posts: 3
I <3 SL!
01-26-2009 07:26
Not sure how many other product developers and scripters I speak for, as the bulk really might not be aware of the List slowdowns yet. However, we'd sure appreciate a 1.25.5 if it could correct the problem.

As I understand it (I'm not a scripter but have a small team of them helping me with a new product development) there are two separate issues with regard to Lists that have arisen since 1.25 was released. There's the (a) reduced temporary memory fix which causes stack/heap errors in scripts that used to take advantage of the extra memory gift, and (b) a general slowdown in Lists by about a factor of 10x.

If you really can't fix (a) because of other internal reasons which have been hinted at, at least fix (b) if you can. Or, if I'm off base in my understanding of this 2-part problem, then Prospero please help us understand that we'll need to look for alternate ways of coding, not using Lists if possible.

Thanks!
Maggie Darwin
Matrisync Engineering
Join date: 2 Nov 2007
Posts: 186
01-26-2009 07:59
From: Prospero Linden
However, before we go to 1.26, we have to convert our servers from Sarge to Etch....
It should probably be mentioned at this point that "Sarge" and "Etch" are names of Debian Linux releases.
_____________________

New Grayson charter: http://tinyurl.com/3cvdpr
Maggie Darwin
Matrisync Engineering
Join date: 2 Nov 2007
Posts: 186
01-26-2009 08:06
From: Rand Charleville
(a) reduced temporary memory fix which causes stack/heap errors in scripts that used to take advantage of the extra memory gift..

If you overdraw your bank account and they don't notice it right away, when they do notice they're unlikely to be amused by your referring to it as a "gift". :-)

That said, it's tough for a scripter to accurately measure script memory usage, so "did it crash" is the best guideline we have sometimes.

Take a look at http://wiki.secondlife.com/wiki/LlGetFreeMemory to see what I mean. We need better instrumentation here.
_____________________

New Grayson charter: http://tinyurl.com/3cvdpr
Sindy Tsure
Will script for shoes
Join date: 18 Sep 2006
Posts: 4,103
01-26-2009 08:09
From: Maggie Darwin
It should probably be mentioned at this point that "Sarge" and "Etch" are names of Debian Linux releases.

This is the move to 64-bit sims?
Maggie Darwin
Matrisync Engineering
Join date: 2 Nov 2007
Posts: 186
01-26-2009 11:54
From: Rand Charleville
...please help us understand that we'll need to look for alternate ways of coding, not using Lists if possible.


Um, with no structs and no arrays, I don't think "not using Lists" is a viable solution to any problem.

LSL is 'way old-and-creeky... we can has new script language pleez? Now that there's a Mono engine, something with (most of) the existing API calls and a more robust architecture would be A Very Good Thing.
_____________________

New Grayson charter: http://tinyurl.com/3cvdpr
Darien Caldwell
Registered User
Join date: 12 Oct 2006
Posts: 3,127
01-27-2009 14:16
I don't mind LSL. They could throw in arrays, case statements, and a few other goodies while maintaining backward compatibility with old LSL. I could live with that.
_____________________
Rand Charleville
Helpy Helperton!
Join date: 6 Dec 2007
Posts: 3
01-28-2009 11:21
Hey Prospero & Team. :)

It has been a couple days and we're on the edge of our seat. Besides the reduction of memory (which is disappointing but workable) the severe slowdown of some List operations is a real killer.

Can you please give us an indication of whether or not you are going to do another 1.25 rollout to deal with this? If not, then we'll need to trash Lists and re-implement using strings or some other means. Our product just won't be as elegant as we hoped it would be.

Thanks much!
Cappy Frantisek
Open Source is the Devil!
Join date: 27 Oct 2006
Posts: 400
01-28-2009 11:33
Hey we made it a whole two weeks before another rolling retard, uhm, restart!
Caelib McCallen
Registered User
Join date: 26 May 2008
Posts: 9
01-28-2009 17:59
Any update on when the database mitigation/throttling will be diminished? Having a hard time doing business under these constraints.
Prospero Linden
Linden Lab Employee
Join date: 6 Aug 2007
Posts: 315
01-28-2009 18:00
That's *retart*

Re: etchification, this is *sort of* the move to 64-bit sims. That is, we've had 64-bit sims for a very long time -- we were using a 64-bit sarge image on the actual sim nodes. However, we've been building the code as 32-bit binaries, and that continues through 1.25. Starting with 1.26 (which is *after* etch), we will begin building 64-bit binaries. I suspect nobody notices much :) But, if you want to perceive that things are a lot faster when that comes, feel free :)

Re: list processing, yes, we're going to revert the scheduler to the pre-1.25 behavior, and then for a future release we'll figure out the right way to fix the scheduler problems. There will be more communication from us about that on forums, in the JIRA, in the Beta group, etc.

The reversion of the scheduler is now deployed to the aditi "Second Life Beta Server" sims. Go over to aditi and try your list stuff out! Drop by the Beta office ours tomorrow afternoon and let us know how it went.
Prospero Linden
Linden Lab Employee
Join date: 6 Aug 2007
Posts: 315
01-28-2009 18:01
Database mitigation/throttling? What are you talking about?
Caelib McCallen
Registered User
Join date: 26 May 2008
Posts: 9
01-28-2009 19:52
From: Prospero Linden
Database mitigation/throttling? What are you talking about?


Maybe I'm mistaken, but I do believe this was explained in detail as the new strategy for "insuring the user experience" without having to block logins when the database is taxed. Pretty sure I saw it in the blog. Ever since the release of that information there has been a noticable performance issue especially with scripted objects within the game. Still having other various annoying issues, but that's not the point. The question was pointed and I hope this information I just posted will help bring an answer.
Rand Charleville
Helpy Helperton!
Join date: 6 Dec 2007
Posts: 3
01-29-2009 09:54
Woo Hoo!!!

Thanks Prospero, you've just made several of us very relieved! We'll follow your suggestion and let you know about any issues during your Beta hours. Liandra Ceawlin will represent our team.

From: Caelib McCallen
Maybe I'm mistaken, but I do believe this was explained in detail as the new strategy for "insuring the user experience" without having to block logins when the database is taxed. Pretty sure I saw it in the blog. Ever since the release of that information there has been a noticable performance issue especially with scripted objects within the game. Still having other various annoying issues, but that's not the point. The question was pointed and I hope this information I just posted will help bring an answer.


Caelib, what you have been noticing with respect to existing scripted content performance issues may very well be related to Jira http://jira.secondlife.com/browse/SVC-3679 which many of us have been concerned about. Seems like Prospero & company are doing the right thing and taking care of us.
1 2 3 4