Welcome to the Second Life Forums Archive

These forums are CLOSED. Please visit the new forums HERE

Second Life Grid Update from FJ Linden

Katt Linden
Senior Member
Join date: 31 Mar 2008
Posts: 256
01-12-2009 14:00
Back with the monthly grid update. Its been a bumpy few weeks, with Level 3 outages, and central database issues [http://status.secondlifegrid.net/]. The good news is that LLnet (data center fiber network) continues ahead of schedule and we should be starting traffic migration in the next week. We've also made some headway in the area of asset storage. Right now, central database issues are our core focus and have been at the center of most of the recent grid problems.

LLnet
The benefits of LLnet are to not only get us off of our dependency on VPN's for inter data center traffic, but also lay the foundation for diverse internet providers that will allow us to handle an outage on a single provider (currently Level 3) and potentially improve latency. Most of our widespread and highest impacting outages have been network related, and that is why LLnet has been my top priority since joining Linden Lab this past summer. I expect final testing to be complete by the end of January, and production traffic cutover immediately after.

Improving Asset Storage
In the meantime, we have also been working to significantly reduce load on the Isilon storage clusters. I know that last month I indicated that we would discuss this more and wanted to touch on our strategy with storage. We've actually been working in a tiered storage environment for a number of months. The Isilons act as our primary means of storage, for those assets that are accessed on a more regular basis.

As you may imagine, however, most assets are either accessed very infrequently, or not at all. To determine how often assets are used, we've been running a detailed "collection" process. This process identifies those rarely used (or dead) assets and moves them to bulk storage, off of the primary Islion hardware. This is of primary importance to the stability of the Isilons, as we have been pushing the storage limits of these clusters, and a large number of assets in the "not frequently accessed" category have been taking up critical capacity. So moving these to bulk storage will not only provide us the necessary headroom and improved reliability, it will properly place assets on the right type of storage (depending on usage). We've also been using file compression on the Isilons as a "mid-tier" storage category, where we can maintain assets in the Isilons, for faster access, but minimize actual space used.

HTTP Dataserver and Agent Inventory Services
A quick update on a couple of our data access layer projects - HTTP Dataserver and Agent Inventory Services. Both of these projects are close to completion, you may recall from my previous posts that we are trying to simplify messaging protocols between the Simulators and back end databases (HTTP Dataserver), as well as messaging from databases to the viewer (Agent Inventory Services.) Implementation is dependent on a central server code release we expect will be deployed by the end of January, followed by these two projects for release in February. Also, expect a follow on blog post from one of our infrastructure leads, Sardonyx Linden, giving more details on our data architecture direction.

Read about the Central Database in January post
Finally, I've purposely not addressed our database issues, as I want to spend the January update on that component of infrastructure. Our central database has been a source of instability the past few weeks, and we have been spending considerable time investigating root cause issues. Given the complicated nature of the service, none of these issues have been easy to identify, but I'm expecting that we will have answers over the next few weeks, and I'll comment on the issue in the forum thread.

Frank

(note, you can read the original post on the blog, at blog.secondlife.com) The original blog post is: http://blog.secondlife.com/2009/01/12/second-life-grid-update-from-fj-linden/
Katt Linden
Senior Member
Join date: 31 Mar 2008
Posts: 256
01-12-2009 14:03
Please post your Grid related questions for FJ Linden, who will be stopping in to review and answer them as his work day allows, today!
Meade Paravane
Hedgehog
Join date: 21 Nov 2006
Posts: 4,845
01-12-2009 14:07
From: FJ Linden via Katt Linden
...This process identifies those rarely used (or dead) assets and moves them to bulk storage

Can we get an official "we plan to never delete these rarely used assets" from LL please? I've got stuff - really, really expensive stuff - that I don't drag out very often. It'd be great to clearly hear that LL has every intention of keeping these assets around for some time that approaches 'forever'.. (edit: and I'm sure that's the plan. I'd just like to hear it said)

From: FJ Linden via Katt Linden
...as well as messaging from databases to the viewer...

/me hopes that the 3rd party viewer devs are getting a heads up on what they're going to have to do here, too!

TY, Frank! (& Katt)

edit: speaking of assets, I've sometimes wondered how much space is being taken up by cubes that are named 'Object' and have 100% default params... While keeping my "don't delete my good stuff" statement above firmly in mind, would a project to prune (or at least measure) all the SL litter be interesting or even worthwhile?
_____________________
Tired of shouting clubs and lucky chairs? Vote for llParcelSay!!!
- Go here: http://jira.secondlife.com/browse/SVC-1224
- If you see "if you were logged in.." on the left, click it and log in
- Click the "Vote for it" link on the left
Maggie Darwin
Matrisync Engineering
Join date: 2 Nov 2007
Posts: 186
01-12-2009 14:09
I'd like to ask (very roughly) how long an asset might go unreferenced before being destaged to the bulk storage, and how long (again, roughly) it takes to revive an asset that has been moved to bulk.

What would an end-user see in a circumstance like that?

By the way, as an IT pro myself, I'm glad to see FJ here, it sounds to me like he has his people working on the right stuff.
Sylvie Grizot
Lurks
Join date: 6 Dec 2006
Posts: 24
01-12-2009 14:32
From: Maggie Darwin
... how long (again, roughly) it takes to revive an asset that has been moved to bulk.

What would an end-user see in a circumstance like that?

By the way, as an IT pro myself, I'm glad to see FJ here, it sounds to me like he has his people working on the right stuff.
That was my first thought as well - how much delay or what impact/effect will we see on loading Inventory or searching it?

And, like Maggie I'm pleased to see what you're focussing on, it seems like these are the right things to be working on. In general, I think asset and transaction issues have been better in the last weeks, apart from the issues when there's been over 70k online (like every night for the last week! ;))
Kimo Junot
Registered User
Join date: 12 Feb 2006
Posts: 29
01-12-2009 14:32
I think it is pretty simple to figure out the assest server problems and other issues you all have been having .
SL runs ok during the week...but when the weekend comes (and this has been going on for several years now) SL runs like all hell has broke loose at LL.
This only seems to happen on the weekends when I am sure LL is running a bare bones staff and the user level is 70K+. What it all seems to point to is that what ever servers you are using they cant handle the load of 70K+ people logged into SL at once.
Now..was that hard to figure out? lol
Kittrannia Cassini
Registered User
Join date: 20 Nov 2006
Posts: 18
01-12-2009 14:33
not sure if this is the right place to ask but here goes.....

With all the "Grid Status" messages lately advising people not to purchase items while the severs are being bad (Btw Katt....Is it so hard to have In-World notices on this?) isn't it about time you coded transactions to ONLY go through once the item has been delivered.

Do this and all business owners will be forever grateful :)
Sascha Vandyke
Bad Karma
Join date: 18 Jan 2007
Posts: 52
01-12-2009 14:38
I like the idea, sounds to me like hierarchical storage management. I only hope implementation works fine and is well thought over. And of course the question is if inv items are not a subject for deletion if not used for a long time or if there is a backup solution for these items. I have some items i rezzed maybe 1 year ago, still expensive, I wouldn't mind to wait a bit for them to rez if they have to be fetched from the bulk storage, only if they would be missing again (which means I loose money again, too).
_____________________
If there's a bug I'll get it.
Ciaran Laval
Mostly Harmless
Join date: 11 Mar 2007
Posts: 7,951
01-12-2009 15:09
How do you identify a little used asset? Once an asset is rezzed inworld, a building for example, then it may seem to be little used, so my question really is are you only looking for items that aren't rezzed inworld?
Argent Stonecutter
Emergency Mustelid
Join date: 20 Sep 2005
Posts: 20,263
01-12-2009 15:13
Items rezzed inworld are on the sim, not the asset server. They reference items in the asset server, like textures and prim contents, and by doing so they keep those assets from being "little used".
_____________________
Argent Stonecutter - http://globalcausalityviolation.blogspot.com/

"And now I'm going to show you something really cool."

Skyhook Station - http://xrl.us/skyhook23
Coonspiracy Store - http://xrl.us/coonstore
Ciaran Laval
Mostly Harmless
Join date: 11 Mar 2007
Posts: 7,951
01-12-2009 15:18
Thanks Argent, that makes sense regarding the textures having to be referenced.
FJ Linden
Linden Lab Employee
Join date: 29 Jul 2008
Posts: 8
01-12-2009 16:18
Hi. Thanks for the feedback. Couple of responses to questions:

- No plans to delete assets on bulk storage
- Assets that have not been accessed for greater than 2 months are eligible for movement.
- You should see no substantial performance issue, if you are retrieving an asset from bulk, and after making a call from bulk storage, we'll tag that asset and cache it back to the primary storage environment.

I also wanted to quickly talk about the weekend problems we have been having. The core problem is our central database becoming overloaded with queries. As mySQL heats up, we have determined the best course of action (meaning fastest recovery time with lowest resident impact) is to block log in's. This step allows for the database to more quickly recover vs. crashing then restarting. We've got a number of immediate steps we are taking to address the load problems and manage our way through our new peaks (now above 80K), and I'll be watching closely to make sure these fixes are implemented as quickly as possible. I know its been difficult to deal with, and appreciate how patient you have been (for a very long time). This has my immediate attention, as well as the attention of the entire engineering and operations team.

But I want to reinforce that the login blocks are not a factor of concurrency load per se. Rather, it has been our triage process to address database query spikes and allow the database to recover without crashing.
Escort DeFarge
Together
Join date: 18 Nov 2004
Posts: 681
01-12-2009 16:34
From: FJ Linden
As mySQL heats up...

Well now, that answers one question I asked a good long time ago in SVC-1149. A Jira that may possibly be worth a read as it relates to "things to watch out for" with MySQL and data integrity...!

Keep up the excellent work, Frank - you always talk a good deal of sense and I strongly expect LLnet to make a very substantial difference to all of our (Second) lives!

/esc
_____________________
http://slurl.com/secondlife/Together
Ciaran Laval
Mostly Harmless
Join date: 11 Mar 2007
Posts: 7,951
01-12-2009 16:37
From: FJ Linden
But I want to reinforce that the login blocks are not a factor of concurrency load per se. Rather, it has been our triage process to address database query spikes and allow the database to recover without crashing.


What sort of queries place excessive load on the database servers and are there any steps residents can take to lessen that load?
Sindy Tsure
Will script for shoes
Join date: 18 Sep 2006
Posts: 4,103
01-12-2009 16:54
From: Ciaran Laval
What sort of queries place excessive load on the database servers and are there any steps residents can take to lessen that load?

I always want to ask that but it comes out sounding like "what can I do to screw up the grid.. I just want to know what to avoid. really. honest." and end up not asking.. :\

Can I have a bear, Frank???
Bob Bunderfeld
Builder Extraordinaire
Join date: 10 Apr 2003
Posts: 423
01-12-2009 18:16
I can understand the need to restrict logins during a time when the CORE Database is experiencing issues; but I wonder about the fairness of the blind restricting of logins.

In this day when we see thousands of bots logged in and even free logins, how is it fair to restrict a paying customer from a service they are paying to receive?

I know it's not an easy task, but considering that this issue has been on the rise for quite a while, shouldn't you be considering a plan to either remove bots or even logging out free accts., during these peak usage times, so paying customers have a better chance to receive the service they are paying for?
_____________________
Bob "The Builder" Bunderfeld

"There could be a 13 year old Genius out there smarter than I am." - Blake Rockwell
Selene Gregoire
Eyes of the Wolf
Join date: 14 Sep 2005
Posts: 681
01-12-2009 18:29
From: Bob Bunderfeld
I can understand the need to restrict logins during a time when the CORE Database is experiencing issues; but I wonder about the fairness of the blind restricting of logins.

In this day when we see thousands of bots logged in and even free logins, how is it fair to restrict a paying customer from a service they are paying to receive?

I know it's not an easy task, but considering that this issue has been on the rise for quite a while, shouldn't you be considering a plan to either remove bots or even logging out free accts., during these peak usage times, so paying customers have a better chance to receive the service they are paying for?



Have to disagree with you on this Bob. I may no longer have a premium account but I still pay tier to an individual that then turns around and pays it to LL. I buy Ls each week (directly from LL) as well. Ergo I am a paying customer even if it is indirectly.

Playing favorites is never a sound business decision. I do understand and sympathize your thinking but it really is wrong on several levels.
_____________________
"Half of what I say is meaningless; but I say it so that the other half may reach you."

"In the depth of my soul there is a wordless song."

Kahlil Gibran


Bob Bunderfeld
Builder Extraordinaire
Join date: 10 Apr 2003
Posts: 423
01-12-2009 18:39
I certainly never meant to imply that Free Accts., should be treated with any less favoritism that other Accts., get treated.

While I understand your point that Free Accts., do pay into the economy of Linden Lab in an indirect way, I fear that with Linden Labs' ineffective means of dealing with bots, the easiest way to alleviate this issue is to identify Free Accts., and log them out.

At this point I would think the best solution would be to identify alternate accts., and logout those that have been logged in the longest. This makes an assumption that the bots being the most important thing to the Owner that the bot would be logged in the longest. This assumption isn't always going to be true, but if you are running a bot, you should assume some of the risk.

If this isn't possible, the next feasible fallback is to locate the Free Accts and gently log them out. This isn't about favoritism, it's about who's paying directly to Linden Lab or indirectly to Linden Lab. Free Acct logout should be the last resort plan for freeing up login space for paying customers, but if it has to be, it has to be, being a Free Acct you must assume some risk by not being a direct paying customer to Linden Lab.

This is just what I see as a solution, it's not something I'm going to be freakishly adamant about, I'm just tossing out ideas and trying to get some way of getting around the problem of blind login restriction.
_____________________
Bob "The Builder" Bunderfeld

"There could be a 13 year old Genius out there smarter than I am." - Blake Rockwell
Lyla Tunwarm
Registered User
Join date: 10 Jul 2008
Posts: 179
01-12-2009 19:12
From: Ciaran Laval
What sort of queries place excessive load on the database servers and are there any steps residents can take to lessen that load?

Any querie. One is no different than another. It is just asking for data to parse. Everything we do creates them. When a bunch of people log on all at once they will spike as peoples avatars load. The more people online the more queries stack up. I can only assume Linden Lab has optimized their indexes by now so it may just be down to MySQL not being the best choice for what SL needs. Also we see the problems today, what happens in 2-5 years when SL doubles in size and the asset data base is 10 times as loaded?

SL needs a major restructuring plan. How it is setup now is not going to cut it if LL plans to significantly grow SL. I have a feeling that plan is already in the works but I also bet people are not going to like it. Unlimited content creation can not last forever.
Maggie Darwin
Matrisync Engineering
Join date: 2 Nov 2007
Posts: 186
01-12-2009 20:03
From: Sindy Tsure
Can I have a bear, Frank???

Now you want a *bear*? You already have all those *sheep*... :-)

I do concur with the point that in-world notices for red-light/green-light on "risky transactions" would be very welcome. Polling the grid status is a pain.

Wouldn't mind "logins are disabled" notices in-world too...not only so I know not to log off (unless you are hoping I will) but so I'll know that my friends may not be able to get in.

And FJ... I missed an answer about time to retrieve a bulked asset. I realize that once it's been staged back in, it will perform like other recently referenced asset. But that first reference after it's been retired to bulk...slight delay? Longer delay? It won't actually *fail*, will it?
Celierra Darling
Registered User
Join date: 11 Jun 2006
Posts: 16
01-12-2009 21:49
From: Lyla Tunwarm
Any querie. One is no different than another. It is just asking for data to parse. Everything we do creates them. ....

This isn't quite what was meant, I think. What kind of in-world *actions* create load on the *central database*, as opposed to other databases? Not all actions cause transactions on the central database, and some actions can induce more queries than others.
We might not get a detailed answer to this here (fearing DoS attacks?), but maybe some vague idea would still be helpful.
SuezanneC Baskerville
Forums Rock!
Join date: 22 Dec 2003
Posts: 14,229
01-12-2009 22:57
Most of the load on the SL end of things comes from delivering bears.
_____________________
-

So long to these forums, the vBulletin forums that used to be at forums.secondlife.com. I will miss them.

I can be found on the web by searching for "SuezanneC Baskerville", or go to

http://www.google.com/profiles/suezanne

-

http://lindenlab.tribe.net/ created on 11/19/03.

Members: Ben, Catherine, Colin, Cory, Dan, Doug, Jim, Philip, Phoenix, Richard,
Robin, and Ryan

-
Aminom Marvin
Registered User
Join date: 31 Dec 2006
Posts: 520
01-13-2009 00:09
Sculpted prims have had problems since the beginning due to how SL retrieves textures from inventory. This is because sculpted prims use lossless images, which have a larger file size per pixel ratio. These problems continue to this day, and just to make a sculpt load in a reasonable period of time requires intensive knowledge of how JPG2000 lossless compresses images, as well as hacks and workarounds.

Will we see improvement specifically for sculpt maps, which suffer from slow loading times much more than image maps? If so, what will improve it and how?

Textures in general also suffer from slow loading times; it is my opinion that many new users are discouraged when they find a world that is slow to load. Will the listed changes to SL help speed texture downloading in general as well?
Qie Niangao
Coin-operated
Join date: 24 May 2006
Posts: 7,138
01-13-2009 00:46
From: Aminom Marvin
Will we see improvement specifically for sculpt maps, which suffer from slow loading times much more than image maps?
This really is a critical question, together with the image texture download delays. Although I think I've seen some limited improvement in sculptmap downloads, rezzing times for surface textures have steadily lengthened, going back as far as I can remember. So, a couple of questions:

In practice, do sculptmaps and surface texture downloads slow each other down? Of course they share the same download bandwidth, but is that pipe often near capacity? (I see cases where sims get in trouble and show both Network and Image frame times hugely extended--but those seem to be anomalies, yet the viewer texture console shows starvation even when the sim and network appear idle.)

Would it be possible to document the detailed storage-to-viewer data path for images and sculptmaps, and for other asset classes if handled differently? It would help us understand what impact to expect from new developments, and (possibly) help residents optimize builds to avoid the bottlenecks.

This may be an impractical thing to ask, but: is the access frequency curve for assets known? That is, the "long tail" of rarely accessed assets are pushed off of the Isilons, but what about what's left on them: does 80% of access hit just 20% of assets, or is the distribution more uniform, or even more peaked? (The operations question, then, is whether Isilon is configured or self-tunes for optimal efficiency with that curve, or if there's an engineering opportunity there.)
Denny Clowes
Registered User
Join date: 24 May 2008
Posts: 4
01-13-2009 02:03
The latest official viewer 1.21.6.99587 has serious issues with caching data locally, especially after it crashes and then has to reload every item from the asset clusters locally to the hard drive. A fair assumption would be to state that no less than 50% of all Second Life residents (excluding bots) have increased their cache-limit to 1 Gb of locally stored data, with that being said is it then understandable that these people, with this viewer is causing massive strain not only on the asset servers when downloading 15-30,000 items/textures per logon, but increased bandwidth usage which is a serious problem for users, especially in the US, with capped download limits from their ISP's. Several JIRA's logged already on similar issues, like http://jira.secondlife.com/browse/VWR-9509 for instance.

Having logons locked and users kicked out (even if it's a 'quicker solution') when in-world resident limits reach 70k+ isn't a feasible solution for recruiting new users and bring them into Second Life as per friendly advice, nor is it appreciated by merchants, businesses owners and educators who depend on reliable services to maintain their positions, just like in the real world; would be bad if SONY headquarters suffered from power outages every Sunday because too many employees came into their building and turned on the lights.

Many of us who use Second Life, use it daily. What we see are thousand of alternative accounts (alts) and more times than not, these are used in-world to generate traffic for statistical purposes and increase productivity on parcels and get higher listings in search engines as bots who idly stand crammed into a small piece of land only to get the numbers up.

A stronger enforcement on the use of alt/bot accounts should be implemented to take strain off the servers, and most of all, create a more stable and friendly community.

It adds nothing (except increased MySQL spikes) having hundreds of users who each has 20 accounts log in only to use for egoistical purposes, it adds load on the sim, takes resources off the asset cluster, induces false traffic and steals listing places from others who pay dearly to increase their place in searches. Most importantly, there are no beneficial purposes for allowing users to have multiple accounts since they are not likely to purchase premium membership or even generate any income through LindeX or other in-world services thus pushing prices up for paying members. With paying members leaving Second Life, business owners pulling out and institutions withdrawing grants it won't be many years before the financial side of Second Life starts to crumble and fail.
1 2 3 4 5 6