Welcome to the Second Life Forums Archive

These forums are CLOSED. Please visit the new forums HERE

NOW what is the problem???

Foolish Frost
Grand Technomancer
Join date: 7 Mar 2005
Posts: 1,433
04-18-2006 08:25
You know, funny thing about technical knowledge:

I could tell the problem was with a database the moment I saw the symptoms, and also could tell that it was terribly slow and occasionally giving wrong data. It was not fully down, and WAS occasionally responding.

So, I know what the problem was by just the symptoms, and had no real need for any information.

Now, some of you want to know what is wrong: A database was having problems.

<shrugs>

Does that information TELL us anything? Does it make the situation better in some way? Do we have the knowledge and troubleshooting skills to use that data in a useful way? Can we help with that data even if we understood it?

Nope.

Sorry. But the problem here is that it ALWAYS comes down to "Something broke. Please wait while we fix it." Any further data is just them just being chatty.
It's not a bad trate, but does nothing to help us, or even really inform us of anything.
Geepa Lazarno
Registered User
Join date: 7 Apr 2006
Posts: 61
04-18-2006 09:08
It's always easier to be patient if you have at least some idea of how long the wait will be.

For this reason, it is best if those who are fixing the problem give out their best estimate on how long it will take, perhaps padding it a little to compensate for Murphy's inevitable meddling, and if they find the task more difficult or time-consuming than was first thought, they can revise this.

I don't think many people are expecting a hard, fast, absolutely certain time for repair, just an idea of what to expect.

If the problem is of such a nature that the programmers don't have any confidence in how long it will take, they may let us know that. If they want, they can then suggest when to check back for an update on the situation.

That way, we have a time frame to work with, but aren't given a false expectation.
Ketra Saarinen
Whitelock 'Yena-gal
Join date: 1 Feb 2006
Posts: 676
04-18-2006 09:40
I can tell you right now that time estimates are near meaningless when it comes to computers, especially a system as large and complex as SL. Just working on a home PC< you can come up with an estimate of how long it'll take, then you double it, and hope you hit that.

The biggest problem is that in computers, EVERYTHING is inter-dependant. What looks like a problem with a specific piece of software could actually stem from hardware. Or vice-versa. Honestly, it isn't until you've fixed the problem that you know how long it'll take.

Couple that with the size and breadth of SL, and it's dependancy on databases, the ability to provide an estimate is nearly impossible. (The following is not directed at Foolish Frost, it is just an example) You can see what appears to be a database error, but what you are actualy looking at is erroneous data. Why is the data bad? It could be a problem with the database. But it could also be a problem with the hardware reading the database. It could also be the hardware between the database server and the systems used to interpret and transmit that data. It could be the hardware involved in sending the data to you. The problem could stem from anything from an imminent catastrophic system failure, to a faulty network cable.

What makes this worse, is that the inter-dependancies can even effect each other. So as soon as you squash one issue, you realize that there is another, which then uncovers another. There is *NEVER* a clear-cut answer to a computer failure. Diagnostics are very often the bulk of the work to be done to solve the problem. And if you don't know what is needed to fix the problem, how do you estimate the time it will take?

I don't envy the guys at LL who have to take care of these issues. The pressures they are under must be bone-crushing, and they have to do their work with as little data loss as possible. Going to a back-up is the absolute last resort in a buisness like this because every minute of uptime has value to each and every customer. And I have seen a flow-chart of a database that's probabaly about the size of SL's. It took up two entire walls of a cubicle and looked like spiders had been spinning it for 50 years.

The only complaint I have about how the recent downtime was handled was the timeliness of the announcements. Someone should be posting something every 30 minutes. The moment the issue reached an emergency level, the front webpage (since the forums are tied-in and can go down at the same time) should say something like "We know there is an issue and we are on-site at this moment." Then again, 30 minutes later, even if it's "We are still working on the problem, but we have no further information, and can't provide an estimated uptime."

As was stated earlier, even non-informitive communication is helpful, if only as a comfort to those who are waiting.
Jon Marlin
Builder, Coder, RL & SL
Join date: 10 Mar 2005
Posts: 297
04-18-2006 10:14
From: Ketra Saarinen
Diagnostics are very often the bulk of the work to be done to solve the problem. And if you don't know what is needed to fix the problem, how do you estimate the time it will take?


This is absolutely true. At my office, we have a system that is just going into production. The only bugs left are the hard ones. I often spend three or four days, and sometimes a week or more diagnosing a problem that takes thirty seconds to fix.

My bosses ask me for ETA's as well. Until I know what is causing the problem, there isn't any way to give an accurate estimate.

- Jon
_____________________
Come visit Marlin Engineering at Horseshoe (222, 26) to see my line of flying vehicles.
Jessica Robertson
Registered User
Join date: 3 Dec 2004
Posts: 412
04-18-2006 10:15
From: someone
they'll provide us with ... a pony


*stamped*

Pony Linden on Webcam Dancing while SL is down.

*stamped again*

Is there a Pony Linden?? Is He Cute? If not, there should be!
Introvert Petunia
over 2 billion posts
Join date: 11 Sep 2004
Posts: 2,065
04-18-2006 10:45
From: someone
Just a supposition, but this is probably continuation of what we saw over the weekend. There was no patch, and really, reasonably, Linden couldn't be expected to engineer a restructuring that fast.

They're probably either trying to save face, or not give the little moron causing the DDOS attacks the satisfaction of acknowledgement.

This'll get solved. There just may be some annoyance before it does.
That is a charitable supposition. The asset server is metaphorically coughing up blood and it is not clear that they know how to fix as it is running at some huge multiple of the design expectations. One need only look at the Announcements over the last month.I expect that this will continue and worsen, DoS attacks or not.
Zuleica Sartre
Registered User
Join date: 27 Sep 2005
Posts: 105
04-18-2006 13:15
Well I just logged out in disgust.

I was trying to left-click-hold and drag a selection box around a set of prims and the edit window would only say I didn't have the right to mod the object. So I couldn't link it.

Deleting groups of prims was taking phenomenally long also. Chat and IMs were coming in out of order and I was sinking into the ground continually.

I'd say the grid is pretty much screwed at this time. Live Help and Lindens weren't answering either so I imagine they're all busy.

Seems to me if SL is going to have much of a life of it's own then LL is going to have to deal with problems like down time, attacks, DB integrity and horrible lag and texture load times.

Right now though it seems to be getting worse each month, not better.
Foolish Frost
Grand Technomancer
Join date: 7 Mar 2005
Posts: 1,433
04-23-2006 05:12
Gee.

It got better.

:D
1 2 3