Welcome to the Second Life Forums Archive

These forums are CLOSED. Please visit the new forums HERE

Equipment Fails..Give it a break Whiners

Alexis Starbrook
CEO - Alexis Digital
Join date: 7 Dec 2006
Posts: 135
12-28-2006 23:59
There is such a thing as equipment failure....Nothing in this world is perfect (INCLUDING YOU) ... so the grids down, its not like the purposely shut it down on you..

The obviously didn't expect it...

Patience is a virtue..
_____________________
Ravyn Christensen
SecondLife Addict
Join date: 30 Dec 2005
Posts: 69
12-29-2006 00:40
*Is disapointed to learn she isn't perfect and is in total shock, runs to hide her shame*
_____________________
Visit the Elven lands, ElvenGlen;wonderful market area & castle,ElfHaven;filled w/ fantastik builds, ElvenMoor;Home of the DragonFae Clan,ElvenVale;Home of House of the Dark Maiden & Serentiy Woods,ElfHabour;all things pirate, & 5+ more sims on the way. This is THE fantasy sim to be a part of! Wonderful community spirit and VERY friendly & helpful people!
Glory Takashi
You up for a DNA test?
Join date: 26 Feb 2006
Posts: 182
12-29-2006 02:25
So you felt the need to create a post just to berate people and whine yourself eh?
_____________________
I speak my mind and make no appologies for my opinion.
Geeky Wunderle
What a GEEK!
Join date: 1 Dec 2006
Posts: 122
12-29-2006 02:31
From: Glory Takashi
So you felt the need to create a post just to berate people and whine yourself eh?


Hey ppl are bored, ya gotta expect random posts...
Alexis Starbrook
CEO - Alexis Digital
Join date: 7 Dec 2006
Posts: 135
Hows this for bored!!
12-29-2006 03:01
_____________________
Geeky Wunderle
What a GEEK!
Join date: 1 Dec 2006
Posts: 122
12-29-2006 03:02
lol...
Wen Yue
Registered User
Join date: 12 Dec 2006
Posts: 2
12-29-2006 04:33
From: Alexis Starbrook
There is such a thing as equipment failure....Nothing in this world is perfect (INCLUDING YOU) ... so the grids down, its not like the purposely shut it down on you..

The obviously didn't expect it...

Patience is a virtue..



other virtues include; dedication, persistance, and devotion, all of which we can't practice while SL is down :P
Melissa Yeuxdoux
Registered User
Join date: 28 Aug 2006
Posts: 44
12-29-2006 05:38
From: Alexis Starbrook
There is such a thing as equipment failure....Nothing in this world is perfect (INCLUDING YOU) ... so the grids down, its not like the purposely shut it down on you..

The obviously didn't expect it...


There is also such a thing as redundancy, and anticipating problems before they happen. Monopolies are evil things, but one thing that can be said for the old phone system monopoly is that they did an admirable job of designing for reliability.
Markubis Brentano
Hi...YAH!!
Join date: 15 Apr 2006
Posts: 836
12-29-2006 05:57
From: Melissa Yeuxdoux
There is also such a thing as redundancy, and anticipating problems before they happen. Monopolies are evil things, but one thing that can be said for the old phone system monopoly is that they did an admirable job of designing for reliability.



Exactly!

If it was a hardware issue, I'm surprised...no, shocked! that they do not have back-up equipment for critical hardware.

Perhaps they have learned another lesson today...."ooo, this is a critical piece of equipment apparently....perhaps we should have back-ups in the future"

hehe


oh, well
Mike Westerburg
Who, What, Where?
Join date: 2 May 2004
Posts: 317
12-29-2006 06:38
From: Alexis Starbrook
There is such a thing as equipment failure....Nothing in this world is perfect (INCLUDING YOU) ... so the grids down, its not like the purposely shut it down on you..

The obviously didn't expect it...

Patience is a virtue..



The problem is, they keep calling this a GRID....

Usually in the realm of networking, a grid can also be referred to a mesh network topology. In a mesh network topology, theoretically over half to seventy five percent of node (server) failures can happen and the rest of the mesh (grid) will remain operable. This mesh layout should also cover the actual networking devices when implemented correctly, there could be a 75 percent failure of networking device failures and the grid would remain operable, it would be really poor in performance but operable.

Perhaps it may be better if they told us it more or less resembled an old token ring network topology, at least then having to bring the entire grid down for 1 system failure would be more accurate.

It is true that equipment fails, the thing is to be prepared to deal with a failure, have backup plans in place, redundant systems ready to spring into action. It isn't like people hide the fact that yes, that cheap Wal-Mart hard drive can fail. What needs to happen is to actually have a plan ready to deal with it when the drive does fail. To not expect system failure is the same as not expecting a thunderstorm in Florida, a very ignorant thing to do.
_____________________
"Life throws you a lemon, you make lemonade and then plant the seeds"
Argent Stonecutter
Emergency Mustelid
Join date: 20 Sep 2005
Posts: 20,263
12-29-2006 06:39
From: Alexis Starbrook
There is such a thing as equipment failure...
That's why you set up redundant servers.

Redundant front ends in front of a shared SAN aren't redundant servers.
Fatz Scheflo
Registered User
Join date: 23 Sep 2006
Posts: 16
12-29-2006 08:17
Last time we had a san failure where I work at I didn't leave my desk for literally 24 hours. After we brought the new hardware online--roughly 8 hours after the initial failure, security was hosed. I had to throw together an app that took ownership of every single file on the san and then set rights for the appropriate users. Of course that was on a windows based system.

My point is, I have a feeling it's going to be awhile for this problem to be fixed--longer than it takes for just the hardware replacement.

We built in redundancy at around 10,000 total users. SL has over 2 million. I'm reeaaallly surprised a single point of failure exists in a system this large.
Nea Scalia
~Vampire~
Join date: 4 Aug 2006
Posts: 27
12-29-2006 08:20
I just wanna get in =/
_____________________
"Not quite blonde, are we? More of a dirty blonde."
Peekay Semyorka
Registered User
Join date: 18 Nov 2006
Posts: 337
12-29-2006 08:23
The grid is up and operational, and many of us are in it... which means there is redundancy in the system (to some level.) However, when there is a failure like this, systems operate in "reduced capacity" mode.

For example, when part of a RAID array fails, the system is still up but the load has to be reduced because a lot of I/O and CPU is needed to rebuild that part.

Thus logins are disabled.

-peekay
Second Commerce
Registered User
Join date: 9 May 2006
Posts: 1
12-29-2006 09:02
From what I have read a hardware failure was encountered during the **emergency** maintenance on the asset servers, not the "grid"...

Maybe the emergency maintenance was to fix problems that were at the time not known to be the precursor to a storage device failure on one of the servers?

So, to keep organization and integrity of data on the asset servers they took grid down until the server is in operational order.

Or, at least thats my personal opinion on what is going on with the data I have gathered. :-P
Leanan Mensing
Registered User
Join date: 30 Nov 2006
Posts: 6
12-29-2006 09:39
Wow has had entire clusters of 'realms' go down due to failure of one part... a router.

Backups don't help on the component(s) that actually funnel or protect your bandwidth from going out.

Servers can be backed up, but I highly doubt LL is paying for more then one t3 line going out of the building. They don't make that much money.
Markubis Brentano
Hi...YAH!!
Join date: 15 Apr 2006
Posts: 836
12-29-2006 09:39
From: Peekay Semyorka
The grid is up and operational, and many of us are in it... which means there is redundancy in the system (to some level.) However, when there is a failure like this, systems operate in "reduced capacity" mode.

For example, when part of a RAID array fails, the system is still up but the load has to be reduced because a lot of I/O and CPU is needed to rebuild that part.

Thus logins are disabled.

-peekay



So you're telling me that you are in SL right now and only logins are disabled?


hmmmm...so SL is occupied only by campers?

hehe
Dartavia Vesperia
Gorean
Join date: 19 Jul 2005
Posts: 150
12-29-2006 09:44
From: Alexis Starbrook
There is such a thing as equipment failure....Nothing in this world is perfect (INCLUDING YOU) ... so the grids down, its not like the purposely shut it down on you..

The obviously didn't expect it...

Patience is a virtue..

And the hypocritical award of the day goes to .....
_____________________
Turian Designs, Est Sept 2005 ~ Fine-crafted Jewelry, Submission Collars and Gorean Decor
Jacques Groshomme
Registered User
Join date: 16 Mar 2005
Posts: 355
12-29-2006 09:49
It is simply impossible to anticipate every single disaster scenario.

The Empire certainly didn't expect some kid from Tatooine and his smuggler buddies to shoot up a 4m wide thermal exhaust port...
Trionnis Stockholm
Registered User
Join date: 20 Nov 2006
Posts: 15
12-29-2006 10:15
From: Leanan Mensing
Servers can be backed up, but I highly doubt LL is paying for more then one t3 line going out of the building. They don't make that much money.


you think all of SL runs on a single DS3??

oh wow.... just..... wow....

you realize that's only 45mbps right? That would be enough to sustain *maybe* 5 users with my downstream speed (10mbit). I assure you that while they certainly don't have the quality and quantity of bandwidth they really *do* need, they have *way* more than a single DS3.

Back on topic, a few of the posters here in this thread are right on the mark. It's very apparent that this asset server is a very critical part of SL's infrastructure, and there's NO reasonable explanation as to why they don't at least have a cold spare (they really should have a proper cluster). You can put all kinds of redundant parts in a good server, but you can never have redundant RAID controllers or motherboards. There's absolutely no excuse for a server of this importance not to have *some* kind of backup ready to put into place.
Markubis Brentano
Hi...YAH!!
Join date: 15 Apr 2006
Posts: 836
12-29-2006 10:46
From: Jacques Groshomme
It is simply impossible to anticipate every single disaster scenario.

The Empire certainly didn't expect some kid from Tatooine and his smuggler buddies to shoot up a 4m wide thermal exhaust port...



Yeah, but the empire had a back-up plan...RUN AWAY!! hehe


You mean they didn't have a back-up Deathstar?
~gasps~ :-O


You always have a "spares" list in ANY mechanical assembly. The spares list should contain any device or mechanism that is not easily replaceable, and is critical to the function of the main assembly.
Especially in a system that is designed and "expected" to run 24/7.
Alexis Starbrook
CEO - Alexis Digital
Join date: 7 Dec 2006
Posts: 135
12-29-2006 21:43
From: Dartavia Vesperia
And the hypocritical award of the day goes to .....


Congradulations, and what address do we send your award to..
_____________________