Hardware Failure DownTime
|
|
Merwan Marker
Booring...
Join date: 28 Jan 2004
Posts: 4,706
|
12-29-2004 21:43
Hardware FailureCory Linden: Hardware Failure and Downtime Thursday AM We just blew a disk on one of database machines. No data has been loss, but performance will be degraded until we can switch over to the backup machine tomorrow AM. As a result, login may be slow or blocked during heavy load tonight (which we're mostly through). We are going to down the grid tomorrow morning around 6am PST and will be back up before noon PST. We apologize for the login problems and are working to get everything fixed ASAP. Thank you for your patience. 
_____________________
Don't Worry, Be Happy - Meher Baba
|
|
Maeve Morgan
ZOMG Resmod!
Join date: 2 Apr 2004
Posts: 1,512
|
12-29-2004 21:45
/me is sleeping waaay in tomorrow
|
|
Zone Zamboni
Registered User
Join date: 3 Jun 2004
Posts: 31
|
12-29-2004 22:01
Again? This is becoming a habit.
|
|
Tanaquil Karuna
Aoi aoi kono hoshi ni
Join date: 19 Aug 2004
Posts: 279
|
12-30-2004 01:48
I guess I'll take this time as an extra opportunity to do some more housework... the beasts (read: RP friends) are coming in on 31st, so it'll be much needed. Before and after 
|
|
Hank Ramos
Lifetime Scripter
Join date: 15 Nov 2003
Posts: 2,328
|
12-30-2004 01:57
From: Tanaquil Karuna I guess I'll take this time as an extra opportunity to do some more housework... the beasts (read: RP friends) are coming in on 31st, so it'll be much needed. Before and after  Same here 
|
|
Torley Linden
Enlightenment!
Join date: 15 Sep 2004
Posts: 16,530
|
12-30-2004 01:57
Stuff like this happens. Life happens. Second Life Happens.
In any case, good to know than remain in the dark -- Cory, all the best to you and the Linden family in getting the computers back up!
*hugz*
|
|
Charlotte Gillespie
2 - 0 Lindens
Join date: 19 Nov 2004
Posts: 1,101
|
12-30-2004 06:08
I'm trying to connect to the IRC, I've got the channel up with a list of people but no one's talking, and it's rather dead. I think there's a problem with the connection somewhere.
|
|
Dee Firefly
Dreaming Dragoness
Join date: 30 Jul 2004
Posts: 315
|
12-30-2004 06:56
Downtime - ah hah, an opportunity to eat, rather than starve in front of the keyboard Then it's either open the copy of Myst IV I got for Xmas, or wash the car. Ahhhh, it's raining, second option dealt with. What is this 'First Life' thing anyway ? 
|
|
Palomma Casanova
Free Dove Owner
Join date: 5 Apr 2004
Posts: 635
|
12-30-2004 06:59
Ohhh no!! Do I have to go to sleep then?
Can't get in!
_____________________
Palomma
|
|
Nora Belvedere
Ask me about being an alt
Join date: 25 Sep 2004
Posts: 267
|
12-30-2004 07:07
I am simply blown away after reading this thread in it's entirety.
_____________________
Coming soon, as RITA GROSHOMME! 
|
|
Marcos Fonzarelli
You are not Marcos
Join date: 26 Feb 2004
Posts: 748
|
12-30-2004 07:44
From: Torley Torgeson Stuff like this happens. Life happens. Second Life Happens. The problem is that it keeps happening over and over and over. Downtimes are becoming an all too frequent occurrence. 
|
|
Clint Cartier
Registered User
Join date: 16 Apr 2003
Posts: 122
|
12-30-2004 07:48
"We just blew a disk on one of database machines. " I wish these people in San Fransisco would just stick to humans for crying out loud!! 
|
|
YadNi Monde
Junkyard Owner
Join date: 30 Mar 2004
Posts: 189
|
Grmph =c
12-30-2004 08:01
Yea out again =( that s all i see =( AGAIN !!!!!!!!!!!!!! =(
I m building a City actually, do you think i have time to spare on the forums ?????
_____________________
---------------=If It S Not Primmy It S Not A YadNi=---------------- -=In a MMORPG u levelup a Character, in SL u levelup the User.=- ----------------=Your only limit is Your imagination.=-------------- -------------------------------------------  Yours, Friendly, Yadni =) 
|
|
Alicia Eldritch
the greatest newbie ever.
Join date: 13 Nov 2004
Posts: 267
|
12-30-2004 08:05
From: Marcos Fonzarelli The problem is that it keeps happening over and over and over. Downtimes are becoming an all too frequent occurrence.  Hmmm... I've been thinking about this a bit lately myself. Now I know that hardware dies, ok fine. Shizzle happens. But what I'd like to know is: Has all of the 8 M that LL received been earmarked in such a way that it can't be spent on hardware upgrades? I mean, 2 M could do wonders for the hardware infrastructure, I'm sure. Because if that's the case, we deserve to know. I am starting to suspect it is. I am starting to suspect that the money has been earmarked for weird spinoff projects like the teen grid, or marketing projects or something. But if not, please tell us and we can all breathe easier.
|
|
Buddha Bergman
Second Life Resident
Join date: 26 Nov 2004
Posts: 38
|
12-30-2004 08:18
From: Alicia Eldritch Now I know that hardware dies, ok fine. Shizzle happens.
Shizzle does happen, but it's important for end users to know that even when shizzle happens, it doesn't mean everything should grind to a halt. I'm not an expert, but from my medium-level knowledge of servers and redundancy I belive it should be possible for a single drive going out to cause little or no effect on the running of a database server. Drives can be redundant, and mission critical servers can be redundant meaning that if a drive or a whole server fails, another should be able to either automatically switch in, or be manually switched in in a very short time. Why isn't this being done at SL?
|
|
Olympia Rebus
Muse of Chaos
Join date: 22 Feb 2004
Posts: 1,831
|
12-30-2004 08:23
From: Torley Torgeson Stuff like this happens. Life happens. Second Life Happens.
In any case, good to know than remain in the dark -- Cory, all the best to you and the Linden family in getting the computers back up!
*hugz* I like your attitude, Torley 
|
|
Anthea Thereian
Delirious
Join date: 26 Jun 2004
Posts: 119
|
12-30-2004 08:49
hmmm... one disk on one database server barfs and 6 hours to replace?
suggests using a raid array.. replacing a hot spare would go unnoticed and increase uptime(ie customer satisfaction)just my 2 cents..
|
|
Willow Zander
Having Blahgasms
Join date: 22 May 2004
Posts: 9,935
|
12-30-2004 08:51
How much longer do we have to wait  there is only so much RL a girl can take 
_____________________
*I'm not ready for the world outside...I keep pretending, but I just can't hide...* <3 Giddeon's <3
|
|
Merwan Marker
Booring...
Join date: 28 Jan 2004
Posts: 4,706
|
12-30-2004 08:52
Note from last nite says about noon game time...
_____________________
Don't Worry, Be Happy - Meher Baba
|
|
Matthias Zander
...me?
Join date: 2 May 2004
Posts: 109
|
12-30-2004 08:56
Ever since I first saw that message, I've been wondering the same thing as Buddha. Replacing a single drive on a single server shouldn't take 6 hours in the first place, and it shouldn't need to take down the whole grid with it for that entire time. Heck, the grid was working fine last night after the drive crashed after you were finally able to log in!
|
|
Willow Zander
Having Blahgasms
Join date: 22 May 2004
Posts: 9,935
|
12-30-2004 09:09
From: Merwan Marker Note from last nite says about noon game time... OMFG thats 3 hours away!!! *cries*
_____________________
*I'm not ready for the world outside...I keep pretending, but I just can't hide...* <3 Giddeon's <3
|
|
LadyMacbrat Loveless
Registered User
Join date: 15 Oct 2004
Posts: 211
|
12-30-2004 09:12
But isn't this a wonderful opportunity to catch up here in the forum??<eg>
<just had to throw that in>
|
|
Azelda Garcia
Azelda Garcia
Join date: 3 Nov 2003
Posts: 819
|
12-30-2004 09:14
> Has all of the 8 M that LL received been earmarked in such a way that it can't be spent on hardware upgrades? I mean, 2 M could do wonders for the hardware infrastructure, I'm sure. > > Because if that's the case, we deserve to know. I am starting to suspect it is. I am starting to suspect that the money has been earmarked for weird spinoff projects like the teen grid, or marketing projects or something.
It's surprising how quickly money like this disappears. Bear in mind that a lot of people have probably been working on the promise of money arriving in the future, so they would have taken a wodge of this.
As an aside, note that whilst drives should be RAIDed, I've seen a ton of RAID controllers die, well ok three, but still they're surprisingly unreliable, and dont entirely guarantee that your server will never die. Of course, Compaq support is generally pretty decent, so the server shouldnt be dead for very long, maybe a couple of hours.
Azelda
|
|
Psyra Extraordinaire
Corra Nacunda Chieftain
Join date: 24 Jul 2004
Posts: 1,533
|
12-30-2004 09:19
I figured last night something was wrong with a server.
I live out in Gray sim, within about 80m of the old Gray airfield, and yesterday when I logged in in the evening, I noticed that all the fences around the airfield were gone. Baleeted. Vamoose. Well, the signs that hung on the airfield were still there but the fences itself were gone. Seemed kind of strange that a build that has been there for ages seemed to be starting to fall apart.... made me think of Gov. Linden's mansion.
Figured "Hmm, maybe there's a database server problem...."... and "Hmm.. good thing this isn't Jurassic Park."
_____________________
E-Mail Psyra at psyralbakor_at_yahoo_dot_com, Visit my Webpage at www.psyra.ca  Visit me in-world at the Avaria sims, in Grendel's Children! ^^
|
|
Simon Oz
Perpetual Noob
Join date: 26 Dec 2004
Posts: 61
|
12-30-2004 09:24
From: Buddha Bergman Shizzle does happen, but it's important for end users to know that even when shizzle happens, it doesn't mean everything should grind to a halt. I'm not an expert, but from my medium-level knowledge of servers and redundancy I belive it should be possible for a single drive going out to cause little or no effect on the running of a database server. Drives can be redundant, and mission critical servers can be redundant meaning that if a drive or a whole server fails, another should be able to either automatically switch in, or be manually switched in in a very short time. Why isn't this being done at SL? They've probably got a RAID array, and one of the disks in the array has failed. Rather than keep the machine in production they're taking it out of the pool before another drive fails leaving the array broken and possibly unrecoverable. In the mean time they're probably prepping one of the "backup" servers to take it's place so they can rebuild the array offline. /guesswork, but sound administration. Inconvenience rather than disaster.
|