Second Life Forums Archive - It Is Official---nothing will ever be fixed

Calliope Simon

Registered User

Join date: 21 May 2006

Posts: 154

08-12-2007 12:59

Just took a look at the blog, seems they took the advice of just coming out with the truth.

Unfortunately, they probably should have just stayed quiet.

IPSEC tunneling to throw SL traffic between hubs over the INTERNET? That's not how you query databases in san francisco from texas. You do that kind of thing on an absolutely static route over a single backbone provider that does hard frame encryption between their routers and has nothing to do with VPNing---because that's fast and stable. Not slow and unreliable.

And a single hard drive in a RAID array capable of bringing down AUTHENTICATION? This just proves beyond any doubt that the people who designed this stuff are IDIOTS. There are NO circumstances in which it is excusable to allow a single hard drive failure to bring down a RAID array. Lets just remind ourselves that raid stands for:

Redundant Array of Inexpensive Drives.

REDUNDANT array. REDUNDANT. That means that if one goes down, you don't lose the array.

In fact, there are three RAID hardware manufacturers that I can think of off the top of my head who will send you a new drive often before you've even realized that one went bad--less than four hours usually. Swapping should be any more than walking up to the rack, yanking out the broken drive and slapping the new one in, and then watching it rebuild itself automatically.

That a failed drive brought down an entire array can mean either or both of the following:

1. More than one drive had failed in that array, and the array lost parity, and therefore failed. It is so highly unlikely that more than two drives would fail at the same time that it's safe to assume that it didn't happen---and that it had *already had a failed drive or drives in it that hadn't been replaced when the last one failed*

2. It was set up without parity.

Both are absolutely novice moves and entirely without excuse.

Kascha Matova

Bus Bench Supermodel

Join date: 30 Mar 2007

Posts: 342

08-12-2007 13:45

From: Calliope Simon

Just took a look at the blog, seems they took the advice of just coming out with the truth.

Unfortunately, they probably should have just stayed quiet.

IPSEC tunneling to throw SL traffic between hubs over the INTERNET? That's not how you query databases in san francisco from texas. You do that kind of thing on an absolutely static route over a single backbone provider that does hard frame encryption between their routers and has nothing to do with VPNing---because that's fast and stable. Not slow and unreliable.

And a single hard drive in a RAID array capable of bringing down AUTHENTICATION? This just proves beyond any doubt that the people who designed this stuff are IDIOTS. There are NO circumstances in which it is excusable to allow a single hard drive failure to bring down a RAID array. Lets just remind ourselves that raid stands for:

Redundant Array of Inexpensive Drives.

REDUNDANT array. REDUNDANT. That means that if one goes down, you don't lose the array.

In fact, there are three RAID hardware manufacturers that I can think of off the top of my head who will send you a new drive often before you've even realized that one went bad--less than four hours usually. Swapping should be any more than walking up to the rack, yanking out the broken drive and slapping the new one in, and then watching it rebuild itself automatically.

That a failed drive brought down an entire array can mean either or both of the following:

1. More than one drive had failed in that array, and the array lost parity, and therefore failed. It is so highly unlikely that more than two drives would fail at the same time that it's safe to assume that it didn't happen---and that it had *already had a failed drive or drives in it that hadn't been replaced when the last one failed*

2. It was set up without parity.

Both are absolutely novice moves and entirely without excuse.

Does this mean RAID doesn't kill roaches dead?

No really - that did seem kinda peculiar. The whole purpose of RAID is to prevent single points of failure in the storage system. Even a simple mirroring setup can prevent catastrophe with one drive. Not the most efficient way to do it but...

Point 2 is frightening. A striped set? And that's it? I know mirroring has no parity either but it would have survived a single disk failure. That only leaves RAID 0 since the rest have either dedicated or distributed parity. Egads!

Day Oh

Registered User

Join date: 3 Feb 2007

Posts: 1,257

08-12-2007 14:04

They did say the drive didn't really fail, but half-failed in such a way that it kept working, but very very poorly.

I don't really know this hardware stuff, I really just want this info in here so it's responded to

Sindy Tsure

Will script for shoes

Join date: 18 Sep 2006

Posts: 4,103

08-12-2007 14:28

From: Day Oh

They did say the drive didn't really fail, but half-failed in such a way that it kept working, but very very poorly.

I don't really know this hardware stuff, I really just want this info in here so it's responded to

Don't go mixing facts in, Day.. This isn't about facts, it's about FUD.

Rusty Satyr

Meadow Mythfit

Join date: 19 Feb 2004

Posts: 610

08-12-2007 14:45

Wow, wish I could do IT in your universe, where problems are always crystal clear and obvious, and nothing ever partially fails in an un-trappable way.

VPN for connectivity between sites is clearly something they've adopted as necessary... if they're going to open-source the SIM side of SecondLife, they are not going to be able to set up dedicated secure lines between each SIM server and their back-end asset/auth services.

Malachi Petunia

Gentle Miscreant

Join date: 21 Sep 2003

Posts: 3,414

08-12-2007 15:59

From: someone

VPN for connectivity between sites is clearly something they've adopted as necessary... if they're going to open-source the SIM side of SecondLife, they are not going to be able to set up dedicated secure lines between each SIM server and their back-end asset/auth services.

But it isn't something that need be done *now* for California<->Texas traffic especially given its now known failure mode.

If they really are "co-locating" across the public Internet (and not just writing poorly in the blog) they should be happy that it works at all, ever. Contrariwise, if they are using IPSec over a dedicated link, they really need to buy better site-to-site transport.

I had the same reaction to the blog entry as did the OP, that this is how I'd expect "Fred's IP Hosting" system to be put together.

_____________________

Calliope Simon

Registered User

Join date: 21 May 2006

Posts: 154

08-12-2007 16:53

From: Day Oh

They did say the drive didn't really fail, but half-failed in such a way that it kept working, but very very poorly.

I don't really know this hardware stuff, I really just want this info in here so it's responded to

Any modern RAID array would have detected that sort of thing immediately---since it will always, always produce massive amounts of channel errors. And then you yank it and slap a new drive in, then go home and let it rebuild itself (while it continues to function normally through the entire process)

Calliope Simon

Registered User

Join date: 21 May 2006

Posts: 154

08-12-2007 16:55

From: Rusty Satyr

Wow, wish I could do IT in your universe, where problems are always crystal clear and obvious, and nothing ever partially fails in an un-trappable way.

VPN for connectivity between sites is clearly something they've adopted as necessary... if they're going to open-source the SIM side of SecondLife, they are not going to be able to set up dedicated secure lines between each SIM server and their back-end asset/auth services.

They don't have to set them up---they're already there. Every backbone provider will lease bandwidth directly to anyone willing to pay them---and its a lot less expensive, generally, than is assumed.

Brenda Connolly

Un United Avatar

Join date: 10 Jan 2007

Posts: 25,000

08-12-2007 17:19

Who's going to be responsible for translating all that into English?

_____________________

Don't you ever try to look behind my eyes. You don't want to know what they have seen.

http://brenda-connolly.blogspot.com

Rusty Satyr

Meadow Mythfit

Join date: 19 Feb 2004

Posts: 610

08-12-2007 17:43

From: Calliope Simon

since it will always

Really? Will it now?

I've been in IT for 20 years... The only things I'm sure of are these:

Complex systems will eventually fail in ways that even the best of people can not plan for.

And there will usually be some armchair quarterback prattling on about coulda-woulda-shoulda.

Osgeld Barmy

Registered User

Join date: 22 Mar 2005

Posts: 3,336

08-12-2007 18:08

i do agree with the RAID points, the entire point of running a raid in a situation like SL (or any other networking system) is not speed, its redundancy

heck like i really give a crap if the disk systems are faster when i cannot log in becuase of a failure, this is truley a novice, and if you need speed and redundancy use a stripe set with parity! if this is beyond you please let me know ill show you how to do it in 1 min flat

linden labs, seriously contact me and ill send you my (ms) networking essentials book that i got in a 101 class, theres like 3 chapters on exactly how to use this, written in a way that a housewife with no computer experience at all could understand

Osgeld Barmy

Registered User

Join date: 22 Mar 2005

Posts: 3,336

08-12-2007 18:12

From: Rusty Satyr

Really? Will it now?

I've been in IT for 20 years... The only things I'm sure of are these:

Complex systems will eventually fail in ways that even the best of people can not plan for.

And there will usually be some armchair quarterback prattling on about coulda-woulda-shoulda.

yea it will 99.999% of the time, unlike 20 years ago its pretty ez for the system to notice that it has to do alot of corrections for the data to be valid

and forgive me for not believing in your 20 year experience, but our 30 year veteran just last Friday tried to wire a 100base T connection using straight tru serial wire(id be giving him credit by saying it was cat 1 but i wont because this shit was insulated with stone) , and proceeded to fuss and cuss at it for almost an hour before i dropped a cat 5 cable on his desk, so ...

and i would rather trust an armchair that knows what their doing vs a novice charging me money any day

Tod69 Talamasca

The Human Tripod ;)

Join date: 20 Sep 2005

Posts: 4,107

08-12-2007 18:19

At least they didnt call in "Geek Squad" (that we know of!)

_____________________

really pissy & mean right now and NOT happy with Life.

Malachi Petunia

Gentle Miscreant

Join date: 21 Sep 2003

Posts: 3,414

08-12-2007 18:19

From: someone

I've been in IT for 20 years... The only things I'm sure of are these:

Complex systems will eventually fail in ways that even the best of people can not plan for.

And there will usually be some armchair quarterback prattling on about coulda-woulda-shoulda.

So with all that experience do you find yourself making the same errors today as you did 10 years ago? I assume not.

The failures - as LL reported - "woulda" been novel or interesting 15 years ago but are now so passe as to be silly. As noted above, they have a mission critical system that can't tell them when a disk is in distress? Shame on them; that's what we now call a "solved problem". Can't contact your remote site and your operations software doesn't tell you before you notice? Another solved problem - unless you are LL, it seems.

_____________________

Draco18s Majestic

Registered User

Join date: 19 Sep 2005

Posts: 2,744

08-12-2007 22:51

Worse is Better

Edit:
oh right. BBCode is down.

Worse is Better:
http://www.jwz.org/doc/worse-is-better.html

Dnali Anabuki

Still Crazy

Join date: 17 Oct 2006

Posts: 1,633

08-13-2007 01:05

From: Draco18s Majestic

Worse is Better

Edit:
oh right. BBCode is down.

Worse is Better:
http://www.jwz.org/doc/worse-is-better.html

Wonderful article..thanks for posting it..I've been wondering about whether to cul de sac myself by learning Lisp...now to find out where Java fits...

Kascha Matova

Bus Bench Supermodel

Join date: 30 Mar 2007

Posts: 342

08-13-2007 02:35

From: Rusty Satyr

Wow, wish I could do IT in your universe, where problems are always crystal clear and obvious, and nothing ever partially fails in an un-trappable way.

VPN for connectivity between sites is clearly something they've adopted as necessary... if they're going to open-source the SIM side of SecondLife, they are not going to be able to set up dedicated secure lines between each SIM server and their back-end asset/auth services.

Are failing drives 100% efficient and reliable in you universe? To the point where there would be no noticeable increase in read/write errors or other performance factors?

Can you pick me up at 8?

Kitty Barnett

Registered User

Join date: 10 May 2006

Posts: 5,586

08-13-2007 03:06

From: Rusty Satyr

VPN for connectivity between sites is clearly something they've adopted as necessary... if they're going to open-source the SIM side of SecondLife, they are not going to be able to set up dedicated secure lines between each SIM server and their back-end asset/auth services.

VPN might simply have been a necessity that came up, rather than a conscious decision.

If they originally developped the whole architecture with the assumption that everything would always be in the same colocation they might not have seen much of a need to implement secure communication.

When it became clear that the SF colo would no longer meet their needs, they would have had the option to either hurry and implement it, or go with something that would do the job without extra coding which is to VPN the two colocations together into one virtual network.

Just a guess for a scenario where VPN would make sense.

Rusty Satyr

Meadow Mythfit

Join date: 19 Feb 2004

Posts: 610

08-13-2007 12:04

From: Osgeld Barmy

yea it will 99.999% of the time, unlike 20 years ago its pretty ez for the system to notice that it has to do alot of corrections for the data to be valid

and forgive me for not believing in your 20 year experience, but our 30 year veteran just last Friday tried to wire a 100base T connection using straight tru serial wire(id be giving him credit by saying it was cat 1 but i wont because this shit was insulated with stone) , and proceeded to fuss and cuss at it for almost an hour before i dropped a cat 5 cable on his desk, so ...

and i would rather trust an armchair that knows what their doing vs a novice charging me money any day

People make mistakes. Vendors release drivers and updates that cause strange problems. Heavy loads results in a dynamic environment can cause new failure conditions that weren't tested for by vendors or implementation staff. Yes, MOST of the failures are expected and planned for.

There are still times when you have to call the vendor and beat them up for a while to get them to acknowledge that there are problems with their product not behaving according to spec.

I'd very happily retire today if it meant that hardware & software wouldn't need the likes of me anymore.

Jotheph Nemeth

Registered User

Join date: 9 Aug 2007

Posts: 142

08-13-2007 14:40

From: Rusty Satyr

Wow, wish I could do IT in your universe, where problems are always crystal clear and obvious, and nothing ever partially fails in an un-trappable way.

VPN for connectivity between sites is clearly something they've adopted as necessary... if they're going to open-source the SIM side of SecondLife, they are not going to be able to set up dedicated secure lines between each SIM server and their back-end asset/auth services.

Why does the idea of them doing this strike me as a bad idea?

It almost seems like they are trying to position themselves as only software, and in control of the money.

If someone else sets up their own servers, what's to prevent them from making counterfeit lindens? Or even a whole new money? They connect, but almost immediately they will start to diverge from the linden version.

Ok. This might not be so bad. In fact, it might mean real competition in terms of software and money. But it could also mean with real competition comes the end of the Lindens being in control.

If they insist on overseeing any other servers that connect, there might be little reason to do so.

AWM Mars

Scarey Dude :¬)

Join date: 10 Apr 2004

Posts: 3,398

08-14-2007 05:19

Whats most worrying is, none of the VPN or Raid systems are new technology... I run a personal (for my business) raid setup with 2tb's of HD space over a 4 disk setup. I use 120,000 hr MBTF HD's and have a throughput of 500gb's of precious data per month through this system without a single clitch. I don't consider my supporting system to be of corporate status either.
Unless LL have setup the raid as OBG (One Big Disk), rather than mirror sets or fast access over singular disk sets, I can't see how their whole system went down..... geeez... for all my hosting services across the world supporting our business, I have never come across such flaky service, especially from a company with such a high $ value throughput. And they charge how much for basic hosting services?

_____________________

*** Politeness is priceless when received, cost nothing to own or give, yet many cannot afford -

Why do you only see typo's AFTER you have clicked submit? **
http://www.wba-advertising.com
http://www.nex-core-mm.com
http://www.eml-entertainments.com
http://www.v-innovate.com

Rusty Satyr

Meadow Mythfit

Join date: 19 Feb 2004

Posts: 610

08-14-2007 09:34

From: AWM Mars

and have a throughput of 500gb's of precious data per month

SL serves up almost that much data every minute. (Obviously, not from the same raid.)

I'd love to see the back-end architecture supporting SL and how it was laid out. I've been piecing together bits over the years as I hear specific mention of parts, but unlike sim servers (which probably fill more than 50 server racks) which are easy to estimate... asset servers, inventory servers and the like could be partitioned off in any quantity, with even more, depending on redundancy.

--
"Shouldn't" happens.

AWM Mars

Scarey Dude :¬)

Join date: 10 Apr 2004

Posts: 3,398

08-14-2007 10:02

From: Rusty Satyr

SL serves up almost that much data every minute. (Obviously, not from the same raid.)

Opps typo time.. I meant per day on average.. however, I also said that my support systems are not considered corporate standard, which I would expect from LL.

_____________________

*** Politeness is priceless when received, cost nothing to own or give, yet many cannot afford -

Why do you only see typo's AFTER you have clicked submit? **
http://www.wba-advertising.com
http://www.nex-core-mm.com
http://www.eml-entertainments.com
http://www.v-innovate.com

Rusty Satyr

Meadow Mythfit

Join date: 19 Feb 2004

Posts: 610

08-14-2007 12:01

From: AWM Mars

Opps typo time.. I meant per day on average.. however, I also said that my support systems are not considered corporate standard, which I would expect from LL.

(nod) I expect the same. I just know that in my small shop of 300+ misc servers that I, and my peers, get patched through to developer level engineers at our primary hardware & software vendors to resolve "unexpected problems", even when our deployment follows that vendor's suggested best practices.

I can only imagine how much more grief LL has with 10x as many servers.

Bobbyb30 Zohari

SL Mentor Coach

Join date: 11 Nov 2006

Posts: 466

12-06-2007 12:37

From: Draco18s Majestic

Worse is Better

Edit:
oh right. BBCode is down.

Worse is Better:
http://www.jwz.org/doc/worse-is-better.html

BBC will never get fixed...

_____________________

It Is Official---nothing will ever be fixed
Calliope Simon Registered User Join date: 21 May 2006 Posts: 154	08-12-2007 12:59 Just took a look at the blog, seems they took the advice of just coming out with the truth. Unfortunately, they probably should have just stayed quiet. IPSEC tunneling to throw SL traffic between hubs over the INTERNET? That's not how you query databases in san francisco from texas. You do that kind of thing on an absolutely static route over a single backbone provider that does hard frame encryption between their routers and has nothing to do with VPNing---because that's fast and stable. Not slow and unreliable. And a single hard drive in a RAID array capable of bringing down AUTHENTICATION? This just proves beyond any doubt that the people who designed this stuff are IDIOTS. There are NO circumstances in which it is excusable to allow a single hard drive failure to bring down a RAID array. Lets just remind ourselves that raid stands for: Redundant Array of Inexpensive Drives. REDUNDANT array. REDUNDANT. That means that if one goes down, you don't lose the array. In fact, there are three RAID hardware manufacturers that I can think of off the top of my head who will send you a new drive often before you've even realized that one went bad--less than four hours usually. Swapping should be any more than walking up to the rack, yanking out the broken drive and slapping the new one in, and then watching it rebuild itself automatically. That a failed drive brought down an entire array can mean either or both of the following: 1. More than one drive had failed in that array, and the array lost parity, and therefore failed. It is so highly unlikely that more than two drives would fail at the same time that it's safe to assume that it didn't happen---and that it had already had a failed drive or drives in it that hadn't been replaced when the last one failed 2. It was set up without parity. Both are absolutely novice moves and entirely without excuse.
Kascha Matova Bus Bench Supermodel Join date: 30 Mar 2007 Posts: 342	08-12-2007 13:45 From: Calliope Simon Just took a look at the blog, seems they took the advice of just coming out with the truth. Unfortunately, they probably should have just stayed quiet. IPSEC tunneling to throw SL traffic between hubs over the INTERNET? That's not how you query databases in san francisco from texas. You do that kind of thing on an absolutely static route over a single backbone provider that does hard frame encryption between their routers and has nothing to do with VPNing---because that's fast and stable. Not slow and unreliable. And a single hard drive in a RAID array capable of bringing down AUTHENTICATION? This just proves beyond any doubt that the people who designed this stuff are IDIOTS. There are NO circumstances in which it is excusable to allow a single hard drive failure to bring down a RAID array. Lets just remind ourselves that raid stands for: Redundant Array of Inexpensive Drives. REDUNDANT array. REDUNDANT. That means that if one goes down, you don't lose the array. In fact, there are three RAID hardware manufacturers that I can think of off the top of my head who will send you a new drive often before you've even realized that one went bad--less than four hours usually. Swapping should be any more than walking up to the rack, yanking out the broken drive and slapping the new one in, and then watching it rebuild itself automatically. That a failed drive brought down an entire array can mean either or both of the following: 1. More than one drive had failed in that array, and the array lost parity, and therefore failed. It is so highly unlikely that more than two drives would fail at the same time that it's safe to assume that it didn't happen---and that it had already had a failed drive or drives in it that hadn't been replaced when the last one failed 2. It was set up without parity. Both are absolutely novice moves and entirely without excuse. Does this mean RAID doesn't kill roaches dead? No really - that did seem kinda peculiar. The whole purpose of RAID is to prevent single points of failure in the storage system. Even a simple mirroring setup can prevent catastrophe with one drive. Not the most efficient way to do it but... Point 2 is frightening. A striped set? And that's it? I know mirroring has no parity either but it would have survived a single disk failure. That only leaves RAID 0 since the rest have either dedicated or distributed parity. Egads!
Day Oh Registered User Join date: 3 Feb 2007 Posts: 1,257	08-12-2007 14:04 They did say the drive didn't really fail, but half-failed in such a way that it kept working, but very very poorly. I don't really know this hardware stuff, I really just want this info in here so it's responded to
Sindy Tsure Will script for shoes Join date: 18 Sep 2006 Posts: 4,103	08-12-2007 14:28 From: Day Oh They did say the drive didn't really fail, but half-failed in such a way that it kept working, but very very poorly. I don't really know this hardware stuff, I really just want this info in here so it's responded to Don't go mixing facts in, Day.. This isn't about facts, it's about FUD.
Rusty Satyr Meadow Mythfit Join date: 19 Feb 2004 Posts: 610	08-12-2007 14:45 Wow, wish I could do IT in your universe, where problems are always crystal clear and obvious, and nothing ever partially fails in an un-trappable way. VPN for connectivity between sites is clearly something they've adopted as necessary... if they're going to open-source the SIM side of SecondLife, they are not going to be able to set up dedicated secure lines between each SIM server and their back-end asset/auth services.
Malachi Petunia Gentle Miscreant Join date: 21 Sep 2003 Posts: 3,414	08-12-2007 15:59 From: someone VPN for connectivity between sites is clearly something they've adopted as necessary... if they're going to open-source the SIM side of SecondLife, they are not going to be able to set up dedicated secure lines between each SIM server and their back-end asset/auth services. But it isn't something that need be done now for California<->Texas traffic especially given its now known failure mode. If they really are "co-locating" across the public Internet (and not just writing poorly in the blog) they should be happy that it works at all, ever. Contrariwise, if they are using IPSec over a dedicated link, they really need to buy better site-to-site transport. I had the same reaction to the blog entry as did the OP, that this is how I'd expect "Fred's IP Hosting" system to be put together. _____________________
Calliope Simon Registered User Join date: 21 May 2006 Posts: 154	08-12-2007 16:53 From: Day Oh They did say the drive didn't really fail, but half-failed in such a way that it kept working, but very very poorly. I don't really know this hardware stuff, I really just want this info in here so it's responded to Any modern RAID array would have detected that sort of thing immediately---since it will always, always produce massive amounts of channel errors. And then you yank it and slap a new drive in, then go home and let it rebuild itself (while it continues to function normally through the entire process)
Calliope Simon Registered User Join date: 21 May 2006 Posts: 154	08-12-2007 16:55 From: Rusty Satyr Wow, wish I could do IT in your universe, where problems are always crystal clear and obvious, and nothing ever partially fails in an un-trappable way. VPN for connectivity between sites is clearly something they've adopted as necessary... if they're going to open-source the SIM side of SecondLife, they are not going to be able to set up dedicated secure lines between each SIM server and their back-end asset/auth services. They don't have to set them up---they're already there. Every backbone provider will lease bandwidth directly to anyone willing to pay them---and its a lot less expensive, generally, than is assumed.
Brenda Connolly Un United Avatar Join date: 10 Jan 2007 Posts: 25,000	08-12-2007 17:19 Who's going to be responsible for translating all that into English? _____________________ Don't you ever try to look behind my eyes. You don't want to know what they have seen. http://brenda-connolly.blogspot.com
Rusty Satyr Meadow Mythfit Join date: 19 Feb 2004 Posts: 610	08-12-2007 17:43 From: Calliope Simon since it will always Really? Will it now? I've been in IT for 20 years... The only things I'm sure of are these: Complex systems will eventually fail in ways that even the best of people can not plan for. And there will usually be some armchair quarterback prattling on about coulda-woulda-shoulda.
Osgeld Barmy Registered User Join date: 22 Mar 2005 Posts: 3,336	08-12-2007 18:08 i do agree with the RAID points, the entire point of running a raid in a situation like SL (or any other networking system) is not speed, its redundancy heck like i really give a crap if the disk systems are faster when i cannot log in becuase of a failure, this is truley a novice, and if you need speed and redundancy use a stripe set with parity! if this is beyond you please let me know ill show you how to do it in 1 min flat linden labs, seriously contact me and ill send you my (ms) networking essentials book that i got in a 101 class, theres like 3 chapters on exactly how to use this, written in a way that a housewife with no computer experience at all could understand
Osgeld Barmy Registered User Join date: 22 Mar 2005 Posts: 3,336	08-12-2007 18:12 From: Rusty Satyr Really? Will it now? I've been in IT for 20 years... The only things I'm sure of are these: Complex systems will eventually fail in ways that even the best of people can not plan for. And there will usually be some armchair quarterback prattling on about coulda-woulda-shoulda. yea it will 99.999% of the time, unlike 20 years ago its pretty ez for the system to notice that it has to do alot of corrections for the data to be valid and forgive me for not believing in your 20 year experience, but our 30 year veteran just last Friday tried to wire a 100base T connection using straight tru serial wire(id be giving him credit by saying it was cat 1 but i wont because this shit was insulated with stone) , and proceeded to fuss and cuss at it for almost an hour before i dropped a cat 5 cable on his desk, so ... and i would rather trust an armchair that knows what their doing vs a novice charging me money any day
Tod69 Talamasca The Human Tripod ;) Join date: 20 Sep 2005 Posts: 4,107	08-12-2007 18:19 At least they didnt call in "Geek Squad" (that we know of!) _____________________ really pissy & mean right now and NOT happy with Life.
Malachi Petunia Gentle Miscreant Join date: 21 Sep 2003 Posts: 3,414	08-12-2007 18:19 From: someone I've been in IT for 20 years... The only things I'm sure of are these: Complex systems will eventually fail in ways that even the best of people can not plan for. And there will usually be some armchair quarterback prattling on about coulda-woulda-shoulda. So with all that experience do you find yourself making the same errors today as you did 10 years ago? I assume not. The failures - as LL reported - "woulda" been novel or interesting 15 years ago but are now so passe as to be silly. As noted above, they have a mission critical system that can't tell them when a disk is in distress? Shame on them; that's what we now call a "solved problem". Can't contact your remote site and your operations software doesn't tell you before you notice? Another solved problem - unless you are LL, it seems. _____________________
Draco18s Majestic Registered User Join date: 19 Sep 2005 Posts: 2,744	08-12-2007 22:51 Worse is Better Edit: oh right. BBCode is down. Worse is Better: http://www.jwz.org/doc/worse-is-better.html
Dnali Anabuki Still Crazy Join date: 17 Oct 2006 Posts: 1,633	08-13-2007 01:05 From: Draco18s Majestic Worse is Better Edit: oh right. BBCode is down. Worse is Better: http://www.jwz.org/doc/worse-is-better.html Wonderful article..thanks for posting it..I've been wondering about whether to cul de sac myself by learning Lisp...now to find out where Java fits...
Kascha Matova Bus Bench Supermodel Join date: 30 Mar 2007 Posts: 342	08-13-2007 02:35 From: Rusty Satyr Wow, wish I could do IT in your universe, where problems are always crystal clear and obvious, and nothing ever partially fails in an un-trappable way. VPN for connectivity between sites is clearly something they've adopted as necessary... if they're going to open-source the SIM side of SecondLife, they are not going to be able to set up dedicated secure lines between each SIM server and their back-end asset/auth services. Are failing drives 100% efficient and reliable in you universe? To the point where there would be no noticeable increase in read/write errors or other performance factors? Can you pick me up at 8?
Kitty Barnett Registered User Join date: 10 May 2006 Posts: 5,586	08-13-2007 03:06 From: Rusty Satyr VPN for connectivity between sites is clearly something they've adopted as necessary... if they're going to open-source the SIM side of SecondLife, they are not going to be able to set up dedicated secure lines between each SIM server and their back-end asset/auth services. VPN might simply have been a necessity that came up, rather than a conscious decision. If they originally developped the whole architecture with the assumption that everything would always be in the same colocation they might not have seen much of a need to implement secure communication. When it became clear that the SF colo would no longer meet their needs, they would have had the option to either hurry and implement it, or go with something that would do the job without extra coding which is to VPN the two colocations together into one virtual network. Just a guess for a scenario where VPN would make sense.
Rusty Satyr Meadow Mythfit Join date: 19 Feb 2004 Posts: 610	08-13-2007 12:04 From: Osgeld Barmy yea it will 99.999% of the time, unlike 20 years ago its pretty ez for the system to notice that it has to do alot of corrections for the data to be valid and forgive me for not believing in your 20 year experience, but our 30 year veteran just last Friday tried to wire a 100base T connection using straight tru serial wire(id be giving him credit by saying it was cat 1 but i wont because this shit was insulated with stone) , and proceeded to fuss and cuss at it for almost an hour before i dropped a cat 5 cable on his desk, so ... and i would rather trust an armchair that knows what their doing vs a novice charging me money any day People make mistakes. Vendors release drivers and updates that cause strange problems. Heavy loads results in a dynamic environment can cause new failure conditions that weren't tested for by vendors or implementation staff. Yes, MOST of the failures are expected and planned for. There are still times when you have to call the vendor and beat them up for a while to get them to acknowledge that there are problems with their product not behaving according to spec. I'd very happily retire today if it meant that hardware & software wouldn't need the likes of me anymore.
Jotheph Nemeth Registered User Join date: 9 Aug 2007 Posts: 142	08-13-2007 14:40 From: Rusty Satyr Wow, wish I could do IT in your universe, where problems are always crystal clear and obvious, and nothing ever partially fails in an un-trappable way. VPN for connectivity between sites is clearly something they've adopted as necessary... if they're going to open-source the SIM side of SecondLife, they are not going to be able to set up dedicated secure lines between each SIM server and their back-end asset/auth services. Why does the idea of them doing this strike me as a bad idea? It almost seems like they are trying to position themselves as only software, and in control of the money. If someone else sets up their own servers, what's to prevent them from making counterfeit lindens? Or even a whole new money? They connect, but almost immediately they will start to diverge from the linden version. Ok. This might not be so bad. In fact, it might mean real competition in terms of software and money. But it could also mean with real competition comes the end of the Lindens being in control. If they insist on overseeing any other servers that connect, there might be little reason to do so.
AWM Mars Scarey Dude :¬) Join date: 10 Apr 2004 Posts: 3,398	08-14-2007 05:19 Whats most worrying is, none of the VPN or Raid systems are new technology... I run a personal (for my business) raid setup with 2tb's of HD space over a 4 disk setup. I use 120,000 hr MBTF HD's and have a throughput of 500gb's of precious data per month through this system without a single clitch. I don't consider my supporting system to be of corporate status either. Unless LL have setup the raid as OBG (One Big Disk), rather than mirror sets or fast access over singular disk sets, I can't see how their whole system went down..... geeez... for all my hosting services across the world supporting our business, I have never come across such flaky service, especially from a company with such a high $ value throughput. And they charge how much for basic hosting services? _____________________ * Politeness is priceless when received, cost nothing to own or give, yet many cannot afford - Why do you only see typo's AFTER you have clicked submit? http://www.wba-advertising.com http://www.nex-core-mm.com http://www.eml-entertainments.com http://www.v-innovate.com
Rusty Satyr Meadow Mythfit Join date: 19 Feb 2004 Posts: 610	08-14-2007 09:34 From: AWM Mars and have a throughput of 500gb's of precious data per month SL serves up almost that much data every minute. (Obviously, not from the same raid.) I'd love to see the back-end architecture supporting SL and how it was laid out. I've been piecing together bits over the years as I hear specific mention of parts, but unlike sim servers (which probably fill more than 50 server racks) which are easy to estimate... asset servers, inventory servers and the like could be partitioned off in any quantity, with even more, depending on redundancy. -- "Shouldn't" happens.
AWM Mars Scarey Dude :¬) Join date: 10 Apr 2004 Posts: 3,398	08-14-2007 10:02 From: Rusty Satyr SL serves up almost that much data every minute. (Obviously, not from the same raid.) Opps typo time.. I meant per day on average.. however, I also said that my support systems are not considered corporate standard, which I would expect from LL. _____________________ * Politeness is priceless when received, cost nothing to own or give, yet many cannot afford - Why do you only see typo's AFTER you have clicked submit? http://www.wba-advertising.com http://www.nex-core-mm.com http://www.eml-entertainments.com http://www.v-innovate.com
Rusty Satyr Meadow Mythfit Join date: 19 Feb 2004 Posts: 610	08-14-2007 12:01 From: AWM Mars Opps typo time.. I meant per day on average.. however, I also said that my support systems are not considered corporate standard, which I would expect from LL. (nod) I expect the same. I just know that in my small shop of 300+ misc servers that I, and my peers, get patched through to developer level engineers at our primary hardware & software vendors to resolve "unexpected problems", even when our deployment follows that vendor's suggested best practices. I can only imagine how much more grief LL has with 10x as many servers.
Bobbyb30 Zohari SL Mentor Coach Join date: 11 Nov 2006 Posts: 466	12-06-2007 12:37 From: Draco18s Majestic Worse is Better Edit: oh right. BBCode is down. Worse is Better: http://www.jwz.org/doc/worse-is-better.html BBC will never get fixed... _____________________

Welcome to the Second Life Forums Archive

It Is Official---nothing will ever be fixed