Welcome to the Second Life Forums Archive

These forums are CLOSED. Please visit the new forums HERE

Another word on databases...

Trimming Hedges
Registered User
Join date: 20 Dec 2003
Posts: 34
12-30-2004 11:58
I just thought I'd start a separate thread for those who don't want to wade through the other one. To wit: losing ONE DRIVE and taking down all of Second Life for NINE HOURS because of it.... is completely incompetent database administration. And then we get this little gem:

From: someone
Originally by Cory Linden....
Down until 3pm Thursday
Say it with me "we love MySQL, we love MySQL" . . . we're taking some extra time down to make sure that everything is well tested before we come back up.


MySQL isn't bad for small installations, and the fact that it's free makes it an easy sell. It probably CAN be made robust, but it would take a great deal of administrative skill to do so. Losing all of SL from one hard drive is not, however, exactly proof of having that kind of administrative skill.

Folks, think about this: they're taking all of YOUR VALUABLE DATA and are managing it so incompetently that a SINGLE DRIVE failure has put everyone in SL out of commission for at least NINE HOURS.

For the kind of transaction rates and amount of data they're dealing with, they should be running on big Sun servers or possibly a mainframe, with serious, high-powered admins running it. Instead, they're limping along with amateurs on Linux and MySQL... using shoelaces, duct tape, and bailing wire, instead of professional-level tools.

And instead of spending the money to fix this stuff first, they're blowing a bunch of attention and effort on this Teen Life garbage. Opening another grid when you can't get the FIRST one working right strikes this observer as pretty damn stupid.

Again... guys, get this right, or someone will come along that DOES get it, and they will eat you alive.
Strife Onizuka
Moonchild
Join date: 3 Mar 2004
Posts: 5,887
12-30-2004 12:13
boooo,
no double posting.

Anyway they took down SL to rebuild the damaged drive (my guess). Supprised they can't rebuild the drive in the array while the machine is running (hot swap, then have raid build the new drive).
_____________________
Truth is a river that is always splitting up into arms that reunite. Islanded between the arms, the inhabitants argue for a lifetime as to which is the main river.
- Cyril Connolly

Without the political will to find common ground, the continual friction of tactic and counter tactic, only creates suspicion and hatred and vengeance, and perpetuates the cycle of violence.
- James Nachtwey
splat1 Edison
Registerd Nut
Join date: 6 Sep 2004
Posts: 353
12-30-2004 13:24
/me puts his rant hat on

If you would take the time to talk to LL like some of us you may relise that they are looking into a new database for the short term and they allready have longterm plans for a rock solid system comming soon. give them a bit of slack, they have grown much faster then they expected and are doing there best.

No the raid is not hot swap if my memory serves me right, the grid is backup and they are using this time as a window to do a full set of tests.

on the linux and mysql side of things, Linux is fine, it is run on a hell of a lot of mainframes these days not to mention sun kit now. As for mysql, its ok for small to medium size work, but we will have to stick with it untill LL can install their new system.

/me takes off the rant hat
_____________________
Splat Soft - We exsist in the RL to!
Gigas Bunny (Mule)
####
You see, our experts describe you as an appallingly dull fellow, unimaginative, timid, lacking in initiative, spineless, easily dominated, no sense of humour, tedious company and irrepressibly drab and awful. And whereas in most professions these would be considerable drawbacks, in chartered accountancy they are a positive boon.
blaze Spinnaker
1/2 Serious
Join date: 12 Aug 2004
Posts: 5,898
12-30-2004 13:57
Raid arrays are good for data safety, bad for performance.

The dirty little secret is that you have to rebuild a raid array which hurts performance so much that you might as well be down.
_____________________
Taken from The last paragraph on pg. 16 of Cory Ondrejka's paper "Changing Realities: User Creation, Communication, and Innovation in Digital Worlds :

"User-created content takes the idea of leveraging player opinions a step further by allowing them to effectively prototype new ideas and features. Developers can then measure which new concepts most improve the products and incorporate them into the game in future patches."
blaze Spinnaker
1/2 Serious
Join date: 12 Aug 2004
Posts: 5,898
12-30-2004 13:59
Also, Google doesn't use mainframes .. so why should Linden Labs?
_____________________
Taken from The last paragraph on pg. 16 of Cory Ondrejka's paper "Changing Realities: User Creation, Communication, and Innovation in Digital Worlds :

"User-created content takes the idea of leveraging player opinions a step further by allowing them to effectively prototype new ideas and features. Developers can then measure which new concepts most improve the products and incorporate them into the game in future patches."
Trimming Hedges
Registered User
Join date: 20 Dec 2003
Posts: 34
12-30-2004 14:06
Actually, I have no beef with Linux, it's a fine operating system. I use it constantly and have run good-size Linux-only networks. It's not perfect, and under very heavy load it's still a bit crashy on some hardware, but on good, server-class machines, it's very robust. I've been using it personally since '94 and professionally since '98.

Mysql, on the other hand, just isn't too great, and THAT is what annoys me... that they're still running on mysql after a YEAR of constant problems with the databases. Over and over and over and over, if there's a bug in SL, it's database related. Can't teleport. Can't detach objects. Can't attach objects. Can't upload things. Can't transfer things. Random bits of one's inventory just going away forever, compltely unsalvageable.

This stuff has been broken since I GOT HERE a YEAR AGO. And even now, they're still on mysql? And a SINGLE DRIVE FAILURE takes them offline for THIS amount of time?

Even if you grant that they couldn't easily fix the architectural problems, or afford a big DB2 installation, they don't even have hot standby drives? On a supposedly 24x7x365 system? Architectural problems or no, that's incompetent administration. Drives FAIL. As an admin, it's one's job to make sure the system stays up when they do.

And note that I'm not just spouting out my ass here, I have many years' experience at system and network administration. I'm not a database specialist, but I am quite qualified to criticize. I know what that system ought to look like, and I'm just appalled by what's actually there.

And I did try to talk to them, long conversations with a couple of Lindens I know. And I have asked two questions recently in Town Halls and got non-answers to both. Check the transcripts if you care to.

blaze: if your system can't rebuild under load, then it's too slow.

And google is entitled to architect their system any way they want, since they never, ever go down. LL would get the same kind of pass if they were equally robust.
Strife Onizuka
Moonchild
Join date: 3 Mar 2004
Posts: 5,887
12-30-2004 14:08
The only people i know using mainframes are using them for weather modeling.

Apple's xServe have hotswappable drives. But speed would be an issue.

lets move this topic over to this thread:
/111/30/31250/1.html
_____________________
Truth is a river that is always splitting up into arms that reunite. Islanded between the arms, the inhabitants argue for a lifetime as to which is the main river.
- Cyril Connolly

Without the political will to find common ground, the continual friction of tactic and counter tactic, only creates suspicion and hatred and vengeance, and perpetuates the cycle of violence.
- James Nachtwey
Trimming Hedges
Registered User
Join date: 20 Dec 2003
Posts: 34
12-30-2004 15:04
I don't think moving it there is a good idea, it's not closely related and isn't workable anyway.
Huns Valen
Don't PM me here.
Join date: 3 May 2003
Posts: 2,749
12-30-2004 15:44
I used to work for a shop that generated millions of rows every month (quite a few of which were mine), and it was all sent to a MySQL server. It worked very well and had a high level of reliability. The only big drawback to MySQL, as I see it, is that it doesn't have a procedural language. That system, and every other mission-critical system, was maintained on RAIDs, either NetApps or custom setups built by the sysadmins.

If you would like to post a detailed analysis of why MySQL is a suck database, feel free.

LL has been on MySQL from the start, and loss of content has been an extremely rare event. What they HAVE lost has been due to operator error, not MySQL deciding to vanish some rows here and there. I think it's a pretty good track record, as far as data integrity is concerned. Also, we know that LL is working to improve the database and some of the data structures, in order to fix what is currently wrong asset server-wise.

About the downtime, I agree it is quite a lot, but I am not privy to what is going on now, and none of us has seen a postmortem AFAIK.
Kurt Zidane
Just Human
Join date: 1 Apr 2004
Posts: 636
12-30-2004 15:54
Another point what if SL get really really big, and what ever new data base they go to isn't open sources. If SL has a problem with taht data base, suddenly SL going to be criticized for not using an open sources db. That the benefit of open sources. Some thing doesn't work right, or need to be added you can.
Middy Mysterio
Registered User
Join date: 3 Feb 2004
Posts: 53
mysql gimme a break
12-30-2004 15:54
Linden is killing themselves with Mysql. It will never have the performance,reliability, scalability, security, etc of a database like Oracle. Oracle just came out with GRID technology. This is what Linden should be going to.
Trimming Hedges
Registered User
Join date: 20 Dec 2003
Posts: 34
12-30-2004 17:08
Loss of data is NOT an extremely rare event. If you ask around, it's not that uncommon. I lost my HAT to that godd**n database system, and I was told that it is unrecoverable. When I complained about it, several people told me about things that they had lost in various glitches. One scripter had just lost a bunch of code she'd been working on... just poof, gone. Had to be rewritten.

My sin? I changed outfits. Everything I had been wearing before was deleted. Forever, it seems; their backup system is apparently no better than the databases themselves.

I haven't used MySQL extensively for a couple of years, but having run some pretty damn intense read/write/delete stuff on it, it had a nasty tendency to just... lose data. Things just went away, never to come back. Queries didn't always return all the rows they should. The programmers complained about it all the time. It was very fast, but the data integrity was not that great.

For us, Oracle wasn't a whole lot better, but for different reasons. We were used to the MySQL fast setup and teardown cycles, and we had to re-engineer a lot of code to work around the very slow setup/connection process in Oracle. But, Oracle always returned all the rows. Data didn't just mysteriously go missing. Those complaints stopped cold. Instead, they complained about speed. Lack of speed, very unlike a lack of reliability, is something you can fix with faster hardware. A fast machine running bad software will just lose data faster.

Personally, I think LL should be inviting vendors to set up test databases, and proceed to beat the hell out of them to find out which one talks best with their code. For a sale of this magnitude, likely several hundred thousand dollars at least, vendors will, in general, jump through hoops. And it's not like the Bay Area is short on the technical talent to get it implemented.

Wheher or not you happen to like MySQL, Huns, is pretty much irrelevant, since it's obviously not working. Whether that is failure of administration or code, who knows. It doesn't matter.

What NEEDS to get fixed is the fundamental attitude of the company towards its data. Once they stop trying to run on strings and bailing wire, I imagine things should improve fairly quickly. In that regard, MySQL is as much a symptom as anything... of trying to get by on the cheap, in an area that is absolutely critical to their business, something that must be a profound core competency.

Middy: Over time, the open source databases will get better and better, and move further and further up the food chain. The for-pay databases are likely to remain better, but the percentage of people who need systems that large will steadily drop. Eventually, open source systems like MySQL will probably manage a majority of data in the world.

But LL is definitely outstripping what's available today in free software. It sure looks to me like they need something bigger than anything that the open source stuff can do.... at least reliably.

Edit: fixed a sentence.
Middy Mysterio
Registered User
Join date: 3 Feb 2004
Posts: 53
12-30-2004 17:22
Hi Trim...didnt know you were a database guy! I was an Oracle DBA up unbl a couple years ago. I agree open source everything is the wave of the future, and Linux is not too shabby already. We have one or 2 MySQL installations at my shop, but mostly Oracle and MS. But NEVER EVER would I have tried to do SL on MySQL!! I can't understand it. It is self-limiting, and represents a huge migration project now. I swear, it could be the death of SL.
CrazyMonkey Feaver
Monkey Guy
Join date: 1 Jul 2003
Posts: 201
12-30-2004 17:25
Id always assumed they had there own system, it supprises me they use a database at all. how could a 3rd part db know how to most efficently store the info? Then again ive never used a db so I guess theres a way to define structures or something.. Still something custom and integrated would seem faster. I guess writing a cache system, ect would be alot of work.

But I think the lindens do pretty well overall.

PS, too bad cheap(and super fast) 1TB and bigger solid state hard drives dont exist, lol
Trimming Hedges
Registered User
Join date: 20 Dec 2003
Posts: 34
12-30-2004 17:38
Well, I'm not truly a database guy... I'm more of a systems and network guy. I can get the foundations really solid, but I know next to nothing about tuning the actual database itself. That's what the DBAs do. :-)

MySQL made some sense when they were still in pilot project phase... a year ago, it looked distinctly possible that SL could just die from lack of usage. There just weren't that many people. But as SOON as they realized that they were really and truly headed toward profitability, they should have been working on those databases first thing. Instead, they've let it go to the point of catastrophic failure. I can't even log in right now, in primetime.

And you're right, that migration project is not going to be a lot of fun... for them OR us. Learning the ins and outs of the new system will be painful for everyone.

Crazy: there has been an enormous amount of thought and engineering time put into the existing database systems. Database management has been under development, in various forms, since the dawn of computing.

Trying to singlehandedly roll your own system when you can, instead, buy tens of thousands of man-YEARS of accumulated wisdom in a box would be very foolish. The big database companies know A LOT about how to manage and store data. The chances of doing something even remotely as good as the existing codebases is nearly nil.

Fast, reliable data storage, when you get to the scale at which SL runs, is a very hard problem. It's not something you want to try to invent yourself.
Huns Valen
Don't PM me here.
Join date: 3 May 2003
Posts: 2,749
12-30-2004 17:48
How do you know that the content loss in question is due to some failure on the part of MySQL? Your hat, for instance, or the airplanes I lose going over borders sometimes. Is that because of some bug in the DB backend itself, or because of a problem elsewhere?

I don't know about the problems your company had with MySQL not returning all the rows it should, as we never had any complaints about that. I'd be interested in seeing something reproducible.
Middy Mysterio
Registered User
Join date: 3 Feb 2004
Posts: 53
losing data
12-30-2004 22:41
Losing data can be an application error too, if for example something wasnt saved to the database correctly. Or it could be the way they administer My SQL. Maybe they don't have something turned on. Anyway you look at it, losing data is not acceptable. No way no way, no how. My beef with MySQL is the lack of scalability and performance and reliability. THis database needs to be on a strong Unix server with Oracle grid technology and correct RAIDing. Actually SAN technology would make sense toegether with Oracle 10g and maybe a Sun Server. Frankly I wouldnt use Linux either, this is your mission critical application we are talking about. Linux is not the best. Get the best.
Middy Mysterio
Registered User
Join date: 3 Feb 2004
Posts: 53
12-30-2004 23:05
PS actually it might be that they CAN get your hat back, or whatever it is you lost. It just that they can't be restoring people's inventory all the time, it would take too much manpower to commit to that.
Huns Valen
Don't PM me here.
Join date: 3 May 2003
Posts: 2,749
12-31-2004 01:23
What problems has MySQL got with scaling, performance, and reliability? And why is Linux not "the best?" Trimda at least provided an example of MySQL not returning rows it should have, although I've never had that problem myself, nor heard of anyone else having it.
Azelda Garcia
Azelda Garcia
Join date: 3 Nov 2003
Posts: 819
12-31-2004 02:55
Well, MySQL used to have issues with tables larger than 4 gig, but 4.0 or 4.1 fixes that.

For this particular outage, the cause was presumably lack of redundant RAID, which is independent of whether one is using MySQL or not. Oracle would die too if you took out its disk storage.

Note that "redundant RAID" is not tortology; there are non-redundant RAIDs available, for speed.

The consensus for MySQL vs Postgres seems to be that MySQL is good for small, fast applications, and probably for prototyping. Postgres is considered generally more robust and complete.

Note that OSMP does use MySQL currently too. The reason being that it is easy to use, and that the databases are distributed across each sim, in OSMP, rather than centralized on a single massive asset server; so each database is relatively small. Players' inventories are stored on their own hard drives, for a few reasons, so again the server-side databases are not really huge.

Azelda
Huns Valen
Don't PM me here.
Join date: 3 May 2003
Posts: 2,749
12-31-2004 06:22
It could be a real pain for them to migrate away from MySQL, especially if they are using groovy stuff like AUTO_INCREMENT and LIMIT. I missed those things when I was working with Postgres and Oracle.

As far as full redundancy is concerned, I see two options. They could hot-mirror their RAID with another identical one and fail over to the mirror if the primary array goes down. Minimal downtime that way, but then when the primary array is fixed, it will have to be synced with the mirror. Probably not too bad if they do it in the middle of the night.

Option two is to do what they are doing now, and just have one array. It is actually possible to have the system in production during a rebuild, but it would be much slower than usual. Under the demands of rebuilding, with thousands of people banging on the doors all at once, it could wind up thrashing all day and not getting very far. On top of that, if they happen to lose another disk before the rebuild is done, the entire array will be lost. Not very likely, but not impossible either - so it is not a bad idea to get the rebuild done as quickly as possible, and with as low an amount of external interference as can be achieved. I bet LL decided to just take the grid down and let the RAID rebuild itself in peace.

I think the first option is preferable.
Issarlk Chatnoir
Cross L. apologist.
Join date: 3 Oct 2004
Posts: 424
12-31-2004 07:36
From: Huns Valen
What problems has MySQL got with scaling, performance, and reliability? And why is Linux not "the best?" Trimda at least provided an example of MySQL not returning rows it should have, although I've never had that problem myself, nor heard of anyone else having it.


A few year back (like, 2) MySQL had poor records of performance under heavy load. I don't know if it is better now but MySQL seems less mature than any other DB. I would rather use postgresql, it's slower but not handle load better. Or something like Oracle, Sysbase, anything but MySQL.
_____________________
Vincit omnia Chaos
From: Flugelhorn McHenry
Anyway, ignore me, just listen to the cow
Malachi Petunia
Gentle Miscreant
Join date: 21 Sep 2003
Posts: 3,414
12-31-2004 09:00
One word and a number: 99.998% availabilty

Haven't used one in years, but their numbers were true in my experience. 300 development engineers hanging off one box, working, building 1M+ line code base. Drive failure was noticable only by warning message sent to the administrator and replacement was live. They are also a just short drive from LL.

I'm not nearly as up on databases as others, but MySQL has not had the time to be shaken out like older technologies. That matters when the numbers of transactions are huge.

The lossage is real. About a year ago I had a friend leave SL because things like his AV were just disappearing along with other major components of his inventory. It got to the point where he was sending in reports like "yesterday at 2349 PST I had UUID1..UUIDn in my inventory, today at 0700, they are not there". This wasn't user error and it wasn't infrequent. There also didn't seem to be anything that was done about it.

Disk arrays are one of the things that you can just throw money at to fix - permanently. Regardless of the database, having a stable store is just too easy.
Huns Valen
Don't PM me here.
Join date: 3 May 2003
Posts: 2,749
12-31-2004 17:05
NetApps are indeed awesome. Very expensive, but awesome. Now, about these problems with MySQL, can any of you show me some links? I have been working with it for about five years, both as a hobbyist and as a professional (wrangling millions of rows in a single table I might add.) It has always been very quick for me, even under heavy load, and I have not had any data loss. If it really is a terrible unreliable RDBMS, and that terribleness and unreliability wasn't caused by sysadmin or DBA error, I want to know about it for my own sake.
Alicia Eldritch
the greatest newbie ever.
Join date: 13 Nov 2004
Posts: 267
three is the magic number
12-31-2004 17:12
From: Huns Valen
As far as full redundancy is concerned, I see two options. They could hot-mirror their RAID with another identical one and fail over to the mirror if the primary array goes down. Minimal downtime that way, but then when the primary array is fixed, it will have to be synced with the mirror. Probably not too bad if they do it in the middle of the night.

Option two is to do what they are doing now, and just have one array. It is actually possible to have the system in production during a rebuild, but it would be much slower than usual. Under the demands of rebuilding, with thousands of people banging on the doors all at once, it could wind up thrashing all day and not getting very far. On top of that, if they happen to lose another disk before the rebuild is done, the entire array will be lost. Not very likely, but not impossible either - so it is not a bad idea to get the rebuild done as quickly as possible, and with as low an amount of external interference as can be achieved. I bet LL decided to just take the grid down and let the RAID rebuild itself in peace.

I think the first option is preferable.


Yes, or even have 3! Use the third one to mirror the failed one off of, while the second runs everything. Then again, there's still the issue of trying to figure out WHY it borked. You could use the third as a reference, to look for bugs and such.

Triple mirrors are really always the way to go.
_____________________

<xNichG> anyone have a good way to visualize 3d vector fields and surfaces?
<Nap> LSD?


"Yeah, there's nothing like literal thirst to put metaphorical thirst into perspective"
- Get Your War On

"The political leader loves what you could become. It is only you he hates."
- Allan Thornton