How much can a database handle?
|
Ishtara Rothschild
Do not expose to sunlight
Join date: 21 Apr 2006
Posts: 569
|
11-12-2006 15:58
I'm wondering quite a bit about databases lately. Perhaps because I have a little experience with those, an experience that could be cut down to "the more you store, the more you lose performance-wise". No database system is endlessly scaleable. At some point things go awry and you have to ask yourself "How do I get back the original performance with this incredibly inflated system?". The answer seems to be easy: Only a lean database is performant, so your server has to lose some pounds.
The search for the right diet could lead to the thought: "Do we really need all the old data from 2001 on? Can't we make a clean cut and swap out everything except for the last two years or so?". You may think about additional database tables or even a second database. Or you might feel really venturous and dare to ask "Can't we just delete some of that old stuff, after a backup of course?". In that case you'll want to flag a part of your database as unneeded, if the data in question has not been touched for years. Perhaps an automated process, to avoid dealing with the same problem again in 2 or 3 years. It's quite a risk, because old Murphy turns over in his grave now and then, causing someone to feel an unquenchable desire for exactly the data that you deleted or swapped out recently.
Some users might experience odd effects whenever Murphy feels an unrest. A few statistics could look a little odd, or a customer happens to ask for an article that didn't sell for years and was flagged for deletion. In short, you traded performance for reliability. But you had no choice, you can't allow the system to grow exuberantly, or else every query will take ages some day. Of course, the person who sold you the system told you "It's scaleable". Scaleable is such a nice word. You can scale it up, it gets slow, you have to scale it down again. That's called scaleable, basically.
The above case is true for every medium sized or larger company with 20-200 database users. Hundreds of orders each day, thousands of new data records, at some point you just have to clean it out. But now imagine a system with tens or hundreds of thousands of users, some even speak of more than a million. Designers who upload two dozen textures for a single piece of clothing, until they're finally satisfied with the look. Thousands of newbies trying out the building possibilities, creating a new data record for each new plywood box that pops into existence. Admin's nightmare, since you can't delete any of that data, even when the user himself deletes it; those two dozen textures could have been used for something, even when our designer deletes the failed attempts (try it sometimes; upload a texture, copy the asset UUID, delete the texture and empty your trash; you can still use the asset ID with llSetTexture). Each temporary plywood box created to sit on it for a minute, each test script and each notecard adds another permanent pound to the hips of a horribly obese database. And the number of clients, i.e. sim servers, keeps growing too.
If you now try abovementioned diet methods with such a complex system, the buried Murphy will not only turn over but start to rotate at 70 rpm. Missing image, missing inventory, failed to find item in database... let's call that case 2. But what's the alternative case 1? An angry red packet-pickpocketing bar, permanently glowing in the upper right corner? The Alfalfa look (people who arrive with a single tortured torus on their hair base after a teleport) coming into vogue, or even world-wide alopecia? There's already an attachment pending for this spot, you're bald pal, live with it? Trying to zone between two sims feels like running into bulletproof glass? Performance or reliability, that's the question here.
It's a loss-loss situation. If case 1 happens (the angry villagers are frantically trying to rez pitchforks and torches, fed up with being bald and frozen to the spot), all you can do is delete or swap out some stuff. Which leads to case 2 (upset hordes of villagers searching their inventory for missing torches because they recently lost their expensive pitchforks). To deal with case 2, you have to restore everything from backups or your archive databases / tables; welcome back to case 1, let's call Houston. Every MMOG publisher would shrug at this point and simply pull an additional, independent grid out of his hat, with an own database. But SL was born for something greater, it's meant to become the WWG, the world-wide grid of the future.
What do you think? Is it possible to develop something like the ultimate database "Deep Search", able to handle the massive data needed by a true million users (should we ever get them) or even a massive world-wide grid someday, something completely unimagineable with currently available technology? And what can be done until then? Should LL (just a made-up example) identify the content creators as the root of all evil, decide that we have to be content with the current content and restrain them a little? Let's say, by moving islands out of their reach and charge for inventory overuse and higher upload fees? Could inhibiting further growth and load really be a solution? Am I simply paranoid and SL is only exhibiting normal growing pains, or is the vision behind SL a little too big for the far-from-unforeseeable technological limitations?
|
Gummi Richthofen
Fetish's Frasier Crane!
Join date: 3 Oct 2006
Posts: 605
|
11-12-2006 16:17
From: Ishtara Rothschild I'm wondering quite a bit about databases lately a) go and find a book called "programming pearls". It's from the early days of computing and it talks in comprehensible terms about inherently-fixable, inherently-difficult, and not-gonna happen. b) if the Sanger Institute can get by with a copy of MySQL, then the question always facing everyone else is, what's your problem? There are good solutions and there are quick solutions. c) generally, you don't want indexes when inputting, and you do want them when querying. Live index rebuilds are to be avoided at all costs...
|
Ishtara Rothschild
Do not expose to sunlight
Join date: 21 Apr 2006
Posts: 569
|
11-12-2006 16:18
Perhaps you should read the post, Gummi  looks like LL tries to get by with MySQL too.
|
Llauren Mandelbrot
Twenty-Four Weeks Old.
Join date: 26 Apr 2006
Posts: 665
|
Thoughts on Database Scalability and Speed vs Speed Trade-Offs...
11-12-2006 16:49
- You build a database.
- It fills up, to the point that everything is slower than molasses flowing uphill in January. Many items in the database are inaccessable, but you cannot tell which ones are, and which ones are not.
- You create a second database, and replace the original database with the copy.
- Items get added to the new database.
- Items get retrieved from the new database.
- The new database is much faster.
- ...but sometimes people want information from the new database that`s in the old database...
- Ok, when that happens, the new database queries the old database, copies the data from the old database to the new database, deletes it from the old database, and returns it as if it had always been in the database.
- Yes, this is slower, but only the first time a given old record is asked for. Thereafter, it is as fast as the new database.
- After a while, the old database has gaps where information that has been moved into the new database used to be. Every so often, these gaps can be compacted, making the old database smaller -- and faster, too.
- Eventually, the new database gets clogged, like the old one was. When that happens, we do it all again, creating another new database....
Would this not keep a fat database behaving lean most of the time? ...and isn`t this what caches do?
_____________________
- ninjafoo Ng Says:
November 4th, 2006 at 7:27 am We all love secondlife so much and were afraid that the magic will end, nothing this good can ever last…. can it?
|
Seola Sassoon
NCD owner
Join date: 13 Dec 2005
Posts: 1,036
|
11-12-2006 17:08
In LA a few months ago, the question of purging was brought up. LL basically said they don't purge anything, even though a resident inactive for a certain period can lose their items, or even access to their account. What I don't understand is why, when this happens, a daily purge was never done on a backup db, to be implemented 1 time a day, downtime of 10 minutes. (It would also refresh the servers nicely.)
I just don't understand why even back up back ups to have certain accounts have items stored that will never be accessed.
Hell, I'd even go as far as purging accounts that had zero transactions on them ever and never logged in, that has 60 days of no activity. There's thousands of those if we go by their numbers.
|
Kitty Barnett
Registered User
Join date: 10 May 2006
Posts: 5,586
|
11-12-2006 19:49
Sorry if you caught this when it was originally posted (September 13th) but it goes into a little more detail about the asset/database system: http://blog.secondlife.com/2006/09/13/re-assets-not-being-found-including-missing-image-issues/I honestly wouldn't ever push LL to delete anything, cause any "missing from database" error anyone experiences is the result of the process mistakingly identifying current items as orphaned. If they were deleted instead of merely moved out of the active system we would all be worse off  .
|
Ishtara Rothschild
Do not expose to sunlight
Join date: 21 Apr 2006
Posts: 569
|
11-12-2006 22:13
From: Llauren Mandelbrot Would this not keep a fat database behaving lean most of the time?
...and isn`t this what caches do? In theory, this works fine. In practice only if the overall amount of really needed data doesn't grow too much. What if you reach the "small as molasse" point, look at your database and realize: this is the information volume needed by some 100,000 users, and we aim for the world? If SL is meant to become an internet alternative in 3D, shouldn't the "slow as molasse" point be reached at a far later time, with 10 million users perhaps? If it happens with 100k users already, I'd really start to worry about the scaleability of my system. From: Seola Sassoon Hell, I'd even go as far as purging accounts that had zero transactions on them ever and never logged in, that has 60 days of no activity. There's thousands of those if we go by their numbers. I can see the need to do that, and this need shows that a single central storage solution for an unlimited number of users seems to be pretty much impossible. If I think of our current 2D internet, I could spend months in a hospital for example, come back and find out that the good old 2D websites still know me, since my locally stored cookies are still valid. Even more important, every bit of digital data that I purchased long ago will still be there. Downloaded software, music etc. If I purchase something, I want it to stay for years to come. I could imagine a world wide 3D web, but only with a working DRM solution at the client end. It could perhaps be done, years after Windows Vista is released, when every hardware manufacturer complies with DRM standards. Every purchase could be a peer-to-peer transfer from the merchant's harddisks to the end user, only temporary uploaded as long as the user stays logged in. Thanks, I really missed that link. I should start to read the blog more often I guess. It confirms what I thought when I read the word garbage collection here on the forums, and that comes down to: the world may be scaleable, but the asset server is not. LL has created a really nice commercial MMOG which I like a lot in its current form, aside from the problems created by the quest for the magic million. I wonder why it has to be sold as The Vision™ at any cost? There's nothing really new and special about the underlying technology, it has all been done before. Persistent online worlds, scaleable to a certain degree for the 250k users that can be expected in the long run. If it squawks like a duck and behaves duckish in almost every regard, why try to sell it as an elephant? That only causes problems. There are enough people who would be content with a duck, enough to create revenue.
|
Jopsy Pendragon
Perpetual Outsider
Join date: 15 Jan 2004
Posts: 1,906
|
11-12-2006 22:39
Keep in mind that assets are stored and found by asset uuid key, and once found are cached on the sims that requested them for a while. If there are assets from 2001 that are still floating around and in use.. rarely, it would actually be rather easy to split the asset server up into several databases... when a sim says "Hey! Gimme asset 555-5555-555" the first asset server could fairly quickly check and say "ain't got it.. try older_asset_server.secondlife.com" .. and so on. It might take a few moments longer to find the asset, but only the first time a sim needs it.
_____________________
* The Particle Laboratory * - One of SecondLife's Oldest Learning Resources. Free particle, control and targetting scripts. Numerous in-depth visual demonstrations, and multiple sandbox areas. - Stop by and try out Jopsy's new "Porgan 1800" an advanced steampunk styled 'particle organ' and the new particle texture store!
|
John Horner
Registered User
Join date: 27 Jun 2006
Posts: 626
|
11-13-2006 04:15
Creating or using a live database in 2D to illustrate the "now" is fairly easy; I am looking at one at the moment (stock market live feed). It is even easier to do a static one. But to create one that (as well as illustrating the "now"  also illustrates the past must increase the amount of data needed by at least the square or cube (power of 2, power of 3) for each previous ref point. In addition sl is also a "live" feed, that is it updates dynamically from the viewpoint of each and every user. There must surely be an infinite number of data subsets, as from a data viewpoint there are no predefined node points I do not believe it is easily possible to create such a database outside of a 3d virtual reality environment as you would have no way of knowing which previous data subset any individual would wish to review Just my limited personal knowledge from someone who has a very basic idea of what a primary and secondary key is in MS Access
|
Lewis Nerd
Nerd by name and nature!
Join date: 9 Oct 2005
Posts: 3,431
|
11-13-2006 06:22
There's one big thing in all this talk about SL being 'the next generation of the internet'.
The internet *works* because there are millions, billions whatever, of individual servers all over the world, sharing the data load, routing traffic, and when one thing goes down, everything else generally carries on. Huge net-critical and big business places will have mirror servers that take over - sometimes even automatically - within seconds should there be a system failure somewhere.
However, with SL, it's all routed to one data centre. When theres a problem with any part, everything suffers because it's all going to the same place.
Linden Lab need to start decentralising, and having sims hosted in different 'colo' facilities around the US, and even then we still have the problem of the centralised asset server and login server.
The easiest, and best, solution of course is to simply stop the deluge of freeloaders, so that the proper investment can be put into expanding at a suitable speed for resonable growth, paid for by the people who are making the cost exist.
But sadly that's too simple and obvious, from what it appears, for the powers that be to grasp, even though many thousands of us 'mere peon' residents have been telling them that for months.
Lewis
|
nimrod Yaffle
Cavemen are people too...
Join date: 15 Nov 2004
Posts: 3,146
|
11-13-2006 06:24
From: Lewis Nerd There's one big thing in all this talk about SL being 'the next generation of the internet'.
The internet *works* because there are millions, billions whatever, of individual servers all over the world, sharing the data load, routing traffic, and when one thing goes down, everything else generally carries on. Huge net-critical and big business places will have mirror servers that take over - sometimes even automatically - within seconds should there be a system failure somewhere.
However, with SL, it's all routed to one data centre. When theres a problem with any part, everything suffers because it's all going to the same place.
Linden Lab need to start decentralising, and having sims hosted in different 'colo' facilities around the US, and even then we still have the problem of the centralised asset server and login server.
The easiest, and best, solution of course is to simply stop the deluge of freeloaders, so that the proper investment can be put into expanding at a suitable speed for resonable growth, paid for by the people who are making the cost exist.
But sadly that's too simple and obvious, from what it appears, for the powers that be to grasp, even though many thousands of us 'mere peon' residents have been telling them that for months.
Lewis Wait until you hear more about a certian program from libSL. The economy will go whack with that and tier changes. Should be interesting!
_____________________
"People can cry much easier than they can change." -James Baldwin
|
Jopsy Pendragon
Perpetual Outsider
Join date: 15 Jan 2004
Posts: 1,906
|
11-13-2006 09:50
From: Lewis Nerd However, with SL, it's all routed to one data centre. When theres a problem with any part, everything suffers because it's all going to the same place. Linden Lab need to start decentralising, and having sims hosted in different 'colo' facilities around the US, and even then we still have the problem of the centralised asset server and login server.
Decentralization is no simple thing. If not done perfectly the consequences will be redundant asset servers will result in duplicated UUIDs (very bad), far more "Not Founds" if assets get out of sync, and possible even worse performance as the geographically seperate asset servers try to stay up to date and do proper garbage collection. Yes, Secondlife is more of a "monolithic-verse" not a metaverse as touted, but it seems to be architected so that some of the 'core' parts will be decentralize-able eventually. (but merely putting sim servers in other colo's isn't going to be that big a win if anything, more distance/delay between infrastructure servers like presense, asset, inventory may worsen performance.) From: Lewis Nerd The easiest, and best, solution of course is to simply stop the deluge of freeloaders, so that the proper investment can be put into expanding at a suitable speed for resonable growth, paid for by the people who are making the cost exist. But sadly that's too simple and obvious, from what it appears, for the powers that be to grasp, even though many thousands of us 'mere peon' residents have been telling them that for months. Maybe if you repeat yourself a few dozen more more times it will come true. Keep trying, I'm sure you're on the verge of getting through to LL with your compelling long-term strategic vision.
|