Welcome to the Second Life Forums Archive

These forums are CLOSED. Please visit the new forums HERE

OpenClient: Cache operation discussion

Scalar Tardis
SL Scientist/Engineer
Join date: 5 Nov 2005
Posts: 249
01-09-2007 14:11
The client cache is one of my big interests, and there's a ton of questions that arise from looking at the cache folder and the client source.

Unfortunately at the moment, there is nothing in the wiki describing how the cache works so we're going to have to pick it apart bit by bit examining the code.


It looks like there are two different cache systems being used. One of them is the "Virtual File System" and is contained in the data.db2.x.3837 and index.db2.x.3837 files.

The other half is the various files scattered about that folder, with names such as objects_992_1016.slc. Anyone know the reason for the two different cache types?


Also, why is the virtual file system used at all? Why put another layer of abstraction on top of the base OS file system?

At this point I am assuming this was done so that SL can be mostly OS-independent. By using its own private storage mechanism within what seems to be a private virtual hard drive, the client is freed from the restrictions of one OS supporting a feature where another does not.

Though, there are some possible limitations, the big one being file size limits of the local file system. By now Windows 98 SHOULD be thoroughly dead, so I don't think we need to worry about the 2 gig file size limit anymore. But that could be one of the defining reasons for why the cache has never been permitted to reach/exceed 2 gigs.


I see these procedures in the file slviewer-src\linden\lindra\llmessage\llhttpassetstorage.cpp ... how do these relate to the object cache? Are these part of the object cache or are they part of the integrated web browser?

void LLHTTPAssetStorage::removeTempAssetData(const LLUUID& asset_id)
void LLHTTPAssetStorage::removeTempAssetDataByAgentID(const LLUUID& agent_id)
void LLHTTPAssetStorage::clearTempAssetData()

.
Feynt Mistral
Registered User
Join date: 24 Sep 2005
Posts: 551
01-09-2007 15:46
Last I heard, SL couldn't run on 98. That's why I "upgraded" to XP. Linux has no hard drive size restrictions that I'm aware of. Mac I'm not so sure about, but the ones here at school are barely (OS version) capable of running SL and they have hard drives far in excess of 2 gigs, so I'm doubting that too.

Also, I'm aware of virtual memory (or swap drives, same thing different name) on all three systems. Unless SL was going to eventually be ported to the Amiga (does it have virtual memory? Laughable anyways given the scarcity of the Amiga system in North America AND its general age) or some other non *nix compatable OS (PalmOS? >D ) that doesn't have virtual memory support it really shouldn't be in there.
_____________________
I dream of a better tomorrow in SL!
You should too. Visit, vote, voice opinions.
Support CSG! Tell LL how much it would mean to subtract one prim from another!
Prim Animation! Stop by and say something about it, show your support!
Scalar Tardis
SL Scientist/Engineer
Join date: 5 Nov 2005
Posts: 249
01-11-2007 12:14
This is my attempt at parsing the virtual file system. It is unknown to me if this is worth sticking in the open source wiki, or if most coders would say this is all plainly obvious to anyone experienced with programming. :)

llvfs.cpp Virtual File System structure:


Procedure:

CODE
LLVFS::LLVFS(const char *index_filename, const char *data_filename,
const BOOL read_only, const U32 presize,
const BOOL remove_after_crash)
:mRemoveAfterCrash(remove_after_crash)
- Handles the initial startup of the virtual file system when the SL Client starts up



snippets from inside this procedure:

CODE
llwarns << "Can't find " << mDataFilename
<< " to open read-only VFS" << llendl;
- In what situations does the client use a read-only VFS?

CODE
llwarns << "Can't open VFS data file " << mDataFilename
<< " attempting to use alternate" << llendl;
- So if your SL cache is stuck open and the client can't get read access, it'll attempt to create others numbered (vfsname).0 to (vfsname).256 before giving up with an error

CODE
// Did we leave this file open for writing last time?
// If so, close it and start over.
if (!mReadOnly && mRemoveAfterCrash)
- When the client crashes, the VFS may be left open and the data/structures may be corrupt. If mRemoveAfterCrash is True, then the client destroys the VFS and starts over fresh. (I don't know if that's normally true or not yet.)

CODE
// read the index file
// make sure there's at least one file in it too
// if not, we'll treat this as a new vfs
- So the smaller index file is effectively the File Allocation Table for the bigger file

CODE
// Do sanity check on this block.
// Note that this skips zero size blocks, which helps VFS
// to heal after some errors. JC
- During the startup process it effectively runs "Scandisk" on the existing VFS looking for problems.....

CODE
// this is corrupt, not empty
- deleting/zeroing-out any bad files....

CODE
// this is a null or bad entry, skip it
S32 index_loc = (S32)(tmp_ptr - buffer);
mIndexHoles.push_back(index_loc);
delete block;
- but it ignores problem directory entries. It leaves the bad entry and deletes the data block associated with it.

CODE
// discover all the free blocks
LLVFSFileBlock *last_file_block = (LLVFSFileBlock*)files_by_loc.getFirstData();
- This is the next phase of VFS startup.

CODE
// Duplicate entries.  Nuke them both for safety.
mFileBlocks.erase(*cur_file_block); // remove ID/type entry
- Delete crosslinked data blocks

CODE
if (length < 0 || loc < 0 || loc > data_size)
{
// Invalid VFS
unlockAndClose( mIndexFP );
mIndexFP = NULL;
LLFile::remove( mIndexFilename );
unlockAndClose( mDataFP );
mDataFP = NULL;
LLFile::remove( mDataFilename );
llwarns << "VFS: overlapping entries"
- If a data block length is < 0, or the data location is < 0 or greater than the total number of blocks in the VFS, then there's a big VFS corruption problem and the whole VFS is deleted. No attempt is made to salvage the VFS.

VFS consistency-check startup sub-procedure in LLVFS::LLVFS ends here


CODE
// no index file, start from scratch w/ 1GB allocation
LLVFSBlock *first_block = new LLVFSBlock(0, data_size ? data_size : 0x40000000);
- If no index to check, it makes a new one. So, the maximum size of the Index is hard-coded to 1 gigabyte?

I note that for my system's 1 gig Data cache, the Index file is only 558 kbytes, so presumably it is capable of handling much more than 1 gig of cache.

CODE
// Open marker file to look for bad shutdowns
if (!mReadOnly && mRemoveAfterCrash)
- Not sure what this is for. I assume the ";(VFS).open" files help the client keep track of open files in case the system crashes with files left open, similar to the zero-length "blah.lock" files that some programs create.

.
Scalar Tardis
SL Scientist/Engineer
Join date: 5 Nov 2005
Posts: 249
01-11-2007 12:17
(Merging split post)
Eddy Stryker
libsecondlife Developer
Join date: 6 Jun 2004
Posts: 353
01-11-2007 12:47
http://wiki.secondlife.com/
_____________________
http://www.libsecondlife.org

From: someone
Evidently in the future our political skirmishes will be fought with push weapons and dancing pantless men. -- Artemis Fate
Scalar Tardis
SL Scientist/Engineer
Join date: 5 Nov 2005
Posts: 249
01-11-2007 17:50
Heh, I fully accept I am probably wading in way too deep for my skill level. Not too many people go about trying to implement their own filesystem on top of someone else's filesystem. :)

I am still looking for a table or structure that defines the filesystem itself, how the various bits of data are ordered, and so forth.

Since this is a full FAT filesystem, it appears to be subject to problems of fragmentation. I assume it can write data across scattered free blocks, rather than only consecutive blocks.

[.....]
===EDIT===
After working through the code as shown below, this is incorrect. Data blocks in the VFS cannot become fragmented.

If a file is to grow bigger and there is no available free blocks immediately trailing the data, then the VFS must find a new, bigger block elsewhere in the VFS and copies the entire file to that new location, freeing the old location in the process.

Free space can become fragmented, however. If none are big enough for an incoming file, the system does not move files around to defragment free space. Instead it builds a table of the Least Recently Used files, and starts deleting the oldest files until enough room is available for the incoming file.
===EDIT===



An interesting question to me, is if it'd be possible to literally gut out the VFS and have SL just directly write data to the hard drive using the base OS filesystem. Would that increase or decrease performance?

I am still not clear on why the VFS exists at all. What does it provide over the regular OS filesystem? What would be lost by using direct OS filesystem access?

Is the SL-rolled filesystem to be considered more reliable or less reliable than the base OS? Is the SL filesystem more robust in some manner?


Or is the VFS merely just an old throwback towards "security through obscurity", where hiding all the cache files in an obscure to access data structure makes it harder for outsiders to hack?

If this is the reason, then the VFS no longer serves any purpose since with the raw source it will now be possible to write a tool to completely dump and extract the full contents of the VFS into normal data files.
Feynt Mistral
Registered User
Join date: 24 Sep 2005
Posts: 551
01-11-2007 19:05
Agreed, the only purpose I can gleam from the VFS is hiding the stuff the client gets from the servers. But now that we actually have the code, or even BEFORE that thanks to LibSL, we can see everything if we so choose. Cutting it out and using the default file system might be for the best, it would be easier to sort caches into individual sim names by subdirectory.

Actually I can see one problem with removing the VFS. It's currently easy to say "this much space will be used, no more." If you remove it, you have to count file sizes manually to keep track of the cache size.

Also, for this code:
CODE

// no index file, start from scratch w/ 1GB allocation
LLVFSBlock *first_block = new LLVFSBlock(0, data_size ? data_size : 0x40000000);

note that data_size ? data_size : 0x40000000 is a trinary operation, essentially an if statement built into a compact line with a single return (either data_size, if data_size is not 0 (non-zero is true), or 0x40000000 if data_size is 0 (zero is false). So in most cases use the data_size, unless the size isn't set in which case you use the 0x40000000 value.
_____________________
I dream of a better tomorrow in SL!
You should too. Visit, vote, voice opinions.
Support CSG! Tell LL how much it would mean to subtract one prim from another!
Prim Animation! Stop by and say something about it, show your support!
Scalar Tardis
SL Scientist/Engineer
Join date: 5 Nov 2005
Posts: 249
01-11-2007 20:42
VFS Functions, Part I (splitting long post in half)

CODE
LLVFS::getExists(const LLUUID &file_id,
const LLAssetType::EType file_type)
Purpose: Directory lookup, check to see if this file already exists in the VFS
Returns: Boolean (true or false)


CODE
LLVFS::getSize(const LLUUID &file_id,
const LLAssetType::EType file_type)
Purpose: Get the size of specified UUID/type
Returns: S32 (signed 32-bit integer)


CODE
LLVFS::getMaxSize(const LLUUID &file_id,
const LLAssetType::EType file_type)
Purpose: Reports the total space allocated for this file, not the actual data length wthin that allocation.
Returns: S32 (signed 32-bit integer)

It looks to me like the client pre-allocates space in the VFS for an asset to be downloaded. Since textures take time to load and are partially rendered as they load, it is likely allocating the full length of the data and then progressively filling that as the data arrives. This may allow for resuming of partial downloads later.


CODE
LLVFS::checkAvailable(S32 max_size)
Purpose: Checks to see if there's room in the VFS for a new file of size max-size
Returns: Boolean (true or false)


CODE
LLVFS::setMaxSize(const LLUUID &file_id,
const LLAssetType::EType file_type,
S32 max_size)
Purpose:
- If the file does not exist, this command allocates blocks in the VFS for the new file.
- If the file exists, this either attempts to allocate more space in the VFS for the existing asset, changing allocation to max_size, or releases blocks back to the VFS it is to become shorter.
Returns: Boolean (true or false pass/fail)

It seems that this virtual filesystem does not permit fragmentation of data blocks and all data in a block must be consecutive. Free space can become fragmented, but it attempts to defragment when it deletes files (command below).

With this function, if the file is to grow larger and there is no free blocks immediately following this file to use, then it searches the cache for a new free block large enough to contain the new, bigger file. It then moves the entire file into the new, larger block of space and frees the original block. If no larger block exists, this function returns an error.


CODE
LLVFS::renameFile(const LLUUID &file_id,
const LLAssetType::EType file_type,
const LLUUID &new_id,
const LLAssetType::EType &new_type)
Purpose: Rename an asset in the directory.. I think. Not sure what the big deal is here. From source:
- WARNING: HERE BE DRAGONS! rename is the weirdest VFS op, because the file moves but the locks don't!
Returns: void


CODE
// mDataMutex must be LOCKED before calling this
void LLVFS::removeFileBlock(LLVFSFileBlock *fileblock)
Purpose: Delete file data but leave filename and locks in place (I think)
Returns: void


CODE
LLVFS::removeFile(const LLUUID &file_id,
const LLAssetType::EType file_type)
Purpose: Remove directory entry for file
Returns: void


CODE
LLVFS::getData(const LLUUID &file_id,
const LLAssetType::EType file_type,
U8 *buffer,
S32 location,
S32 length)
Purpose: Read in data stored in VFS
Returns: S32 (signed 32-bit integer) number of bytes read, or zero if error


CODE
LLVFS::storeData(const LLUUID &file_id,
const LLAssetType::EType file_type,
const U8 *buffer,
S32 location, S32 length)
Purpose: Write data to the VFS
Returns: S32 (signed 32-bit integer) number of bytes written

Error handling for this one is weird. If the file is too long for the allocated blocks it is truncated. If write location is invalid it seems to just give up without an error being reported back to the calling procedure.


CODE
LLVFS::incLock(const LLUUID &file_id,
const LLAssetType::EType file_type,
EVFSLock lock)
Purpose: (Not sure about this one. I believe it is creating an access lock for the file. If the file is already locked, it increments the existing lock value.
Returns: void


CODE
LLVFS::decLock(const LLUUID &file_id,
const LLAssetType::EType file_type,
EVFSLock lock)
Purpose: Decrement existing lock value if > 1, or removes lock if the last one.
Returns: void


CODE
LLVFS::isLocked(const LLUUID &file_id,
const LLAssetType::EType file_type,
EVFSLock lock)
Purpose: Checks to see if a file is locked
Returns: Boolen (true or false)


(continued...)
Scalar Tardis
SL Scientist/Engineer
Join date: 5 Nov 2005
Posts: 249
01-11-2007 20:43
VFS Functions, Part II (splitting long post in half)

//==============================================
// protected
//================================================

(Don't ask me what this means. It's in the source. :)

CODE
LLVFS::eraseBlockLength(LLVFSBlock *block)
Purpose: (from code) find the corresponding map entry in the length map and erase it
Returns: void


CODE
LLVFS::eraseBlock(LLVFSBlock *block)
Purpose: (from code) Remove block from both free lists (by location and by length).
Returns: void


CODE
LLVFS::addFreeBlock(LLVFSBlock *block)
Purpose: Deletes data in this block and then defragments available free space by merging this block with any blocks immediately before or after it. From comments:
- Add the region specified by block location and length to the free lists.
- Also incrementally defragment by merging with previous and next free blocks. Get a pointer to the next free block (by location). We can merge with previous if it ends at our requested location. We can (also) merge with next if our block ends at the next block's location.
Returns: void


CODE
LLVFS::useFreeSpace(LLVFSBlock *free_block,
S32 length)
Purpose: ?? Don't know what's happening here
Returns: void


CODE
LLVFS::sync(LLVFSFileBlock *block, BOOL remove)
Purpose: Sync in-memory data caches with the hard drive filesystem. From source:
// NOTE! mDataMutex must be LOCKED before calling this
// sync this index entry out to the index file
// we need to do this constantly to avoid corruption on viewer crash

Returns: void


CODE
LLVFSBlock *LLVFS::findFreeBlock(S32 size, LLVFSFileBlock *immune)
Purpose: Looks for a free file block, and attempts to delete the oldest files to make more space, if necessary. Parsed from comments:
- mDataMutex must be LOCKED before calling this. This can initiate 'Least Recently Used'-based file removal to make space. The immune file block will not be removed.
- Looks for a suitable free block of space in the VFS. If there aren't any large enough free blocks, time to clean out some junk. Create a list of files sorted by usage time; this is far faster than sorting a linked list
- if Least Recently Used list = 0, then give up
- is the oldest file big enough? (Should be about half the time) If so, ditch this file and look again for a free block - should find it. TODO: it'll be faster just to assign the free block and break
- Now it's time to aggressively make more space. Delete the oldest 5MB of the vfs or enough to hold the file, which ever is larger. This may yield too much free space, but we'll use it up soon enough. TODO: it would be great to be able to batch all these sync() calls


//===============================================
// public
//=================================================


CODE
LLVFS::pokeFiles()

Returns: void
Purpose: Elsewhere it says in llstartup.cpp:
// Poke the VFS, which could potentially block for a while if
// Windows XP is acting up
gVFS->pokeFiles();


And from llvfs.h:
// Used to trigger evil WinXP behavior of "preloading" entire file into memory.
void pokeFiles();



CODE
void LLVFS::dumpMap()
Purpose: Diagnostic
- Diagnostic tool that prints out the location and length of all files stored in the VFS
- Also prints out the location and length of all free blocks in the VFS
Returns: void


CODE
LLVFS::audit()
Purpose: Diagnostic (from source)
- Verify that the index file contents match the in-memory file structure
- Very slow, do not call routinely. JC
Returns: void


CODE
LLVFS::checkMem()
Purpose: Diagnostic (from source)
- quick check for uninitialized blocks
- Slow, do not call in release.
Returns: void


CODE
LLVFS::dumpLockCounts()
Purpose: ?? Diagnostic, seems to be printing a list of all active VFS locks
Returns: void


CODE
LLVFS::dumpStatistics()
Purpose: Diagnostic
- Examines file blocks. Reports blocks with invalid lengths.
- Examines free blocks. Reports bad free blocks with invalid lengths.
- // Dump histogram of free block sizes
- // Look for potential merges (??)
Returns: void


CODE
LLVFS::dumpFiles()
Purpose: (From source)
- Debug Only!
- extention = ".jp2"; // ".j2c"; // IrfanView recognizes .jp2 -sjb
Returns: void

Heh, there is NO NEED to write a tool to dump the VFS to raw files. Here is a procedure to do exactly that. As written, it detects only texture datatypes and renames them to have a JP2 format. Though it can probably be easily extended to detect sounds and add OGG to them, and add TXT to notecards and scripts.


//==============================================
// protected
//==============================================


CODE
*LLVFS::openAndLock(const char *filename,
const char *mode,
BOOL read_lock)
Purpose: ?? Appears to be for direct OS filesystem access, not VFS
Returns: Filename


CODE
LLVFS::unlockAndClose(FILE *fp)
Purpose: ?? Appears to be for direct OS filesystem access, not VFS
Returns: void
Scalar Tardis
SL Scientist/Engineer
Join date: 5 Nov 2005
Posts: 249
01-11-2007 21:43
If you've read this thread earlier I suggest you start from the top and reread it. I've been going back and re-re-re-re-re-re-editing these posts multiple times, so you may have missed something new. Sorry, I'm not sure how to handle this. :)
SuezanneC Baskerville
Forums Rock!
Join date: 22 Dec 2003
Posts: 14,229
01-11-2007 22:16
This thread looks to me like it needs a home in an OpenSL forum.

In addition to the open source portal in the new linden wiki, there's also an new non-linden url that might be of interest: http://opensecondlife.org , where there's a nicely done modified version designed to be easily compilable using the free Visual C++ Express, among other things, and there's an IRC channel #opensl on efnet.
_____________________
-

So long to these forums, the vBulletin forums that used to be at forums.secondlife.com. I will miss them.

I can be found on the web by searching for "SuezanneC Baskerville", or go to

http://www.google.com/profiles/suezanne

-

http://lindenlab.tribe.net/ created on 11/19/03.

Members: Ben, Catherine, Colin, Cory, Dan, Doug, Jim, Philip, Phoenix, Richard,
Robin, and Ryan

-
Haravikk Mistral
Registered User
Join date: 8 Oct 2005
Posts: 2,482
01-12-2007 04:03
From: Feynt Mistral
Actually I can see one problem with removing the VFS. It's currently easy to say "this much space will be used, no more." If you remove it, you have to count file sizes manually to keep track of the cache size.

I'm not sure that's a huge issue really, as you can have an index file somewhere that notes how big the cache is, every time you download a new file, you add it's file-size to that count and see where you are in terms of your maximum size limit. To counteract users going in and deleting stuff themselves from the cache, you can simply check the size on start-up to update the count.
I dunno if all OSes have it, but some have modification dates on folders, so you could even use these to see which ones need to be recounted. I don't think tracking size of the cache therefore is going to be that bad, especially if it's organised relatively sensibly, e.g splitting individual simulator caches into even smaller chunks so that if you only ever visit a small portion of a sim, then any other chunks of area you may have cached can be removed when they get outdated.
_____________________
Computer (Mac Pro):
2 x Quad Core 3.2ghz Xeon
10gb DDR2 800mhz FB-DIMMS
4 x 750gb, 32mb cache hard-drives (RAID-0/striped)
NVidia GeForce 8800GT (512mb)
Morgaine Dinova
Active Carbon Unit
Join date: 25 Aug 2004
Posts: 968
01-12-2007 08:05
From: Scalar Tardis
Also, why is the virtual file system used at all? Why put another layer of abstraction on top of the base OS file system?

At this point I am assuming this was done so that SL can be mostly OS-independent.
You're almost certainly right, Scalar.

It's an odd way to do it though, as it doesn't leverage much on the thousands of man-years of hard work done by the developers of the underlying OSs.

The way I'd have achieved OS independence is by managing only the metadata functions ourselves (mainly mapping to portable filenames, partitioning into subdirectories, managing fast indexed access tables, dealing with crash exits, etc), and letting each OS do the actual file storage functions within a least common denominator directory tree.

That way our code wouldn't have needed to handle fragmentation and the numerous other issues that make programming a good FS very hard.
_____________________
-- General Mousebutton API, proposal for interactive gaming
-- Mouselook camera continuity, basic UI camera improvements
Scalar Tardis
SL Scientist/Engineer
Join date: 5 Nov 2005
Posts: 249
01-12-2007 16:37
Since the VFS cannot span data across free but small fragmented blocks, the cache cleanup process to free up space doesn't seem very efficient.

The problem is that if the LRU files are smaller than the needed size, and they don't happen to be consecutive with other free space in the cache, then they add nothing to the available free space. It is an effort wasted. If the next file is also not big enough and not consecutive with existing free space, then it too adds nothing to available free space, continuing to create tiny useless holes in this big block of cache cheese.

This seemingly random deleting of old files in a highly free-block fragmented cache may lead to a whole pile of deletions that just so happen to never be consecutive, and it must goes on deleting and deleting until finally a sufficiently large free block can happen to appear when the deleted blocks merge with other nearby free blocks.

Is this really more efficent than a direct OS-based filesystem that has clusters and can write across fragmented data blocks?
Haravikk Mistral
Registered User
Join date: 8 Oct 2005
Posts: 2,482
01-12-2007 18:32
Jesus, any open source developers working on this? It ought to be fairly easy to implement the basic file-system caching, the tricky bit is keeping an index of it all so that old (not recently viewed) files can be removed as needed.
The only reason /not/ to use a file-system is there's a slight memory overhead on having lots of individual files, but by the sounds of it the virtual file-system is going to have it little better. Since things can be extracted from the cache anyway we've really just got a case of storing them each in individual files, making it a little bit fiddly to extract doesn't make it impossible to extract, some basic scrambling would discourage the casual user from trying it. No need for full-on encryption since the files are decrypted at some stage in SL's operation anyway, so all it'd take is a custom client and suddenly you have all the assets you want.
_____________________
Computer (Mac Pro):
2 x Quad Core 3.2ghz Xeon
10gb DDR2 800mhz FB-DIMMS
4 x 750gb, 32mb cache hard-drives (RAID-0/striped)
NVidia GeForce 8800GT (512mb)
Scalar Tardis
SL Scientist/Engineer
Join date: 5 Nov 2005
Posts: 249
01-12-2007 22:49
Let me just repeat again, that I am not an expert in C/C++. I doubt I'd be able to write a C case statement straight off on my own without help from a gude.

But, usually you do not need to be an expert in a particular language to understand it and hack it. I can likely successfully modify the client just be examining what is already in there and duplicating the structures in my own new code, even if I don't understand the specifics and nuances of overloading and so forth.

The general gist of what's going on in the client is generally understandable from whatever other programming background you come from, whether that is Java, Pascal, or... QBASIC. Just this morning I whipped up an 8-line program in 6502 assembly on an Apple IIgs to show someone what raw CPU opcodes look like and how they work. :D


Therefore many of the questions I am asking about the VFS are from the point of view of someone who genuinely does not know what is better or worse in program design. IS this a more efficient way of storing file vs the core OS filesystem? I quite honestly do not know.

It would be real nice if the Lindens who wrote this code could step in and say "Yes, this is more efficient, and here's why." or "No, it isn't more efficient, but here are the tradeoffs vs a raw OS filesystem."

So I do not mean to accusatory in the way I ask questions. I genuinely don't know and am looking for knowledgable experience from those who do. Perhaps this VFS really is more efficient than the raw OS. :)


I am vaguely contemplating putting this procedural documentation of what I've done here into the LL open source wiki. I suppose if it does not meet the standards of what LL is looking for, then someone else is always free to come by and modify it or just delete the whole mess. ;)

.
Haravikk Mistral
Registered User
Join date: 8 Oct 2005
Posts: 2,482
01-13-2007 05:15
What you've written Scalar is good, though hopefully it'll be redundant soon, heh.
If I had more experience with C/C++ and more time then I would give such a project a go.

The only reason I can think of for using a VFS opposed to an OS one is that it's much more difficult to pull files out of unless you know how it's organised, but that is pointless now as people can do it already. It's also slightly more memory efficient since you use exactly the space you set out to, whereas with OS file-systems you may have overheads. For example if the block-size on your hard-drive is 16k, and you store a file for a 1k primitive, it tends to take up the full 16k of that block even though 15k is wasted. That's potentially for every file you store.
But with hard-drive sizes are increasing so rapidly (I know have 320gb which is very little compared to some) I wouldn't care if my cache takes up 8gb to store 2gb of cached files. Besides, since the VFS doesn't cope with fragmentation, it is going to have the same trouble anyway as it starts leaving chunks of free-space around that nothing fits into.
OS file-systems then have some overhead associated with each file opened (including hard-drive seek time of the file), where the VFS has to load far fewer files. However, it is fairly negligible in the long run if it allows for a larger cache.

I think in the end, while the VFS may have some superior performance, the real limiting factor for the cache is how you organise the data (so it can be looked up quickly) as you want look-up and clean-up (removing old files) to be as quick as possible, otherwise as the cache grows the performance could go down rapidly.
But we're talking about local access times compared with remote access times, so long as opening and managing the file locally is faster than downloading it again, then the cache is a success.

The simplest way of implementing the cache as I see it is something like storing files by their IDs, this can probably be done in one gigantic folder as the OS/file-system will handle the data-structure for finding the file when requested. Splitting off into sub-folders may be neater but I'm not sure if there'd be much of a performance gain.
Whenever you add a file you stick it in a sorted list as an index file (some kind of tree-structure perhaps, as they're good performance wise) sorted by the time it was added. When it comes time to delete something you just start deleting the first items in the list as they are the oldest. Make it one step better by updating a file's time whenever it is requested and you now know how recently a file was viewed, and thus when files are culled it is files you haven't seen a while that go first.
On OS X you don't even need to implement the index file, as Spotlight (database indexed file-system) ought to do everything you need. Just add the file and it'll do the indexing, whenever the file is requested, update the modified time (or last opened time perhaps is better), then when you want to remove something just query Spotlight for a list of 'oldest' files.
_____________________
Computer (Mac Pro):
2 x Quad Core 3.2ghz Xeon
10gb DDR2 800mhz FB-DIMMS
4 x 750gb, 32mb cache hard-drives (RAID-0/striped)
NVidia GeForce 8800GT (512mb)
Draco18s Majestic
Registered User
Join date: 19 Sep 2005
Posts: 2,744
01-13-2007 13:44
Hmm....I know little on this subject except what I've read here and here's a thought:

Use the VFS for anything smaller than 1 MB. Have a big old mess of the thing and store the PRIMS in it--a prim is like what, 1 to 2 kb of data? Then store the textures using the OS file system, organized by either one folder and each texture is stored by UUID or keep an index of UUIDs to filename and store them numberically or something. Other solution would be .\chache\[sim name]\[texture] by either method above. This would allow for a smaller VFS for each sim to store that sim's objects.
Scalar Tardis
SL Scientist/Engineer
Join date: 5 Nov 2005
Posts: 249
01-13-2007 15:53
That by-sim cache method can be more compact than you might think.

The unix file system creates what are known as links to data in the file system, and the links are the file names you see in the directory. Normally a file only has one link to it, but you can also create additional links to the data in other folders (known as hard-linking). Any of these directory entries will all point to the same data. A file is deleted only when the last of these links is removed.

Microsoft's NTFS also supports hard-linking of directory entries to the same block of data, though not many people use it. (There is an example program demonstrating this on the SysInternals website, now a part of Microsoft.)

So even if you split up caching into separate folders per sim, and there's some texture UUID that is used over and over across hundreds of sims, with hard linking that texture only uses one datafile on the hard drive, and your overall cache is much more compact.
Scalar Tardis
SL Scientist/Engineer
Join date: 5 Nov 2005
Posts: 249
01-13-2007 16:07
Lets pretend for a moment that 25 TB of user data does not scare you, and you're not too concerned about having to cache all of that someday. ;)

If you have potentially gigs of drive space that you don't mind devoting to SL and you want fast efficient searching for objects, then you may benefit by having a separate cache folder for each type of cached object: a texture cache, a sound cache, a notecard cache, and so forth.

If you allow for potentially millions of objects to stay in your cache, then just simply splitting off each object type into its own cache category can significantly cut down the amount of searching needed to find a cached object.

How many notecards are in SL? How many textures? If you permit a 5 gig cache for notecards and a 5 gig cache for textures, how much time can you save looking for a texture if you completely isolate all notecards off into their own cache and simply don't scan through them when looking for a texture?
Scalar Tardis
SL Scientist/Engineer
Join date: 5 Nov 2005
Posts: 249
01-13-2007 16:28
Also, I am no expert in databases. But from a very fundamental viewpoint a database can be seen as a file storage system that sits on top of the base OS file storage system.

Why do people use MySQL or Oracle, vs just billions of tiny data files spread across a hard drive? What advantages do these databases offer that the base file system does not?

Presumably the storage used by MySQL and Oracle can also become fragmented as data is modified and deleted, though there are likely cleanup procedures to deal with that.

The SL VFS can be seen as similar in operation to MySQL or Oracle. Is the SL VFS faster or slower than MySQL?


So there is another direction to be explored with the client: Is it better to store cached object data in this VFS, or as raw OS files, or would it be faster and more efficient to switch to an open-source database engine optimized for speed of data access and use that for storing the gigs of SL cache data?

.
grumble Loudon
A Little bit a lion
Join date: 30 Nov 2005
Posts: 612
01-13-2007 23:02
Given how it is getting harder to get windows to truly compact free space, the VFS is a better system if it sorts the data (in the background) so that the most recently accessed/viewed items are all together.

Storing the items by sim would also do the same thing.

The key will be keeping everything together will reduce head seek times.

There is also the issue of controlling the "Read ahead" feature in the OS where even if you ask for one object, windows will read several Kb.

I would propose using multiple VFS or MySQL databases with each one holding the data that is seen for the sim you are in. This will cause duplication, but the cache system could prevent requesting the same item multiple times by checking the master key mapping table.

1. Check the local sim cache
2. Check the global key to sim table. (if found copy from that other sim cache)
3. Ask the sim for it.
ed44 Gupte
Explorer (Retired)
Join date: 7 Oct 2005
Posts: 638
01-14-2007 04:57
I seem to remember that db's use indexed/sequential methods for finding stuff. They do calculations on the search key of the wanted item and that gives a location of where that item should be. If more than one item calculate the same result, those items are stored sequentially. I think it works better when the db less than half full.

Still not sure if that is quicker than using os file system.
Argent Stonecutter
Emergency Mustelid
Join date: 20 Sep 2005
Posts: 20,263
01-14-2007 11:27
From: Haravikk Mistral
The simplest way of implementing the cache as I see it is something like storing files by their IDs, this can probably be done in one gigantic folder as the OS/file-system will handle the data-structure for finding the file when requested. Splitting off into sub-folders may be neater but I'm not sure if there'd be much of a performance gain.
There woud be a HUGE performance gain. Using the first two characters of the file name for the first level, and the next two for the next level, igonring "-", would give you no more than 256 entries at each level and be trivial to parse. Three characters would give you no more than 4096. You could even tune it for the file system... directory entries tend to be cheaper in UNIX, larger directories cheaper in Windows.

This is a standard "first, easy" optimization for any kind of file system based object store.
From: someone
On OS X you don't even need to implement the index file, as Spotlight (database indexed file-system) ought to do everything you need.
Using a general purpose database instead of a special purpose one, when you have already implemented the special purpose one and optimized it, is always a bad idea. Using a special purpose database optimised for a different goal is worse.
Argent Stonecutter
Emergency Mustelid
Join date: 20 Sep 2005
Posts: 20,263
01-14-2007 11:33
From: Draco18s Majestic
Use the VFS for anything smaller than 1 MB. Have a big old mess of the thing and store the PRIMS in it--a prim is like what, 1 to 2 kb of data? Then store the textures using the OS file system,
That's basically what they do, already. Look at your cache some time.
1 2