Second Life Forums Archive - A decent hash function for avatar keys?

Ziggy Puff

Registered User

Join date: 15 Jul 2005

Posts: 1,143

01-27-2006 09:59

I'm working on something that will require storing data, and one of the fields in each entry is a user key. I think I'll be running low on memory, so I'm thinking of using a set of helper scripts (or maybe even separate objects, though I'm not sure if that gains me anything, since AFAIK there's no limit to the number of scripts in an object) to store this information. I thought I would use a hash function to pick which script gets a certain data entry, to try and keep the memory usage uniform, and also to keep multiple entries for the same user (which are valid in my situation) in the same script, which should help with retrieving them.

So... does anyone know if avatar keys are randomly generated, or sequentially, or anything else? Having some idea of the pattern would help in coming up with a decent hash function.

Thanks,
Ziggy

Masakazu Kojima

ケロ

Join date: 23 Apr 2004

Posts: 232

01-27-2006 10:21

http://secondlife.com/badgeo/wakka.php?wakka=UUID&show_comments=1#comments

From: Psyke Phaeton

Zero Linden: okay - the UUIDs are ... 128 random string
Zero Linden: the are likely to be unique for all time and space (!)
Zero Linden: there is no internal format to them, like one part being the location and one the time of birth or some such

Almost all avatar keys generated after late 2003/early 2004 (?) have 4 as the first digit of the third field. This is probably just the UUID version; version 4 UUIDs are randomly generated. http://www.ietf.org/rfc/rfc4122.txt

Organizing them by the first digit or two is probably your best bet.

Ziggy Puff

Registered User

Join date: 15 Jul 2005

Posts: 1,143

01-27-2006 10:32

Thanks.

Introvert Petunia

over 2 billion posts

Join date: 11 Sep 2004

Posts: 2,065

01-27-2006 11:10

There are AV UUID databases out there, which would give you the data that you need. You could then use a frequency analysis to a static Huffman coding of them.

It is not immediately apparent to me if the code size of or time complexity of Huffman in LSL would offset your data savings especially if UUIDs are decently random in which case their compressability should be close to nil.

Yeah, on second thought, your bucketing would probably be more efficient, and you might not even need to hash - just bucket off the first character.

Ziggy Puff

Registered User

Join date: 15 Jul 2005

Posts: 1,143

01-27-2006 11:19

Yeah, I'm not trying to compress data to save memory, just get a decent distribution across the scripts. Though I could set up some elaborate scheme where the back-end scripts report how many entries they're holding, and the front-end script picks the one with the most free memory. But I'd like to start with just distributing them out there and see if I can get a good mix.

While on the topic of saving memory... my DB will be a 'list of lists', where each entry is a smallish string, and each inner list is a list of such strings. I'm thinking that I'll save quite a bit if I maintain this as a 'list of strings', and use my own delimiter inside each 'long string' to separate the individual 'little strings' within it. Thought/comments? I think the list overhead to maintain each 'little string' as an individual list item will be a lot more than a 2-3 character delimiter. And when it's time to parse/add/remove, I can make a temporary list out of the 'long string', perform the operations needed, and then dump it back into a 'long string' to put back in the DB.

I hope that made some sense

Introvert Petunia

over 2 billion posts

Join date: 11 Sep 2004

Posts: 2,065

01-27-2006 11:24

Lists do appear to be far more memory intensive than they ought be. There was a recent post that strings may be too, but probably less so than lists.

With LSL, we're stuck doing a lot of "empirical programming". To paraphrase the mythical Ezhar "what makes LSL interesting is not so much what you can do with it, but in tricking it into doing what you want". Good luck.

Ziggy Puff

Registered User

Join date: 15 Jul 2005

Posts: 1,143

01-27-2006 11:33

Heh... thanks. I'm still pretty much at the prototyping stage, so maybe I'll try it both ways and see how it looks.

Fenrir Reitveld

Crazy? Don't mind if I do

Join date: 20 Apr 2005

Posts: 459

01-27-2006 16:41

I've used "list-like strings" (strings with data seperation delimiter), in fact I posted some code demonstrating this. It works by treating the string as a stack, popping up the most recent "line" and parsing it into a temporary list via llCSV2List*, then adding it back to the end of the string. Just recurse until all N entries are examined/manipulated.

It wasn't the most optimal code (I was doing a llGetSubString twice on what could potentially be a very large string), but even so there's not much room for optimization. You're trading execution speed for higher in-script data capacity... ^_^

It's less that strings are memory intensive, but that you have to be careful to remember strings are just pointers. (I was making the assumption that string passing via functions was by-value and saw what I thought was leaky strings.) There is also various issues, such the VM doesn't release memory when you expect it (ie: deallocated when going out of scope, instead of cleared by hand, etc).

I'm having similiar growing pains with my remote vendor system. It's a beast of a system, with six prims and nearly a dozen scripts, all sharing the data and workload. I'm thinking of just scrapping it and just doing the database-related actions externally instead of keeping it all in SL.

*If commas are an issue, you can also use llParseString2List/llDumpList2String instead of the CSV based manipulators.

Ziggy Puff

Registered User

Join date: 15 Jul 2005

Posts: 1,143

01-27-2006 17:18

Sometimes I wonder how many times different SL scripters have tried to solve the same problems

My project isn't a remote vendor, but it looks like some of the problems are pretty similar. I considered going off-world too, but that has its own set of issues.

Thanks for the tip, I'll go dig up your 'list-like strings' code and take a look through it.

Strife Onizuka

Moonchild

Join date: 3 Mar 2004

Posts: 5,887

01-27-2006 18:16

You could do something like

CODE


distribute(key user, string message)
{
    integer a = (integer)("0x"+llGetSubString(user,0,0));
    llMessageLinked(LINK_SET, a + 3000, message, user);
}

_____________________

Truth is a river that is always splitting up into arms that reunite. Islanded between the arms, the inhabitants argue for a lifetime as to which is the main river.
- Cyril Connolly

Without the political will to find common ground, the continual friction of tactic and counter tactic, only creates suspicion and hatred and vengeance, and perpetuates the cycle of violence.
- James Nachtwey

Ziggy Puff

Registered User

Join date: 15 Jul 2005

Posts: 1,143

01-27-2006 19:16

The 3000 just being an offset to identify this message type, and then with 16 scripts, each with an internal ID of 0 - 15, and those pick up and save the user/message tuple if the ID matches. Or am I missing something? If that's what you meant, then yes, I was thinking of something like this, but it would totally break if 90% of user keys had the same first character, hence my question about the distribution.

And thanks for that code snippet... I was wondering if I could use 0x somehow to get straight from hex to decimal, I remembered you'd posted something along these lines in the past. Much better than setting up a list of 0 - f and finding the index in there, which is probably what I would have done

Masakazu Kojima

ケロ

Join date: 23 Apr 2004

Posts: 232

01-27-2006 21:06

From 15960 avatar keys (F = first digit in key):

CODE

+---+-------+------+
| F | TOTAL | %    |
+---+-------+------+
| 0 |  1018 | 6.38 |
| 1 |   979 | 6.13 |
| 2 |   986 | 6.18 |
| 3 |  1032 | 6.47 |
| 4 |  1028 | 6.44 |
| 5 |   947 | 5.93 |
| 6 |   974 | 6.10 |
| 7 |  1038 | 6.50 |
| 8 |  1017 | 6.37 |
| 9 |   961 | 6.02 |
| a |   991 | 6.21 |
| b |   978 | 6.13 |
| c |   967 | 6.06 |
| d |   997 | 6.25 |
| e |  1013 | 6.35 |
| f |  1034 | 6.48 |
+---+-------+------+

And just for fun (V = first digit of 3rd set):

CODE

+---+-------+-------+
| V | Total | %     |
+---+-------+-------+
| 0 |     3 |  0.02 |
| 1 |     2 |  0.01 |
| 2 |     1 |  0.01 |
| 3 |     1 |  0.01 |
| 4 | 15916 | 99.72 |
| 5 |     2 |  0.01 |
| 6 |     5 |  0.03 |
| 7 |     1 |  0.01 |
| 8 |     5 |  0.03 |
| 9 |     4 |  0.03 |
| a |     3 |  0.02 |
| b |     3 |  0.02 |
| c |     4 |  0.03 |
| d |     2 |  0.01 |
| e |     3 |  0.02 |
| f |     5 |  0.03 |
+---+-------+-------+

Ziggy Puff

Registered User

Join date: 15 Jul 2005

Posts: 1,143

01-28-2006 08:35

Perfect. Thank you so much

A decent hash function for avatar keys?
Ziggy Puff Registered User Join date: 15 Jul 2005 Posts: 1,143	01-27-2006 09:59 I'm working on something that will require storing data, and one of the fields in each entry is a user key. I think I'll be running low on memory, so I'm thinking of using a set of helper scripts (or maybe even separate objects, though I'm not sure if that gains me anything, since AFAIK there's no limit to the number of scripts in an object) to store this information. I thought I would use a hash function to pick which script gets a certain data entry, to try and keep the memory usage uniform, and also to keep multiple entries for the same user (which are valid in my situation) in the same script, which should help with retrieving them. So... does anyone know if avatar keys are randomly generated, or sequentially, or anything else? Having some idea of the pattern would help in coming up with a decent hash function. Thanks, Ziggy
Masakazu Kojima ケロ Join date: 23 Apr 2004 Posts: 232	01-27-2006 10:21 http://secondlife.com/badgeo/wakka.php?wakka=UUID&show_comments=1#comments From: Psyke Phaeton Zero Linden: okay - the UUIDs are ... 128 random string Zero Linden: the are likely to be unique for all time and space (!) Zero Linden: there is no internal format to them, like one part being the location and one the time of birth or some such Almost all avatar keys generated after late 2003/early 2004 (?) have 4 as the first digit of the third field. This is probably just the UUID version; version 4 UUIDs are randomly generated. http://www.ietf.org/rfc/rfc4122.txt Organizing them by the first digit or two is probably your best bet.
Ziggy Puff Registered User Join date: 15 Jul 2005 Posts: 1,143	01-27-2006 10:32 Thanks.
Introvert Petunia over 2 billion posts Join date: 11 Sep 2004 Posts: 2,065	01-27-2006 11:10 There are AV UUID databases out there, which would give you the data that you need. You could then use a frequency analysis to a static Huffman coding of them. It is not immediately apparent to me if the code size of or time complexity of Huffman in LSL would offset your data savings especially if UUIDs are decently random in which case their compressability should be close to nil. Yeah, on second thought, your bucketing would probably be more efficient, and you might not even need to hash - just bucket off the first character.
Ziggy Puff Registered User Join date: 15 Jul 2005 Posts: 1,143	01-27-2006 11:19 Yeah, I'm not trying to compress data to save memory, just get a decent distribution across the scripts. Though I could set up some elaborate scheme where the back-end scripts report how many entries they're holding, and the front-end script picks the one with the most free memory. But I'd like to start with just distributing them out there and see if I can get a good mix. While on the topic of saving memory... my DB will be a 'list of lists', where each entry is a smallish string, and each inner list is a list of such strings. I'm thinking that I'll save quite a bit if I maintain this as a 'list of strings', and use my own delimiter inside each 'long string' to separate the individual 'little strings' within it. Thought/comments? I think the list overhead to maintain each 'little string' as an individual list item will be a lot more than a 2-3 character delimiter. And when it's time to parse/add/remove, I can make a temporary list out of the 'long string', perform the operations needed, and then dump it back into a 'long string' to put back in the DB. I hope that made some sense
Introvert Petunia over 2 billion posts Join date: 11 Sep 2004 Posts: 2,065	01-27-2006 11:24 Lists do appear to be far more memory intensive than they ought be. There was a recent post that strings may be too, but probably less so than lists. With LSL, we're stuck doing a lot of "empirical programming". To paraphrase the mythical Ezhar "what makes LSL interesting is not so much what you can do with it, but in tricking it into doing what you want". Good luck.
Ziggy Puff Registered User Join date: 15 Jul 2005 Posts: 1,143	01-27-2006 11:33 Heh... thanks. I'm still pretty much at the prototyping stage, so maybe I'll try it both ways and see how it looks.
Fenrir Reitveld Crazy? Don't mind if I do Join date: 20 Apr 2005 Posts: 459	01-27-2006 16:41 I've used "list-like strings" (strings with data seperation delimiter), in fact I posted some code demonstrating this. It works by treating the string as a stack, popping up the most recent "line" and parsing it into a temporary list via llCSV2List, then adding it back to the end of the string. Just recurse until all N entries are examined/manipulated. It wasn't the most optimal code (I was doing a llGetSubString twice on what could potentially be a very large string), but even so there's not much room for optimization. You're trading execution speed for higher in-script data capacity... ^_^ It's less that strings are memory intensive, but that you have to be careful to remember strings are just pointers. (I was making the assumption that string passing via functions was by-value and saw what I thought was leaky strings.) There is also various issues, such the VM doesn't release memory when you expect it (ie: deallocated when going out of scope, instead of cleared by hand, etc). I'm having similiar growing pains with my remote vendor system. It's a beast of a system, with six prims and nearly a dozen scripts, all sharing the data and workload. I'm thinking of just scrapping it and just doing the database-related actions externally instead of keeping it all in SL. If commas are an issue, you can also use llParseString2List/llDumpList2String instead of the CSV based manipulators.
Ziggy Puff Registered User Join date: 15 Jul 2005 Posts: 1,143	01-27-2006 17:18 Sometimes I wonder how many times different SL scripters have tried to solve the same problems My project isn't a remote vendor, but it looks like some of the problems are pretty similar. I considered going off-world too, but that has its own set of issues. Thanks for the tip, I'll go dig up your 'list-like strings' code and take a look through it.
Strife Onizuka Moonchild Join date: 3 Mar 2004 Posts: 5,887	01-27-2006 18:16 You could do something like CODE distribute(key user, string message) { integer a = (integer)("0x"+llGetSubString(user,0,0)); llMessageLinked(LINK_SET, a + 3000, message, user); } _____________________ Truth is a river that is always splitting up into arms that reunite. Islanded between the arms, the inhabitants argue for a lifetime as to which is the main river. - Cyril Connolly Without the political will to find common ground, the continual friction of tactic and counter tactic, only creates suspicion and hatred and vengeance, and perpetuates the cycle of violence. - James Nachtwey
Ziggy Puff Registered User Join date: 15 Jul 2005 Posts: 1,143	01-27-2006 19:16 The 3000 just being an offset to identify this message type, and then with 16 scripts, each with an internal ID of 0 - 15, and those pick up and save the user/message tuple if the ID matches. Or am I missing something? If that's what you meant, then yes, I was thinking of something like this, but it would totally break if 90% of user keys had the same first character, hence my question about the distribution. And thanks for that code snippet... I was wondering if I could use 0x somehow to get straight from hex to decimal, I remembered you'd posted something along these lines in the past. Much better than setting up a list of 0 - f and finding the index in there, which is probably what I would have done
Masakazu Kojima ケロ Join date: 23 Apr 2004 Posts: 232	01-27-2006 21:06 From 15960 avatar keys (F = first digit in key): CODE +---+-------+------+ \| F \| TOTAL \| % \| +---+-------+------+ \| 0 \| 1018 \| 6.38 \| \| 1 \| 979 \| 6.13 \| \| 2 \| 986 \| 6.18 \| \| 3 \| 1032 \| 6.47 \| \| 4 \| 1028 \| 6.44 \| \| 5 \| 947 \| 5.93 \| \| 6 \| 974 \| 6.10 \| \| 7 \| 1038 \| 6.50 \| \| 8 \| 1017 \| 6.37 \| \| 9 \| 961 \| 6.02 \| \| a \| 991 \| 6.21 \| \| b \| 978 \| 6.13 \| \| c \| 967 \| 6.06 \| \| d \| 997 \| 6.25 \| \| e \| 1013 \| 6.35 \| \| f \| 1034 \| 6.48 \| +---+-------+------+ And just for fun (V = first digit of 3rd set): CODE +---+-------+-------+ \| V \| Total \| % \| +---+-------+-------+ \| 0 \| 3 \| 0.02 \| \| 1 \| 2 \| 0.01 \| \| 2 \| 1 \| 0.01 \| \| 3 \| 1 \| 0.01 \| \| 4 \| 15916 \| 99.72 \| \| 5 \| 2 \| 0.01 \| \| 6 \| 5 \| 0.03 \| \| 7 \| 1 \| 0.01 \| \| 8 \| 5 \| 0.03 \| \| 9 \| 4 \| 0.03 \| \| a \| 3 \| 0.02 \| \| b \| 3 \| 0.02 \| \| c \| 4 \| 0.03 \| \| d \| 2 \| 0.01 \| \| e \| 3 \| 0.02 \| \| f \| 5 \| 0.03 \| +---+-------+-------+
Ziggy Puff Registered User Join date: 15 Jul 2005 Posts: 1,143	01-28-2006 08:35 Perfect. Thank you so much

Welcome to the Second Life Forums Archive

A decent hash function for avatar keys?