Memory Efficient String Operations?
|
|
Lila Pixie
Registered User
Join date: 7 Jun 2006
Posts: 20
|
07-05-2007 18:49
Hi,
I am writing a script that involves quite a bit of string manipulation. I am reading text from notecards and replacing $variables$ with actual values. Due to the amount of text my script processes, I am running out of script memory and I was wondering what is the most memory efficient method for cutting and replacing text strings?
Right now, I am doing a character by character parsing. Would converting strings into lists, do my replacement and re-concatenate them be a better option? Should I use a series of llGetSubString, llDeleteSubString, llInsertString calls?
|
|
Osgeld Barmy
Registered User
Join date: 22 Mar 2005
Posts: 3,336
|
07-05-2007 22:17
list's use more memory, its data + 40 bytes per entry (if i recall) so thats not going to be an option
try looking at llSubStringIndex, since you know how long the phrase your wanting to replace is you should be able to just use that to find the instance, and replace it
|
|
Qie Niangao
Coin-operated
Join date: 24 May 2006
Posts: 7,138
|
07-06-2007 01:18
One thing to try, if you haven't already been using this little trick: myString = (myString=""  + myString + "new_item"; The lslwiki entry on strings claims that, as for lists, this improves storage efficiency compared to simple += concatenation. (I'm guessing it reduces heap fragmentation, but that's just a guess.)
|
|
Lila Pixie
Registered User
Join date: 7 Jun 2006
Posts: 20
|
07-06-2007 02:42
@Osgeld Barmy: I'm using character by character parsing because I was also doing string formatting, breaking long sentences up into several lines, aligning each line to left, right or center, etc.. After doing some tests, I realize this is the problematic code that sucks up the most resources and processsing time. So for now, I'm abandoning this feature till I can come up with something more efficient. @Qie Niangao: (Cutting the new year cake???) myString = (myString=""  + myString + "new_item"; How does this work? It seems like a bug to me if it actually does what you claim. *boggles* Assuming the usual left to right precedence order applies for the + symbols, the code is first setting myString to an empty string, concatenates the now empty myString with another empty myString before finally concatenating "new_item". Its end result should be equivalent to: myString = "new_item"; and not: myString += "new_item"; Or am I misunderstanding something?
|
|
Deanna Trollop
BZ Enterprises
Join date: 30 Jan 2006
Posts: 671
|
07-06-2007 02:47
From: Lila Pixie Assuming the usual left to right precedence order applies for the + symbols It doesn't. Concatenation operations are evaluated right-to-left. Therefore new_item is added to the existing value of myString, then myString is cleared, so you don't have two copies of the original myString data taking up memory at the same time. (apparently)
|
|
Lila Pixie
Registered User
Join date: 7 Jun 2006
Posts: 20
|
07-06-2007 02:55
Okay, I did a quick test using this code : integer i; string myString; llOwnerSay((string)llGetFreeMemory()); myString = ""; for (i = 0; i < 100; i++) { myString = (myString=""  + myString + (string)i; } llOwnerSay(myString); llOwnerSay((string)llGetFreeMemory()); myString = ""; for (i = 0; i < 100; i++) { myString += (string)i; } llOwnerSay(myString); llOwnerSay((string)llGetFreeMemory()); The output is... interesting. [2:47] Object say, 15729 [2:47] Object say, 0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899 [2:47] Object say, 15530 [2:47] Object say, 0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899 [2:47] Object say, 15175 So it seems I use LESS memory with += than the wierd code. I tried running with a i < 200 for loop and received even more memory savings. Initial free memory = 15729 After loop 1, free memory = 15232 After loop 2, free memory = 14266 What's happening?
|
|
Deanna Trollop
BZ Enterprises
Join date: 30 Jan 2006
Posts: 671
|
07-06-2007 03:00
Don't rely on llGetFreeMemory to give accurate results. The only real way to test something like this is to see how many such operations you can do before the stack and heap collide.
Edit: For example, insert another llOwnerSay((string)llGetFreeMemory()); just after the second time you clear myString, and I can almost guarantee you won't see an increase in free memory.
|
|
Trevor Langdon
Second Life Resident
Join date: 20 Oct 2004
Posts: 149
|
07-06-2007 08:32
Lila-- As Deanna mentions, llGetFreeMemory is not reliable in indicating how much memory is currently available, since it only shows the low water mark of available memory and will not reflect any recouped memory that may have been freed during the processing. However, in your example, I believe you misread your results. From your results it shows that using the "+=" method leaves less memory available, rather than a savings over the other method. Also, try reversing the order of the test (put the "+=" method first). In this case, my guess is you will not see a difference in the llGetFreeMemory value between Loops 1 & 2. So, the results of your test support the lslwiki explanation that using the (mystring=""  method decreases the low water mark for memory used. For long string manipulation, this can be significant. p.s. I have used this method for list handling; however, didn't realize it could also be used for string handling to save precious memory.  Makes sense. Deanna explained the reasoning well. Good stuff.
|
|
Lila Pixie
Registered User
Join date: 7 Jun 2006
Posts: 20
|
07-06-2007 10:46
Yes, I was rushing out to watch a movie with friends when I typed the last post. When I came home and re-read what I had said, I realized all the mistakes I've made! lol
So yes, the concatenation trick does indeed work!
|
|
Kidd Krasner
Registered User
Join date: 1 Jan 2007
Posts: 1,938
|
07-06-2007 13:53
From: Deanna Trollop It doesn't. Concatenation operations are evaluated right-to-left. Therefore new_item is added to the existing value of myString, then myString is cleared, so you don't have two copies of the original myString data taking up memory at the same time. (apparently) I don't see anything on the Operators page for the wiki that guarantees that the operations are evaluated right to left. Even if that were the case, it's not the relevant requirement. What matters is that the operands be evaluated in the correct order. Consider x = a * b + (c * 2); We expect, reasonably, that the product a*b will be evaluated first, and then 2c added to the result. But if instead we write: x = a * (c = c + 3) + (c * 2) then we know the increment of c must take place before the multiplication, but there's no mathematical requirement that it take place before the doubling. If the order of evaluation were: Compute c*2, save the result on the stack Compute c + 3, push the result onto the stack, save a copy in c Multiply the top of stack by a Add the two topmost items on the stack the result is something that obeys the rule that requires the multiplication to be done first. Why would a compiler do it this way? An optimizing compiler might decide it's the best way to do it. Language specifications have been known to leave this issue unspecified, giving the optimizer freedom at the expense of the users shooting themselves in the foot. The bottom line is that side effects within expressions are evil, they've always been evil, and they always will be. Well, maybe not in LISP. In the absence of a rigorous language specification and test suite (comparable to Plum Hall for C, or DoD's for Ada), there's no way I would trust such an expression.
|
|
Qie Niangao
Coin-operated
Join date: 24 May 2006
Posts: 7,138
|
07-06-2007 14:13
From: Kidd Krasner ...The bottom line is that side effects within expressions are evil, they've always been evil, and they always will be. Well, maybe not in LISP. In the absence of a rigorous language specification and test suite (comparable to Plum Hall for C, or DoD's for Ada), there's no way I would trust such an expression. Oh, indeed, all this could conceivably fail with any update. But on the other hand, a script has 16KB of memory to work with and no discernible memory management, so LSL is all rather like programming a 70's vintage "portable computer" in some proprietary subset of BASIC--and is almost surely just as ephemeral. So, when confronted with a script that almost fits, it seems a kind of arbitrary choice of smelly potions, whether to use the voodoo concatenation trick or break up the script and access memory by link messages.
|
|
Boss Spectre
Registered User
Join date: 5 Sep 2005
Posts: 229
|
07-06-2007 15:05
Unlike numerical addition, string and list concatenation are not commutative, so they always evaluate right to left.
|
|
Kidd Krasner
Registered User
Join date: 1 Jan 2007
Posts: 1,938
|
07-06-2007 16:30
From: Boss Spectre Unlike numerical addition, string and list concatenation are not commutative, so they always evaluate right to left. That doesn't follow. Whether not the operator is commutative has no bearing on whether the operands are evaluated right to left, left to right, or some mixed order. Consider, assuming these are all integers: x = (a = b / c) - (d = e - f) It doesn't really matter whether a is assigned to first or d is. The value stored in x will be (b/c) - (e -f), or (b/c) + f - e, regardless. And all of these operators are non-commutative. But if you change it to x = (a = b/c) - (c = e - f) then it's not clear whether x gets (b/c.original) - (e -f) or b/(e-f) - (e - f). The important factor is the side effect, not the commutativity.
|
|
Kidd Krasner
Registered User
Join date: 1 Jan 2007
Posts: 1,938
|
07-06-2007 16:38
From: Qie Niangao Oh, indeed, all this could conceivably fail with any update. But on the other hand, a script has 16KB of memory to work with and no discernible memory management, so LSL is all rather like programming a 70's vintage "portable computer" in some proprietary subset of BASIC--and is almost surely just as ephemeral. So, when confronted with a script that almost fits, it seems a kind of arbitrary choice of smelly potions, whether to use the voodoo concatenation trick or break up the script and access memory by link messages. A 70's portable computer was a pocket calcultor. The Osborne came out in 81. But I don't remember having to stand on my head like this when working on an Apple II, though I don't remember how much memory it had. I just didn't try to get it to process more data than it could handle. It strikes me that this trick only buys a little. If the script is going to evolve at all, you'll be forced into linked messages, so why not just bite the bullet?
|
|
Deanna Trollop
BZ Enterprises
Join date: 30 Jan 2006
Posts: 671
|
07-06-2007 18:06
From: Kidd Krasner That doesn't follow. Whether not the operator is commutative has no bearing on whether the operands are evaluated right to left, left to right, or some mixed order. The point was, even though string/list concatenation uses the "+" symbol as an operator, it is not a mathematical operation, and therefore doesn't follow the rules of such.
|
|
Deanna Trollop
BZ Enterprises
Join date: 30 Jan 2006
Posts: 671
|
07-06-2007 18:08
From: Kidd Krasner It strikes me that this trick only buys a little. So long as it's enough to keep a few bytes between stack and heap, IMHO, it's worth it.
|
|
Strife Onizuka
Moonchild
Join date: 3 Mar 2004
Posts: 5,887
|
07-06-2007 21:38
From: Kidd Krasner x = (a = b/c) - (c = e - f) That executes as: (written in a style that can accurately describe LSO) push f push e subtract int,int copyto c push c push b divide int,int copyto a subtract int,int copyto x LSL executes right to left, except blocks of code separated by commas are executed left to right (but inside the block are right to left).
_____________________
Truth is a river that is always splitting up into arms that reunite. Islanded between the arms, the inhabitants argue for a lifetime as to which is the main river. - Cyril Connolly
Without the political will to find common ground, the continual friction of tactic and counter tactic, only creates suspicion and hatred and vengeance, and perpetuates the cycle of violence. - James Nachtwey
|
|
Boss Spectre
Registered User
Join date: 5 Sep 2005
Posts: 229
|
07-09-2007 20:10
From: Kidd Krasner <snip> If the script is going to evolve at all, you'll be forced into linked messages, so why not just bite the bullet? Coz h4x0rz r 1337!!! string strcat( string s1, string s2 ) { return s1 + s2; }
integer add( integer x1, integer x2 ) { return x1 + x2; }
default { state_entry() { string a = "Hello"; llOwnerSay( a = (a = "") + a + "Goodbye" ); llOwnerSay( a = strcat( a = "", strcat( a, "Goodbye" ) ) ); integer x = 1; llOwnerSay( (string)( x = (x = 0) + x + 10 ) ); llOwnerSay( (string)(x = add( x = 0, add( x, 100 ) ) ) ); } }
|
|
Feldspar Millgrove
Registered User
Join date: 16 Nov 2006
Posts: 372
|
07-11-2007 01:35
From: Kidd Krasner A 70's portable computer was a pocket calcultor. The Osborne came out in 81. There were portable computers that were more powerful than the Osborne in the 1970s. They were very expensive, though. The IBM 5100, for example. (Not to be confused with a later PC with a similar model number.)
|
|
Kidd Krasner
Registered User
Join date: 1 Jan 2007
Posts: 1,938
|
07-12-2007 08:08
From: Strife Onizuka LSL executes right to left, except blocks of code separated by commas are executed left to right (but inside the block are right to left).
But the question is whether that's an artifact of the current compiler or part of the (non-existent) formal specification of the language? For example, quoting from Stroustrup, The C++ Progamming Language, 2nd ed., p. 492: "Except where noted, the order of evaluation of operands of individual operators is undefined. In particular, if a value is modified twice in an expression, the result of the expression is undefined except where an ordering is guaranteed by the operators involved." In the examples given, i = v[i++] is undefined, while i=7,i++,i++ is defined, because the C++ comma operator explicitly demands an order. So, assuming this hasn't changed in more recent C++ standards, two different compilers can evaluate i = v[i++] in two different ways, but both compilers are correct. There may be only one source of LSL compiler now, but what is to prevent a second? And what is to prevent LSL from adding an optimizer that changes the order?
|
|
Kidd Krasner
Registered User
Join date: 1 Jan 2007
Posts: 1,938
|
07-12-2007 08:09
From: Feldspar Millgrove There were portable computers that were more powerful than the Osborne in the 1970s. They were very expensive, though. The IBM 5100, for example. (Not to be confused with a later PC with a similar model number.) I stand corrected. (Not that anyone would consider the 5100 to be portable by today's standards.)
|
|
Kidd Krasner
Registered User
Join date: 1 Jan 2007
Posts: 1,938
|
07-12-2007 08:11
From: Boss Spectre Coz h4x0rz r 1337!!!
Oh, go rewrite all of SL in APL 
|