Welcome to the Second Life Forums Archive

These forums are CLOSED. Please visit the new forums HERE

Parse String Function

Aakanaar LaSalle
Registered User
Join date: 1 Sep 2006
Posts: 132
10-27-2006 19:43
Hi, I'm working on a project, where I will be reading lines from a notecard.

The script will get a random line, which I then need to parse into a list. Now, most of the elements in the lines I will have will be seperated by a single char, more than likely the asterisk '*'.

However there may be times when I want to use the asterisk inside an element and thus use a different character to seperate the elements.

For this reason I created the following function, which will take the first character of the line and use it to seperate the rest of the line.
CODE

list ufParseString (string strSource)
{
string strSeperator = llGetSubString(strSource, 0, 0);
strSource = llDeleteSubString(strSource, 0, 0);

return llParseString2List(strSource, [strSeperator], []);
}


Now, this works fine for me. But I got to thinking, "this is a usefull function. why not make it available for everyone?". The only problem is some people may want to use more than one character to seperate their elements.

So I decided to remake it so that if the first character is a number (0-9) it would discard it and use the next set of characters as the seperator. The number of characters would be indicated by that first number. And here is what I have:
CODE

list ufParseString (string strSource)
{
string strSeperator = llGetSubString(strSource, 0, 0);

if (llSubStringIndex("0123456789", strSeperator) != -1)
{
strSeperator = llGetSubString(strSource, 1, (integer)strSeperator);
strSource = llDeleteSubString(strSource, 0, (integer)strSeperator);
return llParseString2List(strSource, [strSeperator], []);
}

strSource = llDeleteSubString(strSource, 0, 0);
return llParseString2List(strSource, [strSeperator], []);
}


It's not the best idea, I am sure, but it's what I came up with. But then I thought about Quotation marks. What if someone wanted to surround the elements with quotations. What about adding this to it?
CODE

if (strSeperator == "\"")
{
list lstSeperator = ["\"\"", "\" \"", "\",\"", "\", \"", "\" ,\"", "\" , \""];
}


Now it's getting confusing.

So, here is my question, I'll use the first one for what I want. Does the other sound like something people might want made available?

Is it simple enough that most should be able to figgure it out on their own?

Is there a better way to make a function that would allow people to define how the line is to be seperated at the beginning of the line?

Could something like this be added to or used to spruce up List Conversion or Safe List Conversion?
grumble Loudon
A Little bit a lion
Join date: 30 Nov 2005
Posts: 612
10-29-2006 04:22
Note: You have to use two slashes to get a slash in the text.
Using quotes is also confusing.

I personally like to use the pipe symbol, however the object editor currently does not allow you to enter that in a description field.

soo many characters to choose from. :confused:

You can parse for longer sequences of strings like ],[ but this is slower.

Idealy a seperator is not used anywhere else and is not contained in normal use.

I know of one application that uses the tilde ~ character.

Edit: sorry, I now see what you are saying.

cool.

but, It still sounds more complex than it needs to be in that most aplicatiuons can find a chariter that is not used anywhere.
Aakanaar LaSalle
Registered User
Join date: 1 Sep 2006
Posts: 132
10-29-2006 22:46
Yea, I think most people who might find this usefull will want the short version, single char.. so that every line can have a different character to split on, by just putting that character at the front of the line..

As for the slashes, I meant no slashes to go into that. That last bit, is all quotation marks.
CODE

if (strSeperator == "\"")
{
list lstSeperator = ["\"\"", "\" \"", "\",\"", "\", \"", "\" ,\"", "\" , \""];
}


the "\"" is looking for a single " in front of the line. (one double quotation)
"\"\"" translates to a string containing "" (two double quotation)
"\" \" translates to " " (double quote, space, double quote)
"\",\"" to "," (quote, comma, quote)
"\", \"" to ", " (quote, comma, space, quote)
"\" ,\"" to " ," (quote, space, comma, quote)
and finally
"\" , \"" to " , " (quote, space, comma, space, quote)

Trying to cover standard possibilities. Of course, first character and last character of the line would be assumed to contain a single double quotation mark (first character is verified already, actually. last is assumed) and removed.

Again it's probably much more complex than it needs to be. Perhaps just making that first simple one available for those who would like that flexibility..
Strife Onizuka
Moonchild
Join date: 3 Mar 2004
Posts: 5,887
10-29-2006 23:21
You're initial parsing script is accentually the same as a set of functions I wrote ages ago; TightList is what i call mine (except mines optimized). Anyway, here are a couple dump functions i wrote, the first uses an internal list of separators; the second uses the same list but the use can seed their own separators as well. I also wrote a typed version of TightList, that can regenerate the types as well.

Using multi-character separators is problematic, while the characters don't have to be unique to the string, the arrangement of them does. I found that making sure that it doesn't cause conflicts of that type was very costly.

Since LSL uses UTF-8, i figured that there was no shortage of single character separators (2 ^ 31; though they may be multi-byte). Somewhere I have a TL version that can generate UTF-8 separators on demand.

On a similar topic I have also written a number of other functions
Unescape: C style escapes for strings.
Float2Hex: Quickly & Safely store floats in strings without need for a custom parser.
Float2Sci: Safely store a float in a string in scientific notation, much slower then Float2Hex, output easily readable (no parsers needed).

CODE

list TightListParse(string a)
{
string b = llGetSubString(a,0,0);//save memory
return llParseStringKeepNulls(llDeleteSubString(a,0,0), [a=b],[]);
}

string TightListDump(list a)
{//TLD(simple) makes a string from a list using a seperator that is supposed to be unique to the string
string b = "|/?!@#$%^&*()_=:;~`'<>{}[],.\n\" qQxXzZ\\";//Good character set of rarely used letters.
string c = (string)a;//dump the list without a seperator
integer d = -39;
do; while(~llSubStringIndex(c,llGetSubString(b,d,d)) && (d= -~d));//search for unique seperator
c = llGetSubString(b,d,d);//extract the seperator, saves memory (by releasing c)
return c + llDumpList2String(a, c);//pray we don't have a collision.
} //This function is by no means flawless, it is better to use the complex version if you fear collisions.

string TightListDump(list a, string b)
{//TLD(complex) makes a string from a list using a seperator that is supposed to be unique to the string
string c = (string)a;//dump the list without a seperator
integer d = -39 - llStringLength(b);
if(d == -40)
if(!~llSubStringIndex(c,b))
jump end;//woot, we were given a unique seperator
b += "|/?!@#$%^&*()_=:;~`'<>{}[],.\n\" qQxXzZ\\";//Good character set of rarely used letters.
do; while(~llSubStringIndex(c,llGetSubString(b,d,d)) && (d=-~d));//search for unique seperator
b = llGetSubString(b,d,d);
@end;
c = "";//save memory
return b + llDumpList2String((a = []) + a, b);
}
_____________________
Truth is a river that is always splitting up into arms that reunite. Islanded between the arms, the inhabitants argue for a lifetime as to which is the main river.
- Cyril Connolly

Without the political will to find common ground, the continual friction of tactic and counter tactic, only creates suspicion and hatred and vengeance, and perpetuates the cycle of violence.
- James Nachtwey
Aakanaar LaSalle
Registered User
Join date: 1 Sep 2006
Posts: 132
10-29-2006 23:36
kewl.. I'm sure those dump functions might come in handy some day.. For mine, since it's reading from a notecard, I can't use them.

Basically, I'll have a notecard with lots of lines.. start off with a few, and add to them over time. I originally was thinking of using the first line of the notecard to tell the script which character to parse by, but got to thinking, that is too limiting.

Using this method, if I get 75 lines of text using an asterisk as a seperator and suddenly find myself needing to add an asterisk to a line as data, I wouldn't have to go back through all 75 lines and re-do every single one, nor would I have to update the script at all.

The same goes for any other character that I may think is safe now, but may end up needing to be used some day in the future.
Strife Onizuka
Moonchild
Join date: 3 Mar 2004
Posts: 5,887
10-30-2006 00:34
From: Aakanaar LaSalle
kewl.. I'm sure those dump functions might come in handy some day.. For mine, since it's reading from a notecard, I can't use them.

Basically, I'll have a notecard with lots of lines.. start off with a few, and add to them over time. I originally was thinking of using the first line of the notecard to tell the script which character to parse by, but got to thinking, that is too limiting.

Using this method, if I get 75 lines of text using an asterisk as a seperator and suddenly find myself needing to add an asterisk to a line as data, I wouldn't have to go back through all 75 lines and re-do every single one, nor would I have to update the script at all.

The same goes for any other character that I may think is safe now, but may end up needing to be used some day in the future.


Absolutely, I've used it in notecards too, but i also use it for moving data between scripts (relatively fast and don't have to worry about separators 99% of the time).
_____________________
Truth is a river that is always splitting up into arms that reunite. Islanded between the arms, the inhabitants argue for a lifetime as to which is the main river.
- Cyril Connolly

Without the political will to find common ground, the continual friction of tactic and counter tactic, only creates suspicion and hatred and vengeance, and perpetuates the cycle of violence.
- James Nachtwey