Welcome to the Second Life Forums Archive

These forums are CLOSED. Please visit the new forums HERE

Text Input & Command Parsing

Tyken Hightower
Automagical
Join date: 15 Feb 2006
Posts: 472
12-01-2006 00:32
Just for reference, I'm interested in figuring out what the most efficient method of parsing user chat commands and comparing them is. Specifically, in instances where the command could take any of the following forms:
  1. Single character or word
  2. Single character or word followed by additional parameters, like a couple of separate integers
  3. Single command and/or list-type set of rule parameters, such that can be input in any particular order


Although the last isn't so much of a concern.

The case I'm most interested in is the second. There's some common choices here, and I'm not sure which will get the job done fastest:

if (msg == "<insert command here>";)
This is nice for case one, but unhelpful otherwise. In a situation where you have a heterogenous mix of single and multi-parameter commands, you could at least save by putting all the singles first with this, of course.


llGetSubString()
Fairly quick, but in a situation where you have a lot of potential commands of varying lengths (such that you can't just create a single string variable and keep comparing it, you'd need to keep calling the function to test every potential match) it can get inefficient, I'd suppose.


llSubStringIndex()
I'm assuming this by its nature is automatically slower than the previous and clearly has even less desirable worst-case scenarios. Truth?


llParseString2List() / Related
While this way of doing things is extremely helpful for case 3 and easy to use coding-wise in general, the potential for stack/heap explosion is off the charts, especially with the latest update. It also seems intuitively like it would be the most costly memory and time-wise. Also, since no-mod scripts can't be reset by anyone unless it's the creator resetting something he owns, avoiding potential crashes is a must, primarily on no-trans products. There's just nothing to be done if it breaks.


Suggestions, clarifications to any misconceptions, or otherwise are much appreciated! :o
_____________________
DoteDote Edison
Thinks Too Much
Join date: 6 Jun 2004
Posts: 790
12-01-2006 01:54
A) Use a dialog for the command, then accept further input variables via chat prompts.

B) Prompt, via chat, for every single variable.

C) require the input be formatted to match your script... for example, request CSV entries, so that you can use a simple llCSV2List() function. Maybe have the script say a simple formatted text, which the user can then copy/paste back to your script, replacing default data with their own data.

D) http the string to your server to be procesed by an offline script, with only useful data returned to the LSL script.
Tyken Hightower
Automagical
Join date: 15 Feb 2006
Posts: 472
12-01-2006 05:53
From: DoteDote Edison
A) Use a dialog for the command, then accept further input variables via chat prompts.

B) Prompt, via chat, for every single variable.

C) require the input be formatted to match your script... for example, request CSV entries, so that you can use a simple llCSV2List() function. Maybe have the script say a simple formatted text, which the user can then copy/paste back to your script, replacing default data with their own data.

D) http the string to your server to be procesed by an offline script, with only useful data returned to the LSL script.

I appreciate the input, but I'm really looking for methods that stick to using direct chat parsing. This is for when dialog is unreasonable.. prompting actually makes coding more difficult and uses relatively more effort. CSV is fine, it's really no different from a space-separated llParseString2List, and while it's the easiest to use, it requires the most memory (potentially) and probably the most cycles to process. Making a server backend is sort of out of the question, and potentially unreliable.
_____________________
Joannah Cramer
Registered User
Join date: 12 Apr 2006
Posts: 1,539
12-01-2006 06:55
From: Tyken Hightower
Just for reference, I'm interested in figuring out what the most efficient method of parsing user chat commands and comparing them is.

Direct operation on strings and their chunks (substring etc) is incredibly slow, while breaking things into list (parsestring) can be noticeably faster but eat tons of memory.

As such there's no single answer really, it'll depend on your idea of what efficient *is* and that's going to depend on specific cases to boot ^^;;

I'd personally lean towards the lists when it can be afforded memory-wise, and strings when it cannot, but even here there's exceptions -- substringindex() can be very handy for pattern matching, e.g.
Talarus Luan
Ancient Archaean Dragon
Join date: 18 Mar 2006
Posts: 4,831
12-01-2006 07:44
I've written some programmable command parser scripts, which tend to be both slow and memory-intensive (especially if you want to do input validation and bounds-checking). However, like anything else, the more complex your input is, the more intensive the parse process will become; it's just the nature of the beast.

That said, I have found that llParseString2List works pretty well under most controlled circumstances for complex input processing. I am aware of the issue with it being memory hungry, and also with the issue of long griefer chat lines ("a a a a a a ....." to 1023 chars) causing problems with the Parse function. The latter problem can be ameliorated quite effectively by limiting the length of input lines before you call the Parse function (ie laList = llParseString2List(llGetSubString(lsString,0,63),[" "],[]); ). Still, someone with a grief line will cause the usage of at most 440 bytes of memory in the resultant list (which doubles as soon as you try to access it).

Unfortunately, there's no "Magic Bullet" solution to all your parsing needs. It would be nice if there was a good, programmable parser engine accessible as a set of LSL functions, but it's probably beyond the scope of what they want to do with LSL functions (given that they didn't even bother implementing min/max and string type-checking functions).
Tyken Hightower
Automagical
Join date: 15 Feb 2006
Posts: 472
12-01-2006 07:45
From: Joannah Cramer
Direct operation on strings and their chunks (substring etc) is incredibly slow, while breaking things into list (parsestring) can be noticeably faster but eat tons of memory.

As such there's no single answer really, it'll depend on your idea of what efficient *is* and that's going to depend on specific cases to boot ^^;;

I'd personally lean towards the lists when it can be afforded memory-wise, and strings when it cannot, but even here there's exceptions -- substringindex() can be very handy for pattern matching, e.g.

I'm with you here, but I'm not totally sure on whether llGetSubString is actually slower than handling with the parse built-ins, so I was hoping that someone could eventually give an exact answer in terms of the performance with regards to the stack, etc..

I'm a fan of llSubStringIndex(), as I mentioned, but in the case that your string is actually 1000+ characters, it's got to check the whole thing for matches, which I assume is slower than other possibilities. There's always worst-case scenarios and overall average performance to consider. The memory risks with parse functions aren't always so bad, especially when you know you'll never be running low, but I'm looking for general speed.
_____________________
Tyken Hightower
Automagical
Join date: 15 Feb 2006
Posts: 472
12-01-2006 07:51
From: Talarus Luan
I've written some programmable command parser scripts, which tend to be both slow and memory-intensive (especially if you want to do input validation and bounds-checking). However, like anything else, the more complex your input is, the more intensive the parse process will become; it's just the nature of the beast.

That said, I have found that llParseString2List works pretty well under most controlled circumstances for complex input processing. I am aware of the issue with it being memory hungry, and also with the issue of long griefer chat lines ("a a a a a a ....." to 1023 chars) causing problems with the Parse function. The latter problem can be ameliorated quite effectively by limiting the length of input lines before you call the Parse function (ie laList = llParseString2List(llGetSubString(lsString,0,63),[" "],[]); ). Still, someone with a grief line will cause the usage of at most 440 bytes of memory in the resultant list (which doubles as soon as you try to access it).

Unfortunately, there's no "Magic Bullet" solution to all your parsing needs. It would be nice if there was a good, programmable parser engine accessible as a set of LSL functions, but it's probably beyond the scope of what they want to do with LSL functions (given that they didn't even bother implementing min/max and string type-checking functions).

440 bytes is nothing compared to 5-6k, though, and there's ways to optimize and save yourself space elsewhere if you're strapped enough to worry about that 880 bytes. I generally have at least 3k free to operate, or in the case of my recent whiteboards (nothing but string and command parsing), 7-8k free (sadly, this isn't enough anymore, have to update and break apart more script functions to maintain full functionality in all cases). :)
_____________________
Talarus Luan
Ancient Archaean Dragon
Join date: 18 Mar 2006
Posts: 4,831
12-01-2006 08:14
Oh, parsing a string raw with loops and string-handling functions is probably on the order of at least 10 times slower than using llParseString2List. I don't have any benchmarks on hand at the moment, but it is not hard to understand why it would be so hugely disparate.

Like I said, though, it really depends on the complexity of your input. If you are just handling a one-word command with one or two simple numeric parameters, you can probably get away with raw string processing. Anything significantly more complex than that will require the use of the Parse functions (probably in addition to lots of string handling).
Dimentox Travanti
DCS Coder
Join date: 10 Sep 2006
Posts: 228
12-01-2006 11:20
When i first started coding i wanted to just have a thing that parsed the first command before the space..

Well now i just hard code the LLGetSubString()
Its muct easier and i was just ebing lazy
_____________________
LSL Scripting Database - http://lsl.dimentox.com
Newgate Ludd
Out of Chesse Error
Join date: 8 Apr 2005
Posts: 2,103
12-01-2006 13:37
I generally use llParseString2List purely for its simplicity and ease of use.
Its certainly faster than multiple llGetSubString type comparisons and doesnt require that you know the length of the commands being passed in (Lazy of me)

I usually use the same code to parse the chat / dialog input as I do any notecard parameters, at least for 'simple' command value type inputs.
Hewee Zetkin
Registered User
Join date: 20 Jul 2006
Posts: 2,702
12-01-2006 15:17
My commands typically always have a prefix that is specific to the script/object/system that they are dealing with. This allows multiple systems to coexist on the same channel, in case of user preferences, happenstance, etc. Thus all inputs to my train system might use '/345 train...', and all inputs to a custom sercuity system might use '/843 trsec...'. Almost always this is followed by one of a set of commands, and then possibly a list of arguments (specific to the particular command). Rarely (usually when I have randomly selected a large negative channel), I will leave off the prefix.

That being said, usually I use a CSV format for at least the commands and the parameters. The prefix I usually either separate from the rest with a space (preferred; I tend to test the beginning of the string and return if the prefix isn't matched, parsing the rest of the string as little as possible in this case) or another comma (ease of development when parsing is always likely to happen). So, my listens tend to look like either:
CODE
string CMD_PREFIX = "myprefix ";
integer CMD_PREFIX_LEN;
...
CMD_PREFIX_LEN = llStringLength(CMD_PREFIX);
...
listen(integer channel, string name, key id, string message)
{
...
if (llSubStringIndex(message, CMD_PREFIX) != 0 &&
llStringLength(message) <= CMD_PREFIX_LEN)
{
return;
}

string paramStr = llGetSubString(message, CMD_PREFIX_LEN, -1);
list params = llCSV2List(paramStr);
integer nParams = llGetListLength(params);

string command = llList2String(params, 0);
...
}

or:
CODE
string CMD_PREFIX = "myprefix";
...
listen(integer channel, string name, key id, string message)
{
...
list params = llCSV2List(message);
integer nParams = llGetListLength(params);

if (nParams < 2 || llList2String(params, 0) != CMD_PREFIX)
llStringLength(message) <= CMD_PREFIX_LEN)
{
return;
}

string command = llList2String(params, 1);
...
}

The reason I use 'llSubStringIndex()' in the first one instead of testing length and comparing a sub-string is that theoretically this does not need to copy any part of either string. I figure a string search is less intensive, both in terms of memory and execution time, than memory allocation. However, I admit that I have assumed this rather than actually testing it. It is also POSSIBLE that getting a substring doesn't require dynamically allocating memory, but I HIGHLY doubt they've been so optimal in their string implementation.
carol Wombat
Registered User
Join date: 29 Jan 2006
Posts: 16
12-01-2006 15:57
Put things in perspective! Mono is coming! I know it's been coming for a long time, but at this stage I understand the stumbling block is the serialization in sim to sim transportability. Hopefully we will see it sometime next year!

Having looked at the big .net video demo, it seems we will gain speed, 50 to 150 times as fast, but not memory space, 16 K limit is still there. Given that nothing else changes (????) surely we should be space optimising and forget about speed. String manipulators, here I come!!!
Hewee Zetkin
Registered User
Join date: 20 Jul 2006
Posts: 2,702
12-01-2006 16:22
What we REALLY need here, as I believe I just said in another thread as well, is built-in regular expression funcitons. Comparing, searching, replacing. Something like the basic functionality implemented in Perl and the Java API libraries.
Tyken Hightower
Automagical
Join date: 15 Feb 2006
Posts: 472
12-01-2006 17:03
Dreaming about things LL has promised to do or will virtually never do (yay Havok) is all well and good, but not the subject at hand.
_____________________
Nynthan Folsom
Registered User
Join date: 29 Aug 2006
Posts: 70
12-01-2006 17:37
Handle the parsing in a separate script to distribute the memory foot-print, but I would definitely use the list based parsing functions. MUCH faster and easier for larger scale parsing jobs.
Tyken Hightower
Automagical
Join date: 15 Feb 2006
Posts: 472
12-01-2006 17:44
From: Nynthan Folsom
Handle the parsing in a separate script to distribute the memory foot-print, but I would definitely use the list based parsing functions. MUCH faster and easier for larger scale parsing jobs.

I tend to agree here, especially for ease of use. I'm still hoping someone has some hard data to back up which functions will run faster.. Maybe I'll try writing some test scripts later, since it'll be more important later for some code re-evaluation I'm doing.
_____________________