Second Life Forums Archive - RFC: adding memoization to functions in LSL

ThornnAdenine Maladay

Registered User

Join date: 21 Jan 2006

Posts: 5

01-27-2006 08:51

Abstract: I propose that three new keywords be added to the LSL:

1. memoset: would set a value in memory keyed to the values of the currently encapsulated function, or replace previously set value if one is already set.
Syntax: memoset <value>;
Errors: if no memory remains, raise runtime error. If not correct type (<value> must be type of function return) or used outside of a function, raise compile error.

2. memoget: would return a value set by a memoset, or a null value if not set previously (0,"",<0,0,0>,<0,0,0,0>,[],NULL_KEY as appropriate)

3. ismemod: would return 1 if a value is memod in the function, 0 if not.

4. (thanks Argent Stonecutter!) function: would attribute a function name, forcing that function to return the same value that it returned the first time it was called with the same parameters.

Inspiration:
One of the more daunting tasks in SL when building a complex object is the amount of legwork you have to go through to manage a large amount or memory, as well as the lack of a pointer functionality to allow the representation of internal structure. This leads to many runtime errors, slow programs, and inevitably unmanagable code if the structure gets too big.

Much of this headache is due to the fact that LSL is a call-by-value language. While it's a very wise choice from a security standpoint to make LSL call-by-value (no pointer atttacks, buffer overflows, etc), it makes basic list operations extremely painful. Not only that, it renders recursion pointless because the lack of memoization causes exponential time for most useful recursive algorithms.

Concept and examples of usage:
To aleviate these issues while at the same time preserving the CbV nature of LSL, I propose that any function be allowed to be memoized if memory allows. The keywords would allow a memo value to be set (via memoset), a boolean check to see if a value is already there (via ismemod), and a call that would allow the value to be returned (memoget). Here's an example of a fibonacci algorithm:
integer fib(integer n){
if(ismemod){
return memoget;
}
if(n<2){
memoset 1;
return 1;
}
return fib(n-1)+fib(n-2);
}

Using the function keyword, this can be simplified as such:
function integer fib(integer n){
if(n<2)return 1;
return fib(n-1)+fib(n-2)
}

Using memo keywords would allow this to execute in O(n) steps instead of O(2^n) steps, without the use of a list or array. Using global variables, here's a possible array of strings implementation:

string val;
integer set=0;

setA(integer at, string v){
set=1;
val=v;
arrayA(at);
set=0;
}

string getA(int at){
if(set){
memoset val;
}else{
return memoget;
}
}

If you wanted extra dimensions, you could use extra parameters in the array call. If you wanted a more complicated stored value, make val of type list.

Implementation:
The LSL engine could make a constant size hashtable of the function address and all parameters hashed into the key, and the memoget as the value. When memoset is called, it enters the value into the hashtable, along with a list of the function and parameters. If there's a collision, you can use the next value mod the size of the table. If no next values remain, raise a memory runtime error. Lookup (via memoget) is a simple hash and value grab returning first valid key. ismemod works similarly to memoget.

Security:
The functional nature of the simulated array assures that CbV is preserved, and the memo keywords allow a simple transit into a call-by-reference world by necessity only. Since the simulator is in complete control of the hashtable, there's no way for values to escape. Also, since the hashtable can be easily locked when it's being accessed (like a synchronized Hashtable in Java), there's no race conditions that can occur with native implementations of large internal storage.

Argent Stonecutter

Emergency Mustelid

Join date: 20 Sep 2005

Posts: 20,263

01-27-2006 11:50

1. References do not need to be subject to buffer overflows or pointer attacks. There are meny examples of languages that support references without the dangers of raw pointers. The lack of references seems to be more a matter of simplifying memory management.

2. Adding hashes to the language, even if they were global and couldn't be passed by reference, would provide the functionality you want and more, and wouldn't be any harder than adding implicit hashes for memoization.

3. If you're going to explicitly extend LSL with just this functionality, why not give it a sane syntax, like function fib(integer n);, where function declares that fib has no state and will always return the same results for the same input, and hide all the bookkeeping?

ThornnAdenine Maladay

Registered User

Join date: 21 Jan 2006

Posts: 5

Responses to implementing references suggestion.

01-27-2006 14:03

From: Argent Stonecutter

1. References do not need to be subject to buffer overflows or pointer attacks. There are meny examples of languages that support references without the dangers of raw pointers. The lack of references seems to be more a matter of simplifying memory management.

2. Adding hashes to the language, even if they were global and couldn't be passed by reference, would provide the functionality you want and more, and wouldn't be any harder than adding implicit hashes for memoization.

3. If you're going to explicitly extend LSL with just this functionality, why not give it a sane syntax, like function fib(integer n);, where function declares that fib has no state and will always return the same results for the same input, and hide all the bookkeeping?

To 1: I can totally grant you that references are safer from the perspective of buffer overflow and pointer attack. However, references are not necessarily safe in the world of multithreading. It is so much easier to create a race condition when one variable change screws up another part of code through the use of a reference. A call-by-value language has better protection against race condidions.
Furthermore, to implement references you have to have to change the dynamics of the language. Specifically, you have to allow the L-Value of an assignment to become a moving target. This is much harder to implement than you would think. Imagine the number of scripts that would break if call-by-reference was allowed implicitly, when these functions are always expected to make a copy? I was looking for a method that would be much easier for the developers to throw in that wouldn't take too much time away from their Mono development, and would break as few scripts out there as possible.

To 2: To implement hashes, again you'd either need a variable L-value again (else how'd you implement things like securitylevel[customer_name]="superuser"?) or use functions to manipulate the hashes, which would have the same problems as before, because being a call-by value language, you'd have to pass the hash in and back out, again an O(N) implementation.

To 3: The function keyword idea is a very good one, and should be added to the request. The reason the syntax is the way it is because I wanted function writers to have full control of the memoization. You see, I'm trying to solve two problems at once. The first problem to solve is "How do I speed up recursion?". In this situation, you're entirely correct that a simple keyword tag to a function is sufficient.
However, I'm also simultaneously trying to solve the problem of creating a large manipulable variable that can modify in O(1) time in a call-by-value language. This means that you have to use functions to manipulate the variable, because you can't pass the variable in by reference without changing the dynamics of the language. You could force the variable to be global, but the cleaner method is to encapsulate the variable into the function, and these memoization commands are the simplest way of doing just that without messing any scripts up.

Personally, I don't mind the language being call-by-value. I just want to get a method of using more efficient data structures so that script writers can write more computationally-significant code without taking down the sim that they code it in.

I just have this feeling that changing to a call-by-reference language at the tip of a hat will cause more trouble than it's worth. This is a way to dodge that bullet.

Argent Stonecutter

Emergency Mustelid

Join date: 20 Sep 2005

Posts: 20,263

01-27-2006 16:13

From: ThornnAdenine Maladay

references are not necessarily safe in the world of multithreading.

1. LSL isn't multithreaded. 2. Global variables have more problems, and if you don't have references you have to use more globals.

From: someone

Specifically, you have to allow the L-Value of an assignment to become a moving target. This is much harder to implement than you would think.

I implemented my first parser for a programming language in 1979. I have an idea of what's involved, and it's not a big deal. I'm not going to duplicate my comments in the other thread, we can talk about references there. I'll just say:

From: someone

Imagine the number of scripts that would break if call-by-reference was allowed implicitly, when these functions are always expected to make a copy?

That's why you wouldn't do that. Extending a language so that you don't break existing code is not a new problem, and there's all kinds of ways to make it work.

From: someone

To implement hashes, again you'd either need a variable L-value again (else how'd you implement things like securitylevel[customer_name]="superuser"?)

Picking a completely arbitrary syntax for a stack-based implementation:

CODE


PUSH REF security_level
PUSH REF customer_name
CALL internalRef2String
CALL internalArrayFindRef
PUSH STRING "superuser"
CALL internalStoreStringViaRef

From: someone

or use functions to manipulate the hashes, which would have the same problems as before, because being a call-by value language, you'd have to pass the hash in and back out, again an O(N) implementation.

Since a hash is a new type, making a hash and only a hash call-by-reference will not break any existing scripts. It would also allow the use of hashes instead of globals, making new code MORE serially-reusable or even reentrant.

From: someone

However, I'm also simultaneously trying to solve the problem of creating a large manipulable variable that can modify in O(1) time in a call-by-value language.

That's not a problem that needs to be solved, because there's no inherent restrictions in the syntax or semantics of LSL that would make treating a hash as a reference any harder than adding a new type of variable would be in the first place. And it would be a lot easier to understand, and a lot more generally useful.

From: someone

I just have this feeling that changing to a call-by-reference language at the tip of a hat will cause more trouble than it's worth. This is a way to dodge that bullet.

I think you're too worried about references.

Strife Onizuka

Moonchild

Join date: 3 Mar 2004

Posts: 5,887

01-27-2006 18:43

Screw this, your thinking too hard. Sometime in the near future, we will get Mono, which is a .Net VM. Assuming LL makes their LSL library available for us to link(is that the correct term?), we will be able to use any language with a CIL compiler. Basicly, it means no more screwing around with anitiquated LSL and it's flaws.

Anyway, LL has stated that until mono is released they aren't going to make any major changes to LSL.

_____________________

Truth is a river that is always splitting up into arms that reunite. Islanded between the arms, the inhabitants argue for a lifetime as to which is the main river.
- Cyril Connolly

Without the political will to find common ground, the continual friction of tactic and counter tactic, only creates suspicion and hatred and vengeance, and perpetuates the cycle of violence.
- James Nachtwey

Satchmo Prototype

eSheep

Join date: 26 Aug 2004

Posts: 1,323

01-27-2006 20:18

From: Strife Onizuka

Screw this, your thinking too hard. Sometime in the near future, we will get Mono, which is a .Net VM. Assuming LL makes their LSL library available for us to link(is that the correct term?), we will be able to use any language with a CIL compiler.

Aye Aye... although the first implementation of mono will only run LSL2, which from what I can tell is LSL1 ported to mono. But the rest of the CIL languages will follow

Babbage has some info on The Creation Engine, including the blazing new speed of LSL2.

_____________________

----------------------------------------------------------------------------------------------------------------
The Electric Sheep Company
Satchmo Blogs: The Daily Graze
Satchmo del.icio.us

Argent Stonecutter

Emergency Mustelid

Join date: 20 Sep 2005

Posts: 20,263

01-28-2006 13:46

From: Strife Onizuka

Screw this, your thinking too hard. Sometime in the near future, we will get Mono, which is a .Net VM.

Don't hold your breath.

RFC: adding memoization to functions in LSL
ThornnAdenine Maladay Registered User Join date: 21 Jan 2006 Posts: 5	01-27-2006 08:51 Abstract: I propose that three new keywords be added to the LSL: 1. memoset: would set a value in memory keyed to the values of the currently encapsulated function, or replace previously set value if one is already set. Syntax: memoset <value>; Errors: if no memory remains, raise runtime error. If not correct type (<value> must be type of function return) or used outside of a function, raise compile error. 2. memoget: would return a value set by a memoset, or a null value if not set previously (0,"",<0,0,0>,<0,0,0,0>,[],NULL_KEY as appropriate) 3. ismemod: would return 1 if a value is memod in the function, 0 if not. 4. (thanks Argent Stonecutter!) function: would attribute a function name, forcing that function to return the same value that it returned the first time it was called with the same parameters. Inspiration: One of the more daunting tasks in SL when building a complex object is the amount of legwork you have to go through to manage a large amount or memory, as well as the lack of a pointer functionality to allow the representation of internal structure. This leads to many runtime errors, slow programs, and inevitably unmanagable code if the structure gets too big. Much of this headache is due to the fact that LSL is a call-by-value language. While it's a very wise choice from a security standpoint to make LSL call-by-value (no pointer atttacks, buffer overflows, etc), it makes basic list operations extremely painful. Not only that, it renders recursion pointless because the lack of memoization causes exponential time for most useful recursive algorithms. Concept and examples of usage: To aleviate these issues while at the same time preserving the CbV nature of LSL, I propose that any function be allowed to be memoized if memory allows. The keywords would allow a memo value to be set (via memoset), a boolean check to see if a value is already there (via ismemod), and a call that would allow the value to be returned (memoget). Here's an example of a fibonacci algorithm: integer fib(integer n){ if(ismemod){ return memoget; } if(n<2){ memoset 1; return 1; } return fib(n-1)+fib(n-2); } Using the function keyword, this can be simplified as such: function integer fib(integer n){ if(n<2)return 1; return fib(n-1)+fib(n-2) } Using memo keywords would allow this to execute in O(n) steps instead of O(2^n) steps, without the use of a list or array. Using global variables, here's a possible array of strings implementation: string val; integer set=0; setA(integer at, string v){ set=1; val=v; arrayA(at); set=0; } string getA(int at){ if(set){ memoset val; }else{ return memoget; } } If you wanted extra dimensions, you could use extra parameters in the array call. If you wanted a more complicated stored value, make val of type list. Implementation: The LSL engine could make a constant size hashtable of the function address and all parameters hashed into the key, and the memoget as the value. When memoset is called, it enters the value into the hashtable, along with a list of the function and parameters. If there's a collision, you can use the next value mod the size of the table. If no next values remain, raise a memory runtime error. Lookup (via memoget) is a simple hash and value grab returning first valid key. ismemod works similarly to memoget. Security: The functional nature of the simulated array assures that CbV is preserved, and the memo keywords allow a simple transit into a call-by-reference world by necessity only. Since the simulator is in complete control of the hashtable, there's no way for values to escape. Also, since the hashtable can be easily locked when it's being accessed (like a synchronized Hashtable in Java), there's no race conditions that can occur with native implementations of large internal storage.
Argent Stonecutter Emergency Mustelid Join date: 20 Sep 2005 Posts: 20,263	01-27-2006 11:50 1. References do not need to be subject to buffer overflows or pointer attacks. There are meny examples of languages that support references without the dangers of raw pointers. The lack of references seems to be more a matter of simplifying memory management. 2. Adding hashes to the language, even if they were global and couldn't be passed by reference, would provide the functionality you want and more, and wouldn't be any harder than adding implicit hashes for memoization. 3. If you're going to explicitly extend LSL with just this functionality, why not give it a sane syntax, like function fib(integer n);, where function declares that fib has no state and will always return the same results for the same input, and hide all the bookkeeping?
ThornnAdenine Maladay Registered User Join date: 21 Jan 2006 Posts: 5	Responses to implementing references suggestion. 01-27-2006 14:03 From: Argent Stonecutter 1. References do not need to be subject to buffer overflows or pointer attacks. There are meny examples of languages that support references without the dangers of raw pointers. The lack of references seems to be more a matter of simplifying memory management. 2. Adding hashes to the language, even if they were global and couldn't be passed by reference, would provide the functionality you want and more, and wouldn't be any harder than adding implicit hashes for memoization. 3. If you're going to explicitly extend LSL with just this functionality, why not give it a sane syntax, like function fib(integer n);, where function declares that fib has no state and will always return the same results for the same input, and hide all the bookkeeping? To 1: I can totally grant you that references are safer from the perspective of buffer overflow and pointer attack. However, references are not necessarily safe in the world of multithreading. It is so much easier to create a race condition when one variable change screws up another part of code through the use of a reference. A call-by-value language has better protection against race condidions. Furthermore, to implement references you have to have to change the dynamics of the language. Specifically, you have to allow the L-Value of an assignment to become a moving target. This is much harder to implement than you would think. Imagine the number of scripts that would break if call-by-reference was allowed implicitly, when these functions are always expected to make a copy? I was looking for a method that would be much easier for the developers to throw in that wouldn't take too much time away from their Mono development, and would break as few scripts out there as possible. To 2: To implement hashes, again you'd either need a variable L-value again (else how'd you implement things like securitylevel[customer_name]="superuser"?) or use functions to manipulate the hashes, which would have the same problems as before, because being a call-by value language, you'd have to pass the hash in and back out, again an O(N) implementation. To 3: The function keyword idea is a very good one, and should be added to the request. The reason the syntax is the way it is because I wanted function writers to have full control of the memoization. You see, I'm trying to solve two problems at once. The first problem to solve is "How do I speed up recursion?". In this situation, you're entirely correct that a simple keyword tag to a function is sufficient. However, I'm also simultaneously trying to solve the problem of creating a large manipulable variable that can modify in O(1) time in a call-by-value language. This means that you have to use functions to manipulate the variable, because you can't pass the variable in by reference without changing the dynamics of the language. You could force the variable to be global, but the cleaner method is to encapsulate the variable into the function, and these memoization commands are the simplest way of doing just that without messing any scripts up. Personally, I don't mind the language being call-by-value. I just want to get a method of using more efficient data structures so that script writers can write more computationally-significant code without taking down the sim that they code it in. I just have this feeling that changing to a call-by-reference language at the tip of a hat will cause more trouble than it's worth. This is a way to dodge that bullet.
Argent Stonecutter Emergency Mustelid Join date: 20 Sep 2005 Posts: 20,263	01-27-2006 16:13 From: ThornnAdenine Maladay references are not necessarily safe in the world of multithreading. 1. LSL isn't multithreaded. 2. Global variables have more problems, and if you don't have references you have to use more globals. From: someone Specifically, you have to allow the L-Value of an assignment to become a moving target. This is much harder to implement than you would think. I implemented my first parser for a programming language in 1979. I have an idea of what's involved, and it's not a big deal. I'm not going to duplicate my comments in the other thread, we can talk about references there. I'll just say: From: someone Imagine the number of scripts that would break if call-by-reference was allowed implicitly, when these functions are always expected to make a copy? That's why you wouldn't do that. Extending a language so that you don't break existing code is not a new problem, and there's all kinds of ways to make it work. From: someone To implement hashes, again you'd either need a variable L-value again (else how'd you implement things like securitylevel[customer_name]="superuser"?) Picking a completely arbitrary syntax for a stack-based implementation: CODE PUSH REF security_level PUSH REF customer_name CALL internalRef2String CALL internalArrayFindRef PUSH STRING "superuser" CALL internalStoreStringViaRef From: someone or use functions to manipulate the hashes, which would have the same problems as before, because being a call-by value language, you'd have to pass the hash in and back out, again an O(N) implementation. Since a hash is a new type, making a hash and only a hash call-by-reference will not break any existing scripts. It would also allow the use of hashes instead of globals, making new code MORE serially-reusable or even reentrant. From: someone However, I'm also simultaneously trying to solve the problem of creating a large manipulable variable that can modify in O(1) time in a call-by-value language. That's not a problem that needs to be solved, because there's no inherent restrictions in the syntax or semantics of LSL that would make treating a hash as a reference any harder than adding a new type of variable would be in the first place. And it would be a lot easier to understand, and a lot more generally useful. From: someone I just have this feeling that changing to a call-by-reference language at the tip of a hat will cause more trouble than it's worth. This is a way to dodge that bullet. I think you're too worried about references.
Strife Onizuka Moonchild Join date: 3 Mar 2004 Posts: 5,887	01-27-2006 18:43 Screw this, your thinking too hard. Sometime in the near future, we will get Mono, which is a .Net VM. Assuming LL makes their LSL library available for us to link(is that the correct term?), we will be able to use any language with a CIL compiler. Basicly, it means no more screwing around with anitiquated LSL and it's flaws. Anyway, LL has stated that until mono is released they aren't going to make any major changes to LSL. _____________________ Truth is a river that is always splitting up into arms that reunite. Islanded between the arms, the inhabitants argue for a lifetime as to which is the main river. - Cyril Connolly Without the political will to find common ground, the continual friction of tactic and counter tactic, only creates suspicion and hatred and vengeance, and perpetuates the cycle of violence. - James Nachtwey
Satchmo Prototype eSheep Join date: 26 Aug 2004 Posts: 1,323	01-27-2006 20:18 From: Strife Onizuka Screw this, your thinking too hard. Sometime in the near future, we will get Mono, which is a .Net VM. Assuming LL makes their LSL library available for us to link(is that the correct term?), we will be able to use any language with a CIL compiler. Aye Aye... although the first implementation of mono will only run LSL2, which from what I can tell is LSL1 ported to mono. But the rest of the CIL languages will follow Babbage has some info on The Creation Engine, including the blazing new speed of LSL2. _____________________ ---------------------------------------------------------------------------------------------------------------- The Electric Sheep Company Satchmo Blogs: The Daily Graze Satchmo del.icio.us
Argent Stonecutter Emergency Mustelid Join date: 20 Sep 2005 Posts: 20,263	01-28-2006 13:46 From: Strife Onizuka Screw this, your thinking too hard. Sometime in the near future, we will get Mono, which is a .Net VM. Don't hold your breath.

Welcome to the Second Life Forums Archive

RFC: adding memoization to functions in LSL