Welcome to the Second Life Forums Archive

These forums are CLOSED. Please visit the new forums HERE

a quality proposal

Lotka Zagoskin
Registered User
Join date: 30 Sep 2006
Posts: 40
10-09-2006 09:56
The technical troubles plaguing SL recently have to do with growing pains, malicious scripts, and buggy scripts. No doubt many of these short term problems will be overcome. But in my opinion, long term, the factors which enable them could plague SL further, even imperil it.

Much of the problem is technical, using that term in its widest possible sense. SL isn't much fun without scripts to give its Actors life and character. Script writers current labor under resource constraints which suggest something like C or assembly as an appropriate means of programming. Instead, we have the Linden Scripting Language ("LSL";), advertised to be like object-oriented ("OO";) computational expressions. But clearly not: It has no ability to define classes. It has no direct means of polymorphic behavior. It has no inheritance. While not strictly an OO thing, it has no exception handling, even in crude try ... catch ... form, and that would be useful in many SL contexts.

But LSL is what it is, and it's what we have to work with. We can. We do that by recognizing we need to change how we as developers do things.

What I want to propose is a community embrace of professional principles of quality assurance and testing their scripts.

Now, I don't pretend muttering the words "quality assurance" or setting up some organization of testing folks, formal or informal, will do what needs to be done. I have specific ideas about how this can and should be done. I will get to them. Some of my ideas, if pursued, may mean scripts pose a greater in-place resource burden upon the scripting engine and that darn 16 KB memory limit. It is up to you to decide if the benefits of using these ideas are worth it. I think they are, and I'm dedicating some time to figure out an approach and set of methods that will achieve what I want to do here within present SL constraints. When the day comes that the 16 KB ceiling is raised and when the LSL is richer, the challenges of delivering great, robust scripts to the SL community will be eased. But no matter where that ceiling is, there will always be a requirement, a need for a script that brushes it, and there will always be a misguided argument that the kinds of built-in safeguards I am proposing are too expensive.

This is not primarily a gaming community. People don't want to "reset" their investments in time and money in the SL landscape because of a script or engine bug. People don't like it when they get spammed, nor do they enjoy mischief not under their control. When they have an unpleasant experience, what some people naturally do is reconsider what role SL as a world and community serves in their lives. Since SL is the new kid in town, it's really easy for it to lose on such reevaluations. Newcomers to SL are brave. They start out curious, interested, excited. They invest money, time, and effort in SL. They sometimes risk ridicule by friends and family for taking "just a game, just a computer fiction" seriously. So, if we really care about SL, I think we owe it to the newcomers, to the residents to do our jobs as script developers well. These are not just games mods. We can't expect users to know what they are getting into, or to have any attitude towards scripts other than they expect them to work. They expect them to work as close to their real-world analogues as possible, a lot like that of user interfaces. (See Joel Spolsky, User Interface Design for Programmers.) Sure, this is a new playing field. Sure, mistakes are natural in the fiendishly difficult engineering of event-driven programs. But we need to marshall the best techniques and resources we can to control these problems.

What I propose is that each package of scripts be accompanied by two or more scripts which provide a self-test capability. At least one of the scripts ought to be directly invokable from an object's menu. and it should call each of the others in turn in some reasonable sequence. One of the scripts ought to be a script which tests all preconditions about the state of the world it requires to execute successfully. The rest of the scripts ought to exercise and test each and every external interface the package has. This requires the developer to identify these interfaces, organize them, and devise tests for them. These should be presented in a logical manner within the self-test portions of the package. The effort to devise these tests should be a substantial part of the effort to build the package.

This is not a new idea. Similar techniques are used in much embedded software, notably software in avionics and nuclear controls and safety applications. Moreover, there's a well developed discipline pertaining to such test-focussed programming, a discipline argued by people like Kent Beck (see here for more), David Agans, and Ralph Johnson. Apart from developers adopting the technique, work has to be done to figure out ways of achieving scripts' effect in a manner detectable by their accompanying self-tests.

Some terminology is useful. The self-test code in a package is a test fixture. An individual check of an interface is called a test case. Cases are aggregated into test suites. Annunciations and checks sprinkled throughout executable scripts which enable their states to be ascertained are called instrumentation.

In LSL, there needs to be conventions about how his is done. For instance, the scripter ought to designate two channels, one on which all debug information is sent. The scripter should use llSay to write intermediate results or other checks to that channel. Test fixtures should establish listeners for these annuniciations. They should contain documented expectations about what the results of each test case should be. When such a case is exercised, the fixture should announce on the second channel a report of its success or failure, along with a case identifier. It should be possible to determine what that second channel is externally. I do not know yet whether the SL resident or user should be able to hear the chatter.

Some states and results are directly discernable. Other conditions require interrogating the status of objects or avatars.

Developers should keep formal release notes or records of changes made to a script, perhaps in the text of the notes of the package. The fix of any bug they encounter should involve adding at least one test case to its test suite which verifies that bug has been repaired.

I'd like to build in certain checks to prevent scripts from being improperly modified or coopted. We might do the equivalent of an MD5 hash on the script text and test that. Other defenses scripts might include in their self-test against misuse. The preconditions check might be exercised at the start of normal execution and stop the latter should the check fail. Preconditions might include some tests of appropriate use, such as an attempt to use a weapon in a non-weapon area.

At the moment, I am skeptical something like a test runner facility is needed. However, it should would be nice to have an environment outside of SL where LSL could be executed, a kind of offline sandbox. A test runner would be useful there.

There remains a lot of work to be done to establish these conventions. I see no way of doing that other than trying to rewrite existing scripts under this rubric, and write new ones. I hope to report on my efforts doing this here and in other places, perhaps a permanent home somewhere in world.

I would very much like your comments and thoughts on this, as well your efforts towards realizing it.

My hope is that we might achieve a consensus that such practice is essential. We could encourage users to only buy products with scripts having self-checking features. Maybe there's a role for a "good primkeeping seal of approval" here. Maybe Linden Labs might someday alter SL's code so publishing a script or object causes its self-test to be exercised. Should the test fail, the publication is rejected.

Why is this better than, say, a formal Quality Assurance approach?

Linden Labs itself and quite properly uses a battery of quality assurance tests to check that basic features of SL haven't broken in a release. One can imagine having a group of volunteer testers who receive scripts for testing and check them out, returning comments to the developers. This is unattractive for a couple of reasons.

One, the expectations and requirements of any package or script need to be communicated in writing to the testing group. There are many levels at which such expectations can be conveyed. Testers want precision here, nothing left to be defined. Developers may have difficulty describing what they expect and how they expect something to work. Natural language is likely to be the medium used. That's awkward and hard to maintain, harder than code sometimes.

Two, there will inevitably be a lag between submission of a package and its completed testing. The Q&A testers will need a system and tools to help them manage the flow of scripts in, and reports out. This structure would need to be built and maintained. If the lag is too large, developers will be discouraged from using the system.

Three, establishing an independent testing group is poor sociology. Gatekeepers are powerful. There's lots of opportunity for resentments that need to be handled. There's a tendency for these social structures to turn unhealthily competitive. Such things detract from the primary purpose of the entire system.

It is much better to put script developers in the business of doing their own systematic testing. They themselves (ought to) know exactly what interfaces their scripts and packages have. They themselves (ought to) know what the result of using any interface should be. The test-first approach asks them to be systematic about this, and to express it in a manner which they already know well how to do, by writing good code. Test suites are commented, but their existence itself provides other developers critical self-documentation of the package or scripts. Test suites are meant to be read, like all code should be.
Hewee Zetkin
Registered User
Join date: 20 Jul 2006
Posts: 2,702
10-09-2006 11:38
Dang. Someday I might find the time and inclination to read all that. In the meantime, I got as far as the suggestion that script (packages) be tested in a manner similar to that of professional software packages. I think that is a grand idea for non-trivial scripts! I actually thought of an approach to a sort of testing, "framework," a while ago. Here's the idea:
  1. Put NO CODE in actual event handlers except for a function call. The only exception is 'link_message()', which should test for a well-known testing command number. If this testing call is made, route to a helper function which interprets the string/key arguments as a serialized event and calls other functions just as if the event were generated by the system.
  2. For each state/event-handler pair, create a function with the same arguments that is called directly (e.g. 'default_state_entry()', 'default_sensor(integer numDetected)', 'mystate_listen(integer channel, string name, key id, string message)', etc.). The true event handlers call these functions directly due to system events, and the testing functions (see above) call them with de-serialized (script-name?/)events/parameters when appropriate.
  3. It is POSSIBLE that in addition to the simple funciton call done by the true event handlers, a test-capture mode might be introduced in which the events are serialized and logged for use in unit/regression tests.
  4. Unit and regression tests take the form of scripts which simply poke serialized events at other scripts through link messages. Ideally there would be some way to test output as well (see below).

That should take care of the execution part. The results wouldn't be quite so easy to compare. Ideally it would be done in an automated fashion. This would entail creating a layer on top of most/all 'll...' functions, and having this layer either call the actual LSL implementation or log the results (and/or create further events?) according to whether or not we are executing a test.

If fully automated result comparison is not feasible, it may be possible to do SOME automation and some manual verification, but this would vary a lot from package to package.

This was just an initial thought. One more project I've set on back-burner. Heh.