Bot count, the sequel: 52%
|
|
Madhu Maruti
aka Carter Denja
Join date: 6 Dec 2007
Posts: 749
|
11-17-2008 09:53
Anya's studies are interesting within the limits of sample size and bot-determining criteria, which limitations she fully acknowledged in both of her posts. And she has not made exaggerated or inappropriate claims as to what they show about grid-wide activity. She's made some hypotheses, and provided reasoning to support the tentative extrapolations she's offered in those hypotheses.
The difference between Anya and all of the folks in these threads complaining about how Anya's studies don't show this and Anya's studies don't show that is: Anya is actually out there conducting studies.
Anya's first study was criticized because it was done at a time of high concurrency. So she went out and did another study at a time of low concurrency. She didn't change her sample because any smart researcher knows you don't change two variables at once.
Anya is doing interesting and responsible studies. From her posts it is amply clear that she understands the limitations of what can and cannot be concluded from her work. I for one am very interested and grateful to her for this hard work. Anyone who questions her methods is free to devise their own bot-identifying criteria and sampling methodology and get to it. I look forward to seeing your results as well, and seeing how they differ from or are similar to Anya's.
But until there's data to compare to Anya's, all the protestations about how "the results would be different if ..." are just conjecture and hypothesis at best. And an untested hypothesis is not worth very much compared to a hypothesis supported by and reshaped in the face of actual data.
_____________________
 Visit Madhu's Cafe - relax with your friends in our lush gardens, dance with someone special, enjoy the sounds of classic Bollywood and Monday Night World Music parties - http://slurl.com/secondlife/Milyang/39/16/701/
|
|
Kidd Krasner
Registered User
Join date: 1 Jan 2007
Posts: 1,938
|
11-17-2008 09:54
From: Bee Mizser Actually it does matter. People could be labelled a bot when they are not. What happens if bots are banned? these people could get ARd. This is a red herring. There's nothing in this data or methodology that suggests it would be used to identify whether individual avs should be banned. That would be like using a RL traffic survey, ostensibly for planning or identifying safety issues, to prosecute individuals.
|
|
Phil Deakins
Prim Savers = low prims
Join date: 17 Jan 2007
Posts: 9,537
|
11-17-2008 09:56
From: Argent Stonecutter Sure it does. It may not give an accurate indication, but the only way to determine how accurate it is or isn't is to gather more data. This figure may be high or low, it may only apply to mainland, it may only apply to that continent, but complaining that it's too small a sample size is not a strong counter to it. I'll rephrase what I wrote:- ... but this sort of sweep on its own isn't going to give any such indication.
|
|
Argent Stonecutter
Emergency Mustelid
Join date: 20 Sep 2005
Posts: 20,263
|
11-17-2008 09:57
From: Madhu Maruti The difference between Anya and all of the folks in these threads complaining about how Anya's studies don't show this and Anya's studies don't show that is: Anya is actually out there conducting studies. Indeed. Phil, and the rest of you, if you don't like the results, if you think her criteria are wrong, do your own studies.
|
|
Curtis Dresler
Registered User
Join date: 6 Apr 2008
Posts: 155
|
11-17-2008 10:02
From: Kidd Krasner At a minimum, I view this data as being useful in the same way that a single average trade price on the Lindex is useful. In other words, by itself it may not be much, but when combined with additional data, it can be very helpful.
Statistical work often goes like this. You formulate a survey, you get some initial results. You study the results for consistency, you study the methodology for flaws, you allow others to review the methodology for flaws. However, flaws in the methodology don't invalidate the raw data, they just limit the conclusions that can be drawn. The raw data remains a factual data point that may contribute directly to future analyses or may contribute to an improved methodology.
... Why? We aren't reinventing the wheel here. I've been involved in statistical surveys (DC Jail, for the DSS, various studies for traffic for bicycling, various lab studies as part of what one of our divisions did routinely) and the requirements for a minimally acceptable survey weren't met here. We documented the survey points and for anyone with the necessary access or that were doing a survey of surveys, they could review our raw data and reassess that data against the method used in other surveys and compilations of raw data. Most of the surveys collected the data and compared it against the other collected data to see if there was conformity (you rarely get uniformity). How was that done here? Its nice to play the 'gotcha games' in SL and on the forums, but people, any one that does this for a living (and I don't - I have just been involved, including sitting through the sessions on stat results that put me to sleep) could explain why this is no step at all, except a bit beyond opinion. Not that I have a problem with it really. I don't think the issue merits the work that would be required in RL for a real survey. But lets cap the talk about this being a real survey or a real study of any significance. Its not. FWIW, every survey started with a well defined analysis of the scope and area of the survey and a specific description of the expected range of possible results, with an explanation of how that did or did not extrapolate to the general population (absent that last item, generally it was considered for most of our purposes to be useless and not worth bothering to try for grants - occasionally it was still useful). Of course I am not going to do one myself. I don't think it is worth the effort. Unless you actually were able to do all the above and get the kind of waivers that I think LL would have to give you to just do the survey groundwork, the results would go no further than this one did - everyone would still be left with their favorite arguments. To get the range and waivers necessary, you would probably have to sign a protocol with LL, and you probably would never be discussing the results on this forum... Most of the above is just my opinion, but I've had better surveys shot to pieces like a stop sign in hunting season by the real experts.
|
|
Phil Deakins
Prim Savers = low prims
Join date: 17 Jan 2007
Posts: 9,537
|
11-17-2008 10:02
From: Argent Stonecutter Indeed.
Phil, and the rest of you, if you don't like the results, if you think her criteria are wrong, do your own studies. Why?
|
|
Lear Cale
wordy bugger
Join date: 22 Aug 2007
Posts: 3,569
|
11-17-2008 10:09
From: Rhaorth Antonelli as I mentioned in another post, I would not do such a survey or whatever it would be called, as I have no way to ensure my interpretations would be honest and true
I have no sure fire way to know that an avatar is a person or a bot, or if I am getting a good cross of the "population"
Nor do I have the interest, or the desire to run such a unverifiable conglomeration of data that in my opinion would be more misleading than it would be worth. Rha, that's an argument for never trying to learn anything that isn't already known. Feel free to choose to remain ignorant, but don't attack a reasonable attempt to get some data. These results may turn out to be skewed. At this point, we can't tell. But with more input, we can. And you don't need a "sure fire" way. You only need a way with a reasonable level of confidence. Have you ever actually seen a bot farm? They're pretty obvious.
|
|
Argent Stonecutter
Emergency Mustelid
Join date: 20 Sep 2005
Posts: 20,263
|
11-17-2008 10:17
From: Phil Deakins From: Argent Stonecutter From: Madhu Maruti The difference between Anya and all of the folks in these threads complaining about how Anya's studies don't show this and Anya's studies don't show that is: Anya is actually out there conducting studies.
Indeed. Phil, and the rest of you, if you don't like the results, if you think her criteria are wrong, do your own studies. Why? That depends, I guess, on what your motivation is in dumping on Anya's studies. But, regardless, if the only data that's available is low quality, but it's the only data available, it's still the most credible data around.
|
|
Phil Deakins
Prim Savers = low prims
Join date: 17 Jan 2007
Posts: 9,537
|
11-17-2008 10:17
If people get together to gather more data, and decide on what is and isn't counted as a bot, I suggest leaving campers completely out of it because there is no way of knowing whether or not an av that doesn't respond is a bot. It could be individual who has turned in for the night, or watching TV, or out for the evening, etc., or it could be a camping bot. I don't count individuals as bots, regardless of what they doing.
I'd also suggest leaving manequins and models out of it because they are functional bots.
I'd suggest simply counting those that are obviously bots - groups of avs in boxes, in lakes and places like that.
|
|
Phil Deakins
Prim Savers = low prims
Join date: 17 Jan 2007
Posts: 9,537
|
11-17-2008 10:22
From: Argent Stonecutter That depends, I guess, on what your motivation is in dumping on Anya's studies. But, regardless, if the only data that's available is low quality, but it's the only data available, it's still the most credible data around. I wouldn't describe my posts as "dumping". I've discussed Anya's posts rationally. My motivation is to show that the data *on its own* isn't anything to go by. I don't have a pro-bot motivation as some people who don't know my views probably think I do. I'd like to know how many traffic bots there are on average, but Anya's data on its own doesn't give a reasonable indication of that and I've shown why, that's all. I did say that it would have been much better if the second sweep had been through a different set of sims, but it's just a repetition of the very limited first sweep.
|
|
HoneyBear Lilliehook
Owner, The Mall at Cherry
Join date: 18 Jun 2007
Posts: 4,500
|
11-17-2008 10:23
From: Argent Stonecutter Sure it does. It may not give an accurate indication, but the only way to determine how accurate it is or isn't is to gather more data. This figure may be high or low, it may only apply to mainland, it may only apply to that continent, but complaining that it's too small a sample size is not a strong counter to it. Why don't you do your own survey? If you're not interested in doing it yourself, why don't you script a bot to do it? Argent, the only thing worse than NO data...is incorrect data. This raw data is no more or less accurate than what the Lindens give us. None of the people here need to run their own surveys, because none of us decided to bring bad information to the table. Edit: I'm going to rephrase my own post. Anya's information isn't BAD...I think it's just incomplete.
_____________________
Virtual Freebies now has its own domain! URL=http://virtualfreebiesblog.com The Mall at Cherry Park - new vendors, new look!
|
|
Feldspar Millgrove
Registered User
Join date: 16 Nov 2006
Posts: 372
|
11-17-2008 10:25
From: Phil Deakins I'd suggest simply counting those that are obviously bots - groups of avs in boxes, in lakes and places like that. Hey, some of the best parties are in boxes, lakes, and places like that! I don't think you can base the bot criterion on the location of the avi, anymore than you can suppose that someone who doesn't answer is a bot.
|
|
Meade Paravane
Hedgehog
Join date: 21 Nov 2006
Posts: 4,845
|
11-17-2008 10:28
From: Meade Paravane It matters if the OP wants any credibility. /me didn't mean to be rude here, Anya. Though I don't agree with your numbers, you're one of the few who are trying to get them. /me sends Anya a cookie.
_____________________
Tired of shouting clubs and lucky chairs? Vote for llParcelSay!!! - Go here: http://jira.secondlife.com/browse/SVC-1224- If you see "if you were logged in.." on the left, click it and log in - Click the "Vote for it" link on the left
|
|
Kidd Krasner
Registered User
Join date: 1 Jan 2007
Posts: 1,938
|
11-17-2008 10:31
From: Curtis Dresler they could review our raw data and reassess that data against the method used in other surveys and compilations of raw data. Most of the surveys collected the data and compared it against the other collected data to see if there was conformity (you rarely get uniformity). How was that done here? I haven't followed the links to see if the data posted actually qualifies as a complete set of raw data, but presumably that would be easy enough to do for future surveys. As for the comparison, what was the first set of data compared against? Obviously there had to be a first, with no previous data to compare. You didn't throw it out, you waited until more data was collected. You can't expect methodologies that are used in established survey areas to apply exactly in areas where new ground is being broken. From: someone Its not. FWIW, every survey started with a well defined analysis of the scope and area of the survey and a specific description of the expected range of possible results, with an explanation of how that did or did not extrapolate to the general population (absent that last item, generally it was considered for most of our purposes to be useless and not worth bothering to try for grants - occasionally it was still useful).
These are good points, but I don't think they're that far out of reach. The scope was limited to one particular continent. I'm not sure what percentage of sims on that continent were surveyed, but I'm sure that's easily obtainable. The timing of the survey is well-defined, though not necessarily well chosen. I think it's fair to conclude that the survey looked at 100% of the avs on the selected sims within the given timeframe. I haven't looked a the raw data to know whether the time is included, but it could certainly be included the next time the survey is performed. Since this survey is exploratory, the range of possible results doesn't seem an appropriate or necessary question. One could give the trivial response "between 0 and 100% satisfying the bot criteria", but that's not useful. How does it extrapolate? It doesn't. But rather than invalidating the results so far, this is merely identifies one of the open issues that must be addressed before applying this survey to the entire grid, or even just the mainland subset of the grid. I disagree with your assertion that this is no step at all. When the methodology itself is breaking new ground, it doesn't seem at all unreasonable to begin with a preliminary survey before addressing or solving the issues you raise. Look at it this way. When Dewey beat Truman, it didn't mean the survey was useless. It was extremely useful for learning how to do a better job of surveying next time.
|
|
Argent Stonecutter
Emergency Mustelid
Join date: 20 Sep 2005
Posts: 20,263
|
11-17-2008 10:31
From: Phil Deakins If people get together to gather more data, and decide on what is and isn't counted as a bot, I suggest leaving campers completely out of it because there is no way of knowing whether or not an av that doesn't respond is a bot. Count them separately, perhaps, if you can't tell if someone's a non-responsive camper or a bot. Count the decorative bots and campers separately from the bot farms and camping holes, too. But count them all. If the result is: 48% active avatars 5% inactive decorative avatars (models, dummies, gardeners, etc) 47% inactive clumped avatars (camping chairs, bot farms, etc) that tells a different story than if the result is: 48% active avatars 37% inactive decorative avatars 15% inactive clumped avatars This is what's called a "follow-on study". Since this is open-source research, "submit a patch" by doing the study yourself if it matters to you. If it doesn't matter enough for you to "submit a patch", you're probably already spending more time on it than it should be worth to you just posting about it.
|
|
Drongle McMahon
Older than he looks
Join date: 22 Jun 2007
Posts: 494
|
11-17-2008 10:32
From: Phil Deakins I'd suggest simply counting those that are obviously bots - groups of avs in boxes, in lakes and places like that. Don't forget those hidden inside invisiprim spheres and randomly distributed in all three dimensions (look for their shoes).
|
|
Argent Stonecutter
Emergency Mustelid
Join date: 20 Sep 2005
Posts: 20,263
|
11-17-2008 10:37
From: HoneyBear Lilliehook Argent, the only thing worse than NO data...is incorrect data. I haven't seen any indication in this thread that this is incorrect data, or even that it's too small a sample to be useful. It's based on a statistically significant sample size, over a population of sims that should be representative of the mainland. It may not be representative of estates, but that's also a useful follow-on study.
|
|
Vlad Bjornson
Virtual Gardener
Join date: 11 Nov 2005
Posts: 650
|
11-17-2008 10:47
From: Bee Mizser They have access to the one piece of factual data that could prove an AV a bot or not.
The server logs. This would list the IP addresses of all logins, the dates and times of login, activity in regions, you name it, it would be there.
LL has already analyzed this data. Lindens have mentioned on several occasions that bots make up about 10 - 15% of the total user hours and that the the ratio of bots to human users has stayed steady for many months. They've also mentioned that this is backed up by the rate of spending per user hour and other related stats. Seems like as good a guesstimate as any other. The grid is vast and varied. I don't think that sampling a slice of the mainland is going to produce any stats that are accurate for the grid as a whole. Phil is right - What we really need to do is get rid of the reasons for having traffic bots and the problem will be eliminated or at least greatly reduced. I'm in favor of just eliminating the Traffic counts all together, but removing Traffic's effect on the search results might be enough.
_____________________
I heart shiny ! http://www.shiny-life.com
|
|
Phil Deakins
Prim Savers = low prims
Join date: 17 Jan 2007
Posts: 9,537
|
11-17-2008 10:49
From: Argent Stonecutter I haven't seen any indication in this thread that this is incorrect data, or even that it's too small a sample to be useful. It's based on a statistically significant sample size, over a population of sims that should be representative of the mainland. It may not be representative of estates, but that's also a useful follow-on study. You also haven't seen how the particular band of sims was selected - none of us have. It's not a statistically significant sample at all. You could say that the population of Arizona is a statistically significant sample of the U.S. population but, if you'd polled them all before the election, you would have concluded that McCain would now be the President-elect - easily. That's because support for the two candidates was unevenly spread through the U.S. For a statistically significant sample, you would have needed to poll some form each state. It's the same with SL's population - it is unevenly spread through the grid and a band of sims doesn't cover it on its own. Not only that, but we don't know how the band of sims was selected. It may have been selected because it had plenty of green dots, some or many of which gave the appearance on the map of likely being bots. That's a possibility - we just don't know. added: I think Arizona is McCain's home sate - I meant his home state, anyway.
|
|
Curtis Dresler
Registered User
Join date: 6 Apr 2008
Posts: 155
|
11-17-2008 10:52
From: Kidd Krasner I haven't followed the links to see if the data posted actually qualifies as a complete set of raw data, but presumably that would be easy enough to do for future surveys. As for the comparison, what was the first set of data compared against? Obviously there had to be a first, with no previous data to compare. You didn't throw it out, you waited until more data was collected.
You can't expect methodologies that are used in established survey areas to apply exactly in areas where new ground is being broken.
... With all due respect, I think there is a complete break between that and what constitutes a real study (comment: the OP never claimed it to be a scientific study AFAIK, and only went as far as using it to extrapolate it to the grid, which is entirely their right). Having seen several people working on their masters or doctorates keep coming back after discussing their survey structure with the consulting prof, the groundwork issue IS HUGE. You don't just wander off, collect raw data and call it a survey. You don't do it in business (one of the major steps in consulting projects is defining the context to which the project applies), you don't do it in lab work (efficacy for one purpose and context is useless for another), and you don't do it in real surveys. Absent that first necessary step, the data is no more than darts thrown against a board. And I never heard that either the raw data or the survey could be done poorly, just because it was the first. Again, I think we are comparing real surveys against one that was done casually, so I don't mean that to be overly harsh. That said, generally no one discussed the data or survey or project results without having read the prep work (which is why I never had an opinion at the meetings). You couldn't discuss the data or the results unless you knew the methodology, and that only makes sense (in most cases) within a context. Test results for a tendency for a certain type of cancer is different than the same test for a person having that cancer, as an example. So how many giving opinions here feel that they have enough information that they could go out and do exactly the same survey with a conformity (again, not uniformity) of results? If so, do you believe that there is enough definition and data records of the process that it could be used and compared against future studies? IMO, no, not exactly, and no. So also IMO, no progress. It was probably a fun little thing to do and IMO was probably seen through to the end because it was producing results in line with what they expected. I still see it as nothing beyond another set of arguing points and it only shows how few common assumptions we have to start a marginally biased 'neutral' survey.
|
|
Briana Dawson
Attach to Mouth
Join date: 23 Sep 2003
Posts: 5,855
|
11-17-2008 10:57
Thanks again for your efforts Anya.
Keep up the good work.
|
|
Kidd Krasner
Registered User
Join date: 1 Jan 2007
Posts: 1,938
|
11-17-2008 11:01
From: Phil Deakins If people get together to gather more data, and decide on what is and isn't counted as a bot, I suggest leaving campers completely out of it because there is no way of knowing whether or not an av that doesn't respond is a bot. It could be individual who has turned in for the night, or watching TV, or out for the evening, etc., or it could be a camping bot. I don't count individuals as bots, regardless of what they doing.
I'd also suggest leaving manequins and models out of it because they are functional bots.
I'd suggest simply counting those that are obviously bots - groups of avs in boxes, in lakes and places like that. I'd approach the problem differently. Leave out all mention of terms like "bot" and "camper". They're loaded terms that provoke debate. At least for bot, it's unlikely that there's a set of criteria that's 100% accurate based on data available to everyone. Instead, simply define categories by letter. Perhaps Category A is a group of one or more avs in an empty skybox or platform. Category B is the same av sitting on an object for over an hour with no open chat. Category C is an av in a club for over an hour with no open chat, but not sitting on an object. We might still argue over the significance of the categories, but at least there won't be any debate over what the numbers are measuring.l These are just sample definitions. I'm sure there's a better way to define them.
|
|
Argent Stonecutter
Emergency Mustelid
Join date: 20 Sep 2005
Posts: 20,263
|
11-17-2008 11:07
From: Phil Deakins You also haven't seen how the particular band of sims was selected - none of us have. Given the size of the continent, "a four-sim-wide path, coast to coast" doesn't leave much room for variation. If you want to do a quick check on the data, you could simply scan all 1-sim-wide chunks on the continent, look for a contiguous set that has about 600 avatars and 219 sims, and you know the band. From: someone It's not a statistically significant sample at all. You could say that the population of Arizona is a statistically significant sample of the U.S. population If the distribution of parcels across the mainland tended towards any kind of clumpiness that meant that a roughly rectangular collection of 200+ sims had a significantly different distributaion of land use than any other, that would itself be interesting... because sims really don't have THAT much differences between them, like the difference in terrain and climate between New York and Arizona. The biggest discrepancies would be the FIC sims, the snow sims, and the coral sims around the edges of the north continent. Corsica is typical of most of the mainland, with a mixture of flat green and hilly green and stone sims, and coastline... and I don't think there would be any swath across Corsica that would be all that different from a similar swath across Nautilus or the Southern continent or most of the main continent.
|
|
Rhaorth Antonelli
Registered User
Join date: 15 Apr 2006
Posts: 7,425
|
11-17-2008 11:09
From: Lear Cale Rha, that's an argument for never trying to learn anything that isn't already known. Feel free to choose to remain ignorant, but don't attack a reasonable attempt to get some data.
These results may turn out to be skewed. At this point, we can't tell. But with more input, we can.
And you don't need a "sure fire" way. You only need a way with a reasonable level of confidence.
Have you ever actually seen a bot farm? They're pretty obvious. I am not remaining ignorant, because I choose to not do a survey that I know would not be valid and just and I sure as heck would not post it on a forum, without some proof to back it up This is what gets me, people seem to be taking her word as gospel, that she can do not wrong, and that her numbers are fact, then attack those of us who feel the opposite that her numbers are flawed, and most likely the flaw is human judgment and error, and because we choose to not do the same thing it makes us ignorant... ok... if that is the way you see it, then so be it I stand by my opinion, her 2 survey's or whatever they are called, are in my opinion, not worth the time or effort of putting them out there (even if the time and effort were to make up bogus numbers) I am not saying she made it up, but there is always that possibility, which is why I do not trust it, or put any emphasis on it *shrug* obviously others will disagree, and therefor we will be on opposite sides of the fence (for the record, I do not know the OP, never spoke to them in my recollection, and do not even recall seeing any posts of theirs before this bot survey thing) and thank you for insulting my intelligence asking if I have ever seen a bot farm... I have been in SL nearly 3 years, yes I have seen bot farms, I have also worked for a guy that set up massive camping dance things, which some were bots some were people, part of my job was to determine which were ppl and eject the bots... on a small scale (2 sims max 100 campers each sim) it was still not easy to determine without a doubt if it was a bot or person.. I did my best, and sometimes was in error... however I am not stupid... thank you
_____________________
From: someone Morpheus Linden: But then I change avs pretty often too, so often, I look nothing like my avatar.  They are taking away the forums... it could be worse, they could be taking away the forums AND Second Life...
|
|
Curtis Dresler
Registered User
Join date: 6 Apr 2008
Posts: 155
|
11-17-2008 11:18
From: Argent Stonecutter Given the size of the continent, "a four-sim-wide path, coast to coast" doesn't leave much room for variation. If you want to do a quick check on the data, you could simply scan all 1-sim-wide chunks on the continent, look for a contiguous set that has about 600 avatars and 219 sims, and you know the band.
If the distribution of parcels across the mainland tended towards any kind of clumpiness that meant that a roughly rectangular collection of 200+ sims had a significantly different distributaion of land use than any other, that would itself be interesting... because sims really don't have THAT much differences between them, like the difference in terrain and climate between New York and Arizona. The biggest discrepancies would be the FIC sims, the snow sims, and the coral sims around the edges of the north continent. Corsica is typical of most of the mainland, with a mixture of flat green and hilly green and stone sims, and coastline... and I don't think there would be any swath across Corsica that would be all that different from a similar swath across Nautilus or the Southern continent or most of the main continent. You could be correct, but those are assumptions, not supported reasoning. There could be a significant clumpiness if all but eight sims were bordering open water or included disproportionately some of the features you mentioned. The age of the sims could create significant differences. We don't know and until we do, it is a potential source of error in survey and/or extrapolation to the whole and is nothing more than opinion, one way or the other. Phil says it does, you apparently think it doesn't, and I think unless we actually know, it doesn't make any difference, its all just arguing points. Certainly that it is mainland and not including any theme structured islands means that it only holds well for the mainland (IMO).
|