This is something I've filed a bug report over, and I'm sure others have too over the past year(s), but I'm more curious about a reason for why this is.
There's a problem with llVolumeDetect that makes it nearly useless for many applications where a sensor is instead being used, because:
A sensor will detect agents that are sitting, unlike llVD.
A sensor won't fire off all the events if an agent clicks on something, unlike llVD.
A sensor is also only a sphere and thus less flexible than VolumeDetect for sensing agents.
Sensors are also considered to be bad for performance.
Given there can be problems with server lag and script performance, why wouldn't there be more effort (or an explanation somewhere of the issues involved) on fixing up VolumeDetect?
To get the same functionality of llVD from a sensor one needs the scanner running at a decent rate, and at least one list, if not two, and then the list and the sensor data needs to be compared, all to mimic collision_start and collision_end, but it won't "know" the agent is gone until the next sensor run. Setting the sensor to an interval I'm told is sim friendly means it's a while in between updates, where the volumedetect runs the event nearly instantly. Using a sensor also restricts you to one shape. A sphere or a section of a sphere. This isn't always suitable because depending on your application you can end up NOT detecting agents inside the desired area while ALSO detecting ones outside.
EDIT: Seems like the volumedetect is instead thinking the av has 'left' it when they click, and after the clicking is done, it sees them once more.