1.10.1 Released!
|
Major Senior
Registered User
Join date: 12 Apr 2006
Posts: 104
|
05-31-2006 21:05
Both backtraces show failires in the same object, thouh the path to get to said object is different in both. Anyone out there have access to this code can look at this? I will wager the compiler had some warnings on this object as well. No one added -w to the CFLAGS I hope 
|
Major Senior
Registered User
Join date: 12 Apr 2006
Posts: 104
|
Starting to see a pattern here
05-31-2006 21:22
Yet another backtrace ..
#0 0x09634921 in LLSpatialGroup::updateInGroup () #1 0x0963dc8b in LLSpatialPartition::move () #2 0x087c9613 in LLDrawable::moveUpdatePipeline () #3 0x087c9d6c in LLDrawable::updateMoveUndamped () #4 0x087c975e in LLDrawable::updateMove () #5 0x09f1261a in LLPipeline::updateMove () #6 0x09f5802a in idle () #7 0x09f49705 in main_loop () #8 0x09f42b95 in main ()
And while I was at it I got bored and told gdb to disassemble this at the crash point:
Dump of assembler code for function _ZN14LLSpatialGroup13updateInGroupEP10LLDrawable: 0x09634900 <_ZN14LLSpatialGroup13updateInGroupEP10LLDrawable+0>: push %ebp 0x09634901 <_ZN14LLSpatialGroup13updateInGroupEP10LLDrawable+1>: mov %esp,%ebp 0x09634903 <_ZN14LLSpatialGroup13updateInGroupEP10LLDrawable+3>: sub $0x68,%esp 0x09634906 <_ZN14LLSpatialGroup13updateInGroupEP10LLDrawable+6>: mov %ebx,0xfffffff4(%ebp) 0x09634909 <_ZN14LLSpatialGroup13updateInGroupEP10LLDrawable+9>: mov 0x8(%ebp),%ecx 0x0963490c <_ZN14LLSpatialGroup13updateInGroupEP10LLDrawable+12>: mov 0xc(%ebp),%ebx 0x0963490f <_ZN14LLSpatialGroup13updateInGroupEP10LLDrawable+15>: mov %esi,0xfffffff8(%ebp) 0x09634912 <_ZN14LLSpatialGroup13updateInGroupEP10LLDrawable+18>: xor %esi,%esi 0x09634914 <_ZN14LLSpatialGroup13updateInGroupEP10LLDrawable+20>: mov %edi,0xfffffffc(%ebp) 0x09634917 <_ZN14LLSpatialGroup13updateInGroupEP10LLDrawable+23>: mov %ecx,0xffffffb4(%ebp) 0x0963491a <_ZN14LLSpatialGroup13updateInGroupEP10LLDrawable+26>: mov 0x14(%ecx),%edx 0x0963491d <_ZN14LLSpatialGroup13updateInGroupEP10LLDrawable+29>: test %edx,%edx 0x0963491f <_ZN14LLSpatialGroup13updateInGroupEP10LLDrawable+31>: je 0x9634950 <_ZN14LLSpatialGroup13updateInGroupEP10LLDrawable+80> 0x09634921 <_ZN14LLSpatialGroup13updateInGroupEP10LLDrawable+33>: mov (%edx),%esi
Notice that it tests the pointer at 0x963491d, then goes to jump based on the test results. Someone looks to have freed a pointer and not assigned it to 0 when they where done. Shame shame.
|
Darkside Eldrich
Registered User
Join date: 10 Feb 2006
Posts: 200
|
05-31-2006 22:02
Man, Major, you have a lot of free time.  Kudos. (Granted, I'm currently posting to a forum about a game I could be playing, but... shut up!) Hopefully, a Linden will read this thread and see those.
|
Darkside Eldrich
Registered User
Join date: 10 Feb 2006
Posts: 200
|
05-31-2006 22:18
Strange, gdb reports program exited normally, and I get the remove_marker_file() call in the program output. So it's not throwing a system-level exception, unless I'm doing something wrong. I'll experiment more.
|
Major Senior
Registered User
Join date: 12 Apr 2006
Posts: 104
|
Free time?
05-31-2006 22:18
Not a lot else to do all things considered, so plenty of free time.  On the other hand, I am rather curious as to why the problem is so easy to trigger near a sim boundry and quite a bit more difficult when not near a sim boundry. i.e. if you are in the middle of a sim with your network bandwidth set really low .. it is almost impossible to trip this thing up, though it still occasionally happens. But often just having objects across a sim boundry rez while looking at it triggers it. I have also learned some neat things though. The viewer is deffinately multithreaded. 5 threads to be exact. Which still leaves the possibility of a mutex problem being a bit of a culprit, though the invalid variable being handed off to the method is still pretty relivant. At the same time, the variable could be a bunk lock pointer. And if my memory of the SUSv3 specs serves me correctly, signals are always delivered to the parent thread. So it isn't any wonder that the gdb backtrace is always pointing at the first thread as the culprit. Debugging multithreaded code is a total pain sometimes. *bleh* I suppose it might be useful to see the state of the other threads at this point and see if there is any more similarities escaping view. Really wish I knew what the chunk of memory being pointed at actually was honestly, and why it was being tested. Is it testing to see if the object is locked (yah .. storing the lock inside the object would be .. really funny, if the object was deleted), or is it testing to see if the pointer to some object/structure/whatever is even valid? Having any sort of idea gives a bit more clarity to what would likely cause it to be invalid in the first place and what sort of watches to set up and where.
|
Major Senior
Registered User
Join date: 12 Apr 2006
Posts: 104
|
remove marker file
05-31-2006 22:23
Yah, I was looking at those earlier before I decided to follow Merrick's lead and whip out gdb. It appeared to me from the logs that occasionally some of the "crashes" looked like the originated from the server as a circuit disconnect order. I havn't a clue what that entails, or why it would happen, but was pretty curious about it as they always ended in the remove marker file log message.
|
Darkside Eldrich
Registered User
Join date: 10 Feb 2006
Posts: 200
|
05-31-2006 22:27
Okay, I figured out why I crashed without the segfault. I was copying over MALLOC_CHECK=0 without really thinking about it. It crashes "elegantly" with that, and without it I get the same backtrace as you, with the same disassembly dump. So, same problem all around, just to confirm your findings.
|
Major Senior
Registered User
Join date: 12 Apr 2006
Posts: 104
|
05-31-2006 22:32
Well that takes some of the fun out of it .. kinda hoping there was something new to play with. 
|
Merrick Moose
Registered User
Join date: 20 Oct 2005
Posts: 191
|
05-31-2006 23:20
From: Major Senior but was pretty curious about it as they always ended in the remove marker file log message. That is just a marker to tell if SL is currently running, also if it crashes badly it leaves that. It's a 0 byte file in the secondlife/log directory. Harmless file, not a bug or problem.
|
Major Senior
Registered User
Join date: 12 Apr 2006
Posts: 104
|
05-31-2006 23:24
From: Merrick Moose That is just a marker to tell if SL is currently running, also if it crashes badly it leaves that. It's a 0 byte file in the secondlife/log directory. Harmless file, not a bug or problem. Yah, I wasn't thinking the function was the bug, but more wondering why I only saw the log during certain forms of crashes/shutdowns. I suspected that SL is traping various signals and that was the end of the shutdown sequence.
|
Merrick Moose
Registered User
Join date: 20 Oct 2005
Posts: 191
|
05-31-2006 23:52
From: Major Senior Yah, I wasn't thinking the function was the bug, but more wondering why I only saw the log during certain forms of crashes/shutdowns. I suspected that SL is traping various signals and that was the end of the shutdown sequence. Anything less than a forced/OS level kill should result in the removal of that file. Might possibly have something to with crash reporting or such. Don' t know for sure, but I have seen similar file locking measures done in other programs.
|
Major Senior
Registered User
Join date: 12 Apr 2006
Posts: 104
|
06-01-2006 00:05
From: Merrick Moose Anything less than a forced/OS level kill should result in the removal of that file. Might possibly have something to with crash reporting or such. Don' t know for sure, but I have seen similar file locking measures done in other programs. Yah .. presuming sanity still functions in the universe, and the cleanup routine is only in the parent thread, then unless it recieves a KILL it should always be run, as the only other signal that can't be caught is STOP .. and that doesn't exactly terminate execution ..  Hmm....
|
Hello Toonie
Registered User
Join date: 25 Jul 2005
Posts: 212
|
06-01-2006 01:02
From: Darkside Eldrich I already offered to do it freelance, and I was more than half-serious. It's defitinely a possibility I think LL should consider. Yeah, you're not the only one. I didn't hear a whisper of a reply though.
|
Hello Toonie
Registered User
Join date: 25 Jul 2005
Posts: 212
|
06-01-2006 01:07
From: Drake Bacon Brent, we're more of a sysadmin mentality Looking at some of the posts around here lately, do you specifically mean passive-aggressive? XD
|
Major Senior
Registered User
Join date: 12 Apr 2006
Posts: 104
|
LLSpatialGroup
06-01-2006 08:51
So .. yah .. I am at the point now where I only run SL under gdb, hoping to run across a new backtrace. So far all roads lead to LLSpatialGroup and what appears to be a pointer pointing at bunk memory.
|
Zepp Zaftig
Unregistered Abuser
Join date: 20 Mar 2005
Posts: 470
|
06-01-2006 11:15
Backtrace from crash.
This part gets repeated about 20000 times #20126 0x0964b8d0 in LLOctreeBranch<LLDrawable, 16, 128, 1>::insert () #20127 0x09647dc6 in LLTreeNode<LLDrawable>::insert ()
#20126 0x0964b8d0 in LLOctreeBranch<LLDrawable, 16, 128, 1>::insert () #20127 0x09647dc6 in LLTreeNode<LLDrawable>::insert () #20128 0x0964893f in LLOctreeRoot<LLDrawable, 16, 128, 1>::insert () #20129 0x09647dc6 in LLTreeNode<LLDrawable>::insert () #20130 0x0963d968 in LLSpatialPartition:.put () #20131 0x09f10945 in LLPipeline::addObject () #20132 0x09b1d6aa in LLViewerObjectList:.processUpdateCore () #20133 0x09b1db6f in LLViewerObjectList:.processObjectUpdate () #20134 0x09b1ee95 in LLViewerObjectList:.processCompressedObjectUpdate ().
|
Major Senior
Registered User
Join date: 12 Apr 2006
Posts: 104
|
06-01-2006 11:42
From: Zepp Zaftig Backtrace from crash.
This part gets repeated about 20000 times #20126 0x0964b8d0 in LLOctreeBranch<LLDrawable, 16, 128, 1>::insert () #20127 0x09647dc6 in LLTreeNode<LLDrawable>::insert () Yah .. something different. You have any of the corrective settings enabled in the settings.ini?
|
Zepp Zaftig
Unregistered Abuser
Join date: 20 Mar 2005
Posts: 470
|
06-01-2006 15:08
From: Major Senior Yah .. something different. You have any of the corrective settings enabled in the settings.ini? No, this has only happened once though. Most of the times I've tried to run it, I've just gotten the same LLSpatialGroup as everyone else seem to get.
|
Rizzermon Sopor
Registered User
Join date: 15 Mar 2006
Posts: 43
|
06-01-2006 20:06
Well I have the Debug Permissions set to TRUE and the Use Occlusion set to FALSE and I have Render Far Clip set to 4 and , um, well I'm not crashing, but then I can't see much either. I even think I have bumped at least one person when walking around the Public Sandbox, so I decided to just wander around Creative Commons land on Kula. It's so sparse there that I never crash and can even set my Render Far Clip to 64. Ofcourse, I'm not saying this is great performance, as it's not, it's very frustrating. My experience with the Wine setup is half and half. So, I just now mainly visit SL to see if I have any IM's waiting, otherwise there is no reason for me. Too much frustration. That said, I know this is an alpha client and I have used alpha software before and it too was frustrating. So I'm not complaining, just stating what the current state of things are for me, and they seem to be similar for most linux client users. I am patient with the progress here, as I have seen new software take a long time, long as in almost a year to become reasonably useable (as well as going through smooth and rocky performance issues). The group here seems to be great at reporting on bugs and also very interested in supporting and even being willing to work on developing and debugging the linux client. Way to go everyone! Thinking out loud here, there is an open source application that is in it's infancy called Solipsis that was attempting to lay the ground work for making a decentralized chat world that could eventually include 3d interaction. I don't recommend it as it is not even there yet or even close, and development seems pretty dead, but it sort of illustrates my view, if SL don't work out, people have a lot of ingenuity. People find a way. If it doesn't work out, while SL is neat and I imagine the linux client will improve in time, I'm also not so taken by it that I cannot walk away and go elsewhere if it doesn't suit me. Maybe the problems linux client users have point the way to cracks in using the closed source model long term for Second Life.  Or maybe not, just throwing ideas around. 
|
Jinsar Eponym
Registered User
Join date: 13 Feb 2006
Posts: 127
|
06-01-2006 21:20
From: Drake Bacon Same here. It's definitely the Occulusion features crashing the client.
Put this in settings.ini (use a tab to seperate them, very important! Midnight Commander's editor will work): UseOcclusion FALSE DebugPermissions TRUE I was getting crashes and made this change, works great! so far so good *fingers crossed*
|
Major Senior
Registered User
Join date: 12 Apr 2006
Posts: 104
|
Managed to reproduce this one..
06-01-2006 21:31
From: Zepp Zaftig Backtrace from crash.
This part gets repeated about 20000 times #20126 0x0964b8d0 in LLOctreeBranch<LLDrawable, 16, 128, 1>::insert () #20127 0x09647dc6 in LLTreeNode<LLDrawable>::insert ()
#20126 0x0964b8d0 in LLOctreeBranch<LLDrawable, 16, 128, 1>::insert () #20127 0x09647dc6 in LLTreeNode<LLDrawable>::insert () #20128 0x0964893f in LLOctreeRoot<LLDrawable, 16, 128, 1>::insert () #20129 0x09647dc6 in LLTreeNode<LLDrawable>::insert () #20130 0x0963d968 in LLSpatialPartition:.put () #20131 0x09f10945 in LLPipeline::addObject () #20132 0x09b1d6aa in LLViewerObjectList:.processUpdateCore () #20133 0x09b1db6f in LLViewerObjectList:.processObjectUpdate () #20134 0x09b1ee95 in LLViewerObjectList:.processCompressedObjectUpdate (). This took a while, and a lot of crashes before I found something like it. For sheer sanity I have taken the liberty of cropping the first 20000 entries in the gdb backtrace and left only these lower ones. Not even certain how feasible it is to up and disassemble this one. It is apparently an infinite loop in the Octree code. Wonder if this is the random one that plagues the other clients as well. #20126 0x0964b8d0 in LLOctreeBranch<LLDrawable, 16, 128, 1>::insert () #20127 0x09647dc6 in LLTreeNode<LLDrawable>::insert () #20128 0x0964893f in LLOctreeRoot<LLDrawable, 16, 128, 1>::insert () #20129 0x09647dc6 in LLTreeNode<LLDrawable>::insert () #20130 0x0964b8d0 in LLOctreeBranch<LLDrawable, 16, 128, 1>::insert () #20131 0x09647dc6 in LLTreeNode<LLDrawable>::insert () #20132 0x0964b8d0 in LLOctreeBranch<LLDrawable, 16, 128, 1>::insert () #20133 0x09647dc6 in LLTreeNode<LLDrawable>::insert () #20134 0x0964893f in LLOctreeRoot<LLDrawable, 16, 128, 1>::insert () #20135 0x09647dc6 in LLTreeNode<LLDrawable>::insert () #20136 0x0963d968 in LLSpatialPartition::put () #20137 0x09f10945 in LLPipeline::addObject () #20138 0x09b1d6aa in LLViewerObjectList::processUpdateCore () #20139 0x09b1db6f in LLViewerObjectList::processObjectUpdate () #20140 0x09b1ee95 in LLViewerObjectList::processCompressedObjectUpdate () #20141 0x09ab6e12 in process_compressed_object_update () #20142 0x082fa8cc in LLMessageSystem::decodeData () #20143 0x082e4e28 in LLMessageSystem::checkMessages () #20144 0x09f5603d in idle_network () #20145 0x09f57329 in idle () #20146 0x09f49705 in main_loop () #20147 0x09f42b95 in main ()
|