Second Life Forums Archive - glibc double free or memory corruption

KittyFox Mistral

Registered User

Join date: 17 Oct 2005

Posts: 51

02-26-2006 23:34

Some of you know this error quite well. Now, I've recently started getting it as well. It's odd in this case though, because I've never heard of anyone playing it fine since the day after the first alpha release, and today it starts doing it every time. I haven't updated anything on my system between now and the last time I played. Didn't tweak any SL settings. Didn't play with any system settings.
Using gdb, I get this backtrace:

CODE

(gdb) bt
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb76dd001 in raise () from /lib/tls/libc.so.6
#2  0xb76de71d in abort () from /lib/tls/libc.so.6
#3  0xb770fed7 in __fsetlocking () from /lib/tls/libc.so.6
#4  0xb7715bf7 in malloc_usable_size () from /lib/tls/libc.so.6
#5  0xb7716639 in free () from /lib/tls/libc.so.6
#6  0xb77073f9 in fclose () from /lib/tls/libc.so.6
#7  0x09dd5509 in main ()

Using Insight's handy dissambler window, the call it's crashing on immediately preceeded by a call to LLMD5::raw_digest.

CODE

-	0x9dd54ea	<main+11546>:		mov    $0xa3c65f0,%edx
-	0x9dd54ef	<main+11551>:		lea    0xfffffcd8(%ebp),%edi
-	0x9dd54f5	<main+11557>:		mov    %edx,0x4(%esp)
-	0x9dd54f9	<main+11561>:		mov    %edi,(%esp)
-	0x9dd54fc	<main+11564>:		call   0x819fc30 <_ZN5LLMD510raw_digestEPh>
-	0x9dd5501	<main+11569>:		mov    %ebx,(%esp)
-	0x9dd5504	<main+11572>:		call   0x8053894 <_ZNSiD0Ev+352>

That last call is where it crashes in main(), which is. _ZNSiD0Ev, according to c++filt is:
std::basic_istream<char, std::char_traits<char> >::~basic_istream().
Attempts to get any more information have been fruitless so far.

One thing I'm curious about though, is the program dies here:

CODE

006-02-27T06:07:27Z INFO: GL_VENDOR      NVIDIA Corporation
2006-02-27T06:07:27Z GL_RENDERER    GeForce FX 5500/AGP/3DNOW!
2006-02-27T06:07:27Z GL_VERSION     2.0.1 NVIDIA 81.78
2006-02-27T06:07:27Z
*** glibc detected *** double free or corruption (!prev): 0x0af88b38 ***

It looks like it' about to print another line when it crashes, and I'm interested in what's supposed to print after GL_VERSION (which can help pinpoint where the problem is).

In the mean time, I can always hope they got this fixed for the next version, whenever that is. I know I can set MALLOC_CHECK_=0 to disable this (which I explicitly removed from the secondlife startup script), but memory corruption is not something I want to bypass on a whim since it can be indicative of a bigger problem.

KittyFox Mistral

Registered User

Join date: 17 Oct 2005

Posts: 51

03-02-2006 20:23

Well, both 1.8.4.6 and 1.8.4.7 worked without needing MALLOC_CHECK_=0. Perhaps the bug has been fixed? Or perhaps it's just gone back into hiding. Another reason the "fix" shouldn't have been officially implemented, since we can't know when it's fixed, now.

If some kind souls that used to have that problem could try running SL without the memory-check-disable hack, it'd be great.

More to the point of the thread, though, here's the portion after the GL_VERSION string.

CODE

2006-03-03T04:09:09Z
2006-03-03T04:09:09Z INFO: Viewer Digest: 00000000-0000-0000-0000-000000000000
2006-03-03T04:09:09Z INFO: Couldn't open pilot.txt, aborting agentpilot load!

Interesting that the first line is empty after the time stampm and that the digest comes out to all 0's. As well, that line about being unable to open pilot.txt is a point worth checking, since the glibc corruption crash occurs in fclose. Perhaps SL is trying to close a file handle that didn't properly open? If people still have this glibc corruption crash, what happens if you make an empty pilot.txt file and try to run SL?

Mack Echegaray

Registered Snoozer

Join date: 15 Dec 2005

Posts: 145

03-03-2006 09:41

Heap arena corruption is a pain to track down. Remember that the point at which it crashes often has no relation to the point at which the corruption occurred. Tools like valgrind can help track such issues down, but is fairly worthless without access to the source code.

Hello Toonie

Registered User

Join date: 25 Jul 2005

Posts: 212

03-03-2006 11:47

From: Mack Echegaray

Heap arena corruption is a pain to track down. Remember that the point at which it crashes often has no relation to the point at which the corruption occurred.

Ah, but the whole point of MALLOC_CHECK_ is to force warnings or aborts at the point where the error occurred, instead of blindly corrupting the heap (with the mild exception of the app scribbling over its own pointers, where the best glibc can say is 'this isn't a malloc()d pointer' when you later try to free()/realloc() it).

MALLOC_CHECK_=0 removes this sanity completely and you're back in potential mysterious heap corruption world. Distros are defaulting to MALLOC_CHECK_=2 which causes abort()s. IMHO the better short-term workaround from Linden Lab would be to use MALLOC_CHECK_=1 which still does sanity-checking to protect the heap manager from getting trashed but prints a warning instead of abort()ing. (Interestingly it doesn't seem to notice anything wrong lately here anyway, so the bug may have gone back into hiding.)

Hello Toonie

Registered User

Join date: 25 Jul 2005

Posts: 212

03-03-2006 11:52

From: KittyFox Mistral

As well, that line about being unable to open pilot.txt is a point worth checking, since the glibc corruption crash occurs in fclose.

I've seen fclose() cause a heap error when a file was closed twice or (less likely) an invalid file handle was closed. (That, too, was in a codebase ported from win32!)

glibc double free or memory corruption
KittyFox Mistral Registered User Join date: 17 Oct 2005 Posts: 51	02-26-2006 23:34 Some of you know this error quite well. Now, I've recently started getting it as well. It's odd in this case though, because I've never heard of anyone playing it fine since the day after the first alpha release, and today it starts doing it every time. I haven't updated anything on my system between now and the last time I played. Didn't tweak any SL settings. Didn't play with any system settings. Using gdb, I get this backtrace: CODE (gdb) bt #0 0xffffe410 in __kernel_vsyscall () #1 0xb76dd001 in raise () from /lib/tls/libc.so.6 #2 0xb76de71d in abort () from /lib/tls/libc.so.6 #3 0xb770fed7 in __fsetlocking () from /lib/tls/libc.so.6 #4 0xb7715bf7 in malloc_usable_size () from /lib/tls/libc.so.6 #5 0xb7716639 in free () from /lib/tls/libc.so.6 #6 0xb77073f9 in fclose () from /lib/tls/libc.so.6 #7 0x09dd5509 in main () Using Insight's handy dissambler window, the call it's crashing on immediately preceeded by a call to LLMD5::raw_digest. CODE - 0x9dd54ea <main+11546>: mov $0xa3c65f0,%edx - 0x9dd54ef <main+11551>: lea 0xfffffcd8(%ebp),%edi - 0x9dd54f5 <main+11557>: mov %edx,0x4(%esp) - 0x9dd54f9 <main+11561>: mov %edi,(%esp) - 0x9dd54fc <main+11564>: call 0x819fc30 <_ZN5LLMD510raw_digestEPh> - 0x9dd5501 <main+11569>: mov %ebx,(%esp) - 0x9dd5504 <main+11572>: call 0x8053894 <_ZNSiD0Ev+352> That last call is where it crashes in main(), which is. _ZNSiD0Ev, according to c++filt is: std::basic_istream<char, std::char_traits<char> >::~basic_istream(). Attempts to get any more information have been fruitless so far. One thing I'm curious about though, is the program dies here: CODE 006-02-27T06:07:27Z INFO: GL_VENDOR NVIDIA Corporation 2006-02-27T06:07:27Z GL_RENDERER GeForce FX 5500/AGP/3DNOW! 2006-02-27T06:07:27Z GL_VERSION 2.0.1 NVIDIA 81.78 2006-02-27T06:07:27Z * glibc detected * double free or corruption (!prev): 0x0af88b38 *** It looks like it' about to print another line when it crashes, and I'm interested in what's supposed to print after GL_VERSION (which can help pinpoint where the problem is). In the mean time, I can always hope they got this fixed for the next version, whenever that is. I know I can set MALLOC_CHECK_=0 to disable this (which I explicitly removed from the secondlife startup script), but memory corruption is not something I want to bypass on a whim since it can be indicative of a bigger problem.
KittyFox Mistral Registered User Join date: 17 Oct 2005 Posts: 51	03-02-2006 20:23 Well, both 1.8.4.6 and 1.8.4.7 worked without needing MALLOC_CHECK_=0. Perhaps the bug has been fixed? Or perhaps it's just gone back into hiding. Another reason the "fix" shouldn't have been officially implemented, since we can't know when it's fixed, now. If some kind souls that used to have that problem could try running SL without the memory-check-disable hack, it'd be great. More to the point of the thread, though, here's the portion after the GL_VERSION string. CODE 2006-03-03T04:09:09Z 2006-03-03T04:09:09Z INFO: Viewer Digest: 00000000-0000-0000-0000-000000000000 2006-03-03T04:09:09Z INFO: Couldn't open pilot.txt, aborting agentpilot load! Interesting that the first line is empty after the time stampm and that the digest comes out to all 0's. As well, that line about being unable to open pilot.txt is a point worth checking, since the glibc corruption crash occurs in fclose. Perhaps SL is trying to close a file handle that didn't properly open? If people still have this glibc corruption crash, what happens if you make an empty pilot.txt file and try to run SL?
Mack Echegaray Registered Snoozer Join date: 15 Dec 2005 Posts: 145	03-03-2006 09:41 Heap arena corruption is a pain to track down. Remember that the point at which it crashes often has no relation to the point at which the corruption occurred. Tools like valgrind can help track such issues down, but is fairly worthless without access to the source code.
Hello Toonie Registered User Join date: 25 Jul 2005 Posts: 212	03-03-2006 11:47 From: Mack Echegaray Heap arena corruption is a pain to track down. Remember that the point at which it crashes often has no relation to the point at which the corruption occurred. Ah, but the whole point of MALLOC_CHECK_ is to force warnings or aborts at the point where the error occurred, instead of blindly corrupting the heap (with the mild exception of the app scribbling over its own pointers, where the best glibc can say is 'this isn't a malloc()d pointer' when you later try to free()/realloc() it). MALLOC_CHECK_=0 removes this sanity completely and you're back in potential mysterious heap corruption world. Distros are defaulting to MALLOC_CHECK_=2 which causes abort()s. IMHO the better short-term workaround from Linden Lab would be to use MALLOC_CHECK_=1 which still does sanity-checking to protect the heap manager from getting trashed but prints a warning instead of abort()ing. (Interestingly it doesn't seem to notice anything wrong lately here anyway, so the bug may have gone back into hiding.)
Hello Toonie Registered User Join date: 25 Jul 2005 Posts: 212	03-03-2006 11:52 From: KittyFox Mistral As well, that line about being unable to open pilot.txt is a point worth checking, since the glibc corruption crash occurs in fclose. I've seen fclose() cause a heap error when a file was closed twice or (less likely) an invalid file handle was closed. (That, too, was in a codebase ported from win32!)

Welcome to the Second Life Forums Archive

glibc double free or memory corruption