segmentation fault when calling pkg_malloc

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

segmentation fault when calling pkg_malloc

Chevio
Hi, All,

I am getting a segmentation fault when calling pkg_malloc from a custom module in opensips 1.4.4 notls,

here is the offending code :

int tncgw_ip_lookup(struct sip_msg* msg, char* _para1, char* _para2)
{

        str  _user_ip;
        db_res_t* db_res = NULL;
        char* strsql=NULL;
        char* to_prefix=NULL;
        char* new_uri=NULL;
        char* new_to=NULL;
        str strsqlstr;
        str techprefix;
        str to;
        int retval=-1;
        int dbg=1;

        if(dbg)LM_INFO("TNCGW ip_address_lookup 2009.03.11.a beta\n");

        techprefix.s=NULL;
        _user_ip.s=NULL;

        if(dbg) LM_INFO("---Memory allocation\n");

        strsql=pkg_malloc(100);
        new_to=pkg_malloc(MAX_URI_SIZE); ## this is the line 97
        new_uri=pkg_malloc(MAX_URI_SIZE);
        to_prefix=pkg_malloc(8);


and here is the  back trace

Program terminated with signal 11, Segmentation fault.
[New process 32735]
#0  fm_malloc (qm=0x8183b00, size=<value optimized out>) at mem/f_malloc.c:125
125                             if (frag->size <= (*f)->size) break;
(gdb) bt
#0  fm_malloc (qm=0x8183b00, size=<value optimized out>) at mem/f_malloc.c:125
#1  0xb7ab0873 in tncgw_ip_lookup (msg=0x8196ed8, _para1=0x0, _para2=0x0) at tncgw.c:97
#2  0x08055275 in do_action (a=0x818d278, msg=0x8196ed8) at action.c:845
#3  0x08054172 in run_action_list (a=0x818d278, msg=0x8196ed8) at action.c:138
#4  0x080a0516 in eval_expr (e=0x818d2e0, msg=0x8196ed8, val=0xbf8d4ef8) at route.c:1133
#5  0x08053e2f in do_assign (msg=0x8196ed8, a=0x818d308) at action.c:207
#6  0x080549f5 in do_action (a=0x818d308, msg=0x8196ed8) at action.c:951
#7  0x08054172 in run_action_list (a=0x818d308, msg=0x8196ed8) at action.c:138
#8  0x08056845 in do_action (a=0x818d8c0, msg=0x8196ed8) at action.c:717
#9  0x08054172 in run_action_list (a=0x8189fd0, msg=0x8196ed8) at action.c:138
#10 0x080577f4 in run_top_route (a=0x8189fd0, msg=0x8196ed8) at action.c:118

does it mean I ran out of memory? if that is the case pkg_malloc should return a 0 instead of crashing.

I will appreciate any help.

Chevio
Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault when calling pkg_malloc

Sergio Gutierrez

Hello Chevio.

Just to be sure, perform the corresponding casts when assgining. For example

...

       strsql=(char*)pkg_malloc(100);

After calling pkg_malloc, you should evaluate the return value; it returns NULL if memory allocation could not be performed. That would be a way to determine whether you exhausted memory.

Best regards.

Sergio.


On Wed, Mar 11, 2009 at 5:10 PM, Chevio <[hidden email]> wrote:

Hi, All,

I am getting a segmentation fault when calling pkg_malloc from a custom
module in opensips 1.4.4 notls,

here is the offending code :

int tncgw_ip_lookup(struct sip_msg* msg, char* _para1, char* _para2)
{

       str  _user_ip;
       db_res_t* db_res = NULL;
       char* strsql=NULL;
       char* to_prefix=NULL;
       char* new_uri=NULL;
       char* new_to=NULL;
       str strsqlstr;
       str techprefix;
       str to;
       int retval=-1;
       int dbg=1;

       if(dbg)LM_INFO("TNCGW ip_address_lookup 2009.03.11.a beta\n");

       techprefix.s=NULL;
       _user_ip.s=NULL;

       if(dbg) LM_INFO("---Memory allocation\n");

       strsql=pkg_malloc(100);
       new_to=pkg_malloc(MAX_URI_SIZE); ## this is the line 97
       new_uri=pkg_malloc(MAX_URI_SIZE);
       to_prefix=pkg_malloc(8);


and here is the  back trace

Program terminated with signal 11, Segmentation fault.
[New process 32735]
#0  fm_malloc (qm=0x8183b00, size=<value optimized out>) at
mem/f_malloc.c:125
125                             if (frag->size <= (*f)->size) break;
(gdb) bt
#0  fm_malloc (qm=0x8183b00, size=<value optimized out>) at
mem/f_malloc.c:125
#1  0xb7ab0873 in tncgw_ip_lookup (msg=0x8196ed8, _para1=0x0, _para2=0x0) at
tncgw.c:97
#2  0x08055275 in do_action (a=0x818d278, msg=0x8196ed8) at action.c:845
#3  0x08054172 in run_action_list (a=0x818d278, msg=0x8196ed8) at
action.c:138
#4  0x080a0516 in eval_expr (e=0x818d2e0, msg=0x8196ed8, val=0xbf8d4ef8) at
route.c:1133
#5  0x08053e2f in do_assign (msg=0x8196ed8, a=0x818d308) at action.c:207
#6  0x080549f5 in do_action (a=0x818d308, msg=0x8196ed8) at action.c:951
#7  0x08054172 in run_action_list (a=0x818d308, msg=0x8196ed8) at
action.c:138
#8  0x08056845 in do_action (a=0x818d8c0, msg=0x8196ed8) at action.c:717
#9  0x08054172 in run_action_list (a=0x8189fd0, msg=0x8196ed8) at
action.c:138
#10 0x080577f4 in run_top_route (a=0x8189fd0, msg=0x8196ed8) at action.c:118

does it mean I ran out of memory? if that is the case pkg_malloc should
return a 0 instead of crashing.

I will appreciate any help.

Chevio
--
View this message in context: http://n2.nabble.com/segmentation-fault-when-calling-pkg_malloc-tp2463853p2463853.html
Sent from the OpenSIPS - Users mailing list archive at Nabble.com.


_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users



--
Sergio Gutiérrez

_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault when calling pkg_malloc

Iñaki Baz Castillo
El Miércoles, 11 de Marzo de 2009, Sergio Gutierrez escribió:
> After calling pkg_malloc, you should evaluate the return value; it returns
> NULL if memory allocation could not be performed. That would be a way to
> determine whether you exhausted memory.

True, but still I see no reason for a segmentation fault even if pkg_malloc
didn't success. Note that the segmentation fault occurs in the line 97
(where 'pkg_malloc' is invoked), not when trying to use it later (which would
make sense).

Regards.

--
Iñaki Baz Castillo

_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault when calling pkg_malloc

Sergio Gutierrez

Hi Iñaki.

You are right. I neither see the reason of crashing; that is why I said "just to be sure" (I have seen strange thins in compilers)  :-)

Chevio, please, could you print the value of size at frame 0?

Regards.

Sergio


On Wed, Mar 11, 2009 at 5:45 PM, Iñaki Baz Castillo <[hidden email]> wrote:
El Miércoles, 11 de Marzo de 2009, Sergio Gutierrez escribió:
> After calling pkg_malloc, you should evaluate the return value; it returns
> NULL if memory allocation could not be performed. That would be a way to
> determine whether you exhausted memory.

True, but still I see no reason for a segmentation fault even if pkg_malloc
didn't success. Note that the segmentation fault occurs in the line 97
(where 'pkg_malloc' is invoked), not when trying to use it later (which would
make sense).

Regards.

--
Iñaki Baz Castillo

_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users



--
Sergio Gutiérrez

_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault when calling pkg_malloc

Vasil Kolev
In reply to this post by Chevio
В 14:10 -0700 на 11.03.2009 (ср), Chevio написа:

> Hi, All,
>
> I am getting a segmentation fault when calling pkg_malloc from a custom
> module in opensips 1.4.4 notls,
>
> here is the offending code :
>
> int tncgw_ip_lookup(struct sip_msg* msg, char* _para1, char* _para2)
> {
>
> str  _user_ip;
> db_res_t* db_res = NULL;
> char* strsql=NULL;
> char* to_prefix=NULL;
> char* new_uri=NULL;
> char* new_to=NULL;
> str strsqlstr;
> str techprefix;
> str to;
> int retval=-1;
> int dbg=1;
>
> if(dbg)LM_INFO("TNCGW ip_address_lookup 2009.03.11.a beta\n");
>
> techprefix.s=NULL;
> _user_ip.s=NULL;
>
> if(dbg) LM_INFO("---Memory allocation\n");
>
> strsql=pkg_malloc(100);
> new_to=pkg_malloc(MAX_URI_SIZE); ## this is the line 97
> new_uri=pkg_malloc(MAX_URI_SIZE);
> to_prefix=pkg_malloc(8);
>
>
> and here is the  back trace
>
> Program terminated with signal 11, Segmentation fault.
> [New process 32735]
> #0  fm_malloc (qm=0x8183b00, size=<value optimized out>) at
> mem/f_malloc.c:125
> 125                             if (frag->size <= (*f)->size) break;
> (gdb) bt
> #0  fm_malloc (qm=0x8183b00, size=<value optimized out>) at
> mem/f_malloc.c:125
> #1  0xb7ab0873 in tncgw_ip_lookup (msg=0x8196ed8, _para1=0x0, _para2=0x0) at
> tncgw.c:97
> #2  0x08055275 in do_action (a=0x818d278, msg=0x8196ed8) at action.c:845
> #3  0x08054172 in run_action_list (a=0x818d278, msg=0x8196ed8) at
> action.c:138
> #4  0x080a0516 in eval_expr (e=0x818d2e0, msg=0x8196ed8, val=0xbf8d4ef8) at
> route.c:1133
> #5  0x08053e2f in do_assign (msg=0x8196ed8, a=0x818d308) at action.c:207
> #6  0x080549f5 in do_action (a=0x818d308, msg=0x8196ed8) at action.c:951
> #7  0x08054172 in run_action_list (a=0x818d308, msg=0x8196ed8) at
> action.c:138
> #8  0x08056845 in do_action (a=0x818d8c0, msg=0x8196ed8) at action.c:717
> #9  0x08054172 in run_action_list (a=0x8189fd0, msg=0x8196ed8) at
> action.c:138
> #10 0x080577f4 in run_top_route (a=0x8189fd0, msg=0x8196ed8) at action.c:118
>
> does it mean I ran out of memory? if that is the case pkg_malloc should
> return a 0 instead of crashing.


This doesn't seem like running out of memory, more like a memory
corruption. The first thing to check is if either 'frag' or 'f' are NULL
or invalid (e.g. in gdb do "print f", "print frag" and see what does
that say). After that try dereferencing them, seeing how could they get
these values, etc.

If this isn't reproducible every time, it might also mean a race
condition, that something else is fucking up the situation, in which
case _probably_ valgrind can help (although I never had a lot of luck
using it on opensips).

--
Regards,
Vasil Kolev
Attractel NV
dCAP #1324, LPIC2


_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault when calling pkg_malloc

Bogdan-Andrei Iancu
Hi Chevio, Hi Kolev,

That is true what Kolev says - most probably you have a memory overwrite
somewhere and you are messing up the data structure of the memory
manager. It cannot be a race as the pkg memory is per process, so it is
not shared.

To try to catch the bug, enable memory debugger - it will try to detect
and report mem overflow, double free, etc...See:
http://www.opensips.org/pmwiki.php?n=Resources.DocsTsMem
"how to handle it" chapter

Regards,
Bogdan

Vasil Kolev wrote:

> В 14:10 -0700 на 11.03.2009 (ср), Chevio написа:
>  
>> Hi, All,
>>
>> I am getting a segmentation fault when calling pkg_malloc from a custom
>> module in opensips 1.4.4 notls,
>>
>> here is the offending code :
>>
>> int tncgw_ip_lookup(struct sip_msg* msg, char* _para1, char* _para2)
>> {
>>
>> str  _user_ip;
>> db_res_t* db_res = NULL;
>> char* strsql=NULL;
>> char* to_prefix=NULL;
>> char* new_uri=NULL;
>> char* new_to=NULL;
>> str strsqlstr;
>> str techprefix;
>> str to;
>> int retval=-1;
>> int dbg=1;
>>
>> if(dbg)LM_INFO("TNCGW ip_address_lookup 2009.03.11.a beta\n");
>>
>> techprefix.s=NULL;
>> _user_ip.s=NULL;
>>
>> if(dbg) LM_INFO("---Memory allocation\n");
>>
>> strsql=pkg_malloc(100);
>> new_to=pkg_malloc(MAX_URI_SIZE); ## this is the line 97
>> new_uri=pkg_malloc(MAX_URI_SIZE);
>> to_prefix=pkg_malloc(8);
>>
>>
>> and here is the  back trace
>>
>> Program terminated with signal 11, Segmentation fault.
>> [New process 32735]
>> #0  fm_malloc (qm=0x8183b00, size=<value optimized out>) at
>> mem/f_malloc.c:125
>> 125                             if (frag->size <= (*f)->size) break;
>> (gdb) bt
>> #0  fm_malloc (qm=0x8183b00, size=<value optimized out>) at
>> mem/f_malloc.c:125
>> #1  0xb7ab0873 in tncgw_ip_lookup (msg=0x8196ed8, _para1=0x0, _para2=0x0) at
>> tncgw.c:97
>> #2  0x08055275 in do_action (a=0x818d278, msg=0x8196ed8) at action.c:845
>> #3  0x08054172 in run_action_list (a=0x818d278, msg=0x8196ed8) at
>> action.c:138
>> #4  0x080a0516 in eval_expr (e=0x818d2e0, msg=0x8196ed8, val=0xbf8d4ef8) at
>> route.c:1133
>> #5  0x08053e2f in do_assign (msg=0x8196ed8, a=0x818d308) at action.c:207
>> #6  0x080549f5 in do_action (a=0x818d308, msg=0x8196ed8) at action.c:951
>> #7  0x08054172 in run_action_list (a=0x818d308, msg=0x8196ed8) at
>> action.c:138
>> #8  0x08056845 in do_action (a=0x818d8c0, msg=0x8196ed8) at action.c:717
>> #9  0x08054172 in run_action_list (a=0x8189fd0, msg=0x8196ed8) at
>> action.c:138
>> #10 0x080577f4 in run_top_route (a=0x8189fd0, msg=0x8196ed8) at action.c:118
>>
>> does it mean I ran out of memory? if that is the case pkg_malloc should
>> return a 0 instead of crashing.
>>    
>
>
> This doesn't seem like running out of memory, more like a memory
> corruption. The first thing to check is if either 'frag' or 'f' are NULL
> or invalid (e.g. in gdb do "print f", "print frag" and see what does
> that say). After that try dereferencing them, seeing how could they get
> these values, etc.
>
> If this isn't reproducible every time, it might also mean a race
> condition, that something else is fucking up the situation, in which
> case _probably_ valgrind can help (although I never had a lot of luck
> using it on opensips).
>
>  


_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault when calling pkg_malloc

Chevio
In reply to this post by Chevio
Thank you all for looking at the problem.

Chevio

Chevio wrote
Hi, All,

I am getting a segmentation fault when calling pkg_malloc from a custom module in opensips 1.4.4 notls,

here is the offending code :

int tncgw_ip_lookup(struct sip_msg* msg, char* _para1, char* _para2)
{

        str  _user_ip;
        db_res_t* db_res = NULL;
        char* strsql=NULL;
        char* to_prefix=NULL;
        char* new_uri=NULL;
        char* new_to=NULL;
        str strsqlstr;
        str techprefix;
        str to;
        int retval=-1;
        int dbg=1;

        if(dbg)LM_INFO("TNCGW ip_address_lookup 2009.03.11.a beta\n");

        techprefix.s=NULL;
        _user_ip.s=NULL;

        if(dbg) LM_INFO("---Memory allocation\n");

        strsql=pkg_malloc(100);
        new_to=pkg_malloc(MAX_URI_SIZE); ## this is the line 97
        new_uri=pkg_malloc(MAX_URI_SIZE);
        to_prefix=pkg_malloc(8);


and here is the  back trace

Program terminated with signal 11, Segmentation fault.
[New process 32735]
#0  fm_malloc (qm=0x8183b00, size=<value optimized out>) at mem/f_malloc.c:125
125                             if (frag->size <= (*f)->size) break;
(gdb) bt
#0  fm_malloc (qm=0x8183b00, size=<value optimized out>) at mem/f_malloc.c:125
#1  0xb7ab0873 in tncgw_ip_lookup (msg=0x8196ed8, _para1=0x0, _para2=0x0) at tncgw.c:97
#2  0x08055275 in do_action (a=0x818d278, msg=0x8196ed8) at action.c:845
#3  0x08054172 in run_action_list (a=0x818d278, msg=0x8196ed8) at action.c:138
#4  0x080a0516 in eval_expr (e=0x818d2e0, msg=0x8196ed8, val=0xbf8d4ef8) at route.c:1133
#5  0x08053e2f in do_assign (msg=0x8196ed8, a=0x818d308) at action.c:207
#6  0x080549f5 in do_action (a=0x818d308, msg=0x8196ed8) at action.c:951
#7  0x08054172 in run_action_list (a=0x818d308, msg=0x8196ed8) at action.c:138
#8  0x08056845 in do_action (a=0x818d8c0, msg=0x8196ed8) at action.c:717
#9  0x08054172 in run_action_list (a=0x8189fd0, msg=0x8196ed8) at action.c:138
#10 0x080577f4 in run_top_route (a=0x8189fd0, msg=0x8196ed8) at action.c:118

does it mean I ran out of memory? if that is the case pkg_malloc should return a 0 instead of crashing.

I will appreciate any help.

Chevio
Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault when calling pkg_malloc

Dan Pascu
In reply to this post by Sergio Gutierrez
On Thursday 12 March 2009, Sergio Gutierrez wrote:
> Hi Iñaki.
>
> You are right. I neither see the reason of crashing; that is why I said
> "just to be sure" (I have seen strange thins in compilers)  :-)

The backtrace clearly indicates that the segfault happens in
mem/f_malloc.c, which is inside the memory allocator. It has nothing to
do with his module code, other than the fact that it seems to be the
trigger.

>
> Chevio, please, could you print the value of size at frame 0?
>
> Regards.
>
> Sergio
>
> On Wed, Mar 11, 2009 at 5:45 PM, Iñaki Baz Castillo <[hidden email]>
wrote:

> > El Miércoles, 11 de Marzo de 2009, Sergio Gutierrez escribió:
> > > After calling pkg_malloc, you should evaluate the return value; it
> >
> > returns
> >
> > > NULL if memory allocation could not be performed. That would be a
> > > way to determine whether you exhausted memory.
> >
> > True, but still I see no reason for a segmentation fault even if
> > pkg_malloc didn't success. Note that the segmentation fault occurs in
> > the line 97 (where 'pkg_malloc' is invoked), not when trying to use
> > it later (which would
> > make sense).
> >
> > Regards.
> >
> > --
> > Iñaki Baz Castillo
> >
> > _______________________________________________
> > Users mailing list
> > [hidden email]
> > http://lists.opensips.org/cgi-bin/mailman/listinfo/users



--
Dan

_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault when calling pkg_malloc

Dan Pascu
In reply to this post by Vasil Kolev
On Thursday 12 March 2009, Vasil Kolev wrote:
> This doesn't seem like running out of memory, more like a memory
> corruption. The first thing to check is if either 'frag' or 'f' are
> NULL or invalid (e.g. in gdb do "print f", "print frag" and see what
> does that say). After that try dereferencing them, seeing how could
> they get these values, etc.

I've seen this thing before. I'm willing to bet that the pointers are not
NULL. They are most likely composed of ascii byte codes from some SIP
message header (as a result of the allocator memory being overwritten by
part of the processed message).

Unfortunately I have no idea what causes this, and the only solution to a
similar problem I was experiencing was to replace the opensips private
memory allocator with the system memory allocator when building opensips.

Maybe the thread author can try using the system memory allocator to see
if the problem goes away. If so, we definitely have a problem with the
memory allocator. If not, he may have a different issue somewhere else in
his code.

--
Dan

_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault when calling pkg_malloc

Dan Pascu
In reply to this post by Bogdan-Andrei Iancu
On Thursday 12 March 2009, Bogdan-Andrei Iancu wrote:

> Hi Chevio, Hi Kolev,
>
> That is true what Kolev says - most probably you have a memory
> overwrite somewhere and you are messing up the data structure of the
> memory manager. It cannot be a race as the pkg memory is per process,
> so it is not shared.
>
> To try to catch the bug, enable memory debugger - it will try to detect
> and report mem overflow, double free, etc...See:
> http://www.opensips.org/pmwiki.php?n=Resources.DocsTsMem
> "how to handle it" chapter

Bogdan,

If you remember we have seen this before. I had a similar issue with
segfaults in the memory allocator when using pkg_malloc. It happened in
various cases: when building a stateless reply to a REGISTER, when
processing a reply belonging to a transaction. It didn't seem to
originate from one particular part of the code, but it always ended in
f_malloc.c giving a segfault. If you remember we tried unsuccessfully to
trace it, but we came up empty handed. It still remains a mystery where
the memory allocator internal structures were overwritten, but the end
result was always the same: some internal pkg_malloc pointers were
containing remnants of ascii bytes from the processed SIP message and
trying to dereference them resulted in a segfault. As I said, this didn't
happen in some custom module, but all over the standard opensips code.

The only way to solve it was to switch to using the system memory
allocator for private memory. This leads me to believe that we have a
subtle bug in the memory allocator, that may have been dormant until now,
but recent changes in some other part of the code may have take it out of
its slumber.

--
Dan

_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault when calling pkg_malloc

Vasil Kolev
In reply to this post by Dan Pascu
В 05:43 +0200 на 15.03.2009 (нд), Dan Pascu написа:

> On Thursday 12 March 2009, Vasil Kolev wrote:
> > This doesn't seem like running out of memory, more like a memory
> > corruption. The first thing to check is if either 'frag' or 'f' are
> > NULL or invalid (e.g. in gdb do "print f", "print frag" and see what
> > does that say). After that try dereferencing them, seeing how could
> > they get these values, etc.
>
> I've seen this thing before. I'm willing to bet that the pointers are not
> NULL. They are most likely composed of ascii byte codes from some SIP
> message header (as a result of the allocator memory being overwritten by
> part of the processed message).
>
> Unfortunately I have no idea what causes this, and the only solution to a
> similar problem I was experiencing was to replace the opensips private
> memory allocator with the system memory allocator when building opensips.
>
> Maybe the thread author can try using the system memory allocator to see
> if the problem goes away. If so, we definitely have a problem with the
> memory allocator. If not, he may have a different issue somewhere else in
> his code.
>

"Oh shit" seems like the appropriate response :)
(I'm not the thread author, but this would seem to affect me too)

Anyway, there seem to be three different memory allocators in opensips -
vqm_malloc, fm_malloc and qm_malloc (this is from looking in mem/mem.h),
is it possible that one has an issue and the rest not? Now the issue
might be that the system memory allocator is more resilient to some
strange type of corruption (for example, single byte overflow of
strings) and doesn't have this weird issue...

I'll probably have to look up a way to track opensips with valgrind (as
when using its allocator, it won't find the overflows) because I've also
seen such issues some time before (ultimately they turned out to be the
fault of the db_postgres module).

--
Regards,
Vasil Kolev
Attractel NV
dCAP #1324, LPIC2


_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault when calling pkg_malloc

Dan Pascu
On Sunday 15 March 2009, Vasil Kolev wrote:

> В 05:43 +0200 на 15.03.2009 (нд), Dan Pascu написа:
> > On Thursday 12 March 2009, Vasil Kolev wrote:
> > > This doesn't seem like running out of memory, more like a memory
> > > corruption. The first thing to check is if either 'frag' or 'f' are
> > > NULL or invalid (e.g. in gdb do "print f", "print frag" and see
> > > what does that say). After that try dereferencing them, seeing how
> > > could they get these values, etc.
> >
> > I've seen this thing before. I'm willing to bet that the pointers are
> > not NULL. They are most likely composed of ascii byte codes from some
> > SIP message header (as a result of the allocator memory being
> > overwritten by part of the processed message).
> >
> > Unfortunately I have no idea what causes this, and the only solution
> > to a similar problem I was experiencing was to replace the opensips
> > private memory allocator with the system memory allocator when
> > building opensips.
> >
> > Maybe the thread author can try using the system memory allocator to
> > see if the problem goes away. If so, we definitely have a problem
> > with the memory allocator. If not, he may have a different issue
> > somewhere else in his code.
>
> "Oh shit" seems like the appropriate response :)
> (I'm not the thread author, but this would seem to affect me too)
>
> Anyway, there seem to be three different memory allocators in opensips
> - vqm_malloc, fm_malloc and qm_malloc (this is from looking in
> mem/mem.h), is it possible that one has an issue and the rest not? Now
> the issue might be that the system memory allocator is more resilient
> to some strange type of corruption (for example, single byte overflow
> of strings) and doesn't have this weird issue...

The private memory pool used by the internal memory allocator is a 1 MB
area right after the area where static variables are stored. It may be
possible that there is a buffer overflow when writing to some static
variables that gets to write in this pool overwriting the allocator
internal structure. The system memory doesn't have this problem as it is
stored somewhere else. Such a buffer overflow will even go unnoticed by
memory tracing tools as it sees that the write is done in the 1MB that
the program allocated, having no clue that the area is actually managed
by another memory allocator that can get corrupted.

At the same time I do not imply that this is what happens. The way the
memory is corrupted doesn't suggest this pattern as the private memory is
not overwritten from the beginning. It looks more like specific areas are
overwritten as if the memory allocator thinks that the memory area was
released and gives it to someone else on the next call to pkg_malloc,
resulting in 2 different pointers to the area competing for the memory.

--
Dan

_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault when calling pkg_malloc

Vasil Kolev
В 19:32 +0200 на 16.03.2009 (пн), Dan Pascu написа:

> The private memory pool used by the internal memory allocator is a 1 MB
> area right after the area where static variables are stored. It may be
> possible that there is a buffer overflow when writing to some static
> variables that gets to write in this pool overwriting the allocator
> internal structure. The system memory doesn't have this problem as it is
> stored somewhere else. Such a buffer overflow will even go unnoticed by
> memory tracing tools as it sees that the write is done in the 1MB that
> the program allocated, having no clue that the area is actually managed
> by another memory allocator that can get corrupted.
>

If a static variable is overwriting the memory buffer, this should be
easily visible with valgrind - they're both separate allocations (the
pool is a static one too). Now, if there's something that does overflow
in the pool, that will go unnoticed by default, but I think there should
be some way to make valgrind understand other mallocs (probably via an
API to say "i'm an allocator, here's what I did").

> At the same time I do not imply that this is what happens. The way the
> memory is corrupted doesn't suggest this pattern as the private memory is
> not overwritten from the beginning. It looks more like specific areas are
> overwritten as if the memory allocator thinks that the memory area was
> released and gives it to someone else on the next call to pkg_malloc,
> resulting in 2 different pointers to the area competing for the memory.
>

Another question is, is there a bug at all, e.g. are there any
definitive reports? All the memory corruption problems I've seen can be
attributed either to bugs I've introduced or the issues with the
non-null terminated BLOBs from Postgres and the zero-length results
(again from postgres).

--
Regards,
Vasil Kolev
Attractel NV
dCAP #1324, LPIC2


_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault when calling pkg_malloc

Dan Pascu
On Monday 16 March 2009, Vasil Kolev wrote:
> Another question is, is there a bug at all, e.g. are there any
> definitive reports? All the memory corruption problems I've seen can be

If you can count the fact that opensips 1.4.x was crashing on me every
couple of days until I switched to the system malloc, after which I've
never seen them again, then yes I think there are reports. Also consider
the report of the thread initiator (which may or may not be the same
thing, since he mentions he uses a custom module developend in house,
which may have its own problems), but it looks suspiciously similar.

> attributed either to bugs I've introduced or the issues with the
> non-null terminated BLOBs from Postgres and the zero-length results
> (again from postgres).

I never used postgres. Besides the segfaults were triggered in the most
unexpected places, doing the simplest of things that never involved any
database access (like during a stateless 401 reply to a REGISTER).

--
Dan

_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: segmentation fault when calling pkg_malloc

Dan Pascu
On Wednesday 18 March 2009, Dan Pascu wrote:
> On Monday 16 March 2009, Vasil Kolev wrote:
> > Another question is, is there a bug at all, e.g. are there any
> > definitive reports? All the memory corruption problems I've seen can
> > be
>
> If you can count the fact that opensips 1.4.x was crashing on me every
> couple of days until I switched to the system malloc, after which I've
> never seen them again, then yes I think there are reports.

One more thing I forgot to mention. I've never seen this issue with
openser-1.3. It first appeared when I tried to upgrade to opensips-1.4. I
used the same configuration, so the only thing that changed in this
equation is the sip proxy itself.

Something changed between 1.3 and 1.4 that triggered this, but it's not
clear what and where. It seems to be some side effect of another change
as the memory allocator code wasn't touched since.

--
Dan

_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users