Re: Opensips 1.11.3 crash (Federico Edorna)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Opensips 1.11.3 crash (Federico Edorna)

Kneeoh
Federico, we are seeing similar crashes with both 1.11.3 and 1.11.4 when the system is under load and we perform back to back fifo get_statistics all commands (about 10 seconds apart). Opensips will segfault. I'm going to try to get some back traces, but for immediate time being, I'm going to go back to a 1.10 branch and see if it happens there too. My suspicion is that something is broken in the mi in 1.11. In order to prove it we'll have to setup a lab and run sipp against it, i can't afford to play with production.


_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: Opensips 1.11.3 crash (Federico Edorna)

Kneeoh
Ok I wasn't able to get a core file generated...looking in to that now. However, at debug 6 here is what I get in my log file when I run opensipsctl fifo get_statistics all | grep udp. The first time it works, the second time I get this and a crash:

Apr  7 14:42:45 osipslb03 /usr/local/sbin/opensips[17299]: DBG:tm:insert_timer_unsafe: [6]: 0x7fee1b3f6ac8 (99000000)
Apr  7 14:42:45 osipslb03 /usr/local/sbin/opensips[17299]: DBG:tm:retransmission_handler: retransmission_handler : done
Apr  7 14:42:45 osipslb03 /usr/local/sbin/opensips[17299]: DBG:tm:utimer_routine: timer routine:5,tl=0x7fee1a9fdb38 next=0x7fee1a9f9d18, timeout=97000000




Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17277]: DBG:mi_fifo:mi_parse_tree: adding node <> ; val <all>
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17277]: DBG:mi_fifo:mi_parse_node: end of input tree
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17277]: DBG:mi_fifo:mi_fifo_server: done parsing the mi tree
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17311]: CRITICAL:core:receive_fd: EOF on 16
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17311]: DBG:core:handle_ser_child: dead child 10, pid 17286 (shutting down?)
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17311]: DBG:core:io_watch_del: io_watch_del op on index -1 16 (0x812d00, 16, -1, 0x0,0x1) fd_no=44 called
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17275]: DBG:core:handle_sigs: status = 139
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17275]: INFO:core:handle_sigs: child process 17286 exited by a signal 11
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17275]: INFO:core:handle_sigs: core was generated
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17275]: INFO:core:handle_sigs: terminating due to SIGCHLD
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17310]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17309]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17308]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17307]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17306]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17305]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17304]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17303]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17302]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17301]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17300]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17277]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17298]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17311]: INFO:core:sig_usr: signal 15 received




On Tuesday, April 7, 2015 10:09 AM, Kneeoh <[hidden email]> wrote:


Federico, we are seeing similar crashes with both 1.11.3 and 1.11.4 when the system is under load and we perform back to back fifo get_statistics all commands (about 10 seconds apart). Opensips will segfault. I'm going to try to get some back traces, but for immediate time being, I'm going to go back to a 1.10 branch and see if it happens there too. My suspicion is that something is broken in the mi in 1.11. In order to prove it we'll have to setup a lab and run sipp against it, i can't afford to play with production.




_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: Opensips 1.11.3 crash (Federico Edorna)

Kneeoh
I've got a back trace on this issue. Who can I send it to?



On Tuesday, April 7, 2015 10:46 AM, Kneeoh <[hidden email]> wrote:


Ok I wasn't able to get a core file generated...looking in to that now. However, at debug 6 here is what I get in my log file when I run opensipsctl fifo get_statistics all | grep udp. The first time it works, the second time I get this and a crash:

Apr  7 14:42:45 osipslb03 /usr/local/sbin/opensips[17299]: DBG:tm:insert_timer_unsafe: [6]: 0x7fee1b3f6ac8 (99000000)
Apr  7 14:42:45 osipslb03 /usr/local/sbin/opensips[17299]: DBG:tm:retransmission_handler: retransmission_handler : done
Apr  7 14:42:45 osipslb03 /usr/local/sbin/opensips[17299]: DBG:tm:utimer_routine: timer routine:5,tl=0x7fee1a9fdb38 next=0x7fee1a9f9d18, timeout=97000000




Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17277]: DBG:mi_fifo:mi_parse_tree: adding node <> ; val <all>
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17277]: DBG:mi_fifo:mi_parse_node: end of input tree
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17277]: DBG:mi_fifo:mi_fifo_server: done parsing the mi tree
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17311]: CRITICAL:core:receive_fd: EOF on 16
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17311]: DBG:core:handle_ser_child: dead child 10, pid 17286 (shutting down?)
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17311]: DBG:core:io_watch_del: io_watch_del op on index -1 16 (0x812d00, 16, -1, 0x0,0x1) fd_no=44 called
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17275]: DBG:core:handle_sigs: status = 139
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17275]: INFO:core:handle_sigs: child process 17286 exited by a signal 11
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17275]: INFO:core:handle_sigs: core was generated
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17275]: INFO:core:handle_sigs: terminating due to SIGCHLD
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17310]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17309]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17308]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17307]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17306]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17305]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17304]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17303]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17302]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17301]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17300]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17277]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17298]: INFO:core:sig_usr: signal 15 received
Apr  7 14:42:49 osipslb03 /usr/local/sbin/opensips[17311]: INFO:core:sig_usr: signal 15 received




On Tuesday, April 7, 2015 10:09 AM, Kneeoh <[hidden email]> wrote:


Federico, we are seeing similar crashes with both 1.11.3 and 1.11.4 when the system is under load and we perform back to back fifo get_statistics all commands (about 10 seconds apart). Opensips will segfault. I'm going to try to get some back traces, but for immediate time being, I'm going to go back to a 1.10 branch and see if it happens there too. My suspicion is that something is broken in the mi in 1.11. In order to prove it we'll have to setup a lab and run sipp against it, i can't afford to play with production.






_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users