Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Submitted By: John Riordan (john_riordan)
Assigned to: Bogdan-Andrei Iancu (bogdan_iancu)
Summary: extend tm uac functions to support tmcb release function
Currently there does not appear to be a reliable way to cleanup callback parameters passed to the tm uac functions - t_request, t_request_within, etc.. This patch extends the tm uac functions to support the tmcb release function for tm callback parameter cleanup. This is based on work done earlier which added the infrastructure necessary to do proper cleanup and here we are extending the existing uac functions to use that earlier work done in...
Author: dan_pascu <dan_pascu@689a6050-402a-0410-94f2-e92a70836424>
Date: Thu Nov 20 18:55:23 2008 +0000
Added infrastructure for tm callback parameter cleanup. Fixed memory corruption caused by invalid freeing of dialogs
Currently, callback parameters cannot be reliably cleaned up. For example, when using t_request to send an INVITE and the uas responds with a 200 but then the ACK(s) gets lost and the uas continues sending 200s the callback is called repeatedly with an indication that the request was completed so one is faced with either not freeing the callback parameters (leaking memory) or freeing them repeatedly (ah, umm).
Regardless, proper cleanup can best be done by utilizing the release function infrastructure.
trying to follow the numbering you did for my comments :)
1) you do not need lock as some of the callback are already synced by TM -
so you cannot have 2 callbacks executed in the same time....but this is not
true for all callbacks, so I agree this is a reason.
2) true - exactly what I mentioned in 1)
3) I agree with your arguments
4) good example, indeed
5) right, the TMCB_TRANS_DELETED callback is the best place to cleanup
stuff, but I think you want cleanup per callback and not callback per
transaction. This is a new callback I added 2 years ago for the dialog
module (this module is also using the callback for getting events for the
dialog state machine). But your final point is good - you cannot use this
callback with local transactions as you cannot install it :(
6) make sense to me - what can I say...you convince me, so I will start
overviewing the patch for integrating on SVN.
7 - dlg_dridge() ) not sure what this function does, but I guess somebody
without a good understanding of how the dialog and TM modules work just
pushed it there..
Comment By: John Riordan (john_riordan)
Date: 2009-05-10 07:02
Here is another example in evidence how getting this right is hard for
people. While I haven't tested it, a call to dlg_bridge() in the linked
code looks like it will cause a double free the first time an ack is lost
and the uas sends another 200 (and then triple free and so on as the 200s
come in). I suspect the author of this function would not have fallen into
this trap if t_request() presented an interface that took a release
Comment By: John Riordan (john_riordan)
Date: 2009-05-10 06:07
Well, I'll do my best as passionate patch proponent, but first I want to
mention that I don't have my head anywhere near completely around the tm
module so please forgive me if I end up making statements that turn out to
be patently false. And secondly, this is going to be a ramble and I'm
afraid I don't have the time to edit it, proof read it, or even spell check
it so please forgive me in advance for slaughtering the art of writing.
1) Cleanup on first callback could lead to double free unless lock is used
local_reply() in t_reply.c releases the REPLY_LOCK prior to calling the
running the callbacks for TMCB_LOCAL_COMPLETED. So without locking in the
callback, the series of 200s (from a badly behaving uas for example) could
still trigger a double free from the race condition (ie. proc1 check NULL,
proc 1 free, proc2 check NULL, proc 2 free, proc 1 set param NULL, proc 2
set param NULL). This would obviously be bad. While I know that under
normal circumstances this senario would likely not occur, but it would not
be hard for a determined uas to bring this scenario to life and I'd rather
sleep well knowing that double free simply will not happen.
2) CPL proxy example not a good comparison
The example you referenced (cpl_proxy.h , reply_callback()) doesn't make
for an apples to apples comparision since in that case the callback is
trigger cleanup on TMCB_RESPONSE_OUT which relay_reply() (t_reply.c)
guarantees will only be sent once (sets local var totag_retr inside
REPLY_LOCK and uses it as a test for triggering the callback).
3) Using cleanup callback (release function) makes for a good api
The general approach of using a release function has merit in this case.
When an api is defined so that a callback is called once, the clean up of a
callback parameter is straight forward. But when an api is defined so that
a callback can be registered for multiple events and called repeatedly in
different ways depending on external circumstances, a release function
allows for straight forward clean up by the api consumer. The api consumer
can simply trust the api will do the right thing and call the release
function once whenever callback cleanup needs to be done (transaction ends,
system shutdown, whatever, etc). Furthermore it make it straight forward
for the api maintainer to make future changes and updates to the api
without worrying about who the consumers might be depending on event
sequences for cleanup.
4) Alternative makes for a bad api
The alternative of defining the api along the lines of "register your
callback and if you are doing such and such you can cleanup when this
happens, but if you are doing this then you should do you cleanup like..."
leads, in my opinion, to confusion and bugs and problems making changes in
the future. It is certainly harder for the api consumer and leads to the
following kind of thing "ok, if my callback can be called multiple times
then I need to figure out when to cleanup ... is the api expecting the
parameter pointer to by constant? ... I could check the source code ... can
multiple processes be running my callback at the same time ... is the api
locking to prevent that? ... I ccould check the source code ... if I do it
this way will the api change in the future so that my approach doesn't
work? ... well I certainly don't want to double free because then that will
cause memory corruption and bring down the whole server and I don't want my
module doing that on people and those kinds of problems are really hard to
track down so I best start reading the source code to see what is is doing
so I don't mess it up ... who this tm module is complicated ... timers and
mutexes ... this sucks ... sigh ... maybe I'll just call free and hope for
the best ... no maybe I just won't free at all and leak memory which at
least won't crash it right away".
For example, a current comment at the top of uac_cb() in seas_action.c
seas module (a callback passed to t_request_within)...
/ * ...
* TODO WARNING !!! there's a clear MEMORY LEAK here, see exit: at the
* the function... it should free ev_info !!!!!!!!
* I have disabled the free() because It may be that we receive a
retransmitted 200 OK
* if the ACK gets lost, that 200 OK will make SER invoke this callback a
second,third, etc time...
5) Trying to make the single callback approch work leads ugly land
Update! It looks like the seas code is now registering for
TMCB_TRANS_DELETED which is an undocumented thing that does appear to do
right thing (based on what can tell by pouring over the source code) and
callback at a desirable time for cleanup (expect during shutdown - maybe a
bug and should it should be moved from free_cell() into empty_tmcb_list()).
Of course the api consumer needs to go trolling through the source code to
ever realize that it exists and then has to wonder if it will exist in the
future because, well, it is undocumented and buried and only appears to be
used by this seas module thing which is, well, ah. umm. And lets see how
using this TMCB_DELETED thing is being done - first call t_request_within
and then when the first TMCB_LOCAL_COMPLETED shows up register for
TMCB_TRANS_DELETED and pass my cb param to that instead of cleaning it up
now... ummm, but wait isn't there a 200s race condition here and we could
end up registering two callbacks? ahh.. and boy isn't this ugly register
one callback and then register another and it still doesn't do...
6) Cleanup callback (release function) plain and simple just works well
Wouldn't it be great if the api just provided the ability to pass in
cleanup callback? Simple and elegant. And if it did, I the api consumer
could just use it and not have to read tm source code for hours and
everything would just work - now and in the future - because I can trust
that the api maintainer will called exactly once and from exactly the right
time/place. (Note that is exactly what the prior commited by dan_pascu did
- the release function infrastructure is already in place and gets called
in empty_tmcb_list() exactly once when the callbacks are getting cleaned
up). And there are no race conditions to worry about and no double frees.
And the api maintainer doesn't have to worry about messing any consumers up
in the future if internals need to change etc. And the api consumers (motly
crew that they are) will not go off doing ugly, dumb, and buggy stuff which
ends up in modules that end users include thinking it will give then this
great new feature they want but just ends up causing random crashes.
So, in conclusion, having the api consumer provide the api with a release
function is a good solution which works well for the api maintainer, the
api consumer, and for the health of the entire project (I'm really spinning
And finally, I do need to sleep at night, so can you please confirm that
at a minimum I will need to lock before I free if I attempt to go the "free
and set to NULL on first LOCAL_COMPLETED" route to make sure I never double
I'm not 100% sure this is needed.
Currently all local UAC transaction do trigger a TMCB_LOCAL_COMPLETED
callback when the transaction is completed. So you can use this callback to
cleanup whatever params you attached to the transaction.
In regards to the multiple 200 OK (retransmissions), the registered
callback function should first check if the received param is NULL and if
so, to exit. If not NULL, it should free it and set it to NULL to avoid
As an example, see in the CPL module, the cpl_proxy.h , reply_callback()
function, how the param (inter var) is freed.