Rest Client Async operation

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Rest Client Async operation

Ben Newlin

Hello all,

 

During testing I have found behavior that I cannot explain based on our current understanding of the operation of the async and rest_client functionality. I have reproduced this behavior with a fairly simple script configuration, the relevant piece of which is below:

 

children=8

 

loadmodule "rest_client.so"

modparam("rest_client", "connection_timeout", 5)

modparam("rest_client", "connect_poll_interval", 1)

modparam("rest_client", "curl_timeout", 5)

 

route {

  xlog("L_ALERT", "before t_newtran()\r\n");

  t_newtran();

  xlog("L_ALERT", "after t_newtran()\r\n");

  async(rest_post("http://0.0.0.0", "{}", "application/json", "$var(body)", "$var(ctype)", "$var(rcode)"), test_resume);

  xlog("L_ALERT", "after async\r\n");

}

route[test_resume] {

  xlog("L_ALERT", "test_resume: $retcode\r\n");

  t_reply("503", "Service Unavailable");

}

 

The expected operation is to immediately trigger t_newtran() which sends a 100 response, then timeout on the rest_post call in 5 seconds and respond with a 503. For individual calls this does occur, although the timeout consistently takes several seconds too long, usually ~8s, as can be seen in the logs from the above code:

 

Jun 24 19:10:56 [337] before t_newtran()

Jun 24 19:10:56 [337] after t_newtran()

Jun 24 19:11:04 [337] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 19:11:04 [337] test_resume: -2

 

The extra time is troubling, but the bigger concern is that when I run multiple concurrent calls, I find that the 8 UDP children are still all blocking against the rest_post call. Here are the logs for that test:

 

Jun 24 21:19:51 [337] before t_newtran()

Jun 24 21:19:51 [337] after t_newtran()

Jun 24 21:19:51 [338] before t_newtran()

Jun 24 21:19:51 [338] after t_newtran()

Jun 24 21:19:51 [335] before t_newtran()

Jun 24 21:19:51 [335] after t_newtran()

Jun 24 21:19:51 [333] before t_newtran()

Jun 24 21:19:51 [333] after t_newtran()

Jun 24 21:19:52 [331] before t_newtran()

Jun 24 21:19:52 [331] after t_newtran()

Jun 24 21:19:52 [336] before t_newtran()

Jun 24 21:19:52 [336] after t_newtran()

Jun 24 21:19:52 [332] before t_newtran()

Jun 24 21:19:52 [332] after t_newtran()

Jun 24 21:19:52 [334] before t_newtran()

Jun 24 21:19:52 [334] after t_newtran()

Jun 24 21:19:59 [337] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [337] test_resume: -2

Jun 24 21:19:59 [337] before t_newtran()

Jun 24 21:19:59 [337] after t_newtran()

Jun 24 21:19:59 [338] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [338] test_resume: -2

Jun 24 21:19:59 [338] before t_newtran()

Jun 24 21:19:59 [338] after t_newtran()

Jun 24 21:19:59 [335] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [335] test_resume: -2

Jun 24 21:19:59 [335] before t_newtran()

Jun 24 21:19:59 [335] after t_newtran()

Jun 24 21:19:59 [333] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [333] test_resume: -2

Jun 24 21:19:59 [333] before t_newtran()

Jun 24 21:19:59 [333] after t_newtran()

Jun 24 21:19:59 [331] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [331] test_resume: -2

Jun 24 21:19:59 [331] before t_newtran()

Jun 24 21:19:59 [331] after t_newtran()

Jun 24 21:19:59 [336] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [336] test_resume: -2

Jun 24 21:19:59 [336] before t_newtran()

Jun 24 21:19:59 [336] after t_newtran()

Jun 24 21:19:59 [332] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [332] test_resume: -2

Jun 24 21:19:59 [332] before t_newtran()

Jun 24 21:19:59 [332] after t_newtran()

Jun 24 21:20:00 [334] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:00 [334] test_resume: -2

Jun 24 21:20:00 [334] before t_newtran()

Jun 24 21:20:00 [334] after t_newtran()

Jun 24 21:20:07 [337] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [337] test_resume: -2

Jun 24 21:20:07 [338] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [338] test_resume: -2

Jun 24 21:20:07 [335] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [335] test_resume: -2

Jun 24 21:20:07 [333] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [333] test_resume: -2

Jun 24 21:20:07 [331] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [331] test_resume: -2

Jun 24 21:20:07 [336] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [336] test_resume: -2

Jun 24 21:20:07 [332] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [332] test_resume: -2

Jun 24 21:20:07 [334] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [334] test_resume: -2

 

You can clearly see that the first 8 Invites are processed immediately, but then no new messages are processed until the rest queries begin to timeout. As each query times out, freeing a child process, a new inbound request is processed. Clearly the message processing is being blocked on the rest queries, which is not supposed to happen when using async. Am I missing something in my configuration or am I not understanding how async is supposed to work somehow?

 

Ben Newlin


_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: Rest Client Async operation

Liviu Chircu

Hi Ben,

Regarding the "blocking rest_get()" problem: due to the specifics of the libcurl API, there is
no direct access to the socket file descriptors during the initial TCP connect handshake.  Thus,
this phase will be synchronous (blocking), while all transfers (send HTTP req, receive reply) are
done asynchronously, regardless of how the data gets send or arrives (all at once, in chunks,
byte-by-byte, etc.).

So, indeed: on the specific test case that the remote TCP endpoint is either unresponsive or it is
blocked by a firewall, the throughput of your OpenSIPS may be significantly affected when talking HTTP
to such peers even if you perform async HTTP requests.  Maybe this is just my opinion, but I believe
that the average case is when the HTTP server/API is down and the TCP connect attempt will get rejected.
See [1] for more on this topic.

Regarding the 5s vs. 8s timeouts, I will have to confirm the behavior first and get back to you.
Async is heavily reliant on tm, so I wouldn't exclude the possibility that the transaction timeout
overrides, in some cases, the async logic's timeout.

Best regards,

[1]: https://serverfault.com/questions/521359/why-do-some-connections-time-out-and-others-get-refused

Liviu Chircu
OpenSIPS Developer
http://www.opensips-solutions.com
On 25.06.2019 00:33, Ben Newlin wrote:

Hello all,

 

During testing I have found behavior that I cannot explain based on our current understanding of the operation of the async and rest_client functionality. I have reproduced this behavior with a fairly simple script configuration, the relevant piece of which is below:

 

children=8

 

loadmodule "rest_client.so"

modparam("rest_client", "connection_timeout", 5)

modparam("rest_client", "connect_poll_interval", 1)

modparam("rest_client", "curl_timeout", 5)

 

route {

  xlog("L_ALERT", "before t_newtran()\r\n");

  t_newtran();

  xlog("L_ALERT", "after t_newtran()\r\n");

  async(rest_post("http://0.0.0.0", "{}", "application/json", "$var(body)", "$var(ctype)", "$var(rcode)"), test_resume);

  xlog("L_ALERT", "after async\r\n");

}

route[test_resume] {

  xlog("L_ALERT", "test_resume: $retcode\r\n");

  t_reply("503", "Service Unavailable");

}

 

The expected operation is to immediately trigger t_newtran() which sends a 100 response, then timeout on the rest_post call in 5 seconds and respond with a 503. For individual calls this does occur, although the timeout consistently takes several seconds too long, usually ~8s, as can be seen in the logs from the above code:

 

Jun 24 19:10:56 [337] before t_newtran()

Jun 24 19:10:56 [337] after t_newtran()

Jun 24 19:11:04 [337] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 19:11:04 [337] test_resume: -2

 

The extra time is troubling, but the bigger concern is that when I run multiple concurrent calls, I find that the 8 UDP children are still all blocking against the rest_post call. Here are the logs for that test:

 

Jun 24 21:19:51 [337] before t_newtran()

Jun 24 21:19:51 [337] after t_newtran()

Jun 24 21:19:51 [338] before t_newtran()

Jun 24 21:19:51 [338] after t_newtran()

Jun 24 21:19:51 [335] before t_newtran()

Jun 24 21:19:51 [335] after t_newtran()

Jun 24 21:19:51 [333] before t_newtran()

Jun 24 21:19:51 [333] after t_newtran()

Jun 24 21:19:52 [331] before t_newtran()

Jun 24 21:19:52 [331] after t_newtran()

Jun 24 21:19:52 [336] before t_newtran()

Jun 24 21:19:52 [336] after t_newtran()

Jun 24 21:19:52 [332] before t_newtran()

Jun 24 21:19:52 [332] after t_newtran()

Jun 24 21:19:52 [334] before t_newtran()

Jun 24 21:19:52 [334] after t_newtran()

Jun 24 21:19:59 [337] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [337] test_resume: -2

Jun 24 21:19:59 [337] before t_newtran()

Jun 24 21:19:59 [337] after t_newtran()

Jun 24 21:19:59 [338] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [338] test_resume: -2

Jun 24 21:19:59 [338] before t_newtran()

Jun 24 21:19:59 [338] after t_newtran()

Jun 24 21:19:59 [335] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [335] test_resume: -2

Jun 24 21:19:59 [335] before t_newtran()

Jun 24 21:19:59 [335] after t_newtran()

Jun 24 21:19:59 [333] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [333] test_resume: -2

Jun 24 21:19:59 [333] before t_newtran()

Jun 24 21:19:59 [333] after t_newtran()

Jun 24 21:19:59 [331] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [331] test_resume: -2

Jun 24 21:19:59 [331] before t_newtran()

Jun 24 21:19:59 [331] after t_newtran()

Jun 24 21:19:59 [336] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [336] test_resume: -2

Jun 24 21:19:59 [336] before t_newtran()

Jun 24 21:19:59 [336] after t_newtran()

Jun 24 21:19:59 [332] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [332] test_resume: -2

Jun 24 21:19:59 [332] before t_newtran()

Jun 24 21:19:59 [332] after t_newtran()

Jun 24 21:20:00 [334] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:00 [334] test_resume: -2

Jun 24 21:20:00 [334] before t_newtran()

Jun 24 21:20:00 [334] after t_newtran()

Jun 24 21:20:07 [337] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [337] test_resume: -2

Jun 24 21:20:07 [338] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [338] test_resume: -2

Jun 24 21:20:07 [335] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [335] test_resume: -2

Jun 24 21:20:07 [333] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [333] test_resume: -2

Jun 24 21:20:07 [331] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [331] test_resume: -2

Jun 24 21:20:07 [336] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [336] test_resume: -2

Jun 24 21:20:07 [332] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [332] test_resume: -2

Jun 24 21:20:07 [334] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [334] test_resume: -2

 

You can clearly see that the first 8 Invites are processed immediately, but then no new messages are processed until the rest queries begin to timeout. As each query times out, freeing a child process, a new inbound request is processed. Clearly the message processing is being blocked on the rest queries, which is not supposed to happen when using async. Am I missing something in my configuration or am I not understanding how async is supposed to work somehow?

 

Ben Newlin


_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: Rest Client Async operation

Ben Newlin

Liviu,

 

Thank you for the response and explanation, and for investigating the timing discrepancies. I must say though that this answer is very disappointing. We are not transferring large files or data sets during realtime call processing (I hope no one is!), so the transfer time is not really a concern; it is the connect time that can cause severe backlogs when the far end is not available, as in my test case. So as it stands the async functionality for rest_client is mostly useless to us as it doesn’t prevent the most common cause of congestion.

 

I can appreciate that libcurl does not provide access during the connect, but I guess my question would be why isn’t the entire operation run in async? Why must the connect be performed in the current process and only the transfer be in another process? My expectation when using any async function was that all execution of the underlying function called would be performed in a separate process, not that the function could pick and choose which parts would block and which would not.

 

Are other async function implementations similar to this? For example, does an async_db_query establish the DB connection in the current thread in blocking mode and only perform the actual query as async?

 

I would also recommend updating the documentation for both async and rest_client specifically to make this limitation clear, as it was not at all clear to us.

 

Ben Newlin

 

From: Users <[hidden email]> on behalf of Liviu Chircu <[hidden email]>
Reply-To: OpenSIPS users mailling list <[hidden email]>
Date: Tuesday, June 25, 2019 at 8:52 AM
To: "[hidden email]" <[hidden email]>
Subject: Re: [OpenSIPS-Users] Rest Client Async operation

 

Hi Ben,

Regarding the "blocking rest_get()" problem: due to the specifics of the libcurl API, there is
no direct access to the socket file descriptors during the initial TCP connect handshake.  Thus,
this phase will be synchronous (blocking), while all transfers (send HTTP req, receive reply) are
done asynchronously, regardless of how the data gets send or arrives (all at once, in chunks,
byte-by-byte, etc.).

So, indeed: on the specific test case that the remote TCP endpoint is either unresponsive or it is
blocked by a firewall, the throughput of your OpenSIPS may be significantly affected when talking HTTP
to such peers even if you perform async HTTP requests.  Maybe this is just my opinion, but I believe
that the average case is when the HTTP server/API is down and the TCP connect attempt will get rejected.
See [1] for more on this topic.

Regarding the 5s vs. 8s timeouts, I will have to confirm the behavior first and get back to you.
Async is heavily reliant on tm, so I wouldn't exclude the possibility that the transaction timeout
overrides, in some cases, the async logic's timeout.

Best regards,

[1]: https://serverfault.com/questions/521359/why-do-some-connections-time-out-and-others-get-refused

Liviu Chircu
OpenSIPS Developer
http://www.opensips-solutions.com

On 25.06.2019 00:33, Ben Newlin wrote:

Hello all,

 

During testing I have found behavior that I cannot explain based on our current understanding of the operation of the async and rest_client functionality. I have reproduced this behavior with a fairly simple script configuration, the relevant piece of which is below:

 

children=8

 

loadmodule "rest_client.so"

modparam("rest_client", "connection_timeout", 5)

modparam("rest_client", "connect_poll_interval", 1)

modparam("rest_client", "curl_timeout", 5)

 

route {

  xlog("L_ALERT", "before t_newtran()\r\n");

  t_newtran();

  xlog("L_ALERT", "after t_newtran()\r\n");

  async(rest_post("http://0.0.0.0", "{}", "application/json", "$var(body)", "$var(ctype)", "$var(rcode)"), test_resume);

  xlog("L_ALERT", "after async\r\n");

}

route[test_resume] {

  xlog("L_ALERT", "test_resume: $retcode\r\n");

  t_reply("503", "Service Unavailable");

}

 

The expected operation is to immediately trigger t_newtran() which sends a 100 response, then timeout on the rest_post call in 5 seconds and respond with a 503. For individual calls this does occur, although the timeout consistently takes several seconds too long, usually ~8s, as can be seen in the logs from the above code:

 

Jun 24 19:10:56 [337] before t_newtran()

Jun 24 19:10:56 [337] after t_newtran()

Jun 24 19:11:04 [337] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 19:11:04 [337] test_resume: -2

 

The extra time is troubling, but the bigger concern is that when I run multiple concurrent calls, I find that the 8 UDP children are still all blocking against the rest_post call. Here are the logs for that test:

 

Jun 24 21:19:51 [337] before t_newtran()

Jun 24 21:19:51 [337] after t_newtran()

Jun 24 21:19:51 [338] before t_newtran()

Jun 24 21:19:51 [338] after t_newtran()

Jun 24 21:19:51 [335] before t_newtran()

Jun 24 21:19:51 [335] after t_newtran()

Jun 24 21:19:51 [333] before t_newtran()

Jun 24 21:19:51 [333] after t_newtran()

Jun 24 21:19:52 [331] before t_newtran()

Jun 24 21:19:52 [331] after t_newtran()

Jun 24 21:19:52 [336] before t_newtran()

Jun 24 21:19:52 [336] after t_newtran()

Jun 24 21:19:52 [332] before t_newtran()

Jun 24 21:19:52 [332] after t_newtran()

Jun 24 21:19:52 [334] before t_newtran()

Jun 24 21:19:52 [334] after t_newtran()

Jun 24 21:19:59 [337] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [337] test_resume: -2

Jun 24 21:19:59 [337] before t_newtran()

Jun 24 21:19:59 [337] after t_newtran()

Jun 24 21:19:59 [338] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [338] test_resume: -2

Jun 24 21:19:59 [338] before t_newtran()

Jun 24 21:19:59 [338] after t_newtran()

Jun 24 21:19:59 [335] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [335] test_resume: -2

Jun 24 21:19:59 [335] before t_newtran()

Jun 24 21:19:59 [335] after t_newtran()

Jun 24 21:19:59 [333] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [333] test_resume: -2

Jun 24 21:19:59 [333] before t_newtran()

Jun 24 21:19:59 [333] after t_newtran()

Jun 24 21:19:59 [331] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [331] test_resume: -2

Jun 24 21:19:59 [331] before t_newtran()

Jun 24 21:19:59 [331] after t_newtran()

Jun 24 21:19:59 [336] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [336] test_resume: -2

Jun 24 21:19:59 [336] before t_newtran()

Jun 24 21:19:59 [336] after t_newtran()

Jun 24 21:19:59 [332] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:19:59 [332] test_resume: -2

Jun 24 21:19:59 [332] before t_newtran()

Jun 24 21:19:59 [332] after t_newtran()

Jun 24 21:20:00 [334] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:00 [334] test_resume: -2

Jun 24 21:20:00 [334] before t_newtran()

Jun 24 21:20:00 [334] after t_newtran()

Jun 24 21:20:07 [337] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [337] test_resume: -2

Jun 24 21:20:07 [338] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [338] test_resume: -2

Jun 24 21:20:07 [335] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [335] test_resume: -2

Jun 24 21:20:07 [333] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [333] test_resume: -2

Jun 24 21:20:07 [331] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [331] test_resume: -2

Jun 24 21:20:07 [336] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [336] test_resume: -2

Jun 24 21:20:07 [332] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [332] test_resume: -2

Jun 24 21:20:07 [334] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0 (5s)

Jun 24 21:20:07 [334] test_resume: -2

 

You can clearly see that the first 8 Invites are processed immediately, but then no new messages are processed until the rest queries begin to timeout. As each query times out, freeing a child process, a new inbound request is processed. Clearly the message processing is being blocked on the rest queries, which is not supposed to happen when using async. Am I missing something in my configuration or am I not understanding how async is supposed to work somehow?

 

Ben Newlin



_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: Rest Client Async operation

Liviu Chircu

It's the same process doing both the connect and the transfer.  The problem is that libcurl, as it stands now,
is not able to give me a file descriptor to poll on, until it connects [1].  See this section:

"When libcurl returns -1 in max_fd, it is because libcurl currently does something that isn't possible
for your application to monitor with a socket and unfortunately you can then not know exactly when the
current action is completed using select(). You then need to wait a while before you proceed and call
curl_multi_perform anyway. How long to wait? Unless curl_multi_timeout gives you a lower number, we
suggest 100 milliseconds or so, but you may want to test it out in your own particular conditions to
find a suitable value."

Regarding your issue: I would investigate further the reason why the connect is hanging, and not getting
rejected immediately when your server is down.  That would solve all your blocking issues.

The same with MySQL connections which go down: only after the connection is up are we able to obtain
its file descriptor to asynchronously poll on.  So if connect to DB_HOST:3306 hangs, so will OpenSIPS.

Regards,

[1]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html

Liviu Chircu
OpenSIPS Developer
http://www.opensips-solutions.com
On 25.06.2019 18:41, Ben Newlin wrote:
but I guess my question would be why isn’t the entire operation run in async? Why must the connect be performed in the current process and only the transfer be in another process?

_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: Rest Client Async operation

SamyGo
Hi Liviu,

Is it possible to suspend the transaction and resume it once we somehow get the event from libcurl as soon as the connect is done? 
Im imagining the way usually APNS is done these days. The only thing missing here is the event from the rest_client ! 

Can this mechanism help with the problem of backlog on the opensips thread ? 

Best Regards,
Sammy


On Wed., Jun. 26, 2019, 9:23 a.m. Liviu Chircu, <[hidden email]> wrote:

It's the same process doing both the connect and the transfer.  The problem is that libcurl, as it stands now,
is not able to give me a file descriptor to poll on, until it connects [1].  See this section:

"When libcurl returns -1 in max_fd, it is because libcurl currently does something that isn't possible
for your application to monitor with a socket and unfortunately you can then not know exactly when the
current action is completed using select(). You then need to wait a while before you proceed and call
curl_multi_perform anyway. How long to wait? Unless curl_multi_timeout gives you a lower number, we
suggest 100 milliseconds or so, but you may want to test it out in your own particular conditions to
find a suitable value."

Regarding your issue: I would investigate further the reason why the connect is hanging, and not getting
rejected immediately when your server is down.  That would solve all your blocking issues.

The same with MySQL connections which go down: only after the connection is up are we able to obtain
its file descriptor to asynchronously poll on.  So if connect to DB_HOST:3306 hangs, so will OpenSIPS.

Regards,

[1]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html

Liviu Chircu
OpenSIPS Developer
http://www.opensips-solutions.com
On 25.06.2019 18:41, Ben Newlin wrote:
but I guess my question would be why isn’t the entire operation run in async? Why must the connect be performed in the current process and only the transfer be in another process?
_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: Rest Client Async operation

Liviu Chircu
That is exactly why we need to extract the connection/transfer fd in the
first place: so we can throw it into epoll_wait() and have it act as
an event generator :)

Liviu Chircu
OpenSIPS Developer
http://www.opensips-solutions.com

On 26.06.2019 16:38, SamyGo wrote:
> once we somehow get the event from libcurl

_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: Rest Client Async operation

Ben Newlin
In reply to this post by Liviu Chircu

Liviu,

 

Thanks again for the info. I think what you are saying is that the async operation is not launching a new process to handle the called function, but is performing the function in the original worker thread and only taking advantage of any suspend/resume or polling functionality already exposed by the underlying function itself.

 

This is very different from the way I had hoped async would work. The documentation does not speak to the specific implementation of async, so it was all assumptions on my part. But to me when you say async it means a separate process or thread is being created to perform the function. This would allow that process to block on the connect or any other aspect of the called function as necessary without blocking call processing, which is the desired outcome of any async operation.

 

I understand that the practicalities of the implementation in OpenSIPS may have required this design, but I must re-iterate that these limitations need to be documented very carefully as they are very important to understand when designing OpenSIPS scripts with async functionality and are not described anywhere. I could not find anywhere in the documentation that indicates that async operations can potentially still block the original worker thread and block call processing. The closest is:

 

“The current OpenSIPS worker will launch the asynchronous operation, after which it will continue to process other pending tasks”

 

But this provides no elaboration on what it means for the worker to “launch” the operation, and more importantly it does not indicate that the launching itself can block, which is the key issue here.

 

As I said, this unfortunately makes async processing mostly useless for us. For both DB and REST queries if only the data transfer is async then it is only useful when the data being transferred is extremely large or prone to delays/jitter. Such transfers should be avoided during realtime processing whether async or not, as they will still delay the individual call even if not others. For small payloads, like the single JSON object common in REST responses, it is the connection itself that is the concern. Once connected, running the data transfer in async mode represents no real gain.

 

As far as investigating why the server is not responding, of course we will always do this. But we cannot anticipate or fix every cause of remote system failure. In order to design a resilient system we cannot assume the remote server will always be there. We had believed async would allow call processing to proceed in failure cases like this without blocking.

 

Ben Newlin

 

From: Users <[hidden email]> on behalf of Liviu Chircu <[hidden email]>
Reply-To: OpenSIPS users mailling list <[hidden email]>
Date: Wednesday, June 26, 2019 at 9:23 AM
To: "[hidden email]" <[hidden email]>
Subject: Re: [OpenSIPS-Users] Rest Client Async operation

 

It's the same process doing both the connect and the transfer.  The problem is that libcurl, as it stands now,
is not able to give me a file descriptor to poll on, until it connects [1].  See this section:

"When libcurl returns -1 in max_fd, it is because libcurl currently does something that isn't possible
for your application to monitor with a socket and unfortunately you can then not know exactly when the
current action is completed using select(). You then need to wait a while before you proceed and call
curl_multi_perform anyway. How long to wait? Unless curl_multi_timeout gives you a lower number, we
suggest 100 milliseconds or so, but you may want to test it out in your own particular conditions to
find a suitable value."

Regarding your issue: I would investigate further the reason why the connect is hanging, and not getting
rejected immediately when your server is down.  That would solve all your blocking issues.

The same with MySQL connections which go down: only after the connection is up are we able to obtain
its file descriptor to asynchronously poll on.  So if connect to DB_HOST:3306 hangs, so will OpenSIPS.

Regards,

[1]: https://curl.haxx.se/libcurl/c/curl_multi_fdset.html

Liviu Chircu
OpenSIPS Developer
http://www.opensips-solutions.com

On 25.06.2019 18:41, Ben Newlin wrote:

but I guess my question would be why isn’t the entire operation run in async? Why must the connect be performed in the current process and only the transfer be in another process?


_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: Rest Client Async operation

Liviu Chircu

On 26.06.2019 17:48, Ben Newlin wrote:

Thanks again for the info. I think what you are saying is that the async operation is not launching a new process to handle the called function, but is performing the function in the original worker thread and only taking advantage of any suspend/resume or polling functionality already exposed by the underlying function itself.

Just to clear this up:  the underlying functions themselves need not offer any polling functionality,
they just need to meet two criteria: (1) be non-blocking; (2) provide a valid fd for the async engine to poll on.

Any blocking I/O functionality, be it within libcurl, MySQL, etc. that meets the above can be adapted to
work with the async engine available in OpenSIPS 2.1+.

I understand that the practicalities of the implementation in OpenSIPS may have required this design, but I must re-iterate that these limitations need to be documented very carefully as they are very important to understand when designing OpenSIPS scripts with async functionality and are not described anywhere. I could not find anywhere in the documentation that indicates that async operations can potentially still block the original worker thread and block call processing. The closest is:

 

“The current OpenSIPS worker will launch the asynchronous operation, after which it will continue to process other pending tasks”

 

But this provides no elaboration on what it means for the worker to “launch” the operation, and more importantly it does not indicate that the launching itself can block, which is the key issue here.

Agreed - will try to find a way to integrate this corner-case into the docs, somehow.

As I said, this unfortunately makes async processing mostly useless for us. For both DB and REST queries if only the data transfer is async then it is only useful when the data being transferred is extremely large or prone to delays/jitter. Such transfers should be avoided during realtime processing whether async or not, as they will still delay the individual call even if not others. For small payloads, like the single JSON object common in REST responses, it is the connection itself that is the concern. Once connected, running the data transfer in async mode represents no real gain.

Then I recommend you stop using rest_client, which currently optimizes system resource
usage for
setups where the TCP connect cannot possibly ever hang, and resort to forking
a process for each
HTTP request, using a construct such as:

    async(exec("curl your_connect_hanging_http_endpoint"), resume_route);

Although forking a process for each request is a costly operation which will
eat more system resources during normal operation, at least this solution optimizes
for the worst case, when the HTTP server is down.  In this latter case, the
throughput of your SIP server won't be hindered that much, as the hanging connect
will be done asynchronously.

Best regards,

Liviu Chircu
OpenSIPS Developer
http://www.opensips-solutions.com

_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: Rest Client Async operation

Liviu Chircu
In reply to this post by Ben Newlin
I have updated the async docs [1], so everyone can read more about this
limitation,
along with a mitigation solution for it.

Cheers,

[1]: https://www.opensips.org/Documentation/Script-Async-3-0#toc8

Liviu Chircu
OpenSIPS Developer
http://www.opensips-solutions.com


_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: Rest Client Async operation

Ben Newlin
In reply to this post by Liviu Chircu

Liviu,

 

Thank you. This makes complete sense now and the documentation updates are excellent. Thanks as well for the workaround suggestion; we will investigate whether it is necessary. You do make a good point in the docs about TCP connection reuse in the workers, which shows my example as a bit of an extreme case. In practical use it is more likely the connection would exist and the transfer would timeout instead, which would be async.

 

Even though the example may be extreme, in the current world of cloud computing having the remote server “disappear” without closing TCP connections is becoming a much more common concern. Cloud instances can and do just disappear and the connections will hang; cloud networks have issues and can leave connections hung as well. We have been wrestling with these issues with the db_postgres and db_virtual modules for some time and have an open ticket for it now. While it is impossible to avoid some service impairment when a necessary remote component is no longer available, having the entire call processing application block on the resource is also not a very good response. We are always looking for better ways to gracefully handle the unexpected.

 

Thanks for the time and quick responses!

 

Ben Newlin

 

From: Users <[hidden email]> on behalf of Liviu Chircu <[hidden email]>
Reply-To: OpenSIPS users mailling list <[hidden email]>
Date: Thursday, June 27, 2019 at 5:05 AM
To: OpenSIPS users mailling list <[hidden email]>
Subject: Re: [OpenSIPS-Users] Rest Client Async operation

 

 

On 26.06.2019 17:48, Ben Newlin wrote:

Thanks again for the info. I think what you are saying is that the async operation is not launching a new process to handle the called function, but is performing the function in the original worker thread and only taking advantage of any suspend/resume or polling functionality already exposed by the underlying function itself.

Just to clear this up:  the underlying functions themselves need not offer any polling functionality,
they just need to meet two criteria: (1) be non-blocking; (2) provide a valid fd for the async engine to poll on.

Any blocking I/O functionality, be it within libcurl, MySQL, etc. that meets the above can be adapted to
work with the async engine available in OpenSIPS 2.1+.


I understand that the practicalities of the implementation in OpenSIPS may have required this design, but I must re-iterate that these limitations need to be documented very carefully as they are very important to understand when designing OpenSIPS scripts with async functionality and are not described anywhere. I could not find anywhere in the documentation that indicates that async operations can potentially still block the original worker thread and block call processing. The closest is:

 

“The current OpenSIPS worker will launch the asynchronous operation, after which it will continue to process other pending tasks”

 

But this provides no elaboration on what it means for the worker to “launch” the operation, and more importantly it does not indicate that the launching itself can block, which is the key issue here.

Agreed - will try to find a way to integrate this corner-case into the docs, somehow.

As I said, this unfortunately makes async processing mostly useless for us. For both DB and REST queries if only the data transfer is async then it is only useful when the data being transferred is extremely large or prone to delays/jitter. Such transfers should be avoided during realtime processing whether async or not, as they will still delay the individual call even if not others. For small payloads, like the single JSON object common in REST responses, it is the connection itself that is the concern. Once connected, running the data transfer in async mode represents no real gain.

Then I recommend you stop using rest_client, which currently optimizes system resource
usage for setups where the TCP connect cannot possibly ever hang, and resort to forking
a process for each HTTP request, using a construct such as:

    async(exec("curl your_connect_hanging_http_endpoint"), resume_route);

Although forking a process for each request is a costly operation which will
eat more system resources during normal operation, at least this solution optimizes
for the worst case, when the HTTP server is down.  In this latter case, the
throughput of your SIP server won't be hindered that much, as the hanging connect
will be done asynchronously.

Best regards,

Liviu Chircu
OpenSIPS Developer
http://www.opensips-solutions.com

_______________________________________________
Users mailing list
[hidden email]
http://lists.opensips.org/cgi-bin/mailman/listinfo/users