Scale out with concurrent HTTP calls including retry functionality based on Guzzle 6

Imagine following situations as a developer:

  • Your task is to import bulk data via REST interfaces
  • Your task is to transfer existing images to AWS S3
  • Your task is to fetch bulk data from remote servers

What can be done to speed up such processes?

  • Chunk the workload into smaller parts
  • Use HTTP childs to scale the workload over a cluster of servers
  • Control everything from one process

Guzzle is well known as a powerful and stable HTTP client for PHP.

It features a lot of functionality (mostly) on top of cURL, which give us the opportunity to scale out any massive workload with HTTP.

You can send multiple requests concurrently using promises and asynchronous requests.

You can use the GuzzleHttp\Pool object when you have an indeterminate amount of requests you wish to send.

Guzzle clients use a handler and middleware system to send HTTP requests.

See guzzle documentation here and here

Lets pack everything together:

  • Concurrent HTTP Calls as asynchronous childs
    • Use closures to provide callable
    • Use promise from guzzle to control childs
    • Pool Concept, allows to set maximum number of allowed childs
      • once a child has finished, the next one will start until queue is empty
  • Retry Middleware, which allows to retry broken HTTP requests
  • Set Timeouts correctly to avoid hanging processes

You will find the code in guzzle-examples, it will produce output like this:

2016-05-18T11:14:17+00:00 Retrying GET http://www.rapidshare.com/ 1/2, cURL error 7: Failed to connect to www.rapidshare.com port 80: Connection refused (see http://curl.haxx.se/libcurl/c/libcurl-errors.html)
2016-05-18T11:14:17+00:00 Retrying GET http://www.notexistingweirdstuff.com 1/2, cURL error 6: Could not resolve host: www.notexistingweirdstuff.com (see http://curl.haxx.se/libcurl/c/libcurl-errors.html)
2016-05-18T11:14:18+00:00 Retrying GET http://www.notexistingweirdstuff.com 2/2, cURL error 6: Could not resolve host: www.notexistingweirdstuff.com (see http://curl.haxx.se/libcurl/c/libcurl-errors.html)
2016-05-18T11:14:18+00:00 Got a NOT successful response for url http://www.notexistingweirdstuff.com with reason cURL error 6: Could not resolve host: www.notexistingweirdstuff.com (see http://curl.haxx.se/libcurl/c/libcurl-errors.html)
2016-05-18T11:14:18+00:00 Got a successful response for url http://www.google.co.in/ with index 14 with size 12598 bytes
2016-05-18T11:14:18+00:00 Got a successful response for url http://www.google.de/ with index 16 with size 10495 bytes
2016-05-18T11:14:18+00:00 Got a successful response for url http://www.google.com.hk/ with index 17 with size 19635 bytes
2016-05-18T11:14:18+00:00 Got a successful response for url http://www.google.com/ with index 1 with size 10531 bytes
2016-05-18T11:14:18+00:00 Got a successful response for url http://www.msn.com/ with index 10 with size 40455 bytes
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.google.co.uk/ with index 21 with size 10498 bytes
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.google.fr/ with index 24 with size 10570 bytes
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.sina.com.cn/ with index 19 with size 558363 bytes
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.wikipedia.org/ with index 7 with size 74229 bytes
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.facebook.com/ with index 2 with size 57744 bytes
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.microsoft.com/ with index 22 with size 72105 bytes
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.google.co.jp/ with index 28 with size 10648 bytes
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.google.com.br/ with index 30 with size 10939 bytes
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.bing.com/ with index 25 with size 87091 bytes
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.yahoo.co.jp/ with index 13 with size 19144 bytes
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.amazon.com/ with index 20 with size 229168 bytes
2016-05-18T11:14:19+00:00 Retrying GET http://www.rapidshare.com/ 2/2, cURL error 28: Connection timed out after 2000 milliseconds (see http://curl.haxx.se/libcurl/c/libcurl-errors.html)
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.youtube.com/ with index 3 with size 404726 bytes
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.wordpress.com/ with index 18 with size 11761 bytes
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.google.it/ with index 36 with size 10565 bytes
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.yandex.ru/ with index 27 with size 55527 bytes
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.live.com/ with index 6 with size 9955 bytes
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.linkedin.com/ with index 29 with size 41503 bytes
2016-05-18T11:14:19+00:00 Got a successful response for url http://www.google.es/ with index 39 with size 11536 bytes
2016-05-18T11:14:20+00:00 Got a successful response for url http://www.twitter.com/ with index 12 with size 254032 bytes
2016-05-18T11:14:20+00:00 Got a successful response for url http://www.ebay.com/ with index 26 with size 183760 bytes
2016-05-18T11:14:20+00:00 Got a successful response for url http://www.blogger.com/ with index 9 with size 60952 bytes
2016-05-18T11:14:20+00:00 Got a successful response for url http://www.163.com/ with index 31 with size 748204 bytes
2016-05-18T11:14:20+00:00 Got a successful response for url http://www.conduit.com/ with index 37 with size 11666 bytes
2016-05-18T11:14:20+00:00 Got a successful response for url http://www.baidu.com/ with index 8 with size 99045 bytes
2016-05-18T11:14:20+00:00 Got a successful response for url http://www.flickr.com/ with index 33 with size 216031 bytes
2016-05-18T11:14:20+00:00 Got a successful response for url http://www.vkontakte.ru/ with index 38 with size 6112 bytes
2016-05-18T11:14:20+00:00 Got a successful response for url http://www.qq.com/ with index 11 with size 627410 bytes
2016-05-18T11:14:20+00:00 Got a successful response for url http://www.fc2.com/ with index 35 with size 35332 bytes
2016-05-18T11:14:20+00:00 Got a successful response for url http://www.myspace.com/ with index 23 with size 269004 bytes
2016-05-18T11:14:21+00:00 Got a successful response for url http://www.yahoo.com/ with index 5 with size 457526 bytes
2016-05-18T11:14:21+00:00 Got a successful response for url http://www.craigslist.org/ with index 34 with size 39282 bytes
2016-05-18T11:14:21+00:00 Retrying GET https://world.taobao.com 1/2, cURL error 28: Operation timed out after 0 milliseconds with 0 out of 0 bytes received (see http://curl.haxx.se/libcurl/c/libcurl-errors.html)
2016-05-18T11:14:21+00:00 Got a successful response for url http://www.mail.ru/ with index 32 with size 238892 bytes
2016-05-18T11:14:21+00:00 Got a NOT successful response for url http://www.rapidshare.com/ with reason cURL error 28: Connection timed out after 2195 milliseconds (see http://curl.haxx.se/libcurl/c/libcurl-errors.html)
2016-05-18T11:14:23+00:00 Retrying GET https://world.taobao.com 2/2, cURL error 28: Operation timed out after 0 milliseconds with 0 out of 0 bytes received (see http://curl.haxx.se/libcurl/c/libcurl-errors.html)
2016-05-18T11:14:24+00:00 Got a successful response for url http://www.taobao.com/ with index 15 with size 81011 bytes
40 urls done in 6.9037058353424s
success: 38
failed: 2
concurrency: 20
speed per page: 0.17259264588356s

See the different order of the index numbers, the different calls take different time.

Feel free to use the code if you like :)

Updated: