The client code works just fine and the API is very easy to work with.
This is what I wanted to achieve:
I wanted to use a CDN to host a few PDF files. The files were created locally on my PC running a PHP comand-line script executing Wkhtmltopdf for a few web pages. When the cript had generated the PDF files successfully I picked up the PDF files in a loop and pushed them up to Cloudfiles. This was real simple to achive and worked a treat. Now to the problem. I wanted to keep a fairly long TTL so the 72 hours default time that Rackspace gives you should be fine. Having a 72 hour TTL means that if I upload one version of a file and someone access this version of the PDF via the http, Rackspace(or Actually Akamai who Rackspace has partnered with to build this CDN feature)caches this version of the file on the edge server for 72 hours. No matter how many new versions of the file you upload to Cloudfiles it will keep serving the version located on the edge server.
To solve this problem Rackspace has implemented something called edge purge. This is a facility that makes it possible to invalidate a file or a whole container(read folder) so that the next time someone accesses the PDF again he will be served the latest file version uploaded to Cloudfiles. It sounded like this was a great solution for our little problem. One thing to think of is that an edge purge is not immediate, it can take surely up to 15 minute until it works.
So the next step was to use the API to perform the edge purge. Once again this was very easy using the PHP API. I executed my script a few time to make sure that everything worked.
This is what the script does.
- Use wkhtmltopdf to create the PDFs
- Use the Cloudfiles API to upload the new versions of the files
- Perform an edge purge on each PDF to make sure the end users get a new fresh version of the files
InvalidResponseException: Invalid response (400): in C:\Users\johne\workspace\serverconf\3pp\cloudfiles\cloudfiles.php on line 2475
I contacted Rackspacecloud support via chat and after quite a lengthy investigation they told me that I had gone over the limit for number of allowed edge purge request in one day.
It turns out that they only allow for the API users to edge purge 25 object(containers or files) each 24 hours. My test scripts was hitting this limit in minutes. The PHP API doesn't give you any clear text error messages you need to run curl directly in order to understand what really goes on.
How to solve the problem
I can't understand how a big site would cope with this. Imagine that you run a video site where all the videos are stored in the CDN. You are now starting to get copyright complaints and need to take a few of the videos off the site. You remove them from Cloudfiles and make sure to edge purge so nobody can get to the files. With this system you could only do this 25 times/day and that doesn't scale very well...
The edge purge solution starts to look like a bit of a joke for what I want to use it for so I had to take another approach.
The solution is to lower the TTL. You can actually set it as low as 15 minutes. This means that you will always have a lead time of 15 minutes until uploaded changes to Cloudfiles take effect from the end users point of view. I guess that is fine but it is not very elegant and means that the edge servers needs to get much more data/day from the Cloudfiles storage out to the edge servers.
Something to consider:
The first time you have this problem and turn down the TTL it will most likely not work unless you do an initial edge purge on the container. The previous 72 hour TTL will refuse to be overridden by the new lower value until the previous longer TTL expires, that is unless you force a purge on it.
If you have a better solution for this feel free to leave a comment here or on Google+.
Someone has a similar problem here https://github.com/rackspace/php-cloudfiles/issues/37#issuecomment-3544894
No comments:
Post a Comment