Some of you might have noticed some instability on https://gathering.org tonight (Wednesday April 5th, 2017).
Ok, that was me. My bad.
It all started innocently enough. "Why don't you set up SSL on gathering.org? It'll only take 20 minutes!". Ah, well, no, as it turns out, it didn't take 20 minutes. As I knew it wouldn't.
We put up SSL weeks (months?) ago, using let's encrypt and whatnot. It was reasonably straight forward, but it revealed a whole lot of issues that has taken us great deal of time to find and fix. At its core most of the issues are simple: Hard-coded links to paths using http. But finding these hard-coded references hasn't always been easy. Gathering.org isn't just a plain CMS, it is a django (python) site that also has static content hosted by apache, a php-component to control the front page (hosted on a different domain), a node.js component to control locking on said php-component (...) and god knows what. And it integrates with wannabe for authentication.
But that was weeks ago. So what happened tonight?
In front of gathering.org there's a proxy, Varnish Cache, that caches content and makes sure that you spamming F5 doesn't bring the site down. Yours truly happen to be a varnish developer, but Varnish was in use long before I arrived.
Varnish, however, does not deal with SSL, it just does HTTP caching. So to get SSL we do:
Client --[ssl]--> Apache --[http]--> Varnish --> Apache --> (gunicorn/files/etc)
But I managed to mess it up moths ago, and we ended up with:
Client --[ssl]--> Apache --> (gunicorn/files/etc)
Which works. Until you get traffic. And we predict some traffic spikes next week. So I went about fixing it.
But alas, the gathering.org site is a steaming pile of legacy shit (this is a technical term). And I met resistance every step along the way. So what I ended up doing was quite literally saying "fuck it" out loud, then delete the entire Varnish configuration, rebuild it from scratch, then bypass apache when possible, delete most of the apache config, then establish an archive site for old content. This was not how I had planned it, but it meant some quick improvising. Normally, this is a process you plan out for weeks. I did it in ... err. a couple of hours. Hooray?
So now we have:
Client --[ssl]--> Apache --[http]--> Varnish --> Gunicorn
Client --[ssl]--> Apache --[http]--> Varnish --> Apache (Static files, etc)
And archive.gathering.org - which I also had to do some quick fixes on.
This meant fixing stuff in Apache, in Varnish, in DNS (bind - setting up archive.gathering.org), debugging cross-site-request-forgery modules in django, cache invalidation issues for editorial staff, running a regular expression over 10GB of archived websites reaching back to 1996, etc etc. Probably lots more too.
By the time I was done, someone was ready to put an awesome "work in progress" graphic on the site.