Wannabe er et evig mysterium. En stor dinosaur med svære misfostrede armer og bein, en ekstra hale og noen store, unormale kuler. Kort fortalt så er Wannabe TGs “management-system” for frivillige.
De fleste kjenner nok Wannabe som det systemet der man søker seg inn i crewet på TG. Fyller ut side opp og side ned med utrolig finskrevet og bearbeidet tekst, i håp om at man kanskje endelig kommer inn i crewet, eller i håp om at man får fortsette å være med. (misforstå meg rett her, vi trenger maange i crew). Systemet lar deg så vente, noen venter litt, andre venter lenge. Og plutselig en dag får man en slik velkomst-mail fra Wannabe om at “tjohei, kom å bli med i crew, da!”. Da har Wannabe gjort sitt for deg, tenker du. Nå er Wannabes funksjonalitet for deg bare en enkel liste over hvem du er crew-buddies med.
Men hva ANNET brukes Wannabe egentlig til?
Som du sikkert las øverst så er Wannabe en dinosaur. Og med det så menes det at på et eller annet tidspunkt ble bestemt at alt skal styres fra Wannabe. Du har naturlige funksjoner som hører til i Wannabe, som behandling av crew og søknader, utsendelse av SMS, epostliste-administrasjon, oppmøte, ernæring og medisinsk informasjon og lignende. Men så har du jo også ting som Logistikk-systemet til TG, lost and found, Akkrediterings-modul, sovekart og styring av infoskjermer.
Det betyr at: en deltaker har mistet noe, hva gjør crewmedlemmene som skal sjekke etter det? De bruker Wannabe. Når en skal dele ut akkreditering til en journalist, hva må du gjøre? Logge deg på Wannabe. Du vil oppdatere sovekartet? Logg deg på Wannabe. Få ut informasjon til deltakerene via de mange info-skjermer? Logg deg på Wannabe.
Et flott kaos..
Du må virkelig ikke misforstå meg her, fordi Wannabe er et veldig bra system, med sin egen lille sjarm. Det er bare en dinosaur. Og som alle andre dinosaurer så er de svære, vanskelig å ha med å gjøre og trøblete å ha oversikt over. Men det funker! Wannabe fungerer, gjør det det skal og bare tråler videre.
Systems sin oppgave i dette
Vi er jo Systems, og vi skal ha kontroll på dette. Vi administrerer alle modulene, oppdaterer de, vedlikeholder de og gir de den etterlengtede kjærligheten de trenger. Vi gjør også andre ting, som å gi tilganger og godkjenne profilbilder.
Det som er “inn” i dag er jo mikrotjenester, la frontend skrike til backend igjennom et RESTful API, som naturligvis er stateless og famler i mørket. Det er nok også dit Wannabe skal, om noen år. Segmentering av tjenestene som Wannabe tilbyr er nok en av de beste veiene å gå. En egen liten by av docker-konteinere som snakker sammen og samarbeider i harmoni for å gi brukeren det en måtte ønske. Ahh, for en flott fremtid..
Og til syvende og sist så prøver vi bare å ha system..
get it? Systems, system? Eh….
Vi ses på TG!
I fjor samlet vi hele 40 personer fra LAN miljøet rundt om i Norge. Initiativet fikk gode tilbakemeldinger og det virket som et fornuftig initiativ.
Vi ønsker i år som i fjor å samle teknikere fra hele Norges LAN/Dataparty-miljø. Dette er lavterskel, det eneste vi ønsker er at de som deltar har vært med teknisk på et arrangement før. Det kan være et med 10 deltakere eller 10 000. Spiller ingen rolle.
Agendaen blir ganske løs som i fjor. Har du noe du på hjertet og ønsker prate om så er det ingenting i veien for det.
Sannsynligvis kommer dette til å skje på Fredag og vi kommer til å stille med et rom og trolig noe å tygge på.
Høres dette interessant ut? Påmelding finner du her: https://goo.gl/sLCRpY
Om du bare kommer for meetupen kan det hende vi kan ordne noen dagsbiletter, men vi kan ikke garantere noe enda da vi må se ann pågangen litt. Ta kontakt, så finner vi ut av det!
Har du noen ytterligere spørsmål eller kommentarer så ta gjerne kontakt med oss via email@example.com
The Gathering has almost come to an end and it’s about time we posted some details about our network design.
Our network is designed in the traditional three-layered hierarchical model with the core, distribution and access layer where L3 is terminated at our distribution.
The core L3 switches consists of 2x Juniper QFX5100 in a virtual chassis, which is Junipers stacking technology. In addition to provide our distribution with uplinks, the core switch also connects our 80Gig backbone ring with our stand and border router.
Between our border router the internet we have an inline Juniper SRX5800 which is capable of pushing 2Tbps worth of firewall throughput(!). This is where we terminate our BGP peering with Telenor and do route redistribution to OSPF, making the SRX our OSPF ASBR.
The L3 distribution switches consists of 3x Juniper EX3300 in a virtual chassis per distribution. It connects to the core using 2x 10Gbps singel-mode transceivers patched into our MPO cassettes pulled from the ceiling. The distro redistributes its connected routes into the OSPF area and advertises it to the core.
The L2 access switches consists of 144+ Juniper EX2200 with a 3x 1Gbps connection to our distribution. To protect our network at the edge, we run a series of security features collectively called first-hop security. This takes care of a lot of potential issues such as loops, spoofing and ARP-poisoning.
One of the design choices this year was to turn our backbone ring, which traverses the entire arena, into a virtual chassis instead of separate routers. This effectively means that it becomes a distribution switch for our crew network. This makes it easy for us to provision edge/access switches to our sponsors and crew areas. As a result we have for the first time ever provisioned our entire access network. Not a single access switch has been configured manually this year!
TL;DR – 40Gbps…
At TG16 we suffered several DDoS attacks towards our network and even our website (gathering.org). In order to be able to handle a potential DDoS attack this year we decided to upgrade our internet capacity from 40Gbps to 40Gbps + 10Gbps, where the newly added 10Gbps-link would be reserved for our production environment. Instead of dedicating a single physical interface, we decided to include the interface in our aggregated interface and rate-limit our participants network to 40Gbps. This way we keep our production network alive when our participants network gets lit up.
The party is well underway, and I was dumb enough to say aloud the phrase “We should probably blog something?”. Everyone agreed, and thus told me to do it. Damn it.
Anyway, things are going disturbingly well. We were done with our setup 24 hours ahead of time, more or less. I’ve had the special honor of being the first to get a valid DHCP lease in the NOC and the first to get a proper DHCP lease “on the floor”. And I’ve zeroized the entire west side of the ship (e.g.: reset the switches to get them to request proper configuration, this involved physically walking to each switch with a console cable and laptop).
But we have had some minor issues.
First, which you might have picked up, we have to tickle the edge switches a bit to get them to request configuration. This cost us a couple of hours of delay during the setup. And it means that whenever we get a power failure, our edge switches boot up in a useless state and we have to poke them with a console cable. We’ve been trying to improve this situation, but it’s not really a disaster.
We’ve also had some CPU issues on our distribution switches. Mainly whenever we power on all the edge switches. To reduce the load, we disabled LACP – the protocol used to control how the three uplinks to each edge switch is combined into a single link. This worked great, until we ran into the next problem.
The next problem was a crash on one of the EX3300 switches that make up a distribution switch (each distribution switch has 3 EX3300 switches in a virtual chassis). We’re working with Juniper on the root cause of these crashes (we’ve had at least 4 so far as I am aware). A single member in a VC crashing shouldn’t be a big deal. At worst, we could get about a minute or two of down-time on that single distribution switch before the two remaining members take over the functionality.
However, since we hade disabled LACP earlier, that caused some trobule: The link between the core router and the distribution switch didn’t come back up again because that’s a job for LACP. This happened to distro7 on wednesday. We were able to bring distro7 up again quite fast regardless, even with a member missing.
After that, we re-enabled LACP on all distribution switches, which was the cause of the (very short) network outage on wednesday across the entire site.
Other than that, there is little to report. On my side, being in charge of monitoring and tooling (e.g.: Gondul), the biggest challenge is the ring now being a single virtual chassis making it trickier to measure the individual members. And the fact that graphite-api has completely broken down.
Oh, and we’ve had to move our SRX firewall, because it was getting far too hot… more on that later?
This time, there were no funny pictures though!
This year our wireless seems to be alot more stable then ever before.
Unfortunately i’m not much of a wireless-guy myself, so i can’t go into all the details, but i just felt like writing a small post about our wireless anyway.
This year we have Fortinet onboard to provide us with equipment for the wireless network.
The equipment we use consists of a total of 162 FortiAP U421EV (deployed) and 2 FortiWLC-3000D.
As of right now we have 1517 clients, where 1487 of them are on 5GHz and 30 on 2.4GHz.
The 2.4GHz SSID is hidden and mostly used for equipment that don’t support 5GHz, like PS4 for example.
We have performed some testing by roaming around the vikingship and got an average of 20ms latency and 35-ish Mbps. Our Wireless guy almost managed to watch an entire episode on Netflix without major disruption while walking around!
The accesspoints are spread evenly across the Vikingship.
We have one accesspoint at the end of every second row on each side of the vikingship, as well as one accesspoint per row on each side.
Well.. it’s kinda hard to explain in words, so have a low-quality picture from my phone of our drawing board in the NOC:
The red dots across the tables represents the AP placements in the main area of the vikingship.
Below you can see a map with the AP placement from our wireless controller.
Well.. as i said i’m not much of a wireless guy, but i hope you got some interesting information about how our wireless is deployed this year.
If you have any questions regarding Wireless at TG17, feel free to contact us on the official discord server for The Gathering, channel #tech 🙂
And meanwhile, we are on the lookout for trouble.
Our lights in front of the NOC has turned green!
All edgeswitches are now up and running.
We will perform some tweaks and do a DHCP-run shortly.
See you soon!
Glenn is amazed by the servers. Or lights. Or both. To much coffee?
Thank you Nextron, for blazing fast servers!
Photo: Joachim Tingvold
It’s Saturday. On Wednesday 5000 people storm the fort. Or in our case, the Viking Ship.
And the good news is, I can now sit in the NOC and enjoy moving pictures of cute animals.
On a slightly more serious note:
We’ve got the internet uplink up (a week ago).
We’ve got net in the NOC.
The ring is up.
We’re setting up a distro-switch to test a potential issue we might have to work around.
Our server rack is mostly up.
And we’ve delivered the first few access nets to the various crews that need it to get work done.
We still have quite a bit of stuff to do, but we’re (slightly) ahead of schedule. To get an idea of what’s left, here are some (!) of our access switches, still in their boxes.
A few meters worth of fiber cable.
And with one last image, I’m signing off and going back to my cat picture. I mean back to work.
See you soon!
Some of you might have noticed some instability on https://gathering.org tonight (Wednesday April 5th, 2017).
Ok, that was me. My bad.
It all started innocently enough. “Why don’t you set up SSL on gathering.org? It’ll only take 20 minutes!”. Ah, well, no, as it turns out, it didn’t take 20 minutes. As I knew it wouldn’t.
We put up SSL weeks (months?) ago, using let’s encrypt and whatnot. It was reasonably straight forward, but it revealed a whole lot of issues that has taken us great deal of time to find and fix. At its core most of the issues are simple: Hard-coded links to paths using http. But finding these hard-coded references hasn’t always been easy. Gathering.org isn’t just a plain CMS, it is a django (python) site that also has static content hosted by apache, a php-component to control the front page (hosted on a different domain), a node.js component to control locking on said php-component (…) and god knows what. And it integrates with wannabe for authentication.
But that was weeks ago. So what happened tonight?
In front of gathering.org there’s a proxy, Varnish Cache, that caches content and makes sure that you spamming F5 doesn’t bring the site down. Yours truly happen to be a varnish developer, but Varnish was in use long before I arrived.
Varnish, however, does not deal with SSL, it just does HTTP caching. So to get SSL we do:
Client –[ssl]–> Apache –[http]–> Varnish –> Apache –> (gunicorn/files/etc)
But I managed to mess it up moths ago, and we ended up with:
Client –[ssl]–> Apache –> (gunicorn/files/etc)
Which works. Until you get traffic. And we predict some traffic spikes next week. So I went about fixing it.
But alas, the gathering.org site is a steaming pile of legacy shit (this is a technical term). And I met resistance every step along the way. So what I ended up doing was quite literally saying “fuck it” out loud, then delete the entire Varnish configuration, rebuild it from scratch, then bypass apache when possible, delete most of the apache config, then establish an archive site for old content. This was not how I had planned it, but it meant some quick improvising. Normally, this is a process you plan out for weeks. I did it in … err. a couple of hours. Hooray?
So now we have:
Client –[ssl]–> Apache –[http]–> Varnish –> Gunicorn
Client –[ssl]–> Apache –[http]–> Varnish –> Apache (Static files, etc)
And archive.gathering.org – which I also had to do some quick fixes on.
This meant fixing stuff in Apache, in Varnish, in DNS (bind – setting up archive.gathering.org), debugging cross-site-request-forgery modules in django, cache invalidation issues for editorial staff, running a regular expression over 10GB of archived websites reaching back to 1996, etc etc. Probably lots more too.
By the time I was done, someone was ready to put an awesome “work in progress” graphic on the site.