Part two of this series has been published and is available here
TG12 is rapidly coming up, and the tech:net crew is hard at work with the network design for this years event.
As we cannot yet go into details on the TG12 network, today we will have a closer look at the network design from TG11.
This is part one in a series of articles leading up to TG12.
Last years network was the best performing network in pure numbers we've ever had the pleasure of implementing at The Gathering. I'm sure everyone remembers the 100G internet connection, and our lovely partnership with Altibox.
However, while numbers are cool and big fat pipes are fun, there is still a non-trivial amount of things going on in terms of designing and configuring a network on this scale.
Below, there will be examples of these things - please keep in mind that this can get a bit technical. Feel free to ask questions if anything is unclear.
For reference, our configurations are located here: ftp://ftp.gathering.org/TG/2011/Tech/netconfig.tar.gz
When the tech:net crew has been chosen, we usually get together to brainstorm and come up with ideas for the network. This is where the outline for the high-level design is drawn:
At this point, we have some general idea of what we wish to do, and what devices we will have available. This allows us to create the big picture, so we have something to work with.
2. Testing and implementing
Before every TG, we have something that either went wrong last year, or is a new functionality we wish to put in. Sometimes we test this in a lab before heading off to Hamar, sometimes we don't.
Here are a few examples of things we have thought about (and sometimes tested!) over the past years:
2.1 Network hardening
To make sure participants cannot bring down the entire network by either inadvertently doing something wrong, or by malicious behavior, some security has to be in place. We have identified some scenarios that would potentially be harmful.
Most of these are scenarios where a participant brings their own switch, and cables this incorrectly in one way or another. We try to alleviate these things on a core switch level, to keep the configuration on the edge devices as clean and simple as at all possible.
These two situations should not cause harm to the network, as Spanning Tree is running on the D-Link switches, thus the switch sees itself and blocks its port. We do however not like anyone trying anything like this, as it could potentially create operational instabilities on your local switch, depending on timers and how dumb your switch actually is.
IF however for some reason we should have a loop on the edge switch, it is imperative that this does not bring down the core switches. There are industry standard features designed to alleviate these problems available, and here are a few:
* Port-security (limiting the amount of mac addresses learned)
* Storm-control (limiting the amount of broadcast and multicast packets heading into the switch)
* BPDU guard (automatically shutting down the port when BPDUs are learned)
Our solution is a little bit more elegant though.
We use Layer 3 ports towards every edge switch. No VLANS are tagged towards the edge:
This effectively eliminates the need for spanning-tree to protect us, and we are only left with the problem of a broadcast storm. This means that we have to protect the CPU of our core switch:
Our control-plane policer (CoPP) makes sure our CPU does not get overloaded with bad traffic. In essence this is a QoS policer that works internally in our box. For the full configuration, refer to the config pack - look at CoreN or CoreS, for example.
Some protocols aren't handled very well in the CoPP - these have to be dealt with on a line card level in 6500/7600. ARP is one such protocol:
In the upcoming parts, we will have a look at more scenarios, aswell as the layer 3 routing design for ipv4 and ipv6. Stay tuned! Click here for part two
In the meantime, please leave your questions and comments in the comment fields below :)