Written by Joe Jefford, Senior Network Engineer at LINX
We have been working hard over the last few months to refresh the LON1 infrastructure. As you can imagine moving almost 800 member ports across 10 datacentres is no small task, so has been a huge effort from all of the team involved in this!
Following on from our first blog on the LON1 member migration project that explained more about why we needed to move from the MX960 platform to a multi-vendor Juniper and Nokia MPLS EVPN solution, we are taking a dive into what work is conducted on site for a project of this size!
Initial Audits
Once the team had completed the relevant proof of concept and LAB testing, and produced a network design it was time to complete initial site audits. These were initially started way back at the end of 2023 and consisted of site visits to all LON1 datacentres to audit the sites in various areas. This included ensuring the racks would be suitable to take the new SR-2 chassis. These are quite deep chassis, so at a couple of sites (Equinix LD9, Customs House West) it was clear early on the legacy racks were no deep enough, and work began with the datacentres to secure new deeper racks to house these. These audits then also looked at power, both spare ports and power draw available in the racks, with some sites needing additional power installed to allow the new and old devices to dual run for a period.
We also took this opportunity to look at where we could improve rack layouts and designs to clean up legacy issues, as well as future proofing the racks for future growth. With a number of sites having quite substantial moves to re-locate equipment. So full new rack elevations for all sites were completed and install plans drawn up for each site.
Once these plans were done we werre able to look at things such as patch panel and management capacity to ensure we had everything we needed to move forward, and order anything that we didn’t. Doing these initial audits so early obviously gave us plenty of time to get in contact with the various partners and suppliers and get things ready.
Installations
We started receiving the equipment from Nokia early in 2024, with the first SR-2 being installed in March. As the core of the network was remaining as it was (with some ISL capacity changes) rather than a complete network refresh we took a ‘site by site’ approach to this work, rather than a complete dual build. This approach was also taken due to the power draw required to run both the MX960s and SR-2s concurrently, as well as the amount of pre-cabling required for the member moves.
The installations for these were relatively simple following the planning. Nokia ship these devices completely modularly so you also get the fun of fully building each device yourself, like a very expensive adult lego.
If the site was one that required other equipment to be moved or rack re-designs this was of course done prior to the new chassis arriving on site. An example of this is Equinix LD6 where the whole of the management required relocating before the chassis and breakout devices could be installed.
Once the devices were installed and reachable via management the team worked on configuring them via automation, and passing them through the various “readiness for service” tests that each devices has to go through before being brought onto the network.
More Audits!
With the devices installed it was time to audit and plan the member moves. With other 700 members to move this seemed like a quite a daunting task, however broken down to site level and then smaller chunks from there was much more manageable.
We created a standardised audit template for member moves at each site and then got to work auditing the cables. All our cables at LINX are referenced each end, and we have recently done a lot of work improving our internal systems recording these. So we were able to export the data into the templates for all these cables with member port and various patch panel information. It was then a case of clarifying these against the data. Not the most exciting or interesting task sitting in datacentres reading out cable references for hours on end. However with these moves a huge amount of work is in the preparation, and noone wants to be tracing rouge cables at 3am, so it was worth the effort.
Once these audits were complete full port maps and plans could be drawn up for the member moves themselves.
Pre-cabling
Depending on the site the moves were either “in rack” where the rack was the same as the existing chassis, and for these moves we simply pre-cabled the new cabling form the new chassis and left it hanging at the relevant patch panel ready for the moves.
In a lot of our suite sites we have dedicated “ODF” patch racks where the cabling comes in. Here the patching from the chassis to the patching in the racks can be fully completed with the cables then run in the ODF ‘switch side’ to the ‘member side’ where the cross connects come in, and left hanging there ready to be moved.
These cables were often run in shortly after completing the member cabling audits. We had to time these quite tightly with the maintenances themselves as to not have vast amount of cabling left hanging in our racks or patch racks for too long.
These new cable references were then all recorded and added to the member moves pages creating simple tables to follow on the cable swaps on the night. As mentioned before these moves were all completed overnight so making them as well prepared and simple to action was key.
It wasn’t just member cables that needed moving here, during the work a large number of LINX servers were also moved. Some of these non-member facing ones we could move as soon as the switch was ready, but other devices such as route-servers required planning and pre-cabling to be moved on the night.
Configuration and Maintenance
Once all the audits, planning, and pre-cabling works are complete we can pre-pare the configuration for the member moves. As all network configuration on the LINX peering LANs is pushed via our in house automation, we also have tools available to us to pre-prepare these configurations. We can upload the port moves to our “bulk uploader” tool, which stores all of the member port changes, ready to be pushed on the night of the maintenance.
During theses maintenances we generally break the moves up into sections depending on the number of member ports to be moved so members are not down for too long as the cabling is moved. The remote team will then push BCP-214 configurations to gracefully shutdown peering sessions to affected members, and then the port moves config.
Once this is pushed the on-site team moves all of the cables as per the plan, with both teams testing and confirming the ports are up, stable and error free.
For some of the maintenances depending on the topology we would also be moving or increasing ISL capacity during or after all of the member moves depnding on traffic levels. Using re-claimed capacity from the vacated switches.
Once complete final tests and checks are completed by the team to ensure there are no outstanding issues before closing the maintenance off.
Clean Up
After the moves are complete there is still some final clean up work to be completed. This is with regards to documentation, with all our cabling being recroded in internal systems these need to be updated to reflect the moves. A lot of this can be pre-staged ready to be imported after the maintenance, but often needs some adjusting vs the plan for any cable or port changes due to issues on the night.
There will also be a large amount of hanging cabling where all of the ‘hot cuts’ have been completed which require further site visits to be removed. We get these done as soon as possible after the moves to avoid having such cabling in the racks for too long.
< Go Back