A few years back I was given an opportunity to replace an N5K switch from vPC. Some of the physical connectivity was already there, here is what I performed from the start. Before this change, I also referred to Cisco documents but in documents, there are several wrong steps.
So, I thought to write the steps that I saw and performed during this replacement process of one of the N5k switches and then bring the vPC up for all the FEX connected to these pair of switches.
In modern data centers, replacing Nexus 5K switches in a vPC (Virtual Port Channel) setup is a critical task that ensures network continuity and performance.
This article will explore the process and considerations involved in Nexus 5K switcheNexus 5k Switch replacement within a vPC domain, highlighting best practices for maintaining redundancy and minimizing downtime.
Learn more about Cisco Data Centers with Cisco Data Center Training Courses.
The steps followed to bring the switch into production.
1. Replacement switch “rio-n5k-core-s01” is already running the same NX-OS image as its peer and already has a configuration. Some interfaces were shut down, including peer-link, and the management interface on core-01 was shut down on the connected router. Everything is connected to the replacement switch.
2. The existing configuration had ‘auto-recovery’ and core-01 had ‘role priority 1’ so we removed auto-recovery from both switches and changed role priority to 40000 on core-01 to avoid any traffic impact to rio-n5k-core-s02.
3. Pre-Provisioning configuration was added to core-01, when we brought up the peer keepalive interface on mgmt0 and vPC peer-link, we noticed hosts were down. Logs observed on core-02 saying VLANs are being suspended.
rio-n5k-core-s02# show logging last 20
2015 Jun 10 02:06:01.895 rio-n5k-core-s02 %ETHPORT-3-IF_ERROR_VLANS_SUSPENDED: VLANs 400 on Interface Ethernet199/1/9 are being suspended. (Reason: Vlan is not allowed on Peer-link)
2015 Jun 10 02:06:01.897 rio-n5k-core-s02 %ETHPORT-3-IF_ERROR_VLANS_SUSPENDED: VLANs 82 on Interface Ethernet198/1/9 are being suspended. (Reason: Vlan is not allowed on Peer-link)
2015 Jun 10 02:06:01.899 rio-n5k-core-s02 %ETHPORT-3-IF_ERROR_VLANS_SUSPENDED: VLANs 483 on Interface Ethernet197/1/48 are being suspended. (Reason: Vlan is not allowed on Peer-link)
We checked core-01 and found it is missing VLAN configuration, the only allowed VLAN on port-channel 100 (peer-link) was 1 and 600. (Please note, interface VLAN 600 is the IP address/connection we were using to get into Core-01 while the mgmt0 interface was shut down.)
Added all the VLANs back and host interfaces were still showing DOWN (INACTIVE), this is because we omitted to configure the host interfaces on core-01. The old configuration on core-01 was put back including the host interface configurations, and hosts started to recover.
Some port-channels 403 to 410 didn’t come up on Core-01, and host interfaces were missing the ‘channel-group’ statements, therefore showing DOWN (inactive) and no operational members shown on port-channel interface. To resolve this, had to use the ‘force’ option to add the configuration back in. That is:
interface Ethernet198/1/1
channel-group 403 force mode active
Core-01 was the VTP server, core-02 was the VTP client, and the VTP domain name was already configured on both switches. A switch with the highest Configuration Revision should distribute the VLAN configuration.
However, the VTP password wasn’t configured on core-01 so didn’t sync the VLANs. In hindsight, we were fortunate because had the Configuration Revision on Core-01 was higher, then it would’ve wiped out existing Core-02’s VLAN configuration!
To explain why servers were down when one side of the host interfaces (HIF) was missing configuration. This is because, from the servers’ perspective, it will select any path that is UP.
Physically, the servers are connected to both FEXs, and the interfaces are UP, so if the server selected the path to the switch that had missing configuration, then the traffic will be black holed.
There were some issues faced in this procedure where traffic was blackholed, to avoid such problems make sure point 3 must be considered.
1. Make sure that the NX-OS on both switches is exactly the same if not please upgrade/downgrade the NX-OS of the replacement switch to the same as the existing switch.
2. Shut down all the links including keepalive and peer-link port-channel. If this switch needs remote access, an uplink must be configured for the same.
3. Shutting down mgmt0 and port-channel peer-link
4. Need to verify VTP / VLAN configuration – ‘show vtp status’, ‘show vlan summary’, ‘show vlan brief’. Make sure it has to be in sync with the other switch.
Pre-provision FEX by doing the below:
If vPC auto-recovery is enabled, disable it on both vPC peers using the “no auto-recovery” command under the vPC domain. This is to ensure that there is no vPC role change when the replacement switch is brought up.
Ensure the vPC role priority of the currently running switch is better than on the replacement switch. The switch with lower priority will be elected as the vPC primary switch; the default value for role priority is 32667.
Once configurations on both switches are identical, then you can connect peer keepalive link first. Once up, then bring up peer-link.
Verify the vPC, “show vPC” on both switches, also verify fex are online “show fex”
Verify if all services are running as expected.
You can try these practices on our Cisco Nexus virtual lab
Replacing Nexus 5K switches within a vPC (Virtual Port Channel) domain requires careful planning and execution to ensure a seamless transition. By following best practices and adhering to the recommended steps, network administrators can successfully replace aging Nexus 5K switches while maintaining network redundancy and minimizing downtime.
To learn more about Cisco Nexus switches watch the Cisco Nexus Training videos that cover all important concepts about the Cisco Nexus.
He is a senior solution network architect and currently working with one of the largest financial company. He has an impressive academic and training background. He has completed his B.Tech and MBA, which makes him both technically and managerial proficient. He has also completed more than 450 online and offline training courses, both in India and ...
More... | Author`s Bog | Book a Meeting