www.neil.blog memo to myself. do the dumb things i gotta do. touch the puppet head.

December 12, 2018

Cisco Catalyst not passing traffic after upgrade

Filed under: Uncategorized — Tags: — npd @ 8:37 am

I typically go onsite for switch software updates. They’re just about the only thing that I don’t have a good failback mechanism for in most of the networking stacks that I support. If a host server update fails, I can reset it through iLO or iDRAC. If a firewall update fails, I mostly have High Availability configurations so a single failure won’t ruin my night. However, I always am present for Cisco Catalyst updates. The failure scenarios are too many, and my recovery options too few. 

This past Friday I was doing a simple update, from 15.1 to 15.2.4(E6) on a pair of non-stacked Catalyst 2960X’s. I’d done two previous updates on this environment without issue, and after my onsite maintenance windows had been delayed a few times, I had to just schedule it to be done remotely. What could go wrong?

I backed up all my configurations and downloaded the latest Cisco-recommended software on my switch, set it to /overwrite and /reload. I watched the upgrade status proceeding normally, remembering that there is often a long period where the switch is unresponsive due to console display errors during upgrades. Then I saw it start to reboot. And I waited.

After 20 minutes my remote session didn’t come back up. I connected to the VPN and found that I could ping and ssh to the switch, but couldn’t ping any connected network devices. Logging in to the switch and running terminal monitor I started looking for what the problem could be. show ver shows me that the upgrade was successful. I can ping other switches and servers from inside this switch. So what’s wrong?

After a few minutes, the following message comes up in the terminal:

%ILET-1-DEVICE_AUTHENTICATION_FAIL: The FlexStack Module inserted in 
this switch may not have been manufactured by Cisco or with Cisco's
authorization. If your use of this product is the cause of a support
issue, Cisco may deny operation of the product, support under your
warranty or under a Cisco technical support program such as
Smartnet. Please contact Cisco's Technical Assistance Center for
more information.

But I’m not using any FlexStack modules, and all my hardware is legitimate. What’s going on? I search this message in Cisco support forums and find the link to Bug ID CSCur56395. Which states:

If this issue is seen AFTER UPGRADE, then hard power-cycle is required

Great.

You can try a reload but this won’t work. You can try a downgrade back to the previous version, but I don’t know if this will work (let me know if it does). Seemed too risky to me, and I’ve never done it, hope to try it in the lab if I can recreate the issue. In my case I had to call a coworker who lives nearby to go onsite and power the switch down. 

Sorry if you read this far hoping for a quick solution to this problem. Time to call your datacenter smart hands, or lace up your boots and head onsite yourself. If you are lucky, you are onsite already, laptop balanced on top of the KVM, reading this post, in which case you are very lucky! Just unplug the switch for 5 minutes, do some stretches, plug it back in, and all will be well again.

Postmortem notes for next time:

  • My hosts should be balanced between switches. Fix that next time I’m onsite. This outage wouldn’t have required repair at 11pm on a Friday if the host had just failed over to the other switch.
  • UPS should have had a network card in it. Not sure I would have done it in this scenario, but in some cases it would be helpful to be able to reset one of the power banks in the UPS using telnet from inside the failed switch. In this case there was no management card in the switch, and I would rather not risk a dirty shutdown of Exchange. But had I been prepared for this, I could arrange servers and switches accordingly into each of the APC’s power banks to minimize unsafe shutdowns while still allowing remote reboots.

February 28, 2017

Cisco Catalyst switch software upgrade hangs at Extracting

Filed under: Uncategorized — Tags: , — npd @ 11:03 am

While upgrading the software from 12 to 15 last week on a Catalyst 2960-X over TFTP, I noticed that the upgrade hung at “Extracting” for a very long time. I had just finished a similar update on a 2960S that was a few years older and did not notice a similar delay. 5 minutes passed and I started to get nervous. 10 minutes later I started Googling to see if this is normal. I didn’t find anything and wondered if something had gone wrong.

Of course, the prior action on the screen was this:

Old image for switch 2: flash:/c2960x-universalk9-mz.120-2.EX5
  Old image will be deleted before download.

Deleting `flash:/c2960x-universalk9-mz.120-2.EX5' to create required space
Extracting images from archive into flash...

So I was very nervous that any attempt to kick this thing back into action would leave me with no image on the switch. It’s 9pm on Friday night and I’d like to go home! After about 15 minutes of waiting, the rest of the process finally kicked back into action and showed this message on screen. Would have been helpful if it had come up when it was actually relevant!

Warning: Unable to allocate memory to display the tar extraction of files, however upgrade process is still continuing. If you would like to see the tar extraction output, try upgrading one switch at a time.
Installing (renaming): `flash:update/c2960x-universalk9-mz.152-2.E6' ->
                                       `flash:/c2960x-universalk9-mz.152-2.E6'
New software image installed in flash:/c2960x-universalk9-mz.152-2.E6

In conclusion: don’t sweat it! It is normal for the Catalyst upgrade process to hang at Extracting.

Powered by WordPress

https://25pc.com/pewdiepie-setup/