Skip to main content

Why the lone wolf mentality is a sysadmin mistake

Lone wolf sysadmins cause short- and long-term problems in team environments. Here's an example of where things went wrong, and also when things are done right.
Image
Wolf

If you have worked in system administration for a while, you’ve probably run into a system administrator who doesn’t write anything down and keeps their work a closely-guarded secret. When I’ve run into administrators like this, I often ask why they do this, and the response is usually a joking, "Job security." Which, may not actually be all that joking.

Don’t be that person. I’ve worked in several shops, and I have yet to see someone "work themselves out of a job." What I have seen, however, is someone that can’t take a week off without being called by the team repeatedly. Or, after this person left, I have seen a team struggle to detangle the mystery of what that person was doing, or how they were managing systems under their control.

The best shops I’ve seen work by having a standard methodology, including a standard way of administering systems, and standard procedures for common team tasks. Furthermore, the best shops implement copious cross-training between team members, so that—like your systems—there is no single point of failure.

I was once working on a staff augmentation engagement for a large media company. One day, I connected to a system that was throwing monitoring alerts about disk usage. I found that, while the team used a standardized Kickstart with a common partitioning scheme, this box’s disks looked nothing like that standard. So, I asked a co-worker about it, thinking this system may have just been installed prior to these standards being introduced.

He looked at a log of activity for the system and said, “Oh, this was Jason’s box, I’m not surprised.” Why was that? “Because Jason was a lone wolf and just did stuff the way he thought best.”

Here I was, three years after Jason’s departure from the team, looking at a box that was almost out of disk space with no way to apply the standard mechanisms the team used for addressing this type of issue. Instead, it took me a couple of days of analysis, preparation, and data copy to separate some directories into their own filesystems, in order to have enough space on the system. Plus, I had to schedule downtime and make post-configuration changes to the system as well. Had the initial deployment been done to the team’s established standards, this fix could have all been accomplished live—with no downtime needed—and doing so would have been about a 20-minute task.

Three years after his departure, this team was still finding problems and random stuff done by this one system administrator who had to do things his own way. When I asked my co-worker why Jason felt the need to ignore the standards of the team, you know what he said? “I don’t know, maybe he felt it gave him job security.”

To contrast this, on my last team, when one of our team members was going out for a week or more of vacation, we’d have a "Surviving without " meeting about two weeks in advance. The person who was going out on vacation would write down a list of their duties, systems they managed, or issues they commonly, or maybe uniquely, handled for the team.

Then we would spend about 30-60 minutes in a meeting reviewing this document to determine who would handle those items while was away. If we found a task or duty that no one else knew how to do or had ever worked on before, we still had two weeks to either cross-train or have that person shadow when they were doing those tasks to learn how to do them. Additionally, we would identify any tasks or items that would go without coverage during 's vacation period.

After that 30-60 minute meeting, we’d have our list of tasks, our plans to cross-train whoever was covering an item (if needed), and any uncovered items to review with our manager. The purpose of this review was to verify that those things we thought we could survive without covering were things she also thought we could survive without covering. Meaning that if a request came in that was in that list, it could wait until the sysadmin's return before being handled.

Collaborate. Standardize. Share. It will make you a better system administrator as you cross-train someone else, collaborate on some new project, and share ideas with them. An added bonus, you can take a vacation without being tethered to your phone or laptop, dealing with work issues while you are supposed to be away from work!

Topics:   Sysadmin culture  
Author’s photo

Scott McBrien

Scott McBrien has worked as a systems administrator, consultant, instructor, content author, and general geek off and on for Red Hat since July 2001. More about me

Try Red Hat Enterprise Linux

Download it at no charge from the Red Hat Developer program.