Building Organizational Resilience

Apr 25, 2025 · 7 min read

Leaders love talking about resilience.

But here's the thing. When we talk about resilience, we usually focus on the individual. The hero developer who stays up all night fixing bugs. The founder who refuses to give up. The manager who somehow keeps everything running.

This causes many organizations to design themselves around heroes. Heroes are seductive. They solve immediate problems. They create impressive results. But organizations built around heroes are fundamentally fragile. When your hero gets sick, burns out, or leaves for a competitor, their specialized knowledge, relationships, and capabilities disappear with them.

Even worse, hero-dependent organizations create dangerous incentives. Knowledge becomes power to hoard rather than share. Problems become opportunities to showcase heroic problem-solving rather than systematic prevention. At the extreme, causing problems to solve become the thing to strive for.

What you want is not (only) resilient individuals, but a resilient organization.

Redundancy

Efficiency demands eliminating redundancy. Yet, Southwest Airlines keeps extra planes in rotation. Toyota qualifies multiple suppliers for critical components. Netflix runs parallel systems that can take over when others fail.

How is it that these high performing organisations keep so much deliberate “waste”? The answer is that they know the price for a single point of failure. The “waste” created by having backups is far cheaper than the price of having the one hyper-efficient link break the organizational chain.

Redundancy creates resilience. This isn't about duplicating everything or everyone. It's about identifying your critical systems and ensuring you have appropriate buffers and backups for them. For a software system this could mean having multiple servers running in parallel. In an organisation this takes the form of having teams own critical parts of the system, rather than individuals.

Let’s illustrate this point by imagining two companies. The two companies are competitors in the last mile logistics and both depend on its respective in-house route calculation software system. The two companies are identical in all aspects except for one: How the team that builds the software is organised.

In the first company you have the Hero Developer. This Hero is the only one who has been allowed to touch the route calculation engine. The Hero built it, the Hero knows how it works and teaching others how it works would just take too much of the Hero’s time. Every time there is a bug in the engine, the Hero jumps in and saves the day.

In the second company there is no Hero. Only a team of average developers. The average developers take turns fixing bugs in the engine and help each other out when needed. Updating the engine is part of everyone’s normal work, so fixing a bug in it is not really heroic. Since they all take turns fixing issues, help each other and document fixes so other people know what has been done, development takes a little bit longer.

In the first company, it takes two days on average for the hero to fix a bug. In the second company, it takes the team two and a half days to fix a bug and document the changes.

Now ask yourself this: Which company performs better? If you only look at the time to fix bugs, the company with the Hero looks better. Less waste means quicker turn around, so surely the first company performs better?

Now imagine our hero developer gets hit by a bus and is unable to work for six months. Which company do you think will perform better in that scenario?

Resilient Culture

I’ve heard stories about companies ending up depending on a single Hero more times than you would think and they always end in one of a handful of ways. Either the hero developer quits, gets injured or starts holding a grudge against the owner and starts holding the company ransom (true story). Either way the company ends up in a crisis. Sometimes it manages to pull through, and sometimes it goes under.

It’s not difficult to guess why this happens. There are a lot of moving pieces to keep track of when building a startup. An early developer building most of the core of the system takes the weight off the shoulders of the founder, so it’s often a relief. A quick win to keep the boat afloat.

But resilient systems need resilient cultures. To do that you have to sacrifice speed today for speed tomorrow. Writing documentation doesn't help your customers right now. Neither does building processes. But when your whole team can find answers without tracking down that one person who knows how everything works? That's when you start building momentum.

Courage and culture

It takes the courage to build a resilient culture. As a founder and leader, you need to get comfortable with pushing as much as the decision making as possible to the people doing the work. This is scary because it feels like you are giving up control. But if you don’t, then you become the single point of failure.

It also requires psychological safety. People must feel safe reporting problems, admitting mistakes, and expressing concerns without fear of punishment. Problems remain problems if they are fixed in the shadows. A problem that is brought into the open, on the other hand, becomes a learning opportunity for the whole organisation.

Every organisation will see important things break. This is nothing to be afraid of. In fact, if you don’t you’re probably not running fast enough. The important thing to realize is that recovery isn't just about returning to normal. It's about returning stronger. If you have a learning mindset, every incident turns into chance for the system to improve. To become more resilient.

The most resilient organizations don't wait for crises to discover areas that need to improve. They deliberately introduce controlled chaos to identify vulnerabilities before real disasters strike. For example, Netflix's Chaos Monkey randomly terminates services in production to ensure systems can recover automatically.

Killing services is probably not wise before finding Product-Market Fit. However, you can create safe-to-fail experiments, regardless of context, that reveal how different parts of your systems actually respond under stress. It’s not uncommon for developers to run spike tests to see how a system performs under load. If you’re a leader, this could mean turning off your phone and going on vacation for two weeks, to see how your team manages when you’re not there to give direction.

Organizational resilience isn't about preventing crises. It's about designing systems that can absorb shocks, adapt to changing conditions, and emerge better after disruption inevitably arrives.

Wrap up

Building a resilient organization doesn’t happen overnight. But you can start today by mapping your vulnerabilities to identify single points of failure in people, processes, or systems. Find the low hanging fruit, such as ensuring knowledge is documented and shared across multiple people. Run a small experiment to find unknown weak spots. And perhaps most importantly, create the psychological safety needed for people to share concerns and report problems early.

The cost of all this might seem high. But the longer you wait, the more painful it will be when disruption eventually hits.

If you still don’t know where to start and would like someone to guide you in the process of building a resilient organisation (without losing too much speed) I’d be happy to help out. Simply send me an email to get the conversation started.

Because the question isn't whether you can afford resilience. It's whether you can afford its absence.

Heads up!

Are you a CTO who is ready to take your leadership from Good to Great? Clarity is your superpower. Let me help you unlock it.

👉 Book a free discovery call and start transforming your vision into action today.

Not a CTO? No problem! Book a Clarity Call to tackle your toughest challenges and accelerate your business.

Subscribe to my weekly newsletter

© 2024 Viktor Nyblom