Recovering from ransomware: a conversation with Veritas CEO Greg Hughes

As CEO of data protection and availability company Veritas, Greg Hughes stands on the frontlines of the ransomware battles, which are growing in number and sophistication. From his vantage point of serving close to 90 percent of Fortune Global 500 companies, Hughes has seen firsthand how attackers use tactics that are increasingly difficult to detect and defend against. The threat has grown markedly over the past year and a half for several reasons, elevating from one that was the domain of IT leaders to one that executives and boards must understand thoroughly.

Hughes spoke with McKinsey’s Paul Roche, a senior partner who leads the firm’s software practice, and Jim Boehm, a partner specializing in digital risk and cybersecurity, about the growing threat of ransomware and data leakage; why it now needs to be a C-suite and board-level priority; and how organizations can best develop reliable backups and prepare to recover from an attack. Their edited conversation appears below.

McKinsey: Ransomware has become an increasingly important topic in the context of cybersecurity over the last 18 months. Given your company’s visibility across the industry, what are you hearing from customers, especially recently? I imagine there is a bunch of anxiety and lots of conversation about it.

Greg Hughes: I spend a lot of my time talking to CIOs as part of my job, and ransomware has risen very quickly over the past six months to become the number one issue that I’m hearing about from them. The Colonial Pipeline cyberattack in May of this year really is only the tip of the iceberg. There are a number of companies that have had near-death experiences related to ransomware, so it’s a critical topic.

I’m aware of one 350-year-old company which was almost bankrupted by a ransomware attack because they had to stop their whole business operation. This company is older than the United States, it’s been through multiple wars and survived. But ransomware nearly brought it to bankruptcy, which gives you a sense of what a threat it is. There’s just a ton of top-down pressure around it. CEOs don’t want to wake up and find out that their whole business operation just shut down.

McKinsey: The threat has really evolved from targeting big businesses to also targeting small and medium-sized businesses. I have a family member, for instance, who runs a medical practice that was hit with a ransomware attack via an IT provider. Can you talk about the types of solutions you have to configure to serve all these types of companies?

Greg Hughes: The threat extends up and down the corporate ladder, and targets small-to-medium businesses, county agencies, even city agencies. When you think of the threat in healthcare—small healthcare providers or hospitals—it’s scary stuff. In many cases, those kinds of organizations are the least prepared and the most vulnerable.

Countering the dynamic threat

McKinsey: The nature of the threat is evolving, with data exfiltration and poisoning of backups. Negotiations with attackers now really center on preventing data that’s already been exfiltrated from being released. And of course, there are multi-spectrum attacks where you’re being taken offline. What kind of conversations are you having with your customers about the full spectrum of the threat and their concerns?

Greg Hughes: In our position as a backup provider, we take primary data and move a copy to secondary storage. That’s fundamentally what backup does–both in and out of the public cloud. Companies will ask us what they should do around their primary data, and it really comes down to two important things:

First, all sensitive data should be encrypted. There’s just no reason to have anything in clear text anymore. Second, there are data loss prevention (DLP) tools that will check the perimeter and indicate whether the most sensitive data, such as confidential or regulated information, is leaving your site. Those are two minimum components of a proper data leakage plan.

McKinsey: What are some of the best backup practices that you really emphasize?

Greg Hughes: The most important point, which may be kind of obvious, is that your backup is your insurance policy. Backup data is what allows you to restore your primary system, so it plays a key role in responding to a ransomware attack.

The first step is to make sure that your backup application, like all your other applications, is upgraded to the latest version. This almost shouldn’t have to be said, but don’t try to fight today’s ransomware issues with yesterday’s technology.

Second, redundancy is good insurance and it’s inexpensive. There’s a concept known as “three, two, one”—three different copies of your data on two different media, one stored offline. That’s really a minimum standard. Storage is very cheap these days, so make sure you have enough redundancy.

The third component is immutable storage—you need to have a backup copy in immutable storage, meaning that it’s tamper-proof. It can’t be altered. There are a lot of different immutable storage options now. It can be on disk or in the cloud, and there’s always the old standby of tape, which is immutable, and can be taken offline too.

The final principle is to make sure your whole backup solution is secured–tightly secured–end-to-end with zero trust access, intrusion detection, intrusion prevention, two-factor authentication, and role-based access control. It’s very important to make sure that you have segregation of duties and responsibilities. The folks who touch primary systems shouldn’t be able to touch your backup systems. Advanced anomaly detection is another layer of security.

The cloud and other keys to backup

McKinsey: Cloud-based backup makes the “three-two-one“ backup principle you mentioned accessible to many more companies. Why does ransomware make it so important to have a cloud-enabled backup as a part of your overall backup and recovery strategy?

Greg Hughes: The cloud has been a major area of innovation in backup, to the point that you can now think of the cloud as another target for backup. There are many different storage types in the cloud, including cheap storage that can be used for archives, or immutable storage. Also, in the case of disaster recovery, instead of having a devoted and expensive data center as a secondary backup, you can effectively spin up a data center on demand using the cloud. That’s a powerful concept.

And then, of course, you need to back up your data in the cloud. One common misconception is that the cloud provider will take care of ransomware. The cloud service providers are very clear to say that backing up your data in the cloud is your responsibility, so you must use the same techniques that you’d use on-premises to protect your data in the cloud.

McKinsey: In addition to technology solutions, what else do companies need to work on to build a good backup strategy, especially when it comes to backup access due to a ransomware attack?

Greg Hughes: An area where we often get drawn into a conversation with our customers focuses on operational resiliency rather than just perimeter security. The reason you really need operational resiliency is that the primary threat vector—which is spear phishing—works. It preys on human vulnerability, so the bad guys are going to get in. The malware’s going to get in.

Most advanced enterprises are trying to figure out how you handle a worst-case event—what I would call a “cyber wildfire”—that just wipes out your data center or your data that’s in the cloud. The key to resiliency is a multilayered plan, with no single point of failure.

The National Institute of Standards and Technology (NIST) cybersecurity framework is very good. It’s a five-point framework, but three of the elements have to do with resiliency: protect, detect, recover. How do you protect your data and your systems? How do you detect when an attack is happening as quickly as possible? And how do you recover from that attack as quickly as possible? It’s not just about backup. It’s about that whole process.

McKinsey: What do organizations need to do from a process point of view to make sure that the backups they are doing are actually helpful and alleviate the problem?

Greg Hughes: The “recover” component is where so much of the planning is essential, because restarting applications and business services requires so much coordination across so many different stakeholders. You’ve got compute, storage, network, applications. You need a digital run book, really, a ransomware run book that coordinates all those pieces. And the first step is to make sure you’re recovering from a known, good copy of your backup. That’s where we use anomaly and malware detection to help us determine that last known, good copy.

You also want to scan the data using good malware scanners before using it for recovery. You want to have that in an isolated recovery environment so it’s not touching your primary systems.

Organizational cyber maturity: A survey of industries

Organizational cyber maturity: A survey of industries

Preparing for a full recovery

McKinsey: We encounter a lot of companies that are confident about their preparation, because they run tabletops and war game situations. However, if you ask, “Well, how long does it take you to restore from backup,” they say, “I don’t know.” And we’re hearing that recovery is often incomplete for companies that get attacked by ransomware. Sometimes critical systems are left out or become corrupted during the recovery process, and it just takes a really long time. What do you see in your work when it comes to recovery? How can companies make sure they stay ahead of those kinds of problems?

Greg Hughes: The main thing we believe about recovery is that you’ve got to test your recovery plan. A plan is only as good as your ability to test it, and how frequently you test it. Also, the volume of applications that need recovering is going up a lot.

The other thing that’s happening is that boards are starting to ask, “Do we have everything protected?” They’re reading the news and they’re thinking, “Do we have all of our applications and data protected?” That’s a surprisingly challenging question to answer. One problem we see is very low visibility. Make sure you have good reporting that lets you see that all your applications, all your virtual/physical machines, and all your data are protected.

McKinsey: I’ve seen a board ask that exact question. It’s a question that boards should be asking. And they don’t want to hear an answer other than “Yes.”

Greg Hughes: Exactly, there’s only one right answer to the question they’re asking. But we have also seen cases where, unfortunately, the enterprise finds out that they have only backed up, say, 20 percent of their data after they’ve been hit by a ransomware attack. That’s the worst time to find out.

Innovation in cyber risk solutions

McKinsey: Let’s talk about innovation going on in this area. What do you see coming down the road six, 18, or 60 months from now? What potential changes are generating collective excitement in the industry?

Greg Hughes: First of all, the cloud is a major area of innovation. The cloud offers scalability & elasticity, which is significant because backup and recovery by their very nature scale up and down over time. Also, we need to optimize for the variety of workloads in the cloud, across containers and different databases, while also working across multiple clouds to make it easier for enterprises to protect their data with a single policy across any cloud provider.

A second area of innovation is applying Artificial Intelligence (AI), Machine Learning (ML), and data analytics techniques in and around this space. One of the big questions there is: how do you identify the last known, good copy of backup as quickly as possible? The last good copy is the most current backup without malware. There is a lot of innovation going on now in the backup and recovery space.

McKinsey: Two things that we’re also hearing from clients are that when ransomware spreads, it spreads very, very quickly. And it impacts multiple systems, multiple stacks of technology. How capable do you think today’s recovery solutions are of handling that level of complexity?

Greg Hughes: This is a big and ongoing area of investment for us, as is making sure that we’re backing up all the different workloads optimally. A large enterprise may have dozens of technology stacks. Some of them go back years, some of them are the most modern container-based cloud technology stacks, and everything in between. And it’s not like they’re going to rip out all the old stuff, so a backup provider needs to support all these technology stacks. It’s a significant investment to keep up and requires compatibility with hundreds and hundreds of different workload types.

Choosing priorities for a recovery operation

McKinsey: The business risk problem, especially when it comes to ransomware, is the operational component of getting systems back online. What is the typical timeline for getting fully back online? I’ve heard everything from days to weeks or months. And what are some best-practice timelines that companies serious about testing recovery operations should be shooting for?

Greg Hughes: This really comes down to the classic model of thinking about recovery, which is that you want to tier your business services—tier zero, tier one, tier two, and tier three—in terms of prioritization. Then, you want to tie that to the applications and infrastructure that support those business services. This way, you know what’s the highest priority to lowest priority. And then you assign service level objectives with recovery time objectives (RTOs) and recovery point objectives (RPOs) to each of those tiers.

That conversation has been led primarily by IT, but given the threat of ransomware, it’s important to bring that conversation to the business. It’s got to be a top management, CEO- and board-level conversation, so that when there’s a malware or ransomware attack, people are prepared for the length of time it will take for the services to respond.

McKinsey: That’s a good perspective. Instead of asking, “What’s my target?”, companies should ask themselves, “What should your target be?” It should almost be a question that you pose back to the business. And then, what’s your willingness to achieve that target—how much time and money are you willing to invest in solutions, in preparation, in testing?

So, the goal for recovery operations really should be a dialogue as with any other security or business resilience solution. You want to focus on the highest priorities first.

Greg Hughes: That’s a good point. It’s a dialogue between IT and the business, between the CEO and the board. It’s a dialogue to look at what the competitive standards are if you can figure that out. And then, finally, for a lot of regulated industries, it’s a dialogue with the regulators as well.

The critical role of vendors–before and during a crisis

McKinsey: Once an attack does happen, what are some of the things that are key to managing the crisis well? What should CEOs be thinking about that will help them get back online and operating as efficiently as possible?

Greg Hughes: There are several elements to the answer for that question. The first is to remember that your vendors want to help, so quickly getting them on the phone and explaining what’s going on in a regular cadence is important.

The second point is anybody who’s gone through an attack will probably say it’s the most challenging professional experience they’ve ever faced. It’s going to be a 24 hours a day, seven days a week kind of thing. You need to be prepared for that.

McKinsey: You mentioned communicating with your vendors, and that is a big oversight we see often. Too many companies fail to bring their vendors to the table, especially when they’re doing testing, running their playbooks, doing tabletop exercises. Vendors are partners with you, not just to provide a product or a service but to make sure your business is successful, so they need a seat at the table.

Also, an attack is going to constrain your ability to make decisions, potentially even some of the decisions that a CEO or business leader would make, and you won’t fully understand those constraints unless you have key stakeholders in that conversation. If you don’t have your vendors there during testing to understand how they will be able to help you, you might be practicing making decisions that might not be yours to make or might not be possible.

Greg Hughes: That’s true. One way to look at it is that your vendors should know your ransomware run book—read it, provide feedback, advise where it could be improved, and keep it updated.

McKinsey: Are there good resources you recommend for people interested in learning more?

Greg Hughes: There are a lot of good resources out there to help (see sidebar). One place I’d point to is the US Cybersecurity and Infrastructure Security Agency (CISA), which publishes guides about ransomware resiliency and how to recover from ransomware. It’s a terrific resource, very comprehensive, and covers the whole spectrum of activities you want to launch very quickly if you’ve been attacked.

Explore a career with us