As Southwest, FAA probes begin, fallout could shape flying for years
The Washington Post February 7, 2023
As technicians scrambled throughout the night last month to reboot a malfunctioning pilot notification system, airline officials crowded onto a makeshift Federal Aviation Administration phone line looking for information. How long had the system been out? Could flights already over the Atlantic Ocean safely land?
During the chaotic call with hundreds of people on the line — the second of two major disruptions for U.S. air passengers in a three-week span — an FAA representative had few answers as he battled sounds coming from unmuted phones. Whether to keep flying during the safety system’s outage, the official said, was up to each airline.
“There is no ground stop currently out there in the U.S.A.,” he said.
About an hour later, the federal agency in charge of maintaining safety in the nation’s skies took the almost unprecedented step of halting all domestic departures, an action last taken on Sept. 11, 2001. The action followed decades of underinvestment in technology and warnings about the possible consequences.
The FAA outage hit as Southwest Airlines and its customers still were handling fallout from the carrier’s decision to cancel more than 16,700 flights — a move that left more than 1 million scrambling to rework holiday plans.
The two incidents have triggered multiple federal investigations while revealing the fragile underpinnings of a domestic system that routinely carries 2 million people daily. They also are putting pressure on Transportation Secretary Pete Buttigieg, a rising Democratic Party star, to own long-festering problems at an agency he oversees. The aftermath will shape U.S. aviation priorities for years and could usher in new protections aimed at a public increasingly wary of unreliable flights.
The federal government and one of the nation’s largest air carriers learned long ago their systems were dependent on rickety foundations but didn’t do enough to update technology that could have avoided recent blemishes. Leaders of the House Transportation and Infrastructure Committee will have their first opportunity to ask questions at a hearing Tuesday on aviation safety.
“There’s a lot of concern in Congress,” said Rep. Chris Pappas (D-N.H.), a member of the committee. “We hear it from our constituents. The step that FAA took was extraordinary.”
The hearing is likely to be the first of several to spotlight struggles in the industry, coming alongside deliberations this year on a funding measure for the FAA. In the Senate, the Commerce, Science and Transportation Committee also plans to convene hearings into recent flight disruptions, beginning Thursday, when Southwest’s chief operating officer is among those scheduled to testify.
The Department of Transportation is conducting its own probe into Southwest’s breakdown, examining whether the carrier may have deliberately misled customers by selling tickets on flights it knew it couldn’t operate, which would be considered a deceptive practice that could subject the carrier to fines. Southwest denies that was the case.
Decades-old tech, years of warnings
The airline industry was on the defensive long before the two recent breakdowns.
Despite receiving more than $50 billion in federal pandemic relief money, airlines were caught unprepared for a surge in passenger demand when coronavirus vaccines became widely available in 2021. As cancellations and delays swelled, so did scrutiny of their operations from regulators, lawmakers and the public. Days before Christmas, Southwest’s troubles brought renewed criticism that intensified weeks later with the outage of a key FAA pilot information system.
Glitches in the Notice to Air Missions (NOTAM) system, which flags potential hazards for pilots before departure, began causing trouble the afternoon of Jan. 10.
Backup databases proved unreliable, and the FAA tried to fix the problems overnight. It launched a hotline to help coordinate among agency officials, airports and airlines. In the early morning hours, officials decided to reset the system and ordered a halt to domestic departures.
The ground stop lasted about two hours, but its effects rippled throughout the day, with nearly 11,000 flights delayed and more than 1,300 canceled.
FAA officials quickly ruled out a cyberattack, concluding the outage was caused by a contractor who mistakenly deleted files in the system database. Spatial Front, the contractor, said it is cooperating with the investigation. The FAA barred company employees who were directly involved from accessing its buildings and systems.
Acting FAA administrator Billy Nolen briefed lawmakers on the incident, and in a Jan. 27 letter to the House Transportation and Infrastructure Committee, the agency outlined steps taken to prevent a repeat. Those include a one-hour delay in synchronizing the database with backups to stop any errors from spreading and a requirement that at least two people, including a federal manager, be present during system maintenance.
The failure of the NOTAM system has renewed scrutiny of the agency’s efforts to modernize its technology, which has components dating to the 1970s. The FAA has long been aware the technology it uses to manage the nation’s airspace is outdated.
Its 2023 budget request to Congress outlined multiple aging systems in need of attention, including airport radars, radios and emergency generators up to 50 years old. But those efforts, including one to upgrade the system at fault last month, have been slowed, in part, because of inconsistent funding and complexities in managing a sprawling program in need of upgrades, according to industry and agency officials.
Michael Huerta, who served as FAA administrator during the Obama and Trump administrations, said work to upgrade the industry’s technology infrastructure must be performed regularly.
“These kinds of systems are the plumbing in the old bathroom,” he said. “You need to maintain them and keep them up to date.”
FAA employees warned more than a dozen years ago of potential dangers in how NOTAMs were created, collected and distributed to pilots who needed clear data to avoid hazards.
The agency dedicated millions of dollars to replacing the antiquated system, known as the U.S. NOTAM system, with a consolidated system based on newer, more reliable technology. The higher-tech version would be known as the Federal NOTAM system.
Around the same time, a 2008 outage similar to the one that occurred last month underscored the urgency of ongoing modernization efforts, said Brett Brunk, an FAA technology specialist involved at the time. He said parts of that work stalled because of struggles for adequate funding and an “organizational inertia” that hindered the agency’s ability to advance the project.
“At some point, you can’t just keep putting new tires on an old car,” Brunk said. “Eventually, it rusts out.”
Eight years later, a five-year plan issued in November 2016 indicated a goal to phase out the older system by the end of that year. In later editions of the document, that date slipped to 2019, then 2023. The current target date is 2025, although the agency says it’s looking for ways to accelerate the work.
Despite those setbacks, the FAA cites advancements that include making the notices searchable and offering pilots multiple streams of NOTAM data. Even so, the older U.S. NOTAM system that was supposed to be replaced remains a pillar of the FAA’s warning system.
The January outage renewed efforts to address its risks. The House last month passed a bill to study improvements to the NOTAM system, coming hours after a bipartisan group of senators introduced a similar measure.
In response to growing technological needs, a joint office representing the FAA and other agencies set out plans in 2004 for a major infrastructure program known as NextGen. Members of an advisory committee formed to coordinate the multibillion-dollar FAA program expressed frustration last year that NextGen’s efforts remained too limited, according to agency meeting minutes. Airline representatives on the committee said it could become hard to justify the industry’s spending on major technology advancements when the FAA isn’t doing enough on its end.
Timothy L. Arel, chief operating officer of the FAA’s Air Traffic Organization, responded by saying the agency has absorbed more than $270 million in pandemic-related expenses, even as budgets remain flat.
There are signs the industry might be willing to lobby Congress for money to make upgrades.
“This ought to be a wake-up call for all of us in aviation - something that many of us in aviation have been saying for a long time — that the FAA needs more resources,” United Airlines chief executive Scott Kirby said last month.
Flight delays as a political issue
Buttigieg has vowed accountability tied to the FAA and Southwest but has become a growing target of criticism from Republicans as the two problems unraveled.
On the morning of Jan. 11, as thousands of passengers remained stranded during the nationwide ground stop, a top GOP leader placed the blame on Buttigieg.
“Can we just go a week in America without a major debacle out of the secretary of transportation’s purview?” asked House Majority Leader Steve Scalise (R-La.).
Buttigieg has also taken criticism from the left, with Rep. Ro Khanna (D-Calif.) saying the secretary didn’t heed calls from him and Sen. Bernie Sanders (I-Vt.) last year to take a harder line with airlines.
Buttigieg’s office has cited successes, including in securing pledges from airlines to cover meal and hotel costs when cancellations are their fault, and pressuring airlines to refund more than $1 billion.
In an interview, Buttigieg said the two episodes offer significantly different lessons.
The aviation sector had made “all of these strides toward reliability over the course of what had been a very rough year,” Buttigieg said. “Then one airline melted down completely. And it goes to show that the system requires all of the different players to be at their best.”
Buttigieg said the challenge for the FAA is accelerating decades-long modernization efforts while maintaining a safety record that, in most years, has no commercial airline passenger fatalities. He said the system was stitched together over decades and is managed conservatively because of that safety success.
“These older, creakier systems are going to be increasingly vulnerable as demand picks up,” he said.
‘It just overwhelmed the technology’
For Southwest, problems sparked by a winter storm spiraled out of control after the carrier’s staffing system became overwhelmed and lost the ability to keep track of pilots and flight attendants. Two memos obtained by The Washington Post showed executives also were concerned about a shortage of ground workers at two of its busiest airports, in Denver and Baltimore, although the carrier maintains it was fully staffed.
Employee unions had warned for years that the software used for scheduling crews needed an overhaul, but airline executives didn’t heed those calls. While other carriers quickly resumed flying after the storm passed, Southwest struggled for days to regain control of its operations.
The airline thought it had a good plan in place: It proactively canceled flights ahead of a winter storm that was slated to hit its largest base in Denver, on Dec. 21, then move across the country. Members of the carrier’s pilots union who met with airline officials a day earlier said they were assured by management the carrier was prepared and could handle cancellations from weather.
By Dec. 22, according to a post-incident analysis by the pilots union, it was starting to become clear that management had been too optimistic.
Lyn Montgomery, president of TWU Local 556, which represents Southwest flight attendants, said her first inkling the carrier was facing massive problems also came Dec. 22, when calls from union members began pouring in: Flight attendants were stuck on hold for hours as they tried to reach the scheduling desk, and crew members who had flown to cities in the storm’s path were staying at hotels without heat and water.
Southwest chief executive Bob Jordan said in an interview with The Post that employees volunteered by the hundreds to help the airline dig out as it scrambled to locate crew members and reposition planes, flying hundreds of empty aircraft in a bid to resume operations.
As those decisions were made in Texas, the ramifications were being felt nationwide.
Doug Kotlove, 53, arrived at Baltimore-Washington International Marshall Airport on Christmas Eve with his wife and two daughters expecting a smooth trip to Florida.
At the gate, a Southwest employee was unable to reach anyone who could track down a flight attendant so the plane could depart. The flight was soon canceled. Kotlove and his family were told to return to the service counters outside of security to rebook, only to find a long line of others trying to do the same.
As they left BWI and headed back home to Bethesda, Kotlove sensed a crisis was brewing.
Over the next three days, the carrier canceled more than 60 percent of its flights. The Southwest Airlines Pilots Association estimated the carrier operated more than 500 flights with no paying customers between Dec. 22 and Dec. 29 to rebuild its network.
“[This] caused a historic level of cancellations that turned into an historic level of aircraft rerouting that led to an historic level of crew rerouting or rescheduling,” Jordan said last month. “And it just overwhelmed the technology and the processes.”
Southwest is paying the price for its poor performance, swinging to a net loss last quarter because of expenses attributed to the collapse. Jordan characterized the 11-day period as a singular episode, but it’s not the carrier’s first technological breakdown.
In 2016, the partial failure of a router at Southwest’s Dallas data center that grounded planes and canceled more than 2,000 flights was an early warning of the need to look more closely at technology.
The carrier upgraded its reservation system in 2017 and maintenance system last year, but the software it uses to schedule staffing on flights remained problematic, according to union officials and industry analysts.
“This was not the first time they experienced this problem, but it was the first time they experienced it at this scale — the sum of long-term overreliance on overtime, plus extreme weather, plus marginal IT competence influences, which collectively triggered to total meltdown,” said Robert W. Mann, who runs R.W. Mann & Co., an airline industry analysis and consulting firm.
Avoiding another $1 billion mistake
In all, Southwest’s estimated $1 billion in lost ticket revenue, overtime and goodwill gestures for customers and employees is roughly the same amount the carrier spent last year to upgrade its IT systems.
Jordan said Southwest will spend $1.3 billion on upgrades this year, an amount that could rise depending on findings of a consulting firm the airline hired to examine the root causes of the problem and offer recommendations on preventing a repeat.
Jordan said an update to Southwest’s scheduling software could be in place within weeks. In the meantime, the carrier has a backup plan for any similar situation: an army of about 100 volunteer employees trained for manual scheduling.
As for the FAA, on Jan. 11 — the same day the nation’s aircraft were grounded for nearly two hours — the agency signed a $3 million agreement with Reston, Va.-based Concept Solutions for “modernization support” for the newer Federal NOTAM system.
A representative of the company didn’t respond to a request for comment. The firm says it’s hunting for new — and nimble — workers who can juggle the antiquated workings of old FAA databases and the newer demands of modern systems, according to company job postings.
A key requirement, one read, is the technical ability “to understand legacy systems integrated with new technologies.”