Scared of flying? Good news! Software glitches keep aircraft on the ground

NATS explains last week's UK chaos as bug delays 211 United Airlines flights

It has been a bad few days for anyone with a fear of flying or, perhaps more accurately, a fear of getting to an airport only to find that flying is the last thing that will happen.

United Airlines was the latest to suffer problems after a software issue resulted in all its aircraft being briefly held at their origin airports yesterday before a fix allowed flights to resume approximately an hour later.

A United Airlines spokesperson told The Register that "a software update caused a widespread slowdown in United's technology systems."

Although operations are normal now, the cause is still being investigated, but was not a cybersecurity issue, according to the airline.

United's woes have come as a preliminary report into a "major incident" filed with the UK's Civil Aviation Authority (CAA) was published. The incident was a failure at National Air Traffic Systems (NATS) that sparked chaos over UK airspace on August 28.

More than 1,500 flights were cancelled or delayed due to the failure, which NATS blamed on "an extremely rare set of circumstances" where a flight plan had two identically named but separate waypoint markers outside of UK airspace.

These waypoints are used to describe the route an aircraft will take. For a flight going through UK airspace, the route includes waypoints outside the UK required for its onward journey. This, and other information, is bundled into a file defined by the ADEXP specification and uploaded into the systems.

However, in this instance, a flight plan included two waypoints, which, although 4,000 nautical miles apart, had the same name. A duplicated identifier, if you will.

The system searched for the UK airspace entry point in the file and then went hunting for the exit point. Not finding one – which is perfectly acceptable – it eventually came across the duplicate and a critical exception occurred.

Realizing it was broken, the primary system wrote a log to that effect and handed over duties to the secondary system, which did the same thing.

With both systems now down, flight plans could not be processed, and manual intervention was required. Chaos ensued.

A source opined: "It's an astonishing software design screw-up combined with cost-saving silliness. Root cause was the flight plan interpretation module thingy being written to shut itself down instead of throwing the flight plan out for human analysis and carrying on."

We're sure the software engineers among our readership will also be scratching their heads at how such a problem could happen and how it was acceptable that the result was to fall into maintenance mode and write to a log rather than simply record and flag the failure and move onto the following file.

Assertions that the system had processed 15 million flight plans without issue over the five years preceding the incident will cut little mustard with passengers left stranded and airlines forced to scramble to deal with chaos not of their making.

One pilot, who preferred to remain anonymous, told The Register that he had been forced to de-plane several checked-in passengers 30 minutes before departure due to a missing crew member.

"It almost got me lynched," he said, "...not much fun."

Martin Rolfe, CEO of NATS, said: "Keeping the sky safe is what guides every action we take, and that was our priority during last week's incident. I would like to reiterate my apology for the effects it had on so many people, including our airline and airport customers. Incidents like this are extremely rare and we have put measures in place to ensure it does not happen again."

The Register approached NATS for comment on how the software was purchased and validated. We were told that those areas would form part of the onward investigation, as noted at the end of the report. ®

 

More about

TIP US OFF

Send us news


Other stories you might like