Like Microsoft, Massive AT&T outage also happened because of a bad update
If you’re wondering if the recent global IT outage – caused by a bad update to cybersecurity company CrowdStrike’s software which brought down millions of Windows PCs – is an isolated case, we can assure you that it is not.
In fact, something similar already happened earlier this year. A government investigation into a nationwide AT&T outage in February has shown that the cause was a bad network update, Ars Technica reported on Tuesday.
According to an FCC report, the outage “affected users in all 50 states,” and “all voice and 5G data services for AT&T wireless customers were unavailable, affecting more than 125 million devices, blocking more than 92 million voice calls, and preventing more than 25,000 calls to 911 call centers.” The FCC also noted that it took AT&T “at least 12 hours to fully restore service.”
We’ve covered the outage as it happened, noting that it caused disruptions for users on other, unaffected networks, as they were unable to call AT&T customers. AT&T offered its customers a $ 5 dollar credit as apology.
The incident, according to the report, began “after AT&T implemented a network change with an equipment configuration error.” But it wasn’t just this one, isolated issue that made this outage so serious.
The FCC Public Safety and Homeland Security Bureau analyzed the incident and found that the outage “was the result of several factors, all attributable to AT&T Mobility, including a configuration error, a lack of adherence to AT&T Mobility’s internal procedures, a lack of peer review, a failure to adequately test after installation, inadequate laboratory testing, insufficient safeguards and controls to ensure approval of changes affecting the core network, a lack of controls to mitigate the effects of the outage once it began, and a variety of system issues that prolonged the outage once the configuration error had been remedied” (per Ars Technica).
The story might not end there for AT&T, which is potentially facing a large fine. But it’s another reminder that the global IT networks which we rely on are often more fragile than we think, and the safety procedures for critical systems need, in many case, some serious looking into.