There is Good News and Bad news in the Atlassian Outage.

On April 6th news of an outage at Atlassian that affected customers using Jira, Confluence and other products started to surface. The outage started the day before on the 5th and started rumors of everything from a ransomware attack to a potential breach. The rumors were quickly dispelled by Atlassian who stated that a routine maintenance script accidentally “disabled” a small number of customer sites. While their status page showed the status of affected products to be “active Incident”. The messages to customers indicated that restoration of sites would take “several days”, but now a week later there are still people reporting that their sites are still unavailable.

Now new information out of Atlassian indicates that the original statements were not entirely accurate. While there was indeed no breach, ransomware, or outside penetration of their network, it seems that the failed maintenance script was a bit more impactful than just disabling a small number of sites. Instead, it seems that there was a communication problem between the teams performing the work that resulted in the deletion of sites for around 400 customers. The cascade of failures includes running the script in the wrong mode, permanent deletion, and with the wrong list of IDs. The list was supposed to be deactivation IDs and instead the team working it received a list of sites where the app was installed. Instead of being a simple removal of a legacy service, it turned into a massive impact for 400 organizations.

As of today, Atlassian’s status page shows that they have restored 55% of the impacted customers. They still expect the full restoration to take a bit longer to complete, but state they have set up automation of the process. This automation should allow for an increased restoration pace but is probably not much consolation to the clients who are paying for Atlassian services.

The blunder comes after Atlassian made the announcement that they will no longer sell new licenses for in-prem installations and will discontinue support for on-prem in 2024. The move effectively forces the use of their cloud service. It does not instill a ton of confidence in customers to see a mistake of this magnitude happen. We hope that Atlassian puts some addition steps in place to ensure that changes made have a much lower risk of massive impact. For now, affected customers will have to wait for their services to be restored and then make their own decisions on what to do next.

No comments

Leave your comment

In reply to Some User