The Healthcare industry seems to be the most impacted with healthcare groups reporting that they have not been able to pay holiday and overtime pay to their staff due to the outage. UKG is reportedly connecting recovery specialists with the remaining organizations that were hit. This has not stopped people from being understandably angry about the situation. There are even lawsuits that have been filed by employee groups against UKG.
The real question is, how did this happen in the first place? Kronos is going to be tight on details (understandably) unless legally compelled to disclose them, so we do not know much about how the attack unfolded. We know that Ransomware got into their system and encrypted several databases in their private cloud. Kronos went the recovery route and did not pay the ransom as many have. However there seems to be a glitch in that plan as we are now over a month since the original incident and still no recovery.
There are many theories about why this is happening including a chance that the backups and the offsite backups have been compromised. There are several ways this could have happened. If the local or first line backups were not properly configured or protected, then the ransomware would have been able to encrypt them. This is seen when there is a disk-based backup, and it runs over SMB or another common file transfer system. These types of configurations leave the backup exposed to malware and should be avoided. Kronos should have also had an offsite backup, but that might also have been impacted if the attackers were able to push the malware into the connection and cause it to overwrite the files.
Outside of the backups being encrypted there are other potential challenges to recovery. Kronos could be looking at more than just the data that was impacted. If the infrastructure was damaged and servers non-recoverable, they might need top build the whole house again. I have seen this happen in a few instances and it is not pretty. If they must rebuild the environment and then try to restore data that might take a couple of months and not just the few weeks that was originally talked about. The reasons behind this could be anything if the attackers had enough time and access.
Outside of attacker activity there could also be more routine issues with the backups. Depending on what they are using as a database server service, they could not even be able to restore directly to the database servers due to missing or improperly backed up log files. They could also have simply bad backup policies which did not include periodic testing of the backed-up data. Upon going to restore they could have found nothing or simple disk corruption. Realistically the possibilities are endless for the reasons behind the delay.
Despite having valid reasons for a delay it does not remove the liability that Kronos has. The facts around the incident need to be disclosed and understood. I am sure this will be part of the law suits (that will likely become class action). Cloud service providers, especially HR and payroll services are targets. They know this and should have proactive preventative measures that mitigate the impact of an event like this. They also should have backups, many, many backups. These backups should be integrity tested one a routine and random basis to ensure that a full bare metal restore can be completed if required. The databases and servers themselves should be hardened against attack using modern antimalware solutions on top of other OS hardening and DLP solutions. This is what should be done, this is not what was done, at least at this stage.
For the companies that were using Kronos, it is not acceptable to push the blame for failure to pay employees on the Kronos incident. Backup methods for payroll should also be part of any plan. It is not enough to buy a cloud service and wash your hands of the responsibility. No, at the end of the day the organization is responsible for the fallout. In this case these groups dropped ownership and now their people are paying the price.