We detected a failure in our infrastructure at 13:57. As we also announced maintenance work from 12:00 noon, we did not assume that this was really serious. After enquiring whether our partners in the data centre would start the necessary work, we were told no a long time later. While we waited for an answer from our partner, we analysed the incident to prevent damage, for example.
We could not find the direct fault, so we assumed something more serious. After a restart didn't make sense either, we continued to analyse the system. After some time, at around 16:10, we were able to reduce the error to the GRUB boot loader - which had obviously been damaged by the ZFS. In principle, we did not assume that this was the case, as the affected host system was a few days old.
After a few more hours had passed, the necessary work was carried out by our partner at around 6.30 pm - which also only took about 15 minutes. We continued to analyse in order to fix the problem - which looked difficult after a few more hours in collaboration with various partners. At around 8.30pm we decided to look for a new host system - which we had originally planned to use earlier - and make enquiries. The host system was provisioned at around 10 p.m. - we then took the ZFS apart to find data remnants and compare them with our PBS backups. However, as all the data was still intact on the hard drives, the only option was to transfer the data. Which was done quite quickly.
However, as the individual virtual server configuration files no longer existed, we had to recreate some of them using our PBS. This went quite quickly and smoothly. At around 02:05, all services were running smoothly again.
We are taking further internal measures to minimise such cases in the future. Every affected customer will receive compensation from us; in the case of pre-paid products, a four-day credit for the duration of the service; contract and business customers will each receive a €30.00 credit on their next bill.
As this incident took place between the public holidays and in the middle of the holiday season, we were understaffed and could not act as quickly as we normally do. We would like to apologise wholeheartedly for this.
We will continue to closely monitor the host systems for a further 48 hours to prevent any incidents.