While Aidilfitri is a time for forgiveness, we are not in the mood to forgive MYNIC for their latest blunder, given the sheer arrogance of their press release almost 24 hours after literally wiping out all .my domains with a Thanos level finger snap of incompetence.
In case you were not aware (thanks in part to DNS caching, a long public holiday weekend, as well as the fact that some Malaysian ISP’s run DNS resolvers that do not check for DNSSEC records), eight MYNIC controlled TLD’s ending with .my suffix stopped resolving across the internet from around 4.30pm on Friday.
While this was not immediately visible to a casual internet user, many started experiencing issues with online transactions, and those browsing websites on the .my domain started facing intermittent disruptions. DNS records are constantly cached at many levels, including on individual browsers as it is not necessary to query remote DNS servers every time a domain name is fetched. By around 10pm as DNS cache records started to expire, the problem escalated and it became apparent that something was seriously wrong.
We published our report on the outage at around 11.30pm, and it took MYNIC another 5 hours to acknowledge the issue via a tweet from their official twitter account.
We are currently experiencing some Technical Issue related to DNSSEC chain with IANA. We feel sorry for any inconvenience to the customers. We are working hard to resolve this issue as soon as possible. More details will be issued from time to time.
— MYNIC Berhad (@mynicberhad) June 15, 2018
That is about 12 hours from the time we roughly estimate the issues with their DNSSEC-IANA chains started. If they are now aware and have acknowledged the issues, we would expect that an organization as important as MYNIC would be able to swiftly resolve the issues.
We were wrong again, as we entered day two, .my domains were still not resolving, and people started getting frustrated. Social media was full of complaints from users frantically switching DNS servers to get their .my links working. Media outlets picked up on our story, as well as our warnings against conducting online transactions on .my domains until the issue is resolved. MYNIC has already had a number of DNS hacking and poisoning issues before this, and with the lack of updates from MYNIC on the current issue, there is always a concern that the outage could lead to a similar scenario.
The issue was soon escalated to Multimedia and Communications Minister, YB Gobind Singh Deo, who confirmed via a tweet at 11.12am that he was aware of the situation and has instructed MYNIC and the Malaysian Communications and Multimedia Commission (MCMC) to get it sorted.
Saya telah dimaklumkan berkenaan perkara ini pagi tadi. Tindakan sedang diambil oleh MYNIC dan SKMM utk mengatasinya. Harap ianya dpt diselesaikan dgn segera. Maklumat lanjut selepas menerima laporan terperinci kemudian. https://t.co/nRML3MhLXtADVERTISEMENT
— GobindSinghDeo (@GobindSinghDeo) June 16, 2018
By around 2.30pm, .my domains started resolving without any issues, and was mostly restored by around 5.30pm. At 5.56pm, MYNIC tweets an official statement with regards to the .my domain name outage.
Press Release – .MY Domain Outage pic.twitter.com/MmzRSitI7m
— MYNIC Berhad (@mynicberhad) June 16, 2018
This is where we get utterly disgusted at the sheer arrogance of MYNIC. We will break it down for further clarity why anybody reading this statement should feel aggrieved by MYNIC’s attitude on this issue.
MYNIC received a report of .my domain services intermittent outages late last night.
MYNIC ‘received’ a report? Are you, MYNIC, a top level domain administrator trying to say that until you received a report, you were not aware that your DNSSEC keys were not resolving, and that until a report came in, you were oblivious to the issues affecting all .my domains? With all the resources available at your disposal, you do not have teams monitoring your own DNS servers for issues and actually have to rely on ‘reports’? This is not a small outage affecting a small number of domains, it affected every single one of the 340,638 .my domains that is under your care and control.
Our technical team acted upon the report and are working together
with Internet Assigned Numbers Authority (IANA) and Verisign to
resolve the problem. As part of the resolution to the problem,
MYNIC’s DNSSEC key was refreshed and pushed to IANA servers.
The problem was acknowledged at 4.35am, and it took another 10 hours to ‘refresh your DNSSEC keys’? We are no DNS experts, but based on our basic knowledge of how DNS generally works, it should not take this long to resolve since the affected servers are directly under MYNIC’s care and control.
While DNSSEC is indeed a much needed security improvement for the DNS services, and we commend MYNIC on its early implementation, it is also a double edged sword. A simple oversight as not renewing the crypto-keys could lead to a complete failure of all DNS services tied to the TLD, in this case all .my domains.
Failing to Plan is a Plan to Fail
The blame for this outage lies solely in the hands of MYNIC. Every organization in the world, even the giants like Google and Facebook are not immune to outages. That is where contingency plans, disaster planning, mitigation and recovery planning comes in. The fact that a somewhat trivial thing as a DNSSEC failure brought down the entire .my domain name services is scary to say the least. The implications this time was negated by the sheer luck that it happened on a public holiday, and a very long weekend.
E-commerce sites like Lazada, 11street and Lelong would have experienced a dip in transactions during the outage. The majority of payment gateways also rely on the .my domain name to get transactions done. Both cinema chains, TGV and GSC also are hosted on the .com.my domains, and would have experienced a dip in sales. And had this happened on a business day, it would have also hit the KLSE as well considering that most of the transactions these days are conducted online, on sites and banking channels that sit on .my domain names.
And yes, when you make a mistake of this magnitude, the first thing you say in your press release after the incident is a massive SORRY to everyone affected. We expected MYNIC to at the very least admit to the oversight and take responsibility for the outage.