BlackBerry outage, what went wrong and what lessons RIM should learn from this incident?
What went wrong?
RIM said the service disruption was due to a failure of a core hardware switch related to its BlackBerry Internet Servers (BIS); it would appear that the Blackberry Enterprise Server (BES) was not affected. Not only did the switch fail but the failover system that is supposed to back-up to another switch and recover the system was not responding properly although this sub-system had been tested before implementation. If the problem really was related to a hardware failure, then service disruption should not strike again and the problem should now be contained once and for all.
However, if the problem is deeper than this and provoked by unknown software glitches, then a full recovery of the system could take days or even weeks, particularly if the system was exposed to malware or a virus attack.
Another origin of the problem could be related to a system overload resulting from the increased numbers of BlackBerry users together with the implementation of new features and services such as media-content sharing, BlackBerry Messenger (BBM) music download and online interactive gaming. Not only do these services generate a huge amount of data that cross the BIS, but also expose the system to malware attacks.
If this is the case, RIM will need scale up its infrastructure considerably to cope adequately with the traffic crossing BIS, and this could demand significant investment and time to implement. In the meantime, to alleviate the traffic burden, the company might be forced to switch off some features and services that generate a lot of data. If this is really what is happening, the company could seriously damage its credibility further with its partners - and they would question their reliance on RIM's roadmap.
The situation could have been much worse if the BES had been affected because this server deals with enterprise services that contain time-sensitive information, including e-mail. Loss of this content would have seriously exposed RIM to legal liabilities and would have pushed enterprise customers to look for alternatives immediately. Having said that, a number of enterprise customers do rely on BIS for e-mail services. In fact, RIM recently launched the BES "Lite" version which allows enterprises to download the BES for free and enables their employees to access the service via BIS. So these employees now have the same functionality from BES, but accessing via a BIS plan.
How has RIM handled communication about the problem?
The short answer to this would be very badly. RIM has not communicated very well on this front, which has done it a lot of harm. Google or Apple would have handled a similar problem much more sensibly. For example, when Apple had problems with the antenna system in the iPhone 4, it communicated the problem really well and managed to calm the resulting media furor in just one day. This was mostly thanks to the intervention of Steve Jobs, the CEO of Apple at the time, as he communicated the problem to the industry and promised to compensate those who had been affected or provide a solution, in the shape of the "bumper", free of charge.
In contrast, RIM has shown hesitant leadership; it communicated the issue very badly and the joint chief executives, Mike Lazaridis and Jim Balsillie apologised to customers only under pressure, and only on the third day of the crisis. True, both looked humble and sorry for what had happened. However, they struggled to identify the main causes of the outage and did not say what plans they are putting in place to compensate customers. At this stage, RIM has given the impression it is struggling to contain the problem and, worse still, is probably not even aware of the main reasons behind the service outage.
What impact will this have on RIM's customers?
Since BIS is the main service affected, consumers were the most exposed to the BlackBerry service outage. As these customers do not usually deal with sensitive data, it will take more than just a couple of days of bad user experience to persuade them to look for alternatives. Consumers are often exposed to a bad data experience owing to bad cellular coverage or a shortage of mobile network capacity, so they will perceive the current BlackBerry incident as just another failure of the mobile system. Having said that, if the problem repeats itself, it could be disastrous for RIM because customers would start abandoning the BlackBerry.
Although the BES was not directly affected, some businesses may see this as a good reason to re-evaluate their reliance on centralised servers and instead look to invest in more corporately-controlled servers. Not only would this enable IT departments to minimize the risk of unforeseen collapses but it could also give employees more flexibility to use their own devices.
What will be the cost of compensating customers?
If one considers the fees customers pay for connecting to BlackBerry services, an average US$5 per month, and, if we assume that all 70 million Blackberry subscribers should be compensated for the loss of service they have suffered, then the total amount RIM should pay to refund its customers would be about US$12 million per day. This amount does not take into account liability fees for loss of data or any related legal issues. This could mean RIM paying out over US$100 million and this amount could escalate for every subsequent day the service is out of action.
A number of mobile operators in the Middle East and Europe are already taking the hit and are putting plans in place to compensate BlackBerry customers affected by the outage. Surely these operators will ask RIM for a refund for damages incurred during the period of the service outage.
What about the BlackBerry brand?
Although the company makes almost 80% of its revenues from selling devices, its brand relies on the services it offers, including secure and efficient e-mail and messaging services, plus fast and data-optimized Internet browsing services. These qualities are enabled thanks to its Network Operating Center (NOC) including the BIS and BES. A number of users are hooked on BlackBerry devices because they enjoy using the services rather than being specifically attracted to the BlackBerry phones. If the recent problems are not resolved once and for all, consumers and business customers will consider looking for alternatives.
What lessons should RIM take from this experience?
There are many lessons RIM should take from this experience but the three of the most crucial are:
- Decentralize equipment and services and build more reliable backups. RIM should change the core architecture of its service infrastructure by building more regionalised server infrastructure; this way the company could have more flexibility in dealing with potential outages in specific regions without affecting the global reach of its services.
Obviously this change will require considerable investment but it will be a fundamental step for the company to take in order to maintain the reliability and quality of its services
- Moderate its ambitions and focus on specific market segments and services in line with its core expertise and brand. RIM has extended its market quite considerably in a very short period of time. For example, while the majority of BlackBerry users were business customers before 2009, today most of them are consumers. The company has more than doubled its user base in less than one year from 32 million users in August 2010 to 70 million users in August 2011. In addition, the company has massively upgraded a number of its services - including upgrading its popular BBM [BlackBerry Messenger] service from a basic text platform to a massive media center.
Obviously, the exponential growth in the number of BlackBerry users together with the spectacular upgrade of BBM is putting a huge burden on the BIS and it is now questionable whether this infrastructure can adequately cope with the consequent uplift in traffic.
- RIM should review the way it communicates with the industry and should put contingency plans in place to address potential issues during times of crisis. Service disruptions and malfunctions do often happen within the telecommunications industry. However, these issues are usually dealt with in a transparent and efficient way. RIM has handled the situation poorly and its chief executives appeared shy of expressing their apologies and reassuring users that plans were in place to compensate them.