Sie sind auf Seite 1von 8

Disaster Recovery and Business Continuity

Disaster Recovery and Business Continuity


By Computerworld Philippines Staff

Four IT executives reveal some of the IT adjustments they did as a result of the recent calamities and share strategies in preparing for disasters, as well as the challenges that will be met along the way.

The onslaught of two typhoons that wreaked havoc on the Philippines towards the end of 2009 has reminded companies of the importance of IT disaster recovery and business continuity plans. Though the ordeal was damaging and paralyzing, valuable lessons were learned in pushing the IT capabilities of businesses in a disaster to the limits. One of which is to treat IT disaster recovery as one of the top priorities, if not, the very top priority. During Computerworld Philippines monthly CIO Roundtable last November, four IT executives revealed some of the IT adjustments they did as a result of the recent calamities. They also shared strategies in preparing for disasters as well as the challenges that will be met along the way. For Rommel Frias, division head of IT/ISD, Eastern Communications, disaster recovery and business continuity are not just for giant companies like telecommunications networks but for all types of businesses. In the case of PJ Lhuillier Group, a popular chain of pawnshops in the Philippines, advancements in business continuity is of high importance. Our objective in building the system is to improve customer services in case of interruptions like disasters; because they will surely affect our customer services. And since we deal with high volumes of transactions every day, if disaster Roundtable Discussion Page 1

Disaster Recovery and Business Continuity


strikes and we fail, then there would be tremendous financial loss, says Ergie Ong, chief information officer of PJ Lhuillier. Similarly, Larry Delos Santos, assistant vice president, technical services department of Insular Assurance Company, reports that they have undergone several disaster recovery testing and is looking to implement a 247 web-hosted applications where downtime should be very minimal, or nothing at all. Meanwhile, Jose Raymundo Vergara, chief technology officer of Metropolitan Bank and Trust Company (MBTC), Inc., explains that disaster recovery is important since the function of banks have gone beyond just basic business processing like keeping ATM services available, describing it now as a new form of public service. We need to keep in mind that our responsibility to the community is always assured by planning for disasters, Vergara notes. During the forum, former roundtable participant Jose Maria Valdes, director for operations at Encash, an independent ATM deployer firm, served as the roundtables guest moderator. Sophos Julius Suarez was also present to represent the firm who sponsored the event.

EXCERPTS OF THE ROUNDTABLE DISCUSSION FOLLOW:

Computerworld: Why is disaster recovery and business continuity planning important to your company and how is it important? Frias: For disaster recovery and business continuity, I guess not just for telcos but also for all companies that conduct any business, it should be one of the top priorities. If something untoward happens, at least your core function as a business is there and your management and administration has a fallback position. The problem with companies that are not ready is that when something untoward happens, they dont know what to do and it will affect their customers; and I guess thats the view being shared not just by telcos. Delos Santos: Disaster recovery and business continuity ensures the continued availability of information and the ability to process online transactions across the organization. In Insular Life we have online transactions for payments from customers and enrollment of policies, a disaster will mean manual transactions on the branches and/or delayed in policy issuance for new applications when problems on systems availability is experienced. We may resort to manual transactions in this scenario. Vergara: I have to supplement what the first two panel members have said aside from the business interruption. Particularly for banks, its their responsibility to keep the assets of other parties, our customers, which is why we need to keep our information very secure. The entirety has to be assured and we need to be open for business to service their needs and, in recent years, Roundtable Discussion Page 2

Disaster Recovery and Business Continuity


banking has gone beyond lending and deposits, now it has gone into remittance, into electronic channels, and those things need to be up always. Even in times of disasters, people need to pay their bills. Like in Marikina, right after typhoon Ondoy, we were one of the first to actually restore ATMs and the queue was really long because many people needed cash. You see, it has gone beyond just basic business processing like keeping those services available but it is now also public service. We need to keep in mind that our responsibility to the community is always assured by planning for disasters. Ong: We have lots of email outlets and we have millions of class C or D customers and one of our objective in building the system is to improve customer services in case of interruptions like disasters because they will surely affect our customer services. And since we deal with high volumes of transactions every day, if disaster strikes and we fail, then there would be tremendous financial loss. Computerworld: Did Ondoy and Pepeng spell a disaster in your company? Delos Santos: The recent typhoons had no direct impact in Insular except that two sites were closed. One office in Dagupan was flooded but we are situated at the second floor so there was no direct damage on the resources. Vergara: We have several branches in Marikina, so obviously those would be under water so we had to stop operations. But it was also good in terms of preparation as there were enough preparations for something like this. So after three or four days, all of these branches were restored. In the case of Pepeng, we only had a couple of branches in Carmen that was under water. We were ready to operate except the power was not yet up so we shifted to a generator. Ong: For us, 26 branches were affected and, of course, we needed to replace the computers and the security system. But we were able to restore operations a day after because, as I mentioned earlier, we developed a system that is ready for disasters like that. So even if theres no power, no line connections, we were able to do transactions. Sophos: How about for small disruptions, like if one branch goes offline, do you also consider that as disaster? In addition to that, whats the most frequent disaster do you encounter? Ong: For retail outlets like us, its very critical, very elementary approach. This means you need to go to the branch level. So when we design the system, we made sure that the branch is highly available in terms of servicing the customers. We actually did a list of possible cases of disasters and then we make sure that we have solutions to process transactions of customers. The most common are very light problems, second is power failure, so at least with those two common problems we can actually operate. There should be no reason why we cannot process customer transaction right now. We should minimize or achieve zero rejections for our customers. Sophos: What if cellphone sites are down and there are no signal? Roundtable Discussion Page 3

Disaster Recovery and Business Continuity


Ong: Actually they have backups. And we have three telecom networks in the country so backup your phone with a back-up. Computerworld: When a disaster occurs, to what extent are the formal security measures that are normally practiced? To what extent are they compromised when you go into a disaster mode? Frias: In the past experiences I had, it still applies. It still boils down to the people who have to do the access. For example, when a disaster strikes, we need to do a back-up not just incremental; we need to know how many hours we can pull. If the person to do that is not available, there are documentations and a back- up person. We should have that always. Delos Santos: In handling security during a disaster, we have to consider two items physical security and systems security. In physical security, there are predefined people who have access to certain places of the organization, for example the data center engineers have access to the data center with proximity cards. Not everybody can access the data center or the disaster recovery site; physical access depends on the roles of an individual. Issues arise when the supposed assigned staff is not available and a new staff will be assigned to handle the role. The systems security should be assigned properly. IT staff who are assigned to the disaster recovery site should have the necessary access rights and skills when a disaster recovery is invoked by the authorized people. Vergara: In our case, since the back-up facility is essentially mirrored double action, the same security measures would be appropriate there. Although it is admittedly in a disaster situation, if there are a lot of things that you dont normally do, there will be some shortcuts. Then it becomes inevitable to grant some extra access just to make things work faster. Whats important is to try to normalize the operations very fast so that you can implement the normal SOPs. And then, whatever accommodations or whatever extra power was given to the disaster recovery team would be withdrawn gradually afterwards. Ong: We consider security as very important. Since we communicate a lot in SMS, we have security measures in place for that. Mobile phone numbers are pre-registered in our system. Computerworld: Do you actually do a full company-wide test or small scale representative test? How do you go about the testing of this DR? Vergara: Yes we do. We do it bank wide once a year, at least, and conduct quarterly technical test. We make sure all the systems that run in all the redundant systems run technically and then we make everybody come in. We make all employees rehearse at least once a year. Delos Santos: What we did was identify the critical components of the business and then identify the critical system components needed to support the business. Based on this assessment, a disaster recovery plan needs to be put in place. We have identified the critical systems in Insular and created a replica of these systems, applications and databases in the Roundtable Discussion Page 4

Disaster Recovery and Business Continuity


disaster recovery site. Once a year, we simulate a disaster recovery where a few sets of representatives from identified department/users test the DR integrity. Frias: In this regard, I cannot speak for Eastern. But in the past, like when I was still in the BPO, we really need to test because thats part of the requirement. The problem is when we test, all the people want to know about it first. And so when we did a surprise test, the gap was very wide. Thats why we really need to go back to basics, manage expectations from the people, like whos going to react supposedly. Ong: With EPCI, weve been doing quarterly tests on the technical side and then once we involved users, we normally do it during off peak, like at least once a year to make sure that our systems are ready. But now, here in Lhuillier, since our system is a newly developed system, what we did was we included in the script some conditions about disaster or what-if scenarios. We are actually doing some testing; like if our data center goes down for instance, where can we operate? So we already tested that. For future implementations probably next year, I will do what we did in EPCI. Vergara: One thing about testing, since its a big effort to do it company-wide, you can only test one scenario a year. But how can you simply predict so many scenarios of disasters? So going beyond the rehearsal, and I mean definitely you have to test once a year. You also have to have a good communication plan and for those disasters that are unexpected like the Ondoy time when purely our Marikina and San Mateo branches were affected and the rest were operating. What is probably more important is a good communication plan so you can immediately mobilize resources and address that particular unique disaster situation. Then, again, on the communication plan, even that can get into problems, like there was in Milenyo extended power outage, and even the cellphones could not function anymore because nobody could charge their cellphones. Even that needs to be a factor in the communication plan. Computerworld: Are there any other huge challenges going into a disaster recovery plan? Ong: I think the usual challenge for DRP solution is cost. So if you create a replica of your data center, thats already expensive. Since I had the opportunity to develop our data center from scratch, I made sure that we will help concentrate the DRP solution by actually implementing it on a per branch basis. So in case of a disaster, our branch can operate. When I was in EPCI, cost is too expensive because having another mainframe is very expensive. But I needed to invest disaster recovery for banking since there will be a very big issue if you encounter problems in passing transactions. Vergara: Another challenge is getting the business units to be more disaster-conscious. Getting the business units more awareon what could potentially happen if theres a disaster, how they can react to a disaster and what kind of infrastructure they need to recover in a desired recovery time for a particular area of operationsis where it all starts. They become more aware, so they are more willing to invest, not just money, but also time, in preparing for the disaster. Roundtable Discussion Page 5

Disaster Recovery and Business Continuity


Delos Santos: Insular is turning 100 years next year; we have undergone several disaster recovery testing and are actually embracing it. The real challenge we have now is the plan to implement a 247 web-hosted applications where downtime needs to be very minimal. We all know that as we host an Internet service, this should be available online and downtime needs to be transparent to the customers. This will require a disaster recovery plan that will allow us to divert traffic into an alternate site when the primary site is not available or undergoing maintenance. Our challenge today is the set-up of exact replica of the primary site to the DR site and this may require real-time replication of data between the sites. Frias: For us, for every IT project we implement, we make sure there is a driver that meets all the requirements. Me as a technologically-savvy person, I want everything to be high-tech and new, but of course we need to justify the costs first. In order to do that, we need to find the business requirements, and do an assessment about the impact of our investments. One of the challenges is convincing the management on what type of BCP or DRP strategy we would like to go to. The next challenge would be changing the peoples mindset, especially the Filipinos bahala na attitude. We should change that, especially if we are running a very critical business that services a lot of users. We need to ensure that they are equipped and critical of the continuation of our business. Computerworld: Prior to the recent calamity, how difficult was it to convince management on disaster recovery spending? Rate it on a scale of one to ten, one being the easiest. Frias: Well, for telcos, the problem is we need to change the mindset from having a structured plan, wherein people will just contribute, into something where we would look at the people who is contributing, not just the structure that you have. Delos Santos: On our side, there is really not much difference now and prior because what we do is review the current architecture, identify the risk associated with the possible downtime and then create a plan to address the identified issues independent of all other issues surrounding our environment. Vergara: For banks, there is a requirement imposed with regards to the maximum out time. That has always been there, so the mindset is there already, and the management has really provisioned our infrastructure to achieve that high availability. Admittedly, the more disasters there are, the more awareness there is. I will not discount that they become more participative in the business continuity planning, but I dont see any difference between the pre-Ondoy and postOndoy times, in terms of acceptance of the need to do BCP. Ong: In terms of justifying the cost of business continuity, I think its not that difficult anymore because what I learned from my previous company EPCI is that we just need to make them realize what will be the opportunity lost and what will be the cost involved if we will be down for a day. So I normally quantify when justifying the DR solution. I will often tell them, If you are down for one day, this will be our loss.Delos Santos: The key really is to quantify loses in Roundtable Discussion Page 6

Disaster Recovery and Business Continuity


cases of disaster. Once we are able to show equations on the impact of disaster to our business, decisions could be made easily. Once the management realizes the impact and the risks of downtime in the business, they themselves will be supporting the IT Team to create a disaster recovery plan and support it with a budget. Computerworld: Can you give a percentage of how much of your IT budget actually goes to these technologies? Vergara: If the principle is full availability of your critical systems, then I would say that for the infrastructure purely, because thats where the redundancy that needs to be built, then its a one to one correspondence with the backup. You dont have the same one to one correspondence with software because you dont need to pay for the software licenses in the backup site. If you are not using it in a warm mode anyway, you also need to take off the equation of IT manpower costs, because these are the same people anyway. Ong: I think I can grade it probably on the data center side, around 20% to 30% because we are expecting a degraded mode once we shift to the backup site, but at least we get faster transactions. Delos Santos: It varies. Initially for this year, the budget is significant as we are still starting to put up new infrastructure for disaster recovery, maybe around 15%-20%. In the following years the cost of maintaining it will be around 5% to 7%. Frias: Sad to say, theres very minimal budget for DRP and BCP. Actually, its not identified as a concrete item, so that is something I have to do by early next year. I need to justify unbudgeted projects just to keep it going. Computerworld: Whats your companys requirement for DR technologies? Whats your ideal DR technology or your advice in choosing the right DR technology? Delos Santos: The implementation of DR technologies depends on companys direction. Terms like RPO (Recovery Point Objective) and RTO (Recovery Time Objective) needs to be considered. If we need a setup that requires a very quick RTO such as recovery within 10-30 minutes with RPO such as the last transaction, a real-time replication may be required. This may be an expensive setup. While a slower recovery (RTO) such as within the 5-10 hours and Recovery point of last 12 hours will significantly require lower cost implementation. Frias: I have to justify that now, and I guess it all boils down to the person who is handling that project to identify the different loose ends and then have management approve the level they desire. Thats only the time youll look into the technology side. Ong: In terms of ideal DRP, we need to protect the main asset, which is the data. So, Id want to make sure that our data is replicated and that we will not lose our data during a disaster. Of Roundtable Discussion Page 7

Disaster Recovery and Business Continuity


course, it should be cost-effective, because in Lhuiller I dont want white elephants. I want to use it for other purposes. Vergara: For me, one of the nice things is having real-time backup, and thats one of the things we have in terms of disaster recovery. Also, it requires a lot of bandwidth. With our volumes, we needed a dark fiber link between our primary and disaster recovery sites, so thats almost unlimited bandwidth already. But thats a requirement if you need real time and replication. Aside from cloud computing, Im also happy about other technologies such as virtualization. If you want to limit the expense in disaster recovery sites, you can use the virtualization technologies, which really helps in terms of setting up the secondary site because you dont need to provision full server.

Roundtable Discussion

Page 8