In this post, I want to share my thoughtson Recoverability, actually my approach that I use in Recoverability Assessments.. Actually, these types of assessments are comprehensive, they even include DR solutions, trainings, recovery processes and the continuous availability.
I start with the readiness.. Readiness in 3 different areas : People, Process and Technology. I review and rank the readiness for key areas that are enablers for availability, resiliency and recoverability by assessing current IT capabilities of the customer.
Once I generate the readiness documents, I do my analysis, determine the gaps and then present my recommendations. I support the customer in execution as well.. ( if they need me there..)
So it is pretty straight forward, but still requires lots of efforts :)
The assesments starts with the information gathering. I just gather the detailed information and do my analysis for a number of attributes in the following areas;
Operational Staff, Response Plans, Recovery Testing, Program Maintenance, Business Expectations, Production & DR Facilities, Application Infrastructure, Data Restoration and Recovery Network.
During this first phase, we usually meet with the customers. I write down the people, process, and techonology findings. Then, we popuplate tool based discovery reports ( DB , Server, SAN healtchecks, Server grabs & logs etc..)
In the second phase, I create a recommendation list. Next, I do the remediation roadmap, finalize the recoverability assessment document and lasty I give the final recoverability assessment repsentation (an executive presentation actually)
While analyzing the people and process findings, I check to see if there any any gaps in the following areas; business expectations, production & DR facilities, Application Infra, Data Restoration, Recovery network , Operational Staff, Reponse plan, Recovery Testing, Program maintanence and etc..
Following is an example of the GAPs that may be found in the Recovery network ;
No formal DR program
DR requirements unknown
Lack of formal documented policies or processes
Following is another example of the GAPs that may be found in Data Restoration area;
Lack of service levels with the business
No formal tiering structure
Recovery RTOs / RPOs have not been defined
Lack of recovery expectations
These are big gaps :) and they are here just to give you some examples, but I guess you understand the scope of the work already..
In the technology analysis phase, I analyze the following layers through the following critieria;
Presenation layer, Login/application, Database, Compute, Storage, Network --> Production HA, DR, Backup, Archiving.
Some examples for the technology findings in this phase;
A single point of failure (SPOF) exists which would cause a complete outage for the application.
Server configuration is not aligned with the intended high-availability design (cluster is misconfigured).
Well, after the findings, I create a recommendation matrix, and summarize these recommendations..
I analyze the recommendation from the implementation effort and business impact perspectives and then create a matrix to show the risk level / business impact and implementation effort of each recommendations.
Business Impact goes low to high when you go upwards in the y axis, effort goes high to low when you go right in the x axis.. So, action items/recommendations in the top right quadrant are given high implementation priority, due to low effort & high impact. So you get the idea..
As for the redmediation options, I give the as-is Architecture, then propose target solutions by considering/analayzing the gaps. There may be more than one solution proposed as part of the Gap Analysis against Recoverability Business Requirements .
Finally I create thre recoverabiliy roadmap and that's it :)
In the recoverability roadmap, I start with the areas of opportunities and build a 18 Months plan. (maybe further) . I list the actions that should be done in near term, in 6-12 Months and in 12-18 Months to reach the target state where we usually have the following;
Increased ROI
Standardized Environment
Ensured Recoverability
Recoverability and continuous availability services aligned with the business needs
Operational Excellence
Organizational stability
Culture of Ensured availability & DR Services.
That's end of this post. I hope you find it useful.
If you need any advice or consultancy, feel free to contact me.
No comments :
Post a Comment
If you will ask a question, please don't comment here..
For your questions, please create an issue into my forum.
Forum Link: http://ermanarslan.blogspot.com.tr/p/forum.html
Register and create an issue in the related category.
I will support you from there.