Introduction companies increasingly are looking to the cloud to help deliver It services. Many have already moved email, sales force management, and other applications to Software-as-a-Service (SaaS) cloud service providers. And now there is growing interest in infrastructure services to cut costs and to offer companies the agility to be more responsive when business conditions change.
However, companies have been hesitant to move production and business-critical applications to public clouds due to concerns about performance, reliability, and security. recent outages of some cloud providers’ services, which knocked out major sites such as Instagram, Netflix, and Pinterest for hours or longer, confirmed their greatest fears. A most telling comment about the recent outages came from the Wall Street Journal. It noted1 that “[what we’re seeing] is the rapid expansion of a big new industry that is still in its shakedown phase—finding and fixing problems.”
this should be a cause for concern for any company that wants to run production applications in the cloud or use cloud services for other vital business operations such as backup or disaster recovery.
However, the point to keep in mind is that not all cloud services are created equal. to run production applications, organizations must select a provider that offers high availability and uptime, backed up with service level agreements.
tHe Need for HIgH AvAIlAbIlIty ANd uPtIMe
tHe Need for HIgH AvAIlAbIlIty ANd uPtIMe 2
tHe Need for HIgH AvAIlAbIlIty/uPtIMe the use of public cloud services for SaaS is widely embraced and for infrastructure needs it is slowly gaining momentum. A survey2 of 600 global enterprise and mid-market companies published in early 2012 found that 27 percent were using public cloud Infrastructure-as-a-Service (IaaS) solutions. that was 10 percent higher than what was found in a similar survey published in early 2011.
to date, the main uses of SaaS and IaaS have been for everything but mission-critical applications.
organizations are typically using SaaS for email, calendaring, and internal collaboration. they use IaaS for test and development, to build and deploy new Web-based applications, and to move non-critical applications that are not subject to regulatory requirements off of expensive to maintain on-premises data center equipment.
Another area where both SaaS and IaaS are finding acceptance is when a business unit needs to respond to a new opportunity and doesn’t want to wait for It to provision internal resources.
However, most organizations are taking a cautious approach when it comes to using the cloud for key business applications.
the reason: businesses run 24/7. employees want to check mail, schedules, and share information around the clock. the global nature of organizations, their supply chains, and customer base means systems must be available at all times.
While downtime might be acceptable to the general web surfer, businesses cannot afford that luxury. If a customer tries to check his account or place an order and finds a site or application down, he can easily take his business to a competitor. Workers who cannot access key applications lose valuable time and the company incurs mounting lost productivity for each lost minute or hour.
the costs can quickly add up.
regardless the source of an interruption in availability, the end result is the same. Namely, business suffers. one recent uptime study3 found that businesses lose an average of about $5,000 per minute ($300,000 per hour) in an outage. In several industries, downtime can be even more costly. Past studies have pegged the cost of an hour of downtime at $1.1 million for a retailer and up to $6.48 million per hour for a brokerage firm.
Making these numbers particularly worrisome is that many businesses routinely experience outages. In 2009, dunn & bradstreet found4 that 49 percent of fortune 500 companies experience at least 1.6 hours of downtime per week. that translates into more than 80 hours annually.
businesses obviously need to take steps to avoid costly downtime, lost productivity, and regulatory problems. And if an outage does occur, service recovery must be rapid. All of these points are amplified when production and mission-critical applications are involved.
one recent uptime study3 found that businesses lose an average of about $5,000 per minute ($300,000 per hour) in an outage.
tHe Need for HIgH AvAIlAbIlIty ANd uPtIMe 3
evAluAtINg A ProvIder for production applications used to run a business, not all clouds are created equal. organizations need to select a cloud service that offers availability and uptime characteristics that match its applications’ tolerances for downtime.
Immediately, organizations will find that there are not only great technology differences between providers, but there are also variations in operational procedures, responses to problems, and the way security is handled.
So what are the key characteristics to look for in order to run production applications in the cloud with the assurance they will be highly available?
Essential infrastructure elements: first, look at the cloud provider’s architecture. does the provider employ an enterprise architecture with built-in resiliency and security designed for production workflows? Are there multiple connectivity paths to the provider’s site? If a site goes down due to a power failure, natural or man-made disaster, how is it restored and how fast?
Is there more than one site? If one site becomes inaccessible or its services are unavailable, do workloads automatically failover to a second site without disruption? Can you load balance between sites?
All of these points are important. A properly architected provider infrastructure can help ensure minimal service disruptions.
dig a layer deeper in examining the infrastructure. Are the services based on open source solutions or best-in-class solutions? Who are the provider’s technology partners? these issues can become important when problems arise. If the provider has strung together services based on open source solutions, its technical staff will have to go it alone when trying to troubleshoot problems and restore services. on the other hand, a provider that partners with technology leaders will have the expertise and help of those companies when trouble occurs.
Next, find out about the provider’s operational procedures. What happens is something fails? Suppose a blade server crashes. Is restoration automatic? How long does it take? Are the oS, application, drivers, and data restored instantly or does the server have to be rebuilt from scratch?
What if a site goes down? How does the provider handle routine problems like cut cables or power outages? Are there redundant line feeds from different telecom providers into the data center?
Major provider sites would naturally have onsite backup power generators. but how often is that generator and its ability to automatically kick into action tested? this may seem trivial; one might assume providers routinely test their backup power solutions in real-world scenarios. don’t assume. A recent widespread and prolonged outage of a major provider’s services was due to the failure of a data center to switch over to its backup generators.5 this eventually drained the center’s uninterruptable power supplies resulting in shutdown of all site hardware.
How are the data centers staffed? Is there someone on site 24/7? What are the provisions in place to bring staff in when problems arise?
Hosting key business applications and their associated data in the cloud means organizations also must take a deep look at the provider’s approach to security. How does the provider handle system and physical security?
A properly architected provider infrastructure can help ensure minimal service disruptions.
tHe Need for HIgH AvAIlAbIlIty ANd uPtIMe 4
on the system security front, does the provider follow best practices for protecting systems from malware and cyber attacks? Many government and industry groups (such as ISo and NISt) have developed security guidelines and frameworks. Which ones does the provider follow?
How does the provider handle physical security? How is the data center secured? Who is allowed access? What mechanisms are in place to ensure only authorized staff have access to the server room? Are the server racks locked down?
Procedural aspects: Many organizations need to abide by regulations on availability, data protection, and data privacy. How does the provider handle these issues?
look at the provider’s governance, risk, and compliance strategies. What security practices are followed to protect production workflows?
for organizations that operate in regulated industries, how does the provider address and assist with compliance with regulations such as PCI-dSS (Payment Card Industry data Security Standard), HIPAA (Health Insurance Portability and Accounting Act), Sarbanes-oxley, and many others? And to help ensure compliance, what auditing practices does the provider follow?
do the provider and its staff have the needed certifications to carry out the security procedures and guarantee that regulatory standards are met? Increasingly, one measure of a provider’s security and regulatory fitness is having SSAe (Statement on Standards for Attestation engagements) 16 type II certification. this certification, related to auditing, is typically requested if a service provider’s services affect the financial statements of another company. Most publically traded companies require that their service providers have this certification.
Availability issues: Production applications have little tolerance for downtime. to run such applications on a public cloud service, organizations must have contractual guarantees on availability and uptime.
When evaluating cloud service providers, check to be sure its application and infrastructure service level agreements (SlAs) match the characteristics of the production applications that will be using the service. And make sure you understand exactly what the service level agreements cover and what points you are responsible for.
take a deeper look at the provider’s infrastructure and procedures. How does the provider address resiliency in its infrastructure? Are there multi-site options for high uptime of mission-critical applications? Can the sites be configured so both are primary, load-balancing when all systems are fine and failing over with no downtime if services at one site are lost?
for applications that can accommodate some risk and downtime, what are the availability options based on the recovery time objectives (rto) and recovery point objectives?
When evaluating cloud service providers, check to be sure its application and infrastructure service level agreements (SlAs) match the characteristics of the production applications that will be using the service.
tHe Need for HIgH AvAIlAbIlIty ANd uPtIMe 5
basically, what it comes down to is that providers need to offer multiple layers of data and application protection and availability to run production applications.
SuNgArd AS your teCHNology PArtNer Many companies today are challenged by lack of It resources and technology infrastructure to support their critical business applications. Having reaped the benefits of public cloud services for many of their other applications, there is growing interest in leveraging the cloud for production applications.
but because these applications are so vital to the health of the business, the cloud service must have enterprise- class availability and uptime. that’s where Sungard can help.
Sungard’s enterprise Cloud Services is fully managed and built to meet the requirements of an organization’s most critical production applications. to deliver the services, Sungard uses a highly secure, enterprise-grade virtual data center infrastructure that delivers the high availability and scalability needed by businesses today. these services are backed with the security and experience Sungard is known for. to that end, Sungard offers multiple layers of data protection, with a wide range of recovery options mapped to the needs of each application.
Sungard’s enterprise Cloud Services is built on best-in-class vblock architecture, which is comprised of Cisco, eMC, and vMware technology. Availability is backed by service level agreements for both production and recovery environments.
for applications that require the highest level of uptime, Sungard offers Managed Multi-site Availability. this includes automated failover to a replicated cloud environment or fast recovery of a cloud infrastructure.
Sungard’s enterprise Cloud Services is delivered in data centers built to the ItIl v3 framework, audited under SSAe 16 type II, and certified to the ISo 20000-1 standard. Its cloud platform is built to support the regulatory requirements of PCI-dSS and HIPAA.
organizations exploring the idea of moving their production applications to the cloud can use optional Sungard consulting services including a Cloud readiness Assessment and Cloud Migration Plan to help assess and migrate an environment to the cloud and ensure a strong return on investment.
Additionally, Sungard’s Infrastructure-as-a-Service (IaaS) solution offers the expertise and services needed to move to key applications such as SAP, Citrix, MS SQl, MS exchange, Active directory, intrusion detection systems, and geographic load balancing to the cloud.
for more information about Sungard’s high availability cloud services, visit:
www.sungardas.com