Sunday, August 7, 2011

Building a Cloud Factory


Few areas of human endeavor can match the pace of change in IT. Even by IT standards, the change being driven by cloud computing sometimes seems surprising. To refer to a virtual environment that has only recently been deployed as "legacy," as some organizations are now doing, underscores the fact that the only thing constant in the data center is change. To deal with change of this magnitude, which can involve transforming the workload hosting model of an entire organization, some industrial-strength thinking is required.

In order to tackle this challenge, it's important to properly frame the cloud transformation problem. Many associate cloud with agility, flexibility, cost transparency and other end-user-oriented benefits. But many of these attributes are primarily associated with new infrastructure requests, and specifically, the use of self-service portals to "spin up" infrastructure to host new applications or host transient processing demands. 

When it comes to migrating hundreds or thousands of existing workloads into cloud infrastructure, agility is not a benefit that is typically experienced. In fact the opposite is often the case: because clouds require a higher degree of standardization (i.e., a finite catalog of sizes and software options), migrating existing physical and virtual servers into cloud models can actually be quite difficult. In other words, the very features that make clouds agile for new workload deployments can actually make them less agile from a transformation perspective.

This is where the notion of a factory comes in. In industrial processes, factories are the epitome of scalability, repeatability and productivity. Although they may take some effort to "tool up," once they are up and running they can handle a higher flow of activity, efficiently processing inputs to provide consistent output. This notion is also key to large-scale transformation. By applying a common approach that has been properly engineered to give repeatable results, organizations can greatly reduce the time and effort required to migrate to cloud infrastructure.

Within this concept, it is important to expand on what is meant by "properly engineered." Many organizations tackle these kinds of problems from a grassroots perspective, using spreadsheets and smart people to determine action. The problem with this approach is it rarely evolves to the point where it can generate truly accurate answers, mainly because the problem is too complex. Migrating workloads into clouds requires processing volumes of historical data, analyzing configuration information on the servers and applications being migrated, modeling target instance sizes and software stacks, enforcing corporate and regulatory requirements, honoring SLA and data protection rules, etc. Spreadsheets are not well suited to this, in much the same way that they are ill suited for use as corporate accounting platforms. Even if they can be coaxed into giving a decent answer for simple environments, they will not generate the reports needed to satisfy stakeholders, management, engineering, operations, etc., all of whom need significant detail surrounding the decisions being made in order to ensure benefits are achieved and risk is minimized.

Buried in the list of migration analysis requirements is a key concept linking them all together. This is the notion of policy, which represents the ground rules on how workloads should be hosted, where they should and should not go, how much resources they should be allocated, etc. Without properly modeled policies, hosting decisions are left to the practitioner performing the migration, and it can be hit-or-miss whether they do the right thing (or even follow the same policy twice in a row). Planning and managing cloud infrastructure without proper policies is like trying to fill out a tax return without instructions - there are just too many variables to get it right.

With all of these concepts in mind, the exact nature of the cloud factory becomes clearer. It divides the problem into a series of logical steps that combine data, target models and cloud planning and management policies in order to automate the process of deciding exactly where things go and how big to make them. These steps that make up the factory are:

Candidate Qualification: This process determines whether a given set of workloads are suitable to be hosted in a given cloud environment. This is both qualitative and quantitative in nature and designed to separate true candidates from the workloads that are better suited to go elsewhere (more on this later in step Examples of quantitative criteria include maximum I/O rates, context switching limitations, maximum CPU and memory sizes, etc. Qualitative criteria include data sensitivity, SLA requirements, backup strategy and other considerations. By applying a policy capturing all of these factors, a rapid and accurate assessment can be made.

Sizing: This takes the qualified candidates and determines what cloud instances are best suited to host them given their historical levels and patterns of utilization. This again is subject to policy, which governs how much history is considered, target utilization levels, etc. The result is a detailed specification of the instance sizes needed and the projected utilization levels in the "to be" environment. Note the use of benchmarks is critical in this step, as the translation of CPU utilization from the current environment to the cloud depends on the relative speeds of the CPU employed in each.

Load Balancing: Also a sizing step, this is focused on the load balancers and clusters being migrated. Because cloud environments offer different sizing options, and can even offer more advanced "elasticity" features, it is not always desirable to do a straight one-to-one translation of these servers into cloud capacity. For example, an 8-way IIS cluster might translate onto 12 smalls, 6 mediums and 3 large instances. Of these options, the one that meets the policy criteria (e.g., size for yearly peak activity, allow for N+1 resiliency) at the lowest cost will be the winner. This result is combined with the general sizing results from the previous step to provide a complete sizing plan.

Software Stack Mapping: This step considers the OS and software configurations of the source servers and maps them onto the "closest" configuration available in the cloud. Because cloud catalogs only offer a finite set of software options, this is effectively a standardization analysis. For Infrastructure-as-a-Service (IaaS), this step is typically limited to the OS-level configuration and matches the OS attributes of the existing servers and VMs to the operating systems that are on offer in the cloud (which is typically a much shorter list). For Platform-as-a-Service this step also includes scrutiny of the actual software inventory and applications installed. The result may say "server X looks the most like an IIS v6 server, but differs from the standard image in the following ways..." This not only provides the optimal stack to deploy, but also generates a remediation list that is critical for reducing risk during implementation.

Placement: Once the final specification is arrived at (through sizing, balancing and software mapping), the next step for internal cloud environments is determining exactly where the workloads should be placed in the infrastructure actually hosting the cloud environment. Because most clouds are based on virtual environments, the key is to fit the new VMs into the environment in a way that optimally leverages server resources. This step looks somewhat similar to placement of workloads in virtual environments (which tends to resemble placing Tetris blocks in available server capacity), but the policy regarding overcommit has a large influence on the resulting placements. If the policy is to strictly reserve the capacity for each cloud instance, then the environment will be very safe but relatively inefficient, as the workload density will be quite low (think of playing Tetris with the blocks wrapped in bubbles). If the policy is to fully overcommit resources, then the end customer may have a higher risk of contention if they place unanticipated demands on the environment, but the higher density that results can result in significantly lower costs (think Tetris blocks packed tightly together, requiring far less capacity).

Exception Handling: Going back to step 1, there are typically components of an application or business service that may not be suitable for hosting in the cloud. For these systems, it is necessary to evaluate other hosting options in order to determine what to do with them. Because there is often an order of precedence with respect to the hosting options, this step involves the systematic qualification of the rejected workloads against an ordered set of hosting strategies. These strategies can include using cloud instances with customized allocations, using dedicated cloud servers, hosting in a virtual environment, using dedicated blades, using dedicated rack mount servers or leaving the workloads alone (a last resort). By passing the rejected candidates through this gauntlet of options, each will arrive at a viable outcome.

The result of applying these steps is a methodical, exhaustive and rapid process for planning cloud migrations. By taking a data-centric, policy-driven approach, fewer mistakes are made, less rework is required, and application owners and other stakeholders will have much higher confidence they will arrive on the other end unscathed. This transparency, combined with the detailed specifications and implementation details that emerge, can rapidly accelerate cloud initiatives. This not only reduces time-to-value, but also enables IT organizations to keep up with the pace of technology innovation, which shows no sign of letting up.




Monday, August 1, 2011

Cloud Computing: The Importance of Community Clouds


Community Cloud is the cloud infrastructure that is shared by several organizations and supports a specific community that has shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be managed by the organizations or a third party and may exist on premise or off premise.
With that in mind, we are yet to see larger community clouds across industry verticals.  However Community Cloud adoption will be a key success factor for many industries which are at cross roads worldwide.

Some of the reasons mentioned below will prove why Community Clouds are important in the current volatile global economy.

Cooperation Among Competitors in Specific Areas Is Not New
Automotive Composites Consortium (ACC) (a Partnership of DaimlerChrysler[formerly Chrysler], Ford and General Motors) is started with a Mission to conduct joint research programs on structural and semi-structural polymer composites in pre-competitive areas that leverage existing resources and enhance competitiveness.

The HR-XML Consortium isthe independent, non-profit, volunteer-ledorganization dedicated to the development and promotion of a standard suite of XML specifications to enable e-business and the automation of human resources-related data exchanges.

BEA Systems Inc., IBM Corp., Microsoft Corp. and TIBCO Software Inc. worked together on Web Services ReliableMessaging (WS-RM) specification, comprising both protocol and policy assertions, to the Organization for the Advancement of Structured Information Standards (OASIS) for further refinement and finalization as a Web services standard.

Health Level Seven International (HL7) is a not-for-profit, ANSI-accredited standards developing organization dedicated to providing a comprehensive framework and related standards for the exchange, integration, sharing, and retrieval of electronic health information that supports clinical practice and the management, delivery and evaluation of health services. HL7's 2,300+ members include approximately 500 corporate members who represent more than 90% of the information systems vendors serving healthcare.

Community Clouds in no way will hinder the healthy competition between various vendors, but rather facilitate better customer service, and innovation towards new products and services.

Global Issues Affect All the Competing Organizations and Not Just One

In a globalized economy certain natural disasters and other events like war affect every organization in a particular market segment and not just one specific organization. This emphasizes the need of community clouds so that these issues like disaster recovery, business continuity can be tackled together.

Recently all of the major global automobile industry OEMs, such as GM, Ford, and Toyota, have to slow down on their manufacturing operations due to earthquake in Japan.
Stringent Federal, EU and State Compliance
The USA's Environmental Protection agency and more specifically California Air Resources Board (CARB) have put very stringent vehicle emission standards which all the major automobile manufacturers have to comply with.

The GHS is a common and consistent approach to defining and classifying hazards, and communicating hazard information on labels and safety data sheets (SDS). The European Union has implemented the United Nations' GHS into EU law as the CLP Regulation.

HIPAA (Health Insurance Portability and Accountability Act)  regulates the availability and breadth of group health plans and certain individual health insurance policies.

The above are few examples of compliance standards enforced by several Union, Federal and State agencies on the various industry verticals.