Monday, June 15, 2009

When Lightning Strikes

Wed June 10th, 6:30 PM PST: A lightning strike damages Power Distribution Units serving a set of racks hosting Amazon’s EC2 service.

6:30:05 PM PST: Your business transactions start failing.

7 PM PST: Your iPhone rings.

You thought that since your engineering teams were moving to "THE CLOUD," Your systems were finally going to be more reliable, more trustworthy. Finally, the much needed relief in your already over-extended workday!

But the reality is that no matter where you run your business systems, what underlying technology you use or what controls you put in place to ensure reliable business, there will always be incidents and unforeseen events that are out of your control.

Moving pieces of your application and infrastructure to a third-party hosting environment or leveraging third-party services directly within your business applications will mean even less control. A quick glance at http://status.aws.amazon.com tells you that even the best Cloud providers are only human. Service disruptions remain commonplace, no matter if they result from freak weather conditions or good old-fashioned configuration errors.

As your application evolves and as your data center turns into an amorphous cloud (no pun intended), you need to be prepared for damage control.

From a transactions standpoint, you need watch every single transaction and make sure that your iPhone rings within seconds of a disruption, not minutes or hours. In the real-time economy, every second lost equates to lost revenue.

You’ve got to be able to immediately identify which transactions failed, how many transactions failed, which consumers were affected and more, so that corrective procedures can be put into action. It will no longer suffice to simply let the business know that their transactions were disrupted!

Finally, it would help your case to negotiate strict SLAs with your service provider and establish a strategy for monitoring and documenting real-time compliance. In the event of a disruption--even if it’s too minor to be counted as a disruption by your provider--be prepared to furnish evidence and hold them accountable for the losses your business incurs.

Thursday, June 11, 2009

Plugging Gaps Instead of Creating New Ones

Joe McKendrick cites interesting observations in his latest post. Ovum reports that enterprises may be creating governance silos within their organizations – silos that handle SOA as a special case rather than managing the overarching software development lifecycle.

Most organizations already have well established SDLC processes. Instead of expanding and tweaking existing processes to encompass the development and delivery of the new breed of services-based applications (SOA, Cloud, BPM, whatever), they sometimes buy into the vendor hype and purchase a "SOA Governance" technology just because they think they need it to be successful.

True, new technologies may be useful in expanding existing processes, but governance-in-a-box seldom works. If the tools you buy don’t play nice with the established behavior of your IT staff, all you will get is resistance. Everyone is way too busy to adapt and change. They can adopt however, if the cost of adoption is small and rewards are large. Before buying yet another governance tool, try thinking about Application governance, not just SOA governance.

You’d expect application lifecycle management vendors to broaden their suites to encompass the requirements in SOA governance. However, most of their efforts so far appear to be marketing and rebranding excercises instead of true product expansions.

If you believe in governance, you already have a bit of it – your software engineering lifecycle is fully capable of handling the new needs for distributed computing. Of course, there are gaps to be plugged. So let’s focus on the plugging those instead of creating a whole new gap!