Has the SLA had its day.

Article first published 16th July 2017

I was first introduced to IT Service Management back in 2002. The concept of the SLA and availability targets was fairly straightforward with the service components having a supplier given availability figure and the overall service target being a result of multiplying the service elements out. A recent assignment has significantly challenged my view of this. If you are interested in my viewpoint, please read on.

Now for me, this was probably a late to the game “eureka” moment but when you are elbows deep in the day to day delivery of service it is sometimes difficult to step back and see the changes around you. My assignment involved a request to create a new set of SLA’s as a new IT Director wanted to quickly understand the portfolio of services.

The scope was agreed as a full end to end study. Taking the commercial agreement to the end client, mapping the internal operational systems and identifying the service elements then carrying out the traditional mapping of support and support hours against the operational hours. All of that was straight forward but the first step of reviewing the client contracts raised an interest observation which to be honest when reviewed in the cold light of day was obvious and did pose to me the basic question of the validity of the system availability driven SLA.

What was the new variable that challenged the foundation of my ITSM compass? It was quite simple. The majority of the commercial contracts had limited reference to systems, system uptime or availability. Quite simply the majority of the contracts now referenced “outcomes”. Two examples are as follows:

  • All orders received into the suppliers system prior to 17:00 where flagged as a next day delivery to be processed in time for the final planned transport pick up in order to fulfill the next day delivery criteria
  • Order confirmation and order dispatch messages to be received back into the customers EDI gateway no longer than 15 minutes after the corresponding action is taken in the suppliers WMS system

So what has changed? The basic premise of the old availability approach to the SLA was purely about the fact that the system was “available”. I am sure when the concept was drawn up it was “good enough” to give both a level of re-assurance that IT was taking the internal customer seriously and could “nail its colours to the mast” but also give a point of reference to conduct a service review. But as technology has moved forward and if for example in the retail and logistics world, the service delivery to the end user has become close to real time (who would of thought at the change over to the millennium the likes of Amazon would soon be offering a service proposition whereby you pay a fixed fee and can order a wide range of product whereby if you order by 5pm can be delivered to you the next day at no cost), a basic availability target no longer is sufficient.

Why is that? Well quite simple a standard availability figure does not allow for the constraints of time bound activities. Taking the first example (above) of this we can clearly prove how this is no longer suitable as follows:

  1. An operation works Monday to Saturday 06:00 – 23:00 (but the final transport pick up is at 21:00). Therefore the service window is 60 x 15 x 6 = 5400 mins per week
  2. System availability target is calculated as 98.4% (measured from example server target = 99.5%, network target = 99.75%, application support target based on P1 (90 min fix across 24 x 7) = 99.1%)
  3. 1.6% allowed unplanned downtime against 5400 minutes gives 86 minutes
  4. With an order cut off of 17:00 and a last order pick up of 21:00, the 4 hour window to pick, pack and dispatch is now reduced to just over 2 1/2 hours. The risk and potential penalty is now moved from IT to the operation. Unless a sliding application fix SLA is provided that reduces the P1 fix time to 30 minutes during that 17:00 – 21:00,the availability driven SLA no longer favours the outcome driven contract clause

In a similar way, if we take the second contract criteria of messages being delivered back to the customer system within 15 minutes, the traditional availability target SLA which allows 84 minutes of unplanned downtime clearly does not support that requirement.

The challenge then comes that if you accept the observations above as a basic principle, what are the alternatives?

The obvious move is to realign the IT SLA’s to the business outcomes but in doing so a number of factors such as those below may need to be considered;

  • IT need to have a change of mindset to align them closer to the operational contract. In doing so their is an inevitable risk that differences in operation culture could create natural obstacles
  • In order to prevent future challenges, IT need to be engaged as early as possible in any client contract negotiations
  • Traditional supplier SLAs may need to be aligned to deliver the outcome based approach
  • Concept of fix time to P1’s has to be removed as the measure is no longer time bound but measured in the success of “outcomes”
  • Systems may need to be designed with a greater level of resilience / availability, considering true high availability during change activates to support business outcomes
  • Service processes such as Major Incident Management and service report will need to be aligned

Whilst the SLA may not have seen its day, certainly in order to keep up with the increasing demands, I would suggest that the traditional service measurements may need to be revisited and replaced with an outcome based expectation.