A month in IT Service (our new offering)

March had been quite an interesting month and with so much going on it felt right to pop it all down on a page and get peoples thoughts on it. We are going to start with some website issues we have been involved with and then move onto the development of our new service line and tool evaluation.

So what’s been going on in the world of websites and why is that relevant to this blog? Well lets give you a bit of a back story to bring you up to speed. As part of my personal development I have always wanted to understand website design so having done rudimentary coding in PASCAL in the 1980’s as part of my A Level Computer Studies, I have always felt confident “giving it a go”. Now don’t get me wrong as a man in business myself I acknowledge the need for skilled people doing skilled jobs and if I ever needed a highly technical website I would send it out to the big boys to do but at present my website is a simple market place and I have also developed one for a local motorcycle club I am a member of. So where is this going I hear you ask? Well a couple of things have happened this month that spin into ITSM and are typical of what can be found in small businesses.

Firstly the club website showed a drop in traffic. We use Google Analytics to monitor visits and this week a single day showed no traffic from midnight to 12:00. So a review of the normal pattern of visits showed this to be exceptional so the support brain kicked in and a ticket was logged with the hosting company. In all fairness the service was very good and within 3 hours I had a reply saying that they could not see any server or network issues on my hosting instance and advised I contact Google to see if they had any issues with analytics collecting on that day. A few points from this. Firstly, I can’t validate my hosting companies reply as I have no monitoring tools on the site (something I am now exploring) but a check with some users indicated that they had been able to get on the website so that validated what they were saying. Secondly they had no reason not to disclose an issue. The hosting package has a declared uptime (which would have been broken with a 12 hours outage) but with no detrimental service credits to be applied they were not at risk of any significant penalty. The point of reflection was that the supplier in getting back to me (timeliness and detail) gave me confidence and reassurance but also the fact that I was able to initiate this quickly because I had the service contact details to hand proved that in the event of a more significant issue I could mobilise quickly and the supplier would respond.

My second journey into the world of website ITSM came from another local motorcycle club who were having different issues with their website. They had commissioned a rewrite but unfortunately did not have the contact information or authority to deal with the outgoing domains hosting company. This left them in a state of limbo as the established website could not be updated or taken down but held out of date information and was the primary website being ranked by Google. In this case a track back via Whois allowed us to identify the hosting company and the nominated domain holder and through an open engagement we are aiming to move things forward a bit.

So back to the first club website (the one I maintain) and off in a new direction. You see their is limit to self taught skill. It involves books and the internet and generally takes time when you want to do something different or have an issue. That’s OK when nothing is critical about the site. We don’t sell anything, its a place where people go to see where the next competition is, pick up the results and read an event report. Add in a couple of pages about venues and rebuild projects and that’s about it. Its built using bootstrap, is mobile friendly and uses Google analytics for some visit stats. The problem was we had a big event coming up where we were hosting a National round and I wanted to put some video on the site. Simple I thought, I have some video, I can use Google to find the code, easy! Nope, not easy. Why not? Well for a start I found 4 different versions of the recommended code and different views on whether the video should be MP4 or MPEG-4. The result was video’s that would work on desktops but not Android, so I tried something else and they would work on desktop and Android but not play on IOS. A nightmare! So was their a workaround? I know, upload to YouTube and use the embedded YouTube code but then during testing (another story), one club member said it would not run on his iPad. Was this a single user issue or a bug with one of the IOS versions I wondered. So what did I do?  Well after 3 evenings of frustration I reached out to my network of developers and within a few hours I have the code snippet, the right video format and a link to a conversation tool and after a bit of testing the issue was fixed.

Why is this relevant? Well the final part of this blog moves into a service offering we have been are developing. Each of these website issues (and in fact all of the tickets for our own business) have been logged on an ITSM tool. Within that we have a basic service overview, information relating to support contracts (e.g. the hosting companies email address / support number and a copy of the contract). When we had the problem with the analytics saying the site may have been down it was easy to get straight onto them as the information was available and a single point of contact managed the issue. Now in our case, none of these issues were critical, or impacted trade, profit or customer service but by managing them through an incident lifecycle we were able to control them, get good visibility of where we were and hopefully prevent them happening again.

Just imagine that in your own small business? A single support desk where you can log your issues and they take ownership of the problem, manage the supplier, keep you updated with the progress and report on how offen these things happen.

As part of this process we have been assessing two ITSM tools which based on the needs of the organisation, keep the costs down and outside of incident management (as outlined above) allow us to develop other offerings such as controlling system changes and releases and collating your repeating issues into problem statements and then driving out service improvements to remove them.

In order to refine this we are actively trying to identify two small businesses to assist us. We are specifically looking for the following:

1) A small business of between 10 – 50 IT dependent users. Ideally these will be spread over up to 3 sites (but they may be remote based users) and a range of 2 – 5 applications

2) An independent software developer who is selling their product via online market places such as Google Play who would benefit from a single point of contact for their users

So what is the deal? Well from us you get the following:

  • Set up of the Service Desk and all associated information to map your services into a support model
  • 3 months of Mon – Fri (09:00 – 17:00) Service Desk Incident Management
  • Calls logged via email or dedicated portal
  • Incidents managed to a target 4 hour response
  • Weekly service reporting

As this is an opportunity for us to develop our service we are working to the following pricing model

  • 3 month commitment from both parties. No lock in after that point.
  • Clear banded pricing model starting at 0 – 50 incidents per week
  • Significantly discounted against our expected price to market as a thank you for assisting in us developing our offering

If you have enjoyed reading this article and would like to discuss it in more detail or you would like to know more about our service offering, we would welcome your thoughts on this.

What can ITSM learn from Hurricane Irma

Article first published 21st September 2017

I recently had the opportunity to take a short break between contacts and booked a holiday in Cuba. Unknown to me this was about to coincided with Hurricane Irma (Sept 2017) moving from the Mid Atlantic across the range of islands. Out of every life experience I try to draw some learnings, especially in the ITSM space and this once in a lifetime event certainly taught me a lot. So, let’s look at what happened and draw the comparison.

We started our break in the Cayo Coco region, which quickly became known to be in the flight path. Once this was confirmed we were moved West, 9 hours across the island to an area with hotel capacity to take the 1200 people and outside of the known path. Once established there, the hurricane changed again and our hotel was put on hurricane watch and then full blown emergency procedures implemented as the eye was due to pass close by.

Whilst this is predominantly an ITSM reflection it would be wrong not to acknowledge both the Melia Hotel Managers at Cayo Coco and Valadero and our Thomas Cook holiday rep, people who demonstrated incredible professionalism in their operational ranks .

So, once it was evident that our evacuation location was now in the direct path, all residents were briefed and taken to a safe area under the hotel. This was a large area under the hotel which actually housed the kitchen prep and inbound delivery area. The Hotel Manager assumed the crisis manager role and introduced his deputy and the crisis operating model (in three languages) and then the carrier reps provided specific client updates through the duration.

Putting the crisis management element to one side, part of the focus of this article is on service. We were in the throes of a natural disaster with over 500 “paying” guests. It would have been easy to drop the level of service and yet throughout the incident food was continually provided, the toilets were cleaned and the holiday makers endured over 18 hours in basically an access corridor without any cross words and no alcohol! In fact, the ratio of staff to guests was around 1:5 (an observation I am planning to cover in a later article).

So, what compelled me to write this article? Well its quite simple, in ITSM we regularly talk about 3 things, service design, DR and Service / SLA’s but too regularly organisations do not take this seriously or do this well and I challenged myself to ask why? What is the under-riding factor?

If I reflect on the example of Irma it was clear on a number of things

1) There was a plan. This was not created a few days before. The level of service maintained and the level of execution could only be put down to strong planning and regular modelling / simulation

2) There was a clear mandate – protect life and continue to deliver outstanding service

3) The outcome (for point 2) was never in doubt. There was no hesitancy, management crisis or shortfall in delivery. Everything “just worked”

Now in reflection I would suspect two things ensured this happened.

Firstly, the event was life protecting dependent. Both the hotel chain and the holiday carrier had a duty of care. That was clear in the approach and a 99.95% outcome in this area would just not be acceptable. I would suggest that the service aspect was a secondary consideration but in our ITSM terms, all they decided was that the Vital Business Functions were set with a very high bar.

Secondly, they were used to the natural phenomenon. Whilst this was a Cat 4-5 Hurricane, lower categories are not uncommon, as are tropical storms, therefore at some point there would always be expectation to initiate the plan. I would suspect this was not a procedure given lip service in a management meeting or taken out of scope due to a cost or time over-run.

So why the reflection?

Each year, the UK media seems to report significant IT failures and I am sure on a global scale they occur just as regularly and yet in most cases the true root cause never gets reported. We, as ITSM professionals can speculate on it and I am sure our experiences of working in different organisations can cite system failures that should never have happened or should have been recovered much more quicker with a better level of service. In simple terms, feedback regularly tells us that we let our customers or users down.

Watching the recovery process during Hurricane Irma peaked my interest in a number of areas:

1) Do we approach any service with a mindset that our common “natural disasters” happen? In our world, this could be a cryptolocker type attack, a significant database corruption or a major datacentre / infrastructure outage. As a service function, we have a right to be the voice of doom and approach service with a mindset that our “Irma” is always out there

2) As part of service design do we challenge the architects to put the potential points of failure on the table and then have clear options to mitigate them. More than this, do we actually turn these scenarios into service metrics?

3) Once those are clearly identified and solutioned we then put our service hat on and take up our position of “is the contractual SLA good enough or as a collective is a higher-level outcome our goal”

4) Regardless of the answer to the above question, once that is agreed that becomes the recovery objective. In essence we reset the bar, if a 4hr recovery is our agreed outcome then failure to achieve this should be taken in the same mindset as the mandate demonstrated by the crisis team in Cuba (ie not an option).

Acceptance of point 4 above is not uncommon and I am sure that a lot of organisations go through the 4 steps above during major projects and implementations, the problem is they fail to turn them into an executable outcome that only has a guaranteed conclusion of success. By that I mean truly exploring the “what ifs”, getting the recovery process clearly documented, testing them and then throwing in a curve ball at the last minute! (on a complete side note, I recently watched the film Sully and it was interesting to see the “simulated outcome” turned around significantly once the “human factor” was added in, but that’s for another day).

Maybe it needs a change of perspective? Maybe if a life depended on it or if service was truly king then the focus on service design, recovery, continuity and ensuring that the customer continues to receive the same level of service in the face of adversity regardless of the cost would be more prevalent in the ITSM service cycle.

In closing, I suspect the real benefits of V3 (if you hang your hat on this model) as still not being realised and as an industry we sitting in a hybrid world of acknowledging V3 as a framework but working in a V2 comfortable manner, otherwise the elements of Service Strategy and Service Design in their new guise would be ensuring that services designed and implemented (certainly in the last 6 years since the 2011 refresh of V3) would be closer to the service excellence I recently experienced.

If you have enjoyed reading this article and would like to discuss in more detail, I would welcome your thoughts on this.

Has the SLA had its day.

Article first published 16th July 2017

I was first introduced to IT Service Management back in 2002. The concept of the SLA and availability targets was fairly straightforward with the service components having a supplier given availability figure and the overall service target being a result of multiplying the service elements out. A recent assignment has significantly challenged my view of this. If you are interested in my viewpoint, please read on.

Now for me, this was probably a late to the game “eureka” moment but when you are elbows deep in the day to day delivery of service it is sometimes difficult to step back and see the changes around you. My assignment involved a request to create a new set of SLA’s as a new IT Director wanted to quickly understand the portfolio of services.

The scope was agreed as a full end to end study. Taking the commercial agreement to the end client, mapping the internal operational systems and identifying the service elements then carrying out the traditional mapping of support and support hours against the operational hours. All of that was straight forward but the first step of reviewing the client contracts raised an interest observation which to be honest when reviewed in the cold light of day was obvious and did pose to me the basic question of the validity of the system availability driven SLA.

What was the new variable that challenged the foundation of my ITSM compass? It was quite simple. The majority of the commercial contracts had limited reference to systems, system uptime or availability. Quite simply the majority of the contracts now referenced “outcomes”. Two examples are as follows:

  • All orders received into the suppliers system prior to 17:00 where flagged as a next day delivery to be processed in time for the final planned transport pick up in order to fulfill the next day delivery criteria
  • Order confirmation and order dispatch messages to be received back into the customers EDI gateway no longer than 15 minutes after the corresponding action is taken in the suppliers WMS system

So what has changed? The basic premise of the old availability approach to the SLA was purely about the fact that the system was “available”. I am sure when the concept was drawn up it was “good enough” to give both a level of re-assurance that IT was taking the internal customer seriously and could “nail its colours to the mast” but also give a point of reference to conduct a service review. But as technology has moved forward and if for example in the retail and logistics world, the service delivery to the end user has become close to real time (who would of thought at the change over to the millennium the likes of Amazon would soon be offering a service proposition whereby you pay a fixed fee and can order a wide range of product whereby if you order by 5pm can be delivered to you the next day at no cost), a basic availability target no longer is sufficient.

Why is that? Well quite simple a standard availability figure does not allow for the constraints of time bound activities. Taking the first example (above) of this we can clearly prove how this is no longer suitable as follows:

  1. An operation works Monday to Saturday 06:00 – 23:00 (but the final transport pick up is at 21:00). Therefore the service window is 60 x 15 x 6 = 5400 mins per week
  2. System availability target is calculated as 98.4% (measured from example server target = 99.5%, network target = 99.75%, application support target based on P1 (90 min fix across 24 x 7) = 99.1%)
  3. 1.6% allowed unplanned downtime against 5400 minutes gives 86 minutes
  4. With an order cut off of 17:00 and a last order pick up of 21:00, the 4 hour window to pick, pack and dispatch is now reduced to just over 2 1/2 hours. The risk and potential penalty is now moved from IT to the operation. Unless a sliding application fix SLA is provided that reduces the P1 fix time to 30 minutes during that 17:00 – 21:00,the availability driven SLA no longer favours the outcome driven contract clause

In a similar way, if we take the second contract criteria of messages being delivered back to the customer system within 15 minutes, the traditional availability target SLA which allows 84 minutes of unplanned downtime clearly does not support that requirement.

The challenge then comes that if you accept the observations above as a basic principle, what are the alternatives?

The obvious move is to realign the IT SLA’s to the business outcomes but in doing so a number of factors such as those below may need to be considered;

  • IT need to have a change of mindset to align them closer to the operational contract. In doing so their is an inevitable risk that differences in operation culture could create natural obstacles
  • In order to prevent future challenges, IT need to be engaged as early as possible in any client contract negotiations
  • Traditional supplier SLAs may need to be aligned to deliver the outcome based approach
  • Concept of fix time to P1’s has to be removed as the measure is no longer time bound but measured in the success of “outcomes”
  • Systems may need to be designed with a greater level of resilience / availability, considering true high availability during change activates to support business outcomes
  • Service processes such as Major Incident Management and service report will need to be aligned

Whilst the SLA may not have seen its day, certainly in order to keep up with the increasing demands, I would suggest that the traditional service measurements may need to be revisited and replaced with an outcome based expectation.

Ten top tips for Major Incident Management

Article first published 12th June 2017

When I was looking to have the website redeveloped, it was a good opportunity to look at the old blog articles. A great deal of these were written with a different focus and technologies had changed significantly, but every now and then I came across one which I felt was still pertinent. This article contributed to by a good friend and colleague of mine still (in my opinion) hits all of the spots.

Our top 10 tips:
1. Information is key – The last thing you want to be doing in the middle of a major system failure is scrabbling around for information. Worse still is sat that knowing that all the information you need is saved on the hard drive of the system that has just failed. To manage a major system failure you usually need the following things: o Key contact names and number o Copies of any support agreements o Technical information and if possible, system diagrams These items alone will not guarantee you a fix, but they do provide you with all of the raw information you need to get the right people together and to make the right decisions. Make sure all of this information is held in a paper format, ideally in a file that can be accessed easily. Assign an owner and if anything changes make sure it is updated.
2. Deadlines help focus the mind – One of the first questions we always ask at the start of a major system failure is “when do we need to get the service back?” Now there are always two times’, the time the customer would like it back (which is normally in the next hour!) and the time the customer actually needs it back. Working to an unrealistic and unnecessary deadline causes corners to be cut and poor decisions to be made. This can result in only a partial success which just comes back to catch you out later on. Deadlines should be driven around key business activities and if flexibility exists always take the latest window. Experience has told us that if a technical person tells you how long it will take to fix, always multiply it by 3.
3. When to have a plan b – Having clearly understood deadlines also allows you to re-plan when things are not going well. If you have been working on a strategy for 12 hours and you only have 4 hours left, it may be prudent to start considering other options. It is not normally efficient to approaching a problem from various angles. Firstly, resource and costs do not normally allow it; secondly, it pays to have all of your knowledge working in the same direction. BUT occasions do occur when no matter how long you have been working on it or what resource has been thrown at it, the break through has not been made. The hardest (and sometimes bravest) decision is to ask all of the support people to down tools for a short period of time and to start consider contingency plans or other avenues of investigation. Normally they will want to continue to explore their current theory and that’s fine, but at some point you have got to start putting other eggs into the basket and getting the framework of these into place.
4. If needed, allow one to direct whilst another one writes – I had only been in the job for 6 weeks, when I received a phone call from my boss telling me our main computer room was on fire. This resulted in multiple systems failing and having to juggle a lot of deadlines and support people. The approach was quite simple, he directed the operations and I recorded everything, no matter how trivial or insignificant. From the times that people arrived on site, to decision we made (and who signed them off) down to things we needed to check and key deadlines, it was my responsibility to capture everything and then remind my boss of any key points or key times. This is only really relevant for big system failures or when you lose a few systems at the same time, but by clearly define the roles of the “manager” and the “scribe/reminder” the problem can be managed much tighter with key issues not being missed. Also when you come to trace your steps later on (to prevent it happening again or to review what worked and what did not), you will be surprised the small details that are recorded.
5. Conference calls – Most system failures normally involve 4 groups of people:
a) Those controlling it
b) Those trying to fix it
c) Those who are directly affected
d) Those who are indirectly affected but just want to be involved
The most time consuming part of managing a system failure is normally around the areas gathering information, making decision and keeping people updated. Depending on the size of the groups involved, the most efficient way to do this is to utilise one of the many telephone conference call services advertised on the internet. With rates of around 5ppm billed directly to the caller bill, these can be a low cost method of updating large groups quickly as well as discussing possible options without people feeling left out. Be aware that these non face to face meetings need a strong person in the chair and the expectations of meeting need to be outlined at the start. Due to people’s attention span tending to drift quickly, the chair should take the opportunity to recap and clarify points on a regular basis.
6. Don’t change anything without … – Recording it and making sure, everyone who needs to agree to it. It is so easy in the heat of a major system failure to stumble across a possible fix and to make changes in quick succession with no real rationale of why you have done it. Worse still, if the changes do not work (and notice the emphasis on “changes” as it normally results in several), it is very difficult without referring back to records, to remember exactly what was changed and why. This is especially relevant if several different individuals have provided the support over a prolonged period (as at the point of regression, some of them may be catching up on their sleep). Agreement to change to important for two reasons; Firstly, most changes come with a risk. The risk that you may extend the outage or the risk that it may do more damage that the current situation you find yourself in are two which spring to mind very quickly. Therefore, it is essential that in controlling the system failure, you set up a small group of people who will validate any decisions and where necessary question the reason for them. These do not have to be people from the end user community, but it is benefitial to have someone who understands the business and can explain the impact if the change causes further issues. Secondly, some changes need a level of technical understanding. A change to “system x” to resolve a problem may seem straight forward but in doing so may cause a new problem for “system y”. It is essential that all changes are considered within the scope of all of your computer systems, not just the one that is broke.
7. Trade off full service for vital business functionality – The common approach when dealing with a system failure is normally to want to restore the full service but when deadlines are short or the availability of technical support is limited consideration should be given to restoring vital business functionality only. Vital business functionality as the name suggests, looks at restoring only the key aspects of a system that the business needs to continue its operations to maintain a state of profitability or customer satisfaction. It is not normal for companies to have this documented for each system (although that would be beneficial), but generally is discussed near the start of a system failure to establish requirements early on. This gives the technical support the option of restoring a partial service to keep the business going or to try to fix the full problem.
8. 18 hours is enough! – Whether it is the support staff or the person managing the system failure, productivity and accuracy start to get impaired at around 12 hours into the fault. Once you hit 18 hours, your effectiveness is significantly diminished. Strangely, this does not seem to be influenced by when you last woke up from a nights sleep, so whether the problem starts at 9AM or 9PM the 12 & 18 hour rule seems to be the same (the difference is in your personal recovery time after getting the next good nights sleep!) Whenever a system failure is drawn out and due to its impact on you business you are working into the late hours, you should always aim to have a handover point somewhere between the 12–18 hour mark. Don’t forget, that this handover should include a full briefing of all the decisions that have been made and any courses of action that are currently being considered along with a summary of key milestones that are due to appear.
9. Blame has no place in restoring the service – It is so easy in the heat of dealing with a system failure to start looking for someone to blame. This is normally linked with the question why? Why did it happen? Why did we do it at that time? Why did we listen to that advice? Why didn’t we take a backup? All of these whys are great questions and actually add to the prevention next time, but during the failure they can encourage negative feeling or create barriers. No matter how much you want to pin the blame on someone or something, it should always be avoided, your energy is best invested in finding out why the failure has happened and what you need to do to fix it.
10. Restoration is only half of the job – Once you have returned your service back to a useable state, it may be time to take a deep breath and pat both yourself and any people who have assisted you firmly on the back. But good problem management does not stop there. The next steps involve fully understanding and documenting both the reason for the failure and what actions were taken to restore the service back to normal. Following this, actions to prevent it happening again need to be identified and where costs allow, implemented.

People and Change Management

Article first published 22nd May 2017

Process re-engineering is never an easy thing. Studies have been carried out which map peoples reaction when subjected to significant life changing events and this is no more apparent in the work place when reviewing some of the core ITSM processes. A key one I have observed having the most significant people impact is Change Management.

Since starting out as a contractor in 2014, I have had the opportunity to reshape the ITSM process on a few occasions. They resulted in a significant shift in the approach towards change management, the underpinning processes and use of the toolset.

Lets be honest, Change Management as a process is very well defined. The common text books have it documented to a good level of maturity and as a framework the key pillars such as the presence of the CAB, the easily recognised change types of normal, standard and emergency and a logical process flow allow a maturity review to be carried out quite easily.

But this article is not about that, its introduces the key observations which I have found to be common in a number of reviews and they are based around people not process.

I am going to start this short example list with “Lead Time” as from an emotional viewpoint it tends to be up there at the top. The Change Manager (in administering the process) tends to want the maximum length and it is not uncommon for this to be between 7 to 14 days. This allows a good review of the change record to ensure it has been written to a good standard, a period of time to allow the change to progress through a technical approval cycle (and where end user or client approval is needed, this also to be sought) then in the event of it needing a CAB, a number of days to prepare and execute prior to the required implementation date.

Unfortunately this extended lead time tends to push against the needs of the developers and project managers who as an average I have found seem to sit in the 3 – 4 days camp. In a large number of cases the requirement for a shortened lead time can be clearly tracked back to a level of poor planning but with project timeliness, the risk of escalation and “the change process” being cited as reason for late delivery, any transformation of the Change Management process should really approach any communication of pushing the lead time out with a clear communication plan being able to clearly state the reason why the lead time is as it is, the benefits and most importantly have a clear process for expedited changes which does not become the norm but gives a level of reassurance that where a shortened lead time is required, the process can accommodate it without adding risk into the deployment.

The second issue that has been common in the transformations has been to approve or not to approve before cab. Let me explain. With the general lifecycle for a normal change following some form a peer review and then a decision gate based on a criteria to create the requirement to be presented to a CAB or to immediately go to Change Manager for final approval, one debate I regularly find myself facilitating is that of pre-CAB approval. Now in this case there are generally two routes if the approver has a query and wants the matter discussed in detail at the CAB.

Option 1 is to refine the process to allow the change to progress to CAB “part approved”. In this model, if a change had for example 5 pre CAB approvers and only 3 approved it, there may be a pre-defined CAB date in the change record which automatically progresses the change into the CAB but provides clear visibility that the change record has not completed its approval cycle and a level of questioning would take place. The problem I find with this model is that the CAB can become log jammed and extended with part approved changes, especially if the discussion is quite in-depth. On a people note, it is also a convenient way to defer the decision making process and can become the norm making the Change Manager role much more complex.

The second option is to have a criteria that all changes must be approved by the peer review cycle before presentation at the CAB. This creates accountability on the approvers to really own the action of approving the change and ensuring that any issues they have are resolved prior to the CAB. In this scenario I regularly get challenges with “well if we have all approved it what is the purpose of the CAB?”. My reply to this is fairly standard in that I see the CAB as a collaborative review process which moves past what tends to be a silo’d approval process in the peer review stage to a more collective review of the change, its back out, the testing evidence etc. By forcing the individual peer reviewers to get down to the detail of their level of accountability prior to the CAB it also has had a tendency to drive up the quality of the change approval.

You will probably get a feel for the fact that the second approach is my preferred one although this tends to need more people input in achieving buy-in when compared to the first option.

Where the nature of the application and infrastructure drives a regular need to engage with the end users or clients to seek approval, the decision to include any back out time in or out of change window is an interesting debate. Take a simple change that will take 3 hours to execute excluding back out and then a 1 hour window to back out. The question is whether the last hour to back out should be included in the change window. If you do it appears that you are planning to fail and in some cases can extend the change window significantly and not give a true picture of the “happy path” service handover point. By excluding the back out, you give a more realistic point of service handover but if the change fails and needs to be rolled back you may breach the SLA.

The people element here for me, really does come down to the approach and mindset towards failure. I have found where the consensus falls in including the regression time, there can be an underlying culture that finds failed changes acceptable. By forcing the change to be approved without the regression time included (and the penalty being an imposed SLA breach), the senses are sharpened and accountability for the rigor around the change is elevated.

The approach in the three examples above ultimately come down to engagement. Any process review and certainly where Change Management is concerned can only succeed with a strong engagement plan that gives early visibility of the options available and creates an environment where the options can be debated to a satisfactory conclusion. The days of reviewing the process and then presenting a “fait accompli” are long gone but I would close by stating that this is not process design by committee. Ultimately someone has to take the lead and carry the accountability of the final process but by recognising the people triggers and considering item such as the three highlighted above, a smoother approach can be achieved.

Supplier Paradox

Article first published 2nd August 2017

One of the more unusual elements of working as a contractor can be when you support the recruitment process for your replacement. On this occasion I was assisting with the recruitment of an IT Supplier Manager and part of the client brief was to develop and agree the competency questions with the hiring manager. When conducting the interviews I suddenly thought “if it was me sat there, how would I answer?” Well below is my likely reply to one of the questions.

What is your view of managing a supplier as a partnership or transactional?

Cold silence… a few seconds to think, then engage brain before mouth? maybe? or should I just say what I think?

Transactional is certainly the favored response when answering this question. Whether that is an interview standpoint (saying what you think the interviewer wants to hear) or where the majority sit, I am not sure. My gut feel with watching the body language make me think its the former but here goes…

“My initial answer would always be transactional, I know it is common to think about large IT suppliers as a partnership but I always believe this is the end game and not the starting point. Regardless of how much a relationship has been formed whilst striking the deal, you only really know how it is going to work out once the services start to be delivered and you have the first two or three service reviews under your belt”

Right, opening statement down, now I have to explain why I have probably gone against convention…

“For me the transactional phase of the relatiohship, certainly at the start, establishes a clear boundary between the delivery of BAU service (to the required service level and contract expectations) and the opportunity to introduce new revenue (for the supplier) and improvements to the customer through the introduction of new services. The problem is, if you start with a partnership from the start, small service misses can quickly get over looked as all parties do not want to run the risk of damaging the partnership. For me it is important that the customer is able to enforce its position as the customer and the supplier firmy knows its starting place and what is expected in terms of delivery.”

Thats got that over with. Probably worth talking about the longer term plan now…

“It is likely that a key supplier has been bought in with the Exec mindset of them being a partner. Don’t get me wrong, I am not against partnerships and if I had to put a figure on it, for a 5 year term I would expect that change of mindset to kick in at around month 6 of good solid delivery. At that point in time the succesful run rate of service delivery should have been established and the key personalities should be comfortable working together. I would then entertain a change of approach and mindset toward a more collaborative and partner driven service relationship, but that has to be maintained with the bias being on the supplier to deliver not just the add value element but maintain the BAU.”

Should I say the next bit? hmmm, yep OK then…

“But it is important to remember that a partnership can move back to transactional. If that happens it genuinely would be a sad day and the internal vendor management has to carry some culpability should that happen. Ultimately the right to be treated as a partner has to be earnt by the supplier and then the protection of that supplier as a partner sits with the vendor management team.”

Are the free ITSM tools any good?

First published 18th March 2018

Over the last 6 months we have been exploring the world of ITSM tools. As we have been developing the new service offering, it has been important to test out the tools with the view of a low cost route to market. After looking at a few tools we narrowed it down to two. This review is purely the reflections of our day to day usage but we hope you find them of some benefit.

This is not meant to be a comparison or rating exercise. To be clear we liked both products and their are other product out there. The key issue here was that we went for the free version and tried to see if using a cloud based, non SLA’d service we could get a useable product to give us basic incident management with a good user experience. The approach was based around simulating an organisation with limited technical understanding getting a basic incident management capable support desk operational.

We start off with Spiceworks. This is the second tool we deployed and has been embedded in a client website as a direct log portal for any website issues, updates and development requests. It only offers incident management but comes with a portal URL, a nice advisor interface and a tidy Android app.

Portal – Spiceworks has a nice little trick from the off. A single instance (account) allows you to create multiple “organisations” within the tool. Why is this good? Well if for example you are managing multiple websites for your clients, each can have its own portal / address but the incidents feed into a single dashboard at the service desk end. The other strong feature about the portal is the fact that as you create custom fields (e.g. fault codes) you can embed these into the portal making it quite flexible. It also allows non registered users to log an incident without having to register and also gives an alternative email address instead of the portal. All in all, very well thought out!

Configuration – We have already mentioned that once you have an account “or instance”, you can then create individual organisations within the tool. Each organisation can then have its own set of attributes. This makes it really good for multi client service desks who need to individualise the end user experience but manage the outcomes in one place. The “custom attributes” gives you the ability to create fields such as “Closure code” and then using a simple list, give a number of options for selection. This really allows you to structure the date in a useable format. Ticket monitors, alerts and notifications also give you the final part of the configuration to deliver a viable Incident Management tool.

Advisor View –  A browser based interface presents the Service Desk Advisor with a simple non cluttered view of the world and clicking “New Ticket” fires up a pop up screen that once the organisation is selected, resets the remaining fields with options only for that organisation. As Spiceworks does not automatically assign a fix target based on the priority, a nice feature is that these fields are presented at the call logging stage so the advisor using a simple look up table could manually enter them at the point of call logging. The dashboard also breaks the tickets down by Unassigned and Open tickets making the prioritisation of initial call handing very intuitive.

Advisors and users – We have had a real good look through the literature and so far we can not find any restrictions on the number of advisors or techs that can have access to the front end or any limit on the number of customers / contacts who can log calls. If this is correct its a really nice feature.

Categorisation – We have talked about this earlier but I think for me this is one of the big selling points. The fact that you can create multiple organisations then in each one create your own custom fields give a level of flexibility that is really nice and somewhat unexpected in a “free to use” product. Standard field types include number, text, date and list.

User emails – Once again this covers the basics. As a user I get an email on ticket log, update’s and closure. The emails are a bit messy in my opinion but I am sure their is a level of customisation in the forums to tidy these up. The admin dashboard features a set of toggle switches to activate or disable the emails to both the customer and the agents.

SLA Management – Out of the box, each “Organisation” has a pre-populated mandatory field titled “Ticket Categories”. Unfortunately, apart from being a flagged field it appears to serve no purpose. You are unable to assign target time of fixes to it so when the incident is logged you have to undertake a manual task of updating the fixed due by field with a manually calculated date and time. I am sure if you explore the forums their may be a method of doing this, but out of the box the only options appears to be a manual assignment.

Android App – This is a really nice feature as it provides both a visual alert when a new incident is received as well as allowing an agent to both log new tickets or manage existing ones in a feature rich way. It was tested on both a phone and tablet and was useable in both cases.

Reporting – This is where Spiceworks has an opportunity to improve slightly. Out of the box you get a nice dashboard that shows today, last 7 days and last 30 days. Within this is also shows average response and close times and it can be filtered by organisation. It does not drill this down to a Priority / SLA level so it really is high level analysis. It also provided pre-canned extracts for all ticket data as both a .csv and .json format. On analysing the .csv files (which most users would be use to) the only two time stamps of use are logged and closed so whilst you can calculate a fix time you are unable to do the same for response. On two attempts of running the same report a number of the custom fields did not pull back any information although it was present in the ticket. Alternatively, the.json file is data rich but you would need to be comfortable with the file structure and do some data manipulation in order to create a meaningful reporting deck.

Knowledge Base – After 10 minutes of exploring this functionality, I must admit for the first time in the product I started to get frustrated. The front end seems intuitive enough, the ability to create a knowledge article visible to yourself, the team and what appears to be the whole Spiceworks Community. That’s straight forward, but logging a test article and trying to delete it or move it between groups appears impossible. I may be missing something obvious in the interface as I would expect this to be core product. If I find out how to do this I will update this article…

Summary – if you are looking for a cloud based Incident Management system that provides a level of flexibility, a straight forward portal and unlimited user and agent contacts this certainly works straight out of the box. The Android App and email interaction put a nice usability wrap around it but be aware of the shortfalls in reporting unless you are happy to spend some time digging into forums. The lack of automated SLA times is also an inconvenience but can be accommodated with an operational workaround at the point of ticket assignment. Knowledge Management also appears to need further exploration.

Fresh Service

We follow up with Freshservice. We have been using this in the free format for a few years for our own business and have also explored the full module set for a client about 12 months ago. For the purpose of this comparison, we will only be exploring the Incident Management module which once again comes with a portal URL, a nice advisor interface and a tidy Android app.

Portal – Unlike Spiceworks which allows a different URL per organisation, with Freshservice, the URL is based at the “instance level” and therefore you could end up with multiple clients using the same portal. Not a problem, you just need to consider branding and organisation identification if this is the case. On a really strong plus side, the portal not only allows users to log tickets, but allows them to review their open and closed tickets (and see the updates) as well as browse Knowledge Articles allocated to their organisational structure. A small weakness is that the portal has limited customisation so at the point of logging you only get the users name, a summary and description field. I suspect this will result in some additional call handling as you go back to the user at first contact to ascertain information such as organisation name, application etc.

Configuration / Categorisation – The primary configuration in the free version comes in the form of “Tags”. These are free form tags that can be applied to each Incident record. In our structure we use three leading tags (CI-, Cat-, Clo-) for CI or “Service Name”, Cat “Categorisation of Issue” and Clo “Closure Code”. By keeping the text string to 16 characters (e.g. “Cat – Apprunslow”), when we run the report extract, using formulas in Excel we are able to creating meaning data for reporting. Whilst Freshservice does have a CI field, we found this does not form part of the extract in the free version, hence the reason for creating a CI tag.

Advisor View –  Once again,  a browser based interface gives a simple ticket summary and a nice touch is that it categorises them not only as Open and Unassigned, but also includes grouping for On Hold and Due Today. Clicking into a group shows the tickets for that classification along with other filters to review smaller subsets of the data. A button to create a new ticket exists in the main view and from here, the advisor is able to access a wide set of fields to complete the incident logging process.

Advisors and users – Freshservice offer 5 plans of which the first one “Sprout” is free. This limits you to Incident Management (with the Self Service Portal and Knowledge Management) for 100 end users and 3 agents. The next offering is “Blossom” which does take you to unlimited end users but is charged per agent. At the time of writing (March 18) this is £20 per agent per month so if you are running 3 agents and tip into 101 users you are then faced with a monthly bill of £60.

User emails – The features here are very similar to Spiceworks, with a set of configurable scenarios to trigger emails to both users and agents. In balance, the structure of the Freshservice emails does appear more user intuitive.

SLA Management – Whilst with the free version, you only get the ability to create a single SLA policy, it does interface with the ticket logging and drive the SLA breach and escalation process. You get 4 priorities which allow a response and resolve target applied in days or hours against a configurable calendar. A nice touch.

Android App – Similar functionality here with the App providing a visual alert when a new incident is received as well as allowing an agent to both log new tickets or manage existing ones in a feature rich way. It was tested on both a phone and tablet and was useable in both cases.

Reporting – A simple “Export” button in the “all ticket” view allows you to pick parameters such as closed, resolved or created date and then using a custom date range extract the database information across enough fields to drive meaningful reporting. As we had pointed out, this does not extract CI information so using the Tags functionality is highly recommended.

Knowledge Base – This is where Freshservice plays a nice trump card as its knowledge management module not only allows a logical hierarchy (Category > Folder > Article) to be created but allows you to assign articles to “public” or “private” with public appearing in the end user portal view.

Summary – If you know that you are going to stay under 3 advisors and 100 users, Freshservice provides a feature rich incident management product with a very strong knowledge management add on. The risk comes from the uplift cost once you breach the 100 users or 3 advisors.