21 August 2008

How Things Can Go Wrong – a Fantasy

I thought I would try to sum up my last two postings with an example or fantasy to illustrate how “the best laid schemes o' Mice an' Men, gang aft agley”

To set the scene; imagine a start-up Telco that launches with a single ‘all singing all dancing’ software suite that does everything from Customer Care and Order Management through to Billing and Provisioning.

Things work fine for several years and the Telco grows, but eventually, the customer base gets too big for its unified OSS to handle and the new products that marketing want to launch cannot be easily implemented in the system and often require code changes. It is decided it is time to update the OSS, and rather than a ‘big bang’ approach they decided in the low risk strategy of a stepwise replacement of functionality.

First of all they move Billing out to another platform. This will increase throughput and allow more sophisticated pricing plans and discounts.

They select leading Billing package and then spend a lot of time and effort tailoring it to integrate with the legacy system, so the state-of-the-art billing functionality, pricing plans, discounting rules and product definitions are amended to reflect the legacy system’s functionality and product definitions. The official reason for this decision was it would be a waste of time and effort to update the interfaces and functionality in the soon to be replaced legacy system (but as we will discover later there is an additional dark hidden motive driving this decision).

A similar operation is then performed with the Provisioning subsystem, and each of the other parts of the OSS. Each interface is altered so that no change is required within the legacy system and where the legacy system cannot define the complexity of, for example, provisioning some of the new products and services the other systems add information and build special rules and processes to add the level of sophistication required.

Finally the Telco believes it is the right time to completely replace what remains of the legacy system with a market leading Order Management System.

Unfortunately, and inevitably, the project, like its predecessors, quickly runs into problems – the project team finds that they are not allowed to ask the Billing system or the Provisioning system or any other of the OSS components that interface to the Order Management System to change their interfaces or product definitions.

This has been part of the whole strategy for the stepwise OSS replacement; each of the new components is not allowed to ask for a change in any other of the other OSS components.

The Order Management project is now faced with the daunting task of amending the market leading functionality in the selected software package to work with the weird product catalogue, interfaces and constraints adopted by all the other components of the OSS. They are in fact re-implementing the legacy system they are replacing within the new Order Management package. While the new package has the functionality to completely describe the new advanced products within the Order Management system they cannot do this because the interface to the Billing and Provisioning systems are expecting the ‘dumbed down’ legacy system’s definitions and any change would mean that the special rules and procedures developed at great expense would no longer work.

The project deadlines slip, the volume of change requests soar, budgets are broken, doubled and redoubled, but eventually the Order Management system is finally launched.

To everyone’s horror it is quickly discovered that the problems of the old system are still present in the new one. It is impossible to define new products easily as every new product requires major software changes in every major component of the OSS.
The Telco still has the “ghost” of legacy system in their OSS and it will rule their product catalogue and limited their IT Architectural options for years to come.

So how could this imaginary Telco have done things differently? Each step along the way seemed to be sensible and avoided scope creep and spending money on amending soon to be replaced software. Despite this caution (or because of this) they ended up trapped by architectural mistakes made right at the start of the very first migration project. They had tightly coupled the components in their architecture and had never acknowledged that fact nor faced-up to the consequences of sticking with that architecture.

One thing they needed was a loosely coupled architecture – one in which each component (system) is completely unaware of the inner workings of the other component, and only aware of the services the other component support (or require) and the functions (methods) and information required or delivered by the service. A loosely couple architecture allows each component to be independent of every other and can be developed, used, and replaced without impacting any other component.

To achieve this they should have adopted the TMF SID model and mapped the legacy system’s concepts to it. Then each new system should have used the SID definitions, (and not the legacy system’s) to map to (and not implement as) their own internal data structures. There is a lot of effort initially, but one that would have saved them a lot of grief in the long run.

There is also a hidden problem with the architectural approach inherent in the whole process that no one in this imaginary Telco could see.

No one stopped to think about the product catalogue and its definitions. The Telco had ‘grown up’ with the legacy system as initially it ruled every aspect of their business. The way that system defined products and services became part of the corporate culture to the extent that the quirky vocabulary the legacy system used to describe the products end up in the company’s marketing material.

The product catalogue worked fine for the legacy system, but the Billing package has a more advanced and flexible way of working. Rather than adopting that the Telco decides to amend the product definitions in the Billing package to match those in the legacy system. This is, after all, the way the Telco itself thinks about their products and services, so it seemed a very sensible thing to do…

This is actually a form of the Stockholm Syndrome. The Telco feels trapped by the legacy system, but is secretly ‘in love’ with it and has ‘subconsciously’ absorbed its constraints and ways of working even though these are the very reasons for wanting to escape from it.

Again use of the TMF’s SID model could prevent this. There would be considerable pain in the early stages of the whole migration strategy where the Telco has to stop using the legacy vocabulary, map its products into the SID concepts of Product Offering, CFS, and Resource Specification and in the process learn a new system-independent vocabulary. A painful process in deed – especially where a good number of Product Managers, and the users and owners of the legacy systems will see no point in it.

But having taken that pain early on, one of the reasons for moving away from the legacy system – its inability to describe complex products – would then be sealed into that system. It would no longer “infect” each and every other component of the OSS, nor stunt the Telco’s ability to visualise and describe (in a way their customers can understand) new complex Offerings.

Fact or fantasy? – Well, as I said, the scenario is a fantasy, but does this sound like anything you have experienced?

1 comment:

Anonymous said...

I fully agree by the first half, I mean the realisation of the problem but not the suggestion by each words.
I mean, loosely coupling looks nice, but kill optimalisation on Enaterprise level! On the other hand the real problem is the inflexibility of the software components: high costs, slow reactions for necessary changes.
Code changes on the other hand would not be a problem at all, if your software components are well designed and implemented.
On the other words: parameterisation means for me if I can go live with a change from one day to the other - and I don't care if it is an update in a database, or in an SVN!
The word "can" mean it is not a blind change, but tested properly...