Should you build, buy or adopt an observable AIOps platform for your organization?
Every successful digital transformation depends on enabling world class user experience for its employees, customers and partners. In our earlier blogs, we discussed why a flexible, scalable, intelligent monitoring and observable AIOps platform is the key foundation to support your digital business growth. Once you and your stakeholders decide that successful digital transformation initiatives rely on AIOps, you will have to navigate the decision to build, adopt or buy as below:
- Should you build your AIOps platform in-house from scratch?
- Should you buy an AIOps platform as a SaaS or hosted licensed solution?
- Should you adopt an open-source AIOps solution by integrating multiple open-source components?
If you’re scratching your head right now, please be rest assured you’re not alone. Let’s explore the critical considerations on build vs adopt vs buy.
In your enterprise, it is likely you already have a few silo-ed monitoring tools but they are not meeting your needs of ingesting data from multiple application domains, vendors or sources, including infrastructure, networks, apps, the cloud or existing monitoring tools.
If you are a web-scale company (such as Google, Linked-in, Netflix, Facebook, Etsy), the business and operations needs are so custom and innovative that it cannot be built by anyone other than yourself. A few examples are: Jaeger as a distributed tracing platform built by Uber, Prometheus built by SoundCloud, Skyline as an anomaly detection system built by Etsy. All these solutions have been open-sourced after the initial innovation and battle hardening at a web scale company.
In another approach, you and your team may feel that your AIOps platform requirements can be met by adopting an open-source solution such as ELK stack, Grafana Loki stack. Your team may be tasked with a Proof-of-Concept (PoC) to evaluate and research open-source stacks that meet your requirements. You are pleasantly surprised that the PoC was relatively inexpensive and your team completed it on time. The PoC demo to your stakeholders is received well and your team is very excited to move forward.
As an experienced IT leader, you realize the difference between building a PoC vs implementing a mature, scalable, futuristic observable AIOps solution while balancing the CapEx (Capital Expenditure) and OpEx (Operational Expense) is pretty huge. The unfortunate reality is that working on something cool is not always what’s best for your business. As the reality sinks in, you want your team to put together a set of considerations to carefully assess build, adopt and buy approaches for your organization.
Here are eight critical considerations to help in your decision-making on build vs adopt vs buy of an observable AIOps platform: Complexity of building and integrating open-source tool
1. Complexity of building and integrating open-source tool
Is adopting an open-source integration a core competency of your organization?
Often, adopting an AIOps solution from open-source requires deep understanding of data pipeline, log analytics, distributed tracing, customer journey mapping, time-series metrics, dashboards, alerting, search, machine learning models and more. You want to decide if you, your team and your organization are ready to build competency on each open-source component and its lifecycle along with integration skills required to assemble the solution. The complexity of such a system is two orders of magnitude greater than developing a CRM or ERP system.
Many have attempted to build such a system, and few have succeeded.
The classic case study of a build and adopt approach is GE Digital that expended eight years, 3,000 programmers, and $7 billion trying to succeed at this task. The end result of that effort included the collapse of that division and the termination of the CEO, and it contributed to the dissolution of one of the world’s iconic companies.
On the other hand, buying an off-the-shelf SaaS observability solution that has built-in data pipeline, log analytics, time-series metrics reporting, unified visibility across the entire IT stack and machine learning models may be more effective in the longer run than trying to adopt an open-source solution such as ELK.
2. Skillset to build an observability platform
What is the skillset required to adopt an open-source vs buy?
You and your existing team may be world class and you may have proved it time and again in the past. Your team may be excellent at SRE, DevOps and coding skills required for an AIOps platform. Still, chances are you and your team aren’t good at everything – such as observability, MLOps. Underestimating the skills required to drive observability and MLOps can be the difference between failure and success of your AIOps platform.
Most, if not all, open-source monitoring solutions do not come with observability and machine learning models such as anomaly detection, automated root-cause analysis, capacity planning.
The learning curve is pretty steep if you are starting to build observability and MLOps skillsets in your team from the scratch.
3. CapEx investment
What is the estimated CapEx for open-source adoption vs buy?
In the buy scenario, CapEx is generally lower than open-source adoption because of very little integration costs involved. In the open-source adoption scenario, do not underestimate the need for your AIOps solution to support new data sources, configuring the data sources and ensuring your data pipeline is good enough to meet the needs of your growing data sources.
An integrated, federated common object data model is absolutely necessary for your observability platform.
Using this type of structured programming API-driven architecture will require hundreds of person-years to develop an integrated data model for any large corporation. The Fortune 500 is littered with such disaster stories.
4. OpEx estimation
What is the estimated OpEx for open-source adoption vs buy?
In the buy scenario, OpEx is perceived to be higher than the open-source adoption. But in an open-source adopted solution you will have to carefully factor the cost of your technical SMEs in integrating, assembling and maintaining the open-source components vs buying an off-the-shelf observability solution.
It is very likely in the longer run, your OpEx needs will be more for an open-source adopted solution due to continued investment required in the integration and maintenance of open-source components.
5. Time to deploy
What is the estimated CapEx for open-source adoption vs buy?
Assess the integration timeline requirements for open-source components vs buy. Don’t get short-sighted by the time it took to build the Proof-of-Concept using open-source tools. The integration timeline requirements for a production deployment of an open-source solution encompasses integration, testing, configuration, documentation, disaster recovery, bug fixes, deployment, operations, training and more.
Don’t get short-sighted by the time it took to build the Proof-of-Concept using open-source tools.
As a thumb rule, whatever time estimate you come up for adopting an open-source solution – just triple it. For the buy approach, just double your time estimate to factor any unexpected delays from the vendors. Remember, whether build, adopt or buy – testing in staging, deployment and training are common.
6. Supporting the SLAs
How will you monitor your open-source adopted solution vs buy? What are the SLAs? Who is responsible to meet the SLAs?
Your AIOps platform is mission-critical for your business. Talk to your stakeholders and get their viewpoints on their expected SLAs that need to be met by you and your team. Ask your team how will they monitor your AIOps platform? In the buy scenario, you will make sure that the vendor can meet the SLA requirements and address it during the purchase process.
In the adopt scenario, you will have to plan for a dedicated team with deep skills and commitment to meet the expected SLAs. You should consider the risk of abandoning the open-source solution by considering the lifetime of your AIOps platform and the SLAs not being met. In the buy scenario, you should consider the risk of abandoning a vendor solution to a cheaper or higher quality vendor solution in the future.
How will you handle bugs related to security vulnerabilities, memory leaks and data loss?
Your information security team will have a set of guidelines to ensure that your solution has a minimal exposure to security vulnerabilities, memory leaks and loss of data. All software products inevitably need to address bugs not just in the initial deployment but post-implementation as well. When your AIOps platform is deployed and utilized at scale, new security vulnerabilities, memory leaks and loss of data might surface as a result of inevitable upgrades.
You need to consider who will be on the hook for handling such security vulnerabilities when they surface in the production deployment and how quickly will they be addressed.
8. Migration to future versions
How will you migrate to future versions in open-source versions vs buy, for continued adoption?
Migrating between different versions of open-source components you plan to adopt and integrating them can create incompatibilities because of feature deprecation. As a result, migration can result in configuration rewrites, loss of backward compatibility, and operational instability.
You will need to factor in the time to be spent on an operationally viable platform for future migrations of the open-source components that are inevitable.
In the buy approach, migration is less of a pain as the vendor you choose should address the concerns as part of their product lifecycle.
- It is easy to get excited with a PoC but there is real work involved in delivering a production-ready observable AIOps solution. The eight critical considerations we have outlined above are not exhaustive by any means but they are fundamental as you debate build vs adopt vs buy in your organization.
- It is important to have an experienced technical project manager who can ask the right questions to your team on build vs adopt vs buy approach of your AIOps platform. After deliberating the risks and benefits to each consideration, you may reach the decision to buy. Quite often, buy approach offers as much a challenge as the adopt of build approach.
- As your digital business grows in the number of customer transactions, number of users, number of channels and partners, your application infrastructure and big data analytics will increase in complexity as well.
- At the end of the day, you want to deliver a world-class solution to your stakeholders that is operationally viable.
✍ Srikanth Narasimhan, the author of the article, is a Technical Advisor @ VuNet Systems. He is an Enterprise Architect and has served as a distinguished engineer at Cisco.
VuNet’s platform vuSmartMaps™, is a next generation full stack deep observability product built using big data and ML models in innovative ways for monitoring and analytics of business journeys to provide superior customer experience. Monitoring more than 3 billion transactions per month, VuNet’s platform is improving digital payment experience and accelerating digital transformation initiatives across BFSI, FinTechs, Payment Gateways and other verticals.