Tag Archives: Anomaly Detection

End of the Beginning for Machine Learning

Well thats a wrap for my initial journey into machine learning and artificial intelligence. But it’s only the end of the beginning for machine learning.  Before I finish up this series some people have asked that I provide some context. This technology is changing the industry. Some players have already adopted these technologies – my company and competitors. So to placate the masses, let’s list out what I have heard is currently available in the marketplace. I am not omnipotent, so if you see anything missing or wrong please let me know.

Artificial Intelligence is Old Hat

 
Let’s first talk legacy. Artificial intelligence has been part of software since my start of the industry. The most common of which are “rules”. Humans define the model though. This model could be a list of “if” statements. A tree stored in a database is also a model. The difference is not on the AI side, but the machine learning. Automated model building is what is different. Other legacy concepts are algorithm based. Two examples I have for you: linear regression trending and Holt-Winters smoothing. Both are available in open-source like MRTG as well as many commercial applications today. The commonality is that the algorithm provides the model. Let’s be clear, the algorithms doesn’t build the model, it IS the model. These are robust and well regarded solutions in the marketplace today.

Anomaly Detection vs Chronic Detection

 
 
Now lets move to machine learning. Anomaly detection, with various degrees of accuracy, is getting to be common in the marketplace. Many are black boxes that strain credibility and others are open time abyss of customization. The mature solutions are trying provide a balance between out-of-the-box value and flexibility. There are plenty of options with anomaly detection. Chronic detection and mitigation is much more rarer. I have not seen many who offer that functionality, especially accomplished with machine learning. Again on dealing with chronics, you mileage may vary but its out there.

Takeaways

Many of the products that use this technology do not specifically reference it. Usually when you hear analytics nowadays you can expect machine learning to be part of it. Most performance alerting (threshold crossing) leverage it in the realm of big data analytics. Most historical performance tools leverage machine learning to reduce the footprint of reporting. These three areas commonly have machine learning technology baked in.
 
What this means is that machine learning is NOT revolution technology that solves all our problems. At least not yet. Its revolution technology that lowers the bar. Because of this technology problems can be solved easier with far less resources than ever before. The price you pay for this is simple, machine learning will not catch everything. You will have to be fine with a 80% quality with 0% effort.
 
Thanks again for all the great input, keep commenting and I will keep posting.
Service Assurance's Future with Machine Learning

About the Author

Serial entrepreneur and operations subject matter expert who likes to help customers and partners achieve solutions that solve critical problems.   Experience in traditional telecom, ITIL enterprise, global manage service providers, and datacenter hosting providers.   Expertise in optical DWDM, MPLS networks, MEF Ethernet, COTS applications, custom applications, SDDC virtualized, and SDN/NFV virtualized infrastructure.  Based out of Dallas, Texas US area and currently working for one of his founded companies – Monolith Software.

The Technology of Machine Learning with AI

The technology of machine learning with AI should be our first focus. As with all new technology it has new terms and new concepts. Several of these are heretical to the status quo. Its important to set a proper context so we can have serious discussions.
 
Good to my word, here is the first blog in the series on machine learning with AI in service assurance. To explain some of the solutions in the marketplace, first we need to talk about the technology. The terms and concepts are new. The goals are the same for operations: increasing automation, increasing quality.

Defining Machine Learning

Technology of Machine Learning 
Let’s start on machine learning. First check out the wikipedia article. Below is the definition:
 
“Machine learning is a field of computer science that often uses statistical techniques to give computers the ability to “learn” (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.”
 
I summarize it like an ant farm. The farm, made by worker ants, creates the tunnels. Once done the model is complete. In this case, ants are the machine learning. As days go on and if necessary the ant farm will change. Then, say the farm falls over (oops). So the ants have to re-build and new pathways in the model, updating them. The fact is the ant farm is a always changing and learning from the environment. Like the ants, machine learning builds and maintain the patterns, or as I call it models. These models compare to the current situation in real-time. Artificial intelligence can see if they align (chronics) or diverge (anomalies).
 

Defining AI

Technology of Machine Learning
Now AI as defined by wikipedia as “the study of intelligent agents: any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.
 
The complexity of AI is a spectrum. On one side some AI is cognitive, virtual reasoning. On the other, AI is nothing more than rules processing. When in the context of machine learning, AI is usually applying or comparing models. This is what machine learning has learned with a separate set of data. In the context of service assurance, doing the comparison is the value. Comparing the past with present or projecting the future using the past. This provides analytics with insights.

Understanding Anomalies

 When performing a compare, either they align or diverge within some set degree. If the current situation is a repeat of the past, you have detected a chronic situation. When the current situation is new and unusual its called an anomaly. Datascience.com defines three types of anomalies:
 
  • Point anomalies: A single instance of data is anomalous if it’s too far off from the rest. Business use case: Detecting credit card fraud based on “amount spent.”
  • Contextual anomalies: The abnormality is context specific. This type of anomaly is common in time-series data. Imagine spending $100 on food every day during the holiday season is normal, but may be odd otherwise.
  • Collective anomalies: A set of data instances collectively helps in detecting anomalies. Imagine someone is trying to copy data form a remote machine to a local host. If this odd, an anomaly that would flag this as a potential cyber attack.
Consider chronics as a “negative” anomalies. Many customers I have talked to, see chronic detection being the most common AI tool to have. Both anomalies and chronics are important AI tools operations can use to better monitor their estate.
 

Focusing on Service Assurance

Technology of Machine Learning

This blog focuses on technology as it applies to service assurance (my major focus). Service assurance, as defined by wikipedia, is:
 
“The application of policies and processes by a Communications Service Provider to ensure that services offered over networks meet a pre-defined service quality level for an optimal subscriber experience.”
 
Otherwise defined as assuring the quality of the service. Two ways to increase the quality of your services. One, automate resolutions, thus reducing how much issues impact customers. Two, proactively addressing issues before they become problems and problems before outages.

Increasing Automation with Machine Learning

Increasing automation, reducing downtime, is always a goal for operations. Machine learning with AI tools focus on correlation. The ability to segment faults into new terms like “situations”. Addressing each situation is key, but they must be actionable. Any correlation would leverage a machine learning built model. Then compare that model against the current fault inventory to produce the segmentation. This is the current focus of the industry today with mixed results as we will discuss later.

Being Proactive with Machine Learning

 Being proactive is another buzzword from the late 90s — that was a decade ago right? The reality is its hard, you need to provide educated guesses. You can make quality guesses if you don’t have the data. Data reduction, prevalent in the industry, has caused operations to discard 99% (or more) of their data. Without this data your guesses will be poor. A machine learning model with AI can leverage low level information that may predict a future outage. That prediction will give operations lead time to fix the issue before an outage occurs. This is the hype focus on the industry today.
 
It is important to provide context when discussing machine learning with AI. This helps the us all understand the technology to enable the deeper discussion. The concepts and terms are new, but the industry hasn’t changed. Next up will be applying this technology to problems experienced by the industry.
Journey into Machine Learning AI

About the Author

Serial entrepreneur and operations subject matter expert who likes to help customers and partners achieve solutions that solve critical problems.   Experience in traditional telecom, ITIL enterprise, global manage service providers, and datacenter hosting providers.   Expertise in optical DWDM, MPLS networks, MEF Ethernet, COTS applications, custom applications, SDDC virtualized, and SDN/NFV virtualized infrastructure.  Based out of Dallas, Texas US area and currently working for one of his founded companies – Monolith Software.