Moving bits around: Automate Deployment Server Administration with GitHub Actions

Planning a sequel to the blog –  Moving bits around: Deploying Splunk Apps with Github Actions – led me to an interesting experiment. What if we could manage and automate the deployment server the same way, without having to log on to the server at all. After all, the deployment server is just a bunch of app directories and a serverclass.conf file.

Read more

Reducing Outlier Noise in Splunk

This blog is a continuation of the blog “Using Density Function for Advanced Outlier Detection“. Given the unique but complementary topics of the previous blog and present one, we decided to separate them. This blog describes a single approach to dealing with excess noise in outliers detection use-cases. While multiple methods of reducing noise exist, this is one that has worked (at least in my experience) at multiple projects throughout the Splunk-verse to reduce outliers noise.

Multi-Tier Approach to Reducing Noise

Adding to the plethora of existing noise reductions techniques in the alert management space. We’ve use a multi-tiered approach to find outliers at an entity, system and organization level. Once implemented, we can correlate outliers at each stage to answer the one of the biggest questions in outlier detection – ‘Was this timeframe a true outlier?’. In this section we will discuss the theory of reducing outliers with some visual aide to explain our concept.

There are three tiers we can general look at for outliers when investigating and outlier use case. These tiers in my opinion can be classified into entity-level, system-level, aggregate level. In each of these tiers, we can utilize density function or any other methods such as LocalOutlier, Moving Averages and Quartile ranges to find timeframes that stood. Once the timeframes have been detected, we correlation with each tier to determine when did the outlier occur.

For clarity, the visual below shows what a 3-tier approach might look like. From the ground-up we start looking at outliers from an entity-level, at the second stage we look look at a group that can identify a collection of entities. These collection of entities could be AD Groups, business units, network zones and much more.

This shape does not have to be a pyramid but represents the general # of outliers at each tier
Hierarchy of Mutli-Tier Approach

Combining Outliers in a Multi-Tier Approach

After determining the the outlier method at each tier, our next step is to correlate and combe the outliers. Its important in the planning phase to find a common field across all tiers. I would recommend using “_time” in 15 or 30 minute buckets as the common field. Our outlier detection process will end up looking similar to the visual below, where each level has its unique search running and outputs a list of outliers based on ‘_time’ as the common field. The split_by fields can be different at each tier, this will allow us to find out which entity as part of a system or aggregate group was marked as an outlier at a certain time.

If any user, group or count is an outlier, we will assign that time bucket a score of 1.
Multi-tier Outlier Process

After running the outlier detection searches, we can priortize outliers based off a tally or ranking system. Observe the tables on the right side in the picture above. Each timeframe is either a 1 or 0 if it was detected as outlier. ML algorithms automatically assign the is_outliers a value of ‘1’. For other methods we may have to manually give it the value of 1. Lets add up all the outlier count based on the timeframe.

Timeframeoutlier_count
11-02-2022 02:00:003 (high priority)
11-02-2022 17:40:002 (mid or low priority)
01-02-2022 13:30:000 (not outlier)
Total Count of Outliers

Adding the outlier count for each timeframe in each level will give us an idea on what we should emphasize on. Timeframes with the max 3 out of 3 outliers should take precedence in our investigations over timeframes that have 2 out of the possible 3 outlier count.

Conclusion

In the field, I’ve encountered many area’s where we have needed to adjust the thresholds and also find a way to reduce or analyze the outlier result. In doing so, a multi-tier approach has worked in some of the following specific scenarios:

  • Multi-tier data is available
  • Adjusting single outlier function (such as density function) captures too much or too little
  • Investigation into outlier leads to correlating if another feature/data source had outliers at a specific time

This can be complex to set-up, however one set-up its a repetitive process that can be applied to many use-cases that use outlier or anomaly detection.