At Discovered Intelligence the holiday season is an opportunity to celebrate our employees’ contributions to a successful year, by connecting with everyone over a big holiday dinner and surprising them with a fun holiday gift.
These gifts vary from year to year, but for the past few years we have put together a gift bag for our employees containing products from independent small businesses across Ontario, to make up a very tasty and warm gift bag. The last few years have been particularly challenging for small businesses, due to the many lockdowns and ‘shopping local’ is a great way to provide support.
Read on to find out more about the items in our employee holiday bag and the small businesses they were sourced from. We urge everyone to please not forget to continue to support small businesses in 2023!
Dried Cherries
Sourced from Cherry Lane, cherry farm in Vineland, Ontario
Since 1907, Cherry Lane has been providing tart cherries, tart cherry juice concentrate, and a variety of other fruit products to consumers. The members of the Smith family, who own and operate Cherry Lane, have always been proud to be Ontario fruit farmers.
Peanuts were introduced to the Southwestern Ontario’s sandy soils of Norfolk County in 1979, by the Picard Family. You’ll find the best quality confections, freshly roasted peanuts – in many flavours and varieties and at each of their five locations in Ontario.
Sourced from Wellesley Brand Apple Products, Wellesley, Ontario
Wellesley has been manufacturing Quality Apple Products using the finest ingredients & time-tested recipes for over 75 years. All products use Ontario grown, fresh quality apples and are naturally sweet, made without the use of added sugar or sweeteners.
Sourced from the Niagara Food Company, St Catharine’s, Ontario
Niagara Food Company has been creating home-style, gourmet meals and sweet treats since 2010. We source local ingredients to create high quality, delicious food and snacks.
Sourced from CFX – Chocolate Factory Experience, St David’s, Ontario
Carefully crafted using artisanal chocolate panning and molding techniques at their in-house chocolate factory, CFX provides an incredible assortment of products that can delight even the most discriminating palate.
Sourced from Alexander’s Fudge, Smithville, Ontario
Alexander’s Fudge was started in Smithville, Ontario in 2013 by Chris Alexander. Made with real butter and cream, Alexander’s Fudge is simply the best you’ve ever had!
The Moyer apple farm has been in the family since 1799. The company specializes in thoughtfully crafted caramel apples and gourmet sauces that look great and taste even better, so that you enjoy every bite.
Interested in learning more about working at Discovered Intelligence? Good news, we’re hiring!Click here for more details.
https://discoveredintelligence.com/wp-content/uploads/2022/12/gift_basket_2022.jpg24352268paulhttps://discoveredintelligence.com/wp-content/uploads/2013/12/DI-Logo1-300x137.pngpaul2022-12-20 16:30:382022-12-20 16:53:302022 Holiday Gift – Supporting Small Businesses in Ontario
Deploying apps to forwarders using the Deployment Server is a pretty commonplace use case and is well documented in Splunk Docs. However, it is possible to take this a step further and use it for distribution of apps to the staging directories of management components like cluster manager or a search head cluster deployer, from where apps can then be pushed out to clustered indexers or search heads.
Splunk Cloud Admins rejoice! The Splunk Cloud ACS Command Line Interface is here! Originally, the Splunk Cloud Admin Config Service (ACS) was released in January 2021 to provide various self-service features for Splunk Cloud Admins. It was released as an API-based service that can be used for configuring IP allow lists, configuring outbound ports, managing HEC tokens, and many more which are all detailed in the Splunk ACS Documentation.
To our excitement Splunk has recently released a CLI version of ACS. The ACS CLI is much easier to use and less error-prone compared to the complex curl commands or Postman setup one has had to deal with to-date. One big advantage we see with the ACS CLI is how it can be used in scripted approach or within a deployment CI/CD pipeline to handle application management and index management.
We would recommend that you first refer to the ACS Compatibility Matrix to understand what features are available to the Classic and Victoria experience Splunk Cloud platforms.
ACS CLI Setup Requirements
Before you get started with the ACS CLI there are a few requirements to be aware of:
You must have the sc_admin role to be able to leverage the ACS CLI.
You must be running a Mac or Linux operating systems. However, if you are a Windows user you can use the Windows Subsystem for Linux (WSL), or any Linux VM running on Windows, to install and use the ACS CLI.
The Splunk Cloud version you are interacting with must be above 8.2.2109 to use the ACS CLI. To use Application Management functions, your Splunk Cloud version must be 8.2.2112 or greater.
Please refer to the Splunk ACS CLI documentation for further information regarding the requirements and the setup process.
ACS CLI Logging
At the time of authoring this blog, logging and auditing of interactions through the Splunk Cloud ACS is not readily available to customers. However, when using the ACS CLI it will create a local log on the system where it is being used. It is recommended that any administrators given access to work with the ACS CLI have the log file listed below collected and forwarded to the their Splunk Cloud stack. This log file can be collected using the Splunk Universal Forwarder, or other mechanism, to create an audit trail of activities.
Linux: $HOME/.acs/logs/acs.log
Mac: $HOME/Library/Logs/acs/acs.log
The acs.log allows an administrator to understand what operations were run, request IDs, status codes and much more. We will keep an eye out for Splunk adding to the logging and auditing functionality not just in the ACS CLI but ACS as a whole and provide a future blog post on the topic when available.
Interacting With The ACS CLI
Below are examples of common interactions an administrator might have with Splunk Cloud now done by leveraging the Splunk Cloud ACS CLI. There are many more self-service features supported by the ACS CLI, details of the supported features and CLI operations are available in the Splunk Cloud ACS CLI documentation.
Application Management
One of the most exciting features of the ACS CLI is the ability to control all aspects of application management. That means, using the ACS CLI you can install both private applications and Splunkbase applications.
The command is easy to understand and straightforward, for both private and Splunkbase applications it supports commands to install, uninstall, describe applications within your environment as well as a list command to return a complete list of all installed applications, with their configurations. Specific to only Splunkbase applications there is an update command which allows you to, you guessed it, update the application to the latest version published and available.
For both private and Splunkbase apps, when running a command it will prompt you to enter your splunk.com credentials. You can pass –username –password parameters along with the command to avoid prompting for credentials. For private apps these credentials will be used to authenticate to AppInspect for application vetting.
Application Management: Installing a Private App
Let’s look at how we use the ACS CLI to install a private application. The following command will install a private app named company_test_app:
acs apps install private --acs-legal-ack Y --app-package /tmp/company_test_app.tgz
Now when a private app is installed using the ACS CLI it will automatically be submitted to AppInspect for vetting. A successful execution of the command will result in the following response, which you will note includes the AppInspect summary:
Submitted app for inspection (requestId='*******-****-****-****-************') Waiting for inspection to finish... processing.. success Vetting completed, summary: { "error": 0, "failure": 0, "skipped": 0, "manual_check": 0, "not_applicable": 56, "warning": 1, "success": 161 } Vetting successful Installing the app... { "appID": "company_test_app", "label": "Company Test App", "name": "company_test_app", "status": "installed", "version": "1.0.0" }
Application Management: Installing a Splunkbase Application
Let’s now look at an example of installing a Splunkbase application by running a command to install the Config Quest application:
The licensing URL passed as a parameter in the command above can be found in the application details on Splunkbase. Additionally, by running a curl command the licensing URL can be retrieved from the Splunkbase API:
Index management using the ACS CLI supports a wide range of functionality. The supported commands allow you to create, update, delete and describe an index within your environment as well as a list command to return a list of all of the existing indexes, with their configurations.
Let’s now look at how we run one of these commands by running a command that creates a metrics index with 90 days searchable retention period. Note that ACS supports creating either event or metrics index, however it does not yet support configuring DDAA or DDSS.
Managing HTTP Event Collector (HEC) token’s just got real easy. The ACS CLI supports commands to create, update, delete and describe a HEC token within your environment as well as a list command to return a list of all of the existing HEC token’s, with their configurations.
Let’s now look at how we run one of these commands by running a command to create a HEC token in Splunk Cloud quickly and easily:
acs hec-token create --name test_token --default-index main --default-source-type test
A successful execution of the command provides the token value in the JSON response:
Planning a sequel to the blog –Moving bits around: Deploying Splunk Apps with Github Actions – led me to an interesting experiment. What if we could manage and automate the deployment server the same way, without having to log on to the server at all. After all, the deployment server is just a bunch of app directories and a serverclass.conf file.
This blog is a continuation of the blog “Using Density Function for Advanced Outlier Detection“. Given the unique but complementary topics of the previous blog and present one, we decided to separate them. This blog describes a single approach to dealing with excess noise in outliers detection use-cases. While multiple methods of reducing noise exist, this is one that has worked (at least in my experience) at multiple projects throughout the Splunk-verse to reduce outliers noise.
Multi-Tier Approach to Reducing Noise
Adding to the plethora of existing noise reductions techniques in the alert management space. We’ve use a multi-tiered approach to find outliers at an entity, system and organization level. Once implemented, we can correlate outliers at each stage to answer the one of the biggest questions in outlier detection – ‘Was this timeframe a true outlier?’. In this section we will discuss the theory of reducing outliers with some visual aide to explain our concept.
There are three tiers we can general look at for outliers when investigating and outlier use case. These tiers in my opinion can be classified into entity-level, system-level, aggregate level. In each of these tiers, we can utilize density function or any other methods such as LocalOutlier, Moving Averages and Quartile ranges to find timeframes that stood. Once the timeframes have been detected, we correlation with each tier to determine when did the outlier occur.
For clarity, the visual below shows what a 3-tier approach might look like. From the ground-up we start looking at outliers from an entity-level, at the second stage we look look at a group that can identify a collection of entities. These collection of entities could be AD Groups, business units, network zones and much more.
Combining Outliers in a Multi-Tier Approach
After determining the the outlier method at each tier, our next step is to correlate and combe the outliers. Its important in the planning phase to find a common field across all tiers. I would recommend using “_time” in 15 or 30 minute buckets as the common field. Our outlier detection process will end up looking similar to the visual below, where each level has its unique search running and outputs a list of outliers based on ‘_time’ as the common field. The split_by fields can be different at each tier, this will allow us to find out which entity as part of a system or aggregate group was marked as an outlier at a certain time.
After running the outlier detection searches, we can priortize outliers based off a tally or ranking system. Observe the tables on the right side in the picture above. Each timeframe is either a 1 or 0 if it was detected as outlier. ML algorithms automatically assign the is_outliers a value of ‘1’. For other methods we may have to manually give it the value of 1. Lets add up all the outlier count based on the timeframe.
Timeframe
outlier_count
11-02-2022 02:00:00
3 (high priority)
11-02-2022 17:40:00
2 (mid or low priority)
01-02-2022 13:30:00
0 (not outlier)
Total Count of Outliers
Adding the outlier count for each timeframe in each level will give us an idea on what we should emphasize on. Timeframes with the max 3 out of 3 outliers should take precedence in our investigations over timeframes that have 2 out of the possible 3 outlier count.
Conclusion
In the field, I’ve encountered many area’s where we have needed to adjust the thresholds and also find a way to reduce or analyze the outlier result. In doing so, a multi-tier approach has worked in some of the following specific scenarios:
Multi-tier data is available
Adjusting single outlier function (such as density function) captures too much or too little
Investigation into outlier leads to correlating if another feature/data source had outliers at a specific time
This can be complex to set-up, however one set-up its a repetitive process that can be applied to many use-cases that use outlier or anomaly detection.
https://discoveredintelligence.com/wp-content/uploads/2013/12/DI-Logo1-300x137.png00Discovered Intelligencehttps://discoveredintelligence.com/wp-content/uploads/2013/12/DI-Logo1-300x137.pngDiscovered Intelligence2022-03-11 20:29:132024-07-28 15:24:56Reducing Outlier Noise in Splunk
In our previous blog we covered some common methods of finding an outliers. Starting with fixed thresholds to moving thresholds using averages and standard deviation. This forms the basis of data points that deviate from their norm. Using standard methods of outlier detection does have it pro’s and con’s. On one hand they are very easy to implement and derive reason for outlier detection. On the other hand, they are too simple
Key Concepts In this Blog
The terminology we will use will not differ significantly from our previous blog except with the introduction of the following terms:
Density Function Command: The density function command is housed within the MLTK application. Most mathematicians think about Z-score or normal distribution when a density function is mentioned. However within Splunk’s MLTK command there are 4 distributions that attempt to map the distribution of the data. They are ‘Normal (Z-score), Exponential, Gaussian Kernel Density and Beta Distribution’. The command will attempt to find the best distribution fit for the data given the split-by field.
More can be read about this command the Splunk blog post here.
Ensemble Methods: Ensemble methods is a technique within the Machine Learning space to combine multiple algorithms to create an optimal model. My favourite blog channel ‘towards data science’ explains it here.
Using MLTK Algorithms for Outlier Detection
Splunk’s Machine Learning Toolkit hosts 3 algorithms for anomaly detection. Whilst they each provide value, some may better suited in specific scenarios. Most if not all of the anomaly detection algorithms can be used on numerical and categorical fields. To illustrate their feature we picked the Density Function to enumerate and deep-dive into here.
DensityFunction
The density function is a popular and relatively simple method of finding outliers assuming no changes to default parameters. Underneath the hood though, it contains many configuration opens for users to fine tune their outlier detection. Lets cover some of the parameters I’ve come across tuning in the field:
parameter
default_value
description
when to adjust parameter
sample
false
When set to true, it takes sample from the ‘inliear’ region of the distribution.
When you have a large volume of data points + and need to remove initial outliers from your model. This is beneficial when you have more than a million events as the density function has a soft limit on events.
full_sample
false
When set to ‘true’ it takes sample from the ‘inliear’ and ‘outlier’ region of the distribution.
When you have a large volume of data points and would like to consider outliers samples in your outlier detection model. This is beneficial when you have more than a million events as the density function has a soft limit on events.
dest
auto
When set to ‘true’ the density of each data point will show.
Our suggestion is keep this as ‘auto’ . It allow MLTK to determine the best algorithm based on the values of each group. Set this as a specific distribution when the distribution is expected to a singular.
random_state
N/A
Identify which distribution you should like to run.
Use this parameter when developing your model on a fixed data set. Using the same random_state with all other configurations unchanged will return the same result. This is helpful to use when two or more users are working off the same data.
partial_fit
false
Incrementally ‘train’ and update the model based on new data. This is useful when you have a large volume of data and are unable to train.
Scenarios where using partial fit is helpful is to train your function over time where running a single fit command will reach an MLTK limit; such as over 7 days, 1 month and so on.
threshold
0.01
1 less than the threshold value shows the area under the curve.
This parameter can be adjusted to ease or restrict the criteria for detecting outliers. The smaller the value the less number of outliers (more restrictive) will be. On
List of commonly used parameters from Density Function
Adding onto a list of useful parameters, there is a ‘by’ clause within the Density Function. It gives the user the ability to group data based on a categorial field (grouping field). This grouping field can be location, intranet zone, host type, asset type, user type and much more. Lets start building some searches to utilize the density function:
Base Query
The base query to understand the Density Function a bit better is shown below. We are using the out-of-the-box lookup to run our query. The file ‘call_center.csv’ contains a list of call records count over a 2 month period starting in September 2017 till November. Next we are running 2 eval functions to extract the HourOfDay and DayOfWeek from the data.
Univariate in this context means that we will analyze outliers based on a single field of interest. For this section we picked ‘DayOfWeek’ as the field. We appended any new queries to our base query. This search will run the density function on the total count of activity based on the DayOfTheWeek. It will give us list of outliers, boundary range the outlier fill in and the probability density of the outlier. The two columns to focus on are “IsOutlier(count)” and “ProbabilityDensity(count)”. The lower the probability density, the higher the chance the value of count on the DayOfTheWeek has to be an outlier.
... | fit DensityFunction count by DayOfWeek into densityfunction_testmodel
Once we run the fit command, we can see 3 new columns created named “isOutlier(count), BoundaryRanges, ProbabilityDensity(count)”.
Using the density function is quite simple as using the fit command provided by MLTK. In order to use the DensityFunction algorithm effectively I will recommend to test and verify outliers with different parameters configurations. One such parameter that you can modify is the ‘threshold’ function. To test its impact on the result, I created a table below to analyze the outlier count based on different values.
Threshold
Outlier Count ‘(| stats count(eval(is_outlier=1)) as outlier_count’
0.01 (default)
116
0.05
590
0.09
1027
0.005
59
0.001
11
Table displaying outlier count on varied thresholds
After creating the table, it becomes clear as the thresholds value decreases, the DensityFunction is more restrictive in finding outliers. As we increase the threshold value outliers number increase as it becomes less restrictive.
Visualizing Changing Parameters with DensityFunction
To visual the impact of the different threshold levels we add the ‘show_option’ parameters to the density function as shown below:
... | fit DensityFunction count by DayOfWeek into densityfunction_testmodel threshold=0.005 show_density=true show_options="feature_variables, split_by, params"
After that, we use the MLTK ‘distribution plot’ visual to look at the distribution plot of the data.
From the above plot, each line represents each day of the week, Monday thru Sunday. However, to view this with clarity lets filter this visual to a single day only in the picture below.
... | fit DensityFunction count by DayOfWeek into densityfunction_testmodel threshold=0.005 show_density=true show_options="feature_variables, split_by, params"
| search DayOfWeek="Monday"
In the above distribution plot, we can see that on Monday, the call volumes of around 900, 1100 1250 and 1300 were detected as outliers. It might appear that the outliers exist on the right side of the graph because they are extremely high values compared to the rest of the calls volumes. However, they are detected as outliers due to the low probability of these high values occurring.
Looking at the same Distribution Plot of Monday, the range of values between 300-500 have a low count as well. As we increase the thresholds we to 0.05. Notice how more values with low counts will be selected. This is due to the lower probability of them occurring in the dataset of and us relaxing the threshold. This concept ties back directly to the table we built in the ‘Using The Density Function - Univariate Outlier‘ section.
... | fit DensityFunction count by DayOfWeek into densityfunction_testmodel threshold=0.05 show_density=true show_options="feature_variables, split_by, params"
| search DayOfWeek="Monday"
Conclusion
The DensityFunction is great and easy to use algorithm available along with MLTK for finding outliers. It’s very well documented in the Splunk Documentations and used commonly in the community. Some of the scenarios I’ve found it to be helpful in are listed below:
Quick and easy scenario’s use of ML to find outliers
Use cases where anomalies in aggregate counts are used to find outliers
Finding unexpected counts over times and hours when activity counts are not expected
However, one drawback and feature to using the DensityFunction is that it does not have historic or outside context. It will find outliers based on counts and its deviations. This can result in a same entities selected as outliers in multiple iterations over multiples times. One of the ways to tackle this is via adding context and using scores in Splunk. We talk about this method in our blog https://discoveredintelligence.ca/reducing-outlier-noise-in-splunk/.
https://discoveredintelligence.com/wp-content/uploads/2013/12/DI-Logo1-300x137.png00Discovered Intelligencehttps://discoveredintelligence.com/wp-content/uploads/2013/12/DI-Logo1-300x137.pngDiscovered Intelligence2022-02-14 15:44:062024-07-28 15:19:17Using DensityFunction for Outlier Detection in Splunk