Posts

Introducing the Cribl Search App for Splunk

Search Your Cribl Search Data Directly From Splunk: Stop Moving Logs. Stop Switching Tabs. Start Searching Data

We’re excited to announce the Cribl Search App for Splunk, an integration that lets you query your Cribl Search data—directly from the Splunk search interface.

Whether you’re hunting for threats in long-term archives or reporting on a high-volume API that isn’t worth the cost of indexing, this app brings the results back into Splunk as standard events. No more switching tabs; no need for “rehydration” of data from Cribl to be able to use it in Splunk searches. 

The Cribl Search App for Splunk introduces a custom generating command, | criblsearch, to your Splunk environment.  It sends your Cribl Query request to Cribl Search and streams the results back into your Splunk search pipeline.

Once the data hits Splunk, you can treat it just like any other event in SPL. You can pipe it into stats, eval, outputlookup, use on your favourite dashboards, or write it to an index with collect.

Core Features

  • Multi-Endpoint Control: Search multiple Cribl Search environments.
  • Enterprise Auth:  Authenticates to Cribl Cloud using OAuth and securely stores Credentials using Splunk Secure Credential storage
  • Any Splunk compatible: Built to meet Splunk Cloud app vetting standards for seamless installation in both on-prem and cloud Splunk environments.

Cribl Search: A Primer

Cribl search allows you to search data where it lives.  It can search data from many sources including: Cribl Lake, Cribl Edge, Amazon Security Lake, Amazon S3, Azure Blob Storage, Azure Data Explorer, Google Cloud Storage, Elasticsearch, Opensearch, Prometheus, Snowflake, ClickHouse, and data from quite a few APIs (AWS, Azure, GCP, Google Workspace, Microsoft Graph, Okta, Tailscale, Zoom, and a Generic http API data source provider that allows you to search ones not already covered)

The benefits of Cribl Search are:

  • Slash Costs: Leave “low-value” logs in cheap object storage (S3). Search them only when you need them.
  • Instant Visibility: Access logs the second they hit your storage. No waiting for indexing delays.
  • Zero Infrastructure Bloat: Scale your search capabilities without adding more Splunk Indexers.

Cribl Search documentation can be found here.

Example Use Cases

1. Incident Response: Finding the initial compromise from long-term storage

The Challenge: An alert triggers today, but the compromise started 45 days ago. Data in Splunk is set to age out at 30 days,  so those logs were moved to cold storage to save on Splunk storage costs.
The Solution: Pivot instantly to your S3 archive using Cribl Search directly in Splunk:

SQL
| criblsearch query="dataset:'firewall_archive' latest=-30d src_ip=='192.0.2.50' dest_ip=='27.133.154.218'"
| stats count by action, dst_port
| where action!="Blocked"

Impact: Get your full forensic timeline in a few minutes, not hours of manual data recovery, and no need to go into Cribl to set up a rehydration job for these events to be available.

2. High-Volume, Low-Value Logs

The Challenge: Your API generates 5TB of “200 OK” logs daily. Indexing them is a waste of money, but you need them for monthly compliance reports.
The Solution: Run the audit search across your data lake and bring only the summary data needed for the report back to Splunk:

SQL
| criblsearch query="dataset:'api_logs' | where response_time > 5000 | summarize avg(response_time) AS avg_latency by endpoint" 
| table avg_latency endpoint
| outputlookup monthly_api_report.csv

Impact: 100% visibility for 0% additional indexing cost.

3. Cross-Cloud Correlation (The “Power Join”)

The Challenge: You suspect a credential spray attack hitting both AWS and Azure, but the logs live in Cribl Search datasets.
The Solution: Use Splunk to join results from the two Cribl Search datasets:

SQL
| criblsearch query="dataset:'aws_cloudtrail' event=='ConsoleLogin'"
| rename sourceIPAddress AS src_ip, userIdentity.principalId AS user
| append [ 
    | criblsearch query="dataset:'azure_audit' event=='SignInActivity'"
    | rename ipAddress AS src_ip, userPrincipalName AS user
  ]
| stats count values(user) by source_ip
| where count > 5

Impact: Multi-cloud threat hunting from a single search bar.

Get Started

  1. Install: Download the Cribl Search App for Splunk from Github. Install the app to your Splunk Search Head or Search Head Cluster.
  2. Connect: Enter your Cribl Cloud credentials on the configuration page
  3. Search: Start your first query with | criblsearch query="..." and see your data lake come to life.

Are you ready to unlock your data?

Download the App on Github

View the Documentation

Migrating Syslog to Cribl Stream: The Art of the “Zero Change” Migration

We’ve all been there. You’re ready to modernize your observability pipeline. You’ve got the green light to move from legacy syslog servers (like syslog-ng) to Cribl Stream. It sounds like a straightforward lift-and-shift, right? But then you flip the switch, and suddenly your downstream SIEM is screaming about unparsed events, your timestamps are drifting, and your load balancers are pinning traffic to a single node.

Read more

Cribl and GitOps: From Development to Production

If you’re running Cribl Stream in a distributed environment, you already know the Leader Node is critical, and a Git is non-negotiable for handling config bundling and version control. You’ve probably already discovered how painful it is to experience inconsistencies between development and production, and ultimately, these can lead to unexpected outages, security vulnerabilities, or compliance violations. To avoid this, we like to implement a full GitOps workflow. This way, you apply disciplined CI/CD methods to your configurations, enforcing change control through standard Pull Requests, ensuring everything is auditable, and keeping production rock-solid.

The Foundation: Git Integration in Cribl Stream

For us to implement any truly sophisticated change management within a distributed Cribl environment, Git integration is the absolute essential building block. Since Cribl’s architecture involves a Leader Node coordinating multiple Worker Groups, having centralized version control isn’t just a best practice – it’s mandatory. The Leader Node simply won’t start without it installed in a distributed deployment.

Why Git is Non-Negotiable for Cribl Leaders

Git provides several immediate, built-in benefits essential for managing your dynamic data pipelines:

  • Audit Trails: Every configuration change is recorded in Git, creating a history of who changed what and when, satisfying crucial security and compliance needs.
  • Version Comparison and Reversion: It’s an easy way to compare different configuration versions, simplifying the process of identifying and isolating problematic changes, and enabling rapid rollback when necessary.
  • Configuration Bundling: On a fundamental level, the Cribl Leader uses Git to bundle the finalized configurations, which are then distributed to the Workers in the field.

Beyond Local Commits: Leveraging Remote Git

While a basic deployment just relies on local commits for managing configurations, we find that a true enterprise-grade strategy needs to utilize Remote Git integration, using tools like GitHub or Bitbucket. This remote capability is a robust backup and disaster recovery solution. The key advantage here is redundancy, since the Leader Node holds the main copy of all configurations; its failure could be catastrophic. By simply setting up the Cribl Leader to push its configurations on a schedule to that remote repository, we ensure an off-instance backup. That way, if a primary Leader Node ever goes down, we can always spin up and restore a new Leader directly from the last known-good configuration copy in Remote Git, drastically reducing our recovery time.

Implementing Full GitOps: CI/CD for Data Pipelines

GitOps elevates Git beyond a backup tool; we use it as the single source of truth for my entire data pipeline ecosystem. We believe this model is ideal for organizations that need stringent control, especially those handling complex regulatory requirements or massive volumes of mission-critical data. The core concept is pretty straightforward: it means rigorously separating the development and production environments and strictly governing the flow of all changes between them using standard Git branches and pull requests.

The Two-Environment GitOps Model

In this approach, you maintain two separate Cribl environments, each tied to a dedicated Git branch on the remote repository:

  1. Development Environment: Connected to the dev branch. All initial configuration work – such as building new data Sources, Destinations, or Pipelines – is done here.
  2. Production Environment: Connected to the prod branch. Crucially, the Production Leader is set to a read-only mode. This hard constraint prevents manual, unauthorized changes directly in production, forcing all changes to follow the GitOps pipeline.

The Standard GitOps Workflow

The flow for deploying a new configuration involves a structured, multi-step process:

  1. Development and Commit: You will need to create or modify a configuration (e.g., a new Pipeline) in the Dev Leader. Then use the UI to deploy the changes to the worker and to the remote Git repository’s dev branch.
  2. Pull Request and Review: Create a Pull Request (PR) to merge the changes from the dev branch into the prod branch. This triggers a review by the Cribl Administrator or a designated approver.
  3. Merge and Automation: Once reviewed and approved, the PR is merged, updating the prod branch with the verified configuration. This merge action does not automatically deploy the configuration to the Production Leader.
  4. External Sync Trigger: To apply the changes, an external CI/CD tool (such as Jenkins, GitHub Actions, or a homegrown script) must trigger the Production Leader. You can do this by hitting the Leader’s REST API endpoint /api/v1/version/sync  
  5. Deployment to Workers: Once the Production Leader has the new configuration, it automatically distributes the update to its connected Workers.

Handling Environment-Specific Configurations

A key challenge in this two-environment model is that, by default, all development configurations are pushed to production. This isn’t always desirable, and sometimes you need granular control. This is where using environment tags comes into play to manage state:

  • C.LogStreamEnv Variable: Cribl automatically manages a C. LogStreamEnv variable that identifies whether an instance is DEV or PRD (Production).
  • Selective Configuration: The environment tag can be used in JavaScript expressions for Sources and Destinations. For example, a Destination defined for production will be enabled in the Prod environment but will appear disabled (“greyed out”) in the Dev environment, offering necessary flexibility while maintaining the core GitOps flow.

Use Case : Updating Lookup Files in GitOps

With enabling GitOps, one interesting use case we have come across is updating a lookup in Cribl via Git. While Cribl provides REST API endpoints for programmatically updating lookups, this customer was interested in using their existing CI/CD process for providing a self-service capability to their users for updating the lookup file. The following steps detail how the update flow looks:

  1. User Update: The user (or an automated script) updates the Lookup File directly within the remote Git repository’s dev branch.
  2. Pull Request and Review: Create a Pull Request (PR) to merge the changes from the dev branch into the prod branch. This triggers a review by the Cribl Administrator or a designated approver.
  3. Merge and Automation: Once reviewed and approved, the PR is merged, updating the prod branch with the verified configuration. This merge action does not automatically deploy the configuration to the Production Leader.
  4. External Sync Trigger: To apply the changes, an external CI/CD tool (such as Jenkins, GitHub Actions, or a homegrown script) must trigger the Production Leader. You can do this by hitting the Leader’s REST API endpoint /api/v1/version/sync
  5. Update DEV leader: Since the lookup update happened directly on the dev branch, the DEV leaders is not aware of the change and we need to do a git pull on the dev leader to keep it up to date with the branch. This again can be part of the external trigger automation

Final Thoughts

Transitioning to a GitOps workflow for Cribl Stream elevates how we manage our data pipelines, moving us away from manual, error-prone changes toward a scalable, auditable, and secure CI/CD process. By embracing Git as the control plane for configuration, we gain the confidence that every single deployment is consistent, every change is traceable, and the production environment is protected by a strong, automated defense against unauthorized modifications. This is more than just an operational improvement; it’s a critical step in building a truly resilient and compliant data observability platform.


Looking to expedite your success with Cribl? View our Cribl Professional Service offerings.

Discovered Intelligence Inc., 2025. Unauthorized use and/or duplication of this material without express and written permission from this site’s owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Discovered Intelligence, with appropriate and specific direction (i.e. a linked URL) to this original content.

Using Cribl Search to Monitor Instances in Google Cloud Platform (GCP)

One recurring challenge in managing cloud environments is the tendency for lab and development instances to remain active long after they’re needed. While it might seem like a small oversight, the impact can be significant. These idle instances rack up unnecessary costs, drain valuable resources, and open the door to security vulnerabilities. Configuring effective monitoring to notify about the running instances is a good way to address this problem.

Read more

Beyond Smart: When ‘Always On’ Mode is the Best Choice for Cribl Persisent Queues

If your Cribl environment was set up a few years ago, it might be time to revisit some of your settings—particularly the Persistent Queue (PQ) settings on your source inputs. Recently, while troubleshooting an issue, I discovered that the PQ settings were the root cause of the problem. I wanted to share my findings in case they help you optimize your Cribl setup.

Read more

Cribl Stream: Things I wish I knew before diving in

If you are like me when I started with Cribl, you will have plenty of Splunk knowledge but little to no Cribl experience. I had yet to take the training, had no JavaScript experience, and only had a basic understanding of Cribl, but I didn’t let that stop me and just dove in. Then I immediately struggled because of my lack of knowledge and spent countless hours Googling and asking questions. This post will list the information I wish I had possessed then, and hopefully make your first Cribl experience easier than mine.

Cribl Quick Reference Guide

If I could only have one item on my wish list, it would be to be aware of the Cribl Quick Reference Guide. This guide details basic stream concepts, performance tips, and built-in and commonly used functions.

Creating that first ingestion, I experienced many “how do I do this” moments and searched for hours for the answers, such as “How do I create a filter expression?” Generally, filters are JavaScript expressions essential to event breakers, routes, and pipelines. I was lost unless the filter was as simple as 'field' == 'value.' I didn’t know how to configure a filter to evaluate “starts with,” “ends with,” or “contains.” This knowledge was available in the Cribl Quick Reference Guide in the “Useful JS methods” section, which documents the most popular string, number and text Javascript methods.

Common Javascript Operators

OperatorDescription
&&Logical and
||Logical or
!Logical not
==Equal – both values are equal – can be different types.
===Strict equal – both values are equal and of the same type.
!=Returns true if the operands are not equal.
Strict not equal (!==)Returns true if the operands are of the same type but not equal or are of different kinds.
Greater than (>)Returns true if the left operand is greater than the right operand.
Greater than or equal (>=)Returns true if the left operand is greater than or equal to the right operand.
Less than (<)Returns true if the left operand is less than the right operand.
Less than or equal (<=)Returns true if the left operand is less than or equal to the right operand.

Regex

Cribl uses a different flavour of Regex. Cribl uses ECMAScript, while Splunk uses PCRE2. These are similar, but there are differences. Before I understood this, I spent many hours frustrated that my Regex code would work in Regex101 but fail in my pipeline.  

Strptime

It’s almost identical to the version that Splunk uses, but there are a few differences. Most of my problems were when dealing with milliseconds. Cribl uses %L, while Splunk uses %3Q or %3N.   Consult D3JS.org for more details on the strptime formatters.

JSON.parse(_raw)

When the parser function in a pipeline does not parse your JSON event, it may be because the JSON event is a string and not an object. Use an eval function with the Name as _raw and the Value Expression set to JSON.parse(_raw), which will convert the JSON to an object. A side benefit of JSON.parse(_raw) is that it will shrink the event’s size, so I generally include it in all my JSON pipelines.

JSON parse example

Internal Fields

All Cribl source events include internal fields, which start with a double underscore and contain information Cribl maintains about the event. Cribl does not include internal fields when routing an event to a destination. For this reason, internal fields are ideal for temporary fields since you do not have to exclude them from the serialization of _raw. To show internal fields, click the … (Advanced Settings) menu in the Capture window and toggle Show Internal Fields to “On” to see all fields.

Cribl source internal fields

Event Breaker Filters for REST Collector or Amazon S3  

Frequently, expressions such as “sourcetype=='aws:cloudwatchlogs:vpcflow‘” are used in an Event breaker filter, but sourcetype cannot be used in an Event Breaker for a REST Collector or an Amazon S3 Source. This is because this sourcetype field is set using the input’s Fields/Metadata section, and the Event Breaker is processed before the Field/Metadata section. 

For a REST collector, use “__collectible.collectorId=='<rest collector id>'” internal field in your field expression, which the REST collector creates on execution. 

Amazon S3 source

For further information, refer to the Cribl Docs – Event Processing Order.

Dropping Null fields

One of Cribl Stream’s most valuable functions is the ability to effortlessly drop fields that contain null values. Within the parser function, you can populate the “Fields Filter Expression” with expressions like value !== null.

Some example expressions are:

ExpressionMeaning
value !== nullDrop any null field
value !== null || value==’N/A’Drop any field that is null or contains ‘N/A’
dropping null fields

Once I obtained these knowledge nuggets,  my Cribl Stream was more efficient.  Hopefully, my pain will be your gain when you start your Cribl Stream journey.


Looking to expedite your success with Splunk and Cribl? Click here to view our Professional Service offerings.

© Discovered Intelligence Inc., 2024. Unauthorized use and/or duplication of this material without express and written permission from this site’s owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Discovered Intelligence, with appropriate and specific direction (i.e. a linked URL) to this original content.

Introducing the benefits and features of Cribl Lake

April marked the beginning of a new era for Cribl with the introduction of Cribl Lake, which brings Cribl’s suite of products full circle in the realm of data management. In this post we dive a bit deeper into some of the benefits and features of Cribl Lake.

Read more

Deploying Cribl Workers in AWS ECS for Data Replay

Cribl Stream provides a flexible way of storing full-fidelity raw data into low-cost storage solutions like AWS S3 while sending a reduced/filtered/summarized version into Analytical Platforms for cost-effectiveness. In this blog post, I’ll walk you through setting up Cribl workers on AWS ECS and implementing dynamic auto scaling for seamless scale-out and scale-in as the demand fluctuates.

Read more

Building a Unified View: Integrating Google Cloud Platform Events with Splunk

By: Carlos Moreno Buitrago and Anoop Ramachandran

In this blog we will talk about the processes and the options we have to collect the GCP events and we will see how to collect those in Splunk. In addition, we will even add integration with Cribl, as an optional step, in order to facilitate and optimize the process of information ingestion. After synthesizing all of this great information, you will have a great understanding of the available options to take, depending on the conditions of the project or team in which you work.

Read more

Help Getting Started with Cribl Stream

Getting Started With Cribl

Once you have embraced and grasped the power of Cribl Stream, “Reduce! Simplify!” will become your new mantra.

Here we list some of the best Cribl Stream resources available to get you started. Most of these resources are completely free! – money is not an obstacle when beginning your Cribl Stream journey, so keep reading and start learning today!

Read more