Ditch the Deployment Server: Why We Used Ansible for Splunk in a Secure OT Environment

Have you ever tried to manage a net-new Splunk deployment across dozens of isolated gas plants while staring down an aggressive six-week deadline?

We recently partnered with a major gas extraction company to do exactly that. In their highly secure Industrial Control Systems (ICS) and Operational Technology (OT) environments, you can’t just “hope” your configurations stick; you need a process that is repeatable, version-controlled, and bulletproof.

When the network is locked down tighter than a bank vault, standard Splunk config-management doesn’t just not work — it becomes a security risk. Here is why we moved away from a traditional Splunk Deployment Server setup and leaned into Ansible to get the job done.

OT Challenge: Navigating the Purdue Model

Managing data in a standard IT environment is (mostly) straightforward. But our customers’ environment follows the Purdue Model—a network architecture of increasingly secured rings designed to protect critical assets like pumps, manufacturing tools, and sensors.

While the Purdue Model is great for security, it’s a bit of a nightmare for traditional Splunk management. Level 1 and 2 are incredibly locked down. Using Splunk Deployment Server (DS) would require punching holes in firewalls to allow forwarders to “phone home” for updates.  This is forbidden.

We faced a choice: introduce a new management technology that might trigger security red flags, or leverage the tool already in place. Since the customer already had Ansible “plumbed” into those secure OT layers for other tasks, it became our tool of choice for orchestration.

Why Infrastructure as Code (IaC)

When you’re onboarding nearly hundreds of GB of data per day across network devices, servers and appliances, manual configuration is a recipe for disaster.  We’ve all seen “configuration drift”—that slow, silent divergence where systems move away from standard configurations over time.

By using Ansible, we gained three critical advantages:

  1. Idempotency: We can run the same playbook ten times, and it will only make changes if the target state isn’t met. No accidental overwrites.
  2. Cross-Platform Consistency: We used the same playbook logic for both Linux and Windows hosts; the automation handled the heavy lifting.
  3. Tag-Based Flexibility: We utilized Ansible tags (like site14 or windows_uf) to handle different physical locations and server roles without needing separate “Server Classes” for every tiny variation.

Mapping Splunk Concepts to Ansible

If you’re comfortable with Splunk, the jump to Ansible is shorter than you think. We essentially re-mapped familiar Splunk architecture to Ansible equivalents:

Splunk ConceptAnsible ImplementationDescription
Deployment ServerAnsible Control NodeThe central “source of truth” running our playbooks.
Deployment ClientInventory HostEach forwarder (UF/IF) is defined in a YAML inventory file.
Server ClassesHost Tags and GroupsWe use tags like linux or uf to target specific systems.
Deployment AppsRoles & Files StructureApps are managed in Git and pushed to targets via playbooks.

What We Learned

Even with the best automation, a tight six-week turnaround like this had its “gotchas.” Here are two lessons that could save you time on your next project:

1. Splunk ARI and GDI Dependency

We were tasked with setting up Splunk Asset and Risk Intelligence (ARI). A key lesson: don’t start the ARI “polish” until the Getting Data In (GDI) is 100% finished. ARI relies entirely on the quality and consistency of your data inputs. If you’re still tweaking data inputs a week before the project ends, ARI dashboards can break. Finish the data onboarding first; the intelligence layer comes second.

2. Permissions and Ownership

Automation is only as good as its permissions. For Linux targets, we had to ensure a Splunk user was consistently defined across all sites to avoid ownership errors upon file delivery. On the Windows side, we found that using the local administrator account for the Ansible connection was the most reliable way to ensure the Splunk service could be restarted remotely after a configuration change.

Conclusion: Focus on the Plumbing

Building a massive Splunk environment in six weeks reaffirmed to us that agility requires automation. By replacing the traditional Deployment Server with an Ansible-driven process, we created a system that is secure enough for the Purdue Model and repeatable for future expansions.

Whether you’re dealing with isolated gas plants or a complex cloud-hybrid stack, having a version-controlled “source of truth” for your configurations is what can save the project.


Ready to modernize your Splunk environment? Contact Us to learn how our experts can help you automate your secure Splunk environment.

Discovered Intelligence Inc., 2026. Unauthorized use and/or duplication of this material without express and written permission from this site’s owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Discovered Intelligence, with appropriate and specific direction (i.e. a linked URL) to this original content.

Splunk Universal Forwarder Upgrades: From Manual Pain to Automated Gain

When was the last time you actually looked forward to upgrading your Splunk Universal Forwarders (UFs)? If you’re like most of the engineers we talk to, UFs are the last things to get touched. They’re usually stuck on the back burner because the sheer effort of touching hundreds—or thousands—of endpoints is incredibly tedious. While we focus our energy on keeping the core Splunk instances shiny and updated, the UF fleet often lingers several versions behind, creating a maintenance debt that only gets heavier over time. But what if we told you there’s finally a native way to solve this headache? 

The “Back Burner” Dilemma: Why UFs Are So Hard

In the past, we’ve really only had three ways to handle these upgrades: manual, scripted, or through external automation platforms like Ansible or SCCM. If you’re a smaller shop, you’re likely doing manual installs, which means an engineer has to remotely access or physically touch every single box. Even if you’re a bit more mature and use scripts, it’s still a fragmented process.

The largest, most “mature” customers have already moved to heavy-duty automation platforms to manage their fleet, and they’ve built their own processes for this. But for everyone else—the folks relying on manual or basic scripted processes—Splunk didn’t have a native solution. Until now.

The Splunk Remote Upgrader

The Splunk Remote Upgrader is a free, Splunk-supported tool available as two separate apps on Splunkbase – one for Linux and one for Windows. It’s designed to run right alongside your existing UF on the endpoint.

Essentially, it acts as a separate application that monitors a predetermined directory (usually under temp) for new installation packages. As soon as it sees a new package land in that directory, it takes over the installation process for you.

What Can It Actually Upgrade?

  • Target Versions: It can upgrade UFs to any version 9.0 or higher.
  • Starting Point: You can use this process if your current forwarder is at version 8.0 or higher.
  • Security First: It only supports signed UF packages. This is why the target must be 9+, as these versions include the necessary signature files for verification.
  • OS Support: Currently, available for Linux and Windows platforms.

The Deployment Process

The biggest point of confusion we see is the relationship between the Upgrader and the Forwarder package. Think of them as two distinct pieces of the same puzzle.

1. Initial Setup

You still have to do the “first mile” yourself. You need to get the Remote Upgrader installed on the endpoint machine manually or through your existing external tools first. Once that Remote Upgrader daemon is running, it starts its “watch” on the /tmp/SPLUNK_UPDATER_MONITORED_DIR/ folder.

2. Preparing the Package

On your Deployment Server, you’ll prepare a package that contains the new UF version you want to deploy, along with its signature (.sig) file.

3. Execution and Monitoring

When you push this application via the Deployment Server, the UF pulls it down. The package contains a script that copies the new files over to the temp directory the Upgrader is monitoring.

Once the Upgrader detects those files, the real work begins:

  • Three Strikes Rule: The Upgrader will try the installation up to three times if it fails.
  • Timeout Safety: If an attempt gets stuck for more than five minutes, it gives up on that attempt.
  • The Safety Net: If all attempts fail, it triggers an automatic rollback to your previous version. It even keeps a backup of your old configuration for 30 days by default, just in case.

Ready to finally tackle that fleet of 500 forwarders? It’s not just about the convenience; it’s about the peace of mind knowing you have a centralized, logged, and recoverable way to stay current.

Real-World Considerations and Constraints

While we’re big fans of this new tool, we have to stay grounded in reality. It’s not a “set it and forget it” magic wand for every scenario.

  • Initial Effort: As we mentioned, the very first install of the Upgrader must be manual. However, once it’s there, the Upgrader can actually upgrade itself automatically in the future.
  • Storage Requirements: You need at least 1GB of free space on the endpoint to handle the packages and the backups.
  • Deployment Server Strategy: If you have a massive environment, you probably don’t want to hit 1,000 servers at once. You’ll need to be creative with your Server Classes to roll out the upgrades in waves.
  • Windows Requirements: For those of you on Windows, make sure PowerShell scripting is enabled, as the process relies on it to function.

Conclusion

By adopting the Splunk Remote Upgrader, we’re moving away from the era of “neglected forwarders” and into a world of centralized, secure lifecycle management. It reduces maintenance overhead, ensures your fleet is consistent with the latest security patches, and lets you adopt new features faster than ever before. It might take a bit of initial legwork to get the Upgrader daemon onto your hosts, but the long-term payoff for your operations and security posture is massive.


Need help? If you need help architecting a massive UF rollout, contact us today – we’d love to help you streamline your data pipeline.

Discovered Intelligence Inc., 2026. Unauthorized use and/or duplication of this material without express and written permission from this site’s owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Discovered Intelligence, with appropriate and specific direction (i.e. a linked URL) to this original content.

Migrating Syslog to Cribl Stream: The Art of the “Zero Change” Migration

We’ve all been there. You’re ready to modernize your observability pipeline. You’ve got the green light to move from legacy syslog servers (like syslog-ng) to Cribl Stream. It sounds like a straightforward lift-and-shift, right? But then you flip the switch, and suddenly your downstream SIEM is screaming about unparsed events, your timestamps are drifting, and your load balancers are pinning traffic to a single node.

Read more

Cribl and GitOps: From Development to Production

If you’re running Cribl Stream in a distributed environment, you already know the Leader Node is critical, and a Git is non-negotiable for handling config bundling and version control. You’ve probably already discovered how painful it is to experience inconsistencies between development and production, and ultimately, these can lead to unexpected outages, security vulnerabilities, or compliance violations. To avoid this, we like to implement a full GitOps workflow. This way, you apply disciplined CI/CD methods to your configurations, enforcing change control through standard Pull Requests, ensuring everything is auditable, and keeping production rock-solid.

The Foundation: Git Integration in Cribl Stream

For us to implement any truly sophisticated change management within a distributed Cribl environment, Git integration is the absolute essential building block. Since Cribl’s architecture involves a Leader Node coordinating multiple Worker Groups, having centralized version control isn’t just a best practice – it’s mandatory. The Leader Node simply won’t start without it installed in a distributed deployment.

Why Git is Non-Negotiable for Cribl Leaders

Git provides several immediate, built-in benefits essential for managing your dynamic data pipelines:

  • Audit Trails: Every configuration change is recorded in Git, creating a history of who changed what and when, satisfying crucial security and compliance needs.
  • Version Comparison and Reversion: It’s an easy way to compare different configuration versions, simplifying the process of identifying and isolating problematic changes, and enabling rapid rollback when necessary.
  • Configuration Bundling: On a fundamental level, the Cribl Leader uses Git to bundle the finalized configurations, which are then distributed to the Workers in the field.

Beyond Local Commits: Leveraging Remote Git

While a basic deployment just relies on local commits for managing configurations, we find that a true enterprise-grade strategy needs to utilize Remote Git integration, using tools like GitHub or Bitbucket. This remote capability is a robust backup and disaster recovery solution. The key advantage here is redundancy, since the Leader Node holds the main copy of all configurations; its failure could be catastrophic. By simply setting up the Cribl Leader to push its configurations on a schedule to that remote repository, we ensure an off-instance backup. That way, if a primary Leader Node ever goes down, we can always spin up and restore a new Leader directly from the last known-good configuration copy in Remote Git, drastically reducing our recovery time.

Implementing Full GitOps: CI/CD for Data Pipelines

GitOps elevates Git beyond a backup tool; we use it as the single source of truth for my entire data pipeline ecosystem. We believe this model is ideal for organizations that need stringent control, especially those handling complex regulatory requirements or massive volumes of mission-critical data. The core concept is pretty straightforward: it means rigorously separating the development and production environments and strictly governing the flow of all changes between them using standard Git branches and pull requests.

The Two-Environment GitOps Model

In this approach, you maintain two separate Cribl environments, each tied to a dedicated Git branch on the remote repository:

  1. Development Environment: Connected to the dev branch. All initial configuration work – such as building new data Sources, Destinations, or Pipelines – is done here.
  2. Production Environment: Connected to the prod branch. Crucially, the Production Leader is set to a read-only mode. This hard constraint prevents manual, unauthorized changes directly in production, forcing all changes to follow the GitOps pipeline.

The Standard GitOps Workflow

The flow for deploying a new configuration involves a structured, multi-step process:

  1. Development and Commit: You will need to create or modify a configuration (e.g., a new Pipeline) in the Dev Leader. Then use the UI to deploy the changes to the worker and to the remote Git repository’s dev branch.
  2. Pull Request and Review: Create a Pull Request (PR) to merge the changes from the dev branch into the prod branch. This triggers a review by the Cribl Administrator or a designated approver.
  3. Merge and Automation: Once reviewed and approved, the PR is merged, updating the prod branch with the verified configuration. This merge action does not automatically deploy the configuration to the Production Leader.
  4. External Sync Trigger: To apply the changes, an external CI/CD tool (such as Jenkins, GitHub Actions, or a homegrown script) must trigger the Production Leader. You can do this by hitting the Leader’s REST API endpoint /api/v1/version/sync  
  5. Deployment to Workers: Once the Production Leader has the new configuration, it automatically distributes the update to its connected Workers.

Handling Environment-Specific Configurations

A key challenge in this two-environment model is that, by default, all development configurations are pushed to production. This isn’t always desirable, and sometimes you need granular control. This is where using environment tags comes into play to manage state:

  • C.LogStreamEnv Variable: Cribl automatically manages a C. LogStreamEnv variable that identifies whether an instance is DEV or PRD (Production).
  • Selective Configuration: The environment tag can be used in JavaScript expressions for Sources and Destinations. For example, a Destination defined for production will be enabled in the Prod environment but will appear disabled (“greyed out”) in the Dev environment, offering necessary flexibility while maintaining the core GitOps flow.

Use Case : Updating Lookup Files in GitOps

With enabling GitOps, one interesting use case we have come across is updating a lookup in Cribl via Git. While Cribl provides REST API endpoints for programmatically updating lookups, this customer was interested in using their existing CI/CD process for providing a self-service capability to their users for updating the lookup file. The following steps detail how the update flow looks:

  1. User Update: The user (or an automated script) updates the Lookup File directly within the remote Git repository’s dev branch.
  2. Pull Request and Review: Create a Pull Request (PR) to merge the changes from the dev branch into the prod branch. This triggers a review by the Cribl Administrator or a designated approver.
  3. Merge and Automation: Once reviewed and approved, the PR is merged, updating the prod branch with the verified configuration. This merge action does not automatically deploy the configuration to the Production Leader.
  4. External Sync Trigger: To apply the changes, an external CI/CD tool (such as Jenkins, GitHub Actions, or a homegrown script) must trigger the Production Leader. You can do this by hitting the Leader’s REST API endpoint /api/v1/version/sync
  5. Update DEV leader: Since the lookup update happened directly on the dev branch, the DEV leaders is not aware of the change and we need to do a git pull on the dev leader to keep it up to date with the branch. This again can be part of the external trigger automation

Final Thoughts

Transitioning to a GitOps workflow for Cribl Stream elevates how we manage our data pipelines, moving us away from manual, error-prone changes toward a scalable, auditable, and secure CI/CD process. By embracing Git as the control plane for configuration, we gain the confidence that every single deployment is consistent, every change is traceable, and the production environment is protected by a strong, automated defense against unauthorized modifications. This is more than just an operational improvement; it’s a critical step in building a truly resilient and compliant data observability platform.


Looking to expedite your success with Cribl? View our Cribl Professional Service offerings.

Discovered Intelligence Inc., 2025. Unauthorized use and/or duplication of this material without express and written permission from this site’s owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Discovered Intelligence, with appropriate and specific direction (i.e. a linked URL) to this original content.

Finding Asset and Identity Risk with Splunk Asset and Risk Intelligence

Splunk Asset and Risk Intelligence (Splunk ARI) discovers and reports on risks affecting assets and identities. This risk discovery is performed in real-time, ensuring that risks can be quickly addressed, helping to limit exposure and increase overall security posture. In this post, we highlight three use cases related to asset risk using Splunk ARI.

Read more

Reveal Asset and Identity Activity with Splunk Asset and Risk Intelligence

Splunk Asset and Risk Intelligence (Splunk ARI) keeps track asset and identity discovery activity over time. This activity supports investigations into who had what asset and when, in addition to providing insights about asset changes over time and when they were first or last discovered. In this post, we highlight three use cases related to asset activity using Splunk ARI.

Read more

Investigating Assets and Identities with Splunk Asset and Risk Intelligence

Splunk Asset and Risk Intelligence (Splunk ARI) has powerful asset and identity investigative capabilities. Investigations help to reveal the full asset record, cybersecurity control gaps and any associated activity. In this post, we highlight three use cases related to asset investigations using Splunk ARI.

Read more

Discovering Assets and Identities with Splunk Asset and Risk Intelligence

Splunk Asset and Risk Intelligence (Splunk ARI) continually discovers assets and identities. It does this using a patented approach that correlates data across mulitple sources in real-time. In this post, we highlight three use cases related to asset discovery using Splunk ARI.

Read more

Field Filters 101: The Basics You Need to Know

Hello, Field Filters!

Data protection is a critical priority for any organization, especially when dealing with sensitive information like personal identifiable information (PII) and protected health information (PHI) data. Implementing robust protection mechanisms not only ensures compliance with regulations like the General Data Protection Regulation (GDPR) but also mitigates the risk of data breaches. 

Read more

Using Cribl Search to Monitor Instances in Google Cloud Platform (GCP)

One recurring challenge in managing cloud environments is the tendency for lab and development instances to remain active long after they’re needed. While it might seem like a small oversight, the impact can be significant. These idle instances rack up unnecessary costs, drain valuable resources, and open the door to security vulnerabilities. Configuring effective monitoring to notify about the running instances is a good way to address this problem.

Read more