Moving bits around: Deploying Splunk Apps with Github Actions

It would be reasonable to argue that no matter the size of the deployment, there aren’t many Splunk deployments out there that have not leveraged the Deployment Server to manage and distribute Splunk apps to other components. Just put everything in the $SPLUNK_HOME/etc/apps/deployment-apps directory of the Deployment Server and create server classes connecting the relevant apps to the appropriate clients that are phoning home. Easy, right? But the big catch with that is this — what if we overwrite a working app with some modifications that may then have to be rolled back, or say, multiple Splunk admins are editing the same configurations or if we accidentally delete one or more apps within the directory and we don’t know which ones. Of course, restoring a full backup of that directory might solve all these problems, provided a full back-up is regularly taken at a short enough interval but this isn’t a great way of managing it in a dynamic environment where there are always changes getting pushed over the apps. It turns out that these are the problems that a version control tool is designed to solve.

Now for most folks, when you hear about version control or source code control, Git is the first and perhaps the only word that comes to mind. And the second word will likely be GitHub which is arguably the most popular source code hosting tool out there that’s based on Git. But is it enough to use Git and Github for version-controlling and hosting Splunk apps for deployment? In a functional sense yes, but not so much from an admin perspective. You must still manage deploying these apps to Splunk Deployment server. This is what could be an example of a “toil” according to Google’s SRE principles. This can and should be eliminated by simply having a CI/CD setup. By the end of 2019, GitHub introduced their own CI/CD setup native to the GitHub platform called GitHub Actions. GitHub Actions is a workflow orchestration and automation tool that can trigger actions based on events such as changes in the GitHub repository. GitHub Actions in our case, can help automate the task of deploying apps to the Deployment Server staging directory.

Automate Splunk App Deployment with GitHub Actions

So we have hosted our Splunk apps in a GitHub repository properly source-controlled. Now let’s explore how we can automate deploying them to the Deployment Server using GitHub Actions. 

Note: What this article covers is not a production ready prescriptive solution. The use of GitHub Actions here is solely because of the relatively simple one-stop-shop approach in realizing the benefits of version-controlled hosting as well as continuous deployment of Splunk apps.

The setup consists of three parts – the source (GitHub Repository), the intermediary (runner) and the destination (Deployment Server). GitHub Actions invokes a runner instance as an intermediary to run the actions from. This instance is what will connect to the target server. This can either be a self-hosted runner that you must provision in your infrastructure or a GitHub-hosted runner.

Let me highlight a couple of important factors at play in choosing the runner instance type.

1. Security Considerations

Hosting self-hosted runners or using GitHub-hosted runners have some common as well as unique security implications. While network connectivity requirements are unique to each approach, SSH authentication is common to both. You may either not want to allow external connections directly to Deployment Server or you may be having a public repository. GitHub recommends that you only use self-hosted runners with private repositories. This is because forks of your repository can potentially run dangerous code on your self-hosted runner machine by creating a pull request that executes the code in a workflow. This is not an issue with GitHub-hosted runners because each GitHub-hosted runner is always a clean isolated virtual machine, and it is destroyed at the end of the job execution.

2. Usage limits and Billing

Usage limits are primarily based on storage and free minutes. Self-hosted runners are free to use but come with some usage limits. For GitHub-hosted runners, different usage limits apply.

I have linked the documentation in the appendix for further reading on this topic.

For demonstration purposes, I am going to use a self-hosted runner.

Destination:

Let’s configure the destination first which is the Deployment Server.

On a high level, the steps involve

  1. Creating an SSH key-pair
  2. Creating a user specific for the task in the Deployment Server
  3. Making the Deployment Server accessible using the above created SSH key-pair for the created user
  4. Setting proper permissions on the target staging directory

First off, we create an SSH key locally like so:

ssh-keygen -t ed25519 -C "your_email@example.com"

Enter the file name to save the keys and leave the passphrase field empty.

Then we login to the Deployment Server and create a user, say, ghuser, in there.

Make the host accessible for the user over SSH by adding the above created public key to the /home/ghuser/.ssh/authorized_keys. I have linked a page in the appendix that covers step-by-step instructions on how to do this in a Linux instance.

Next, we need to give this user full access to $SPLUNK_HOME/etc/deployment-apps directory. For instance, if Splunk is installed under /opt, then:

setfacl -R -m u:ghuser:rwx /opt/splunk/etc/deployment-apps

Now, if Splunk is run as a non-root user, commonly named as splunk, then that user can be leveraged for this purpose in which case you do not need to grant any additional directory permissions as above.

Once this is completed, we now have a user that can SSH to the deployment server and modify the deployment-apps directory. We will be using this user in our GitHub Actions.

Intermediary:

Once the runner instance is provisioned , we need to install the client application on the host to poll the repository. Go to Settings -> Actions -> Runners in the GitHub Repository.

When you click on the Add runner button as shown above and select the OS and CPU arch, you are presented with the instruction to set up the client application. Now for the client application to successfully do HTTPS long polls to the GitHub repository, you must ensure that the host has the appropriate network access to communicate with specific GitHub URLs. Appendix has a link that points to those URLs.

Next the self-hosted runner needs to be set up with Docker for the specific GitHub Action that we are going to set up in the next step. This is also straightforward. Here I am using an Amazon Linux 2 EC2 instance and here are the installation steps for that:

  1. Update your system
    $ sudo yum update -y
  2. Install Docker
    $ sudo yum install docker -y
  3. Start Docker
    $ sudo service docker start
  4. Add your user to the docker group
    $ sudo usermod -a -G docker USERNAME
  5. Log out and log back in.
  6. Verify Docker runs without sudo
    $ docker run hello-world

I have linked a document in Appendix that covers Docker installation on different Linux flavors.

Source:

GitHub Actions has a marketplace where we can look for off-the-shelf solutions which in our case is to push the apps out from the repository to the deployment server. In this example, I have used two workflows; 1) checkout that is a standard GitHub-provided Action to check out the repository and 2) rsync-deployments that essentially spins up a docker container in the runner to rsync the specified directory from the checked-out repository to the destination directory in the target host.

First, we create a repository with a sub-directory that contains all the Splunk apps to be copied to the deployment server’s deployment-apps directory. In this example the repository I have used is test-deploy-ds and all the Splunk apps reside within a subdirectory that I have named as deployment-apps to match with the target directory, but this can be any name you want. See below:

Then we create a simple workflow from the Actions tab of the repository like so:

Name the yml file that opens in the next screen suitably like push2ds.yml or so.

Modify the file as below.

# This is a basic workflow to help you get started with Actions

name: CI

# Controls when the action will run. 
on:
  # Triggers the workflow on push or pull request events but only for the main branch
  push:
    branches: [ main ]

  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
  # This workflow contains a single job called "build"
  build:
    # The type of runner that the job will run on
    runs-on: self-hosted

    # Steps represent a sequence of tasks that will be executed as part of the job
    steps:
      # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
      - uses: actions/checkout@v2
 
      # Runs the Rsync Deployment action
      - name: Rsync Deployments Action
        uses: Burnett01/rsync-deployments@4.1
        with:
          switches: -avzr --delete --omit-dir-times --no-perms --no-owner
          path: deployment-apps/
          remote_path: /opt/splunk/etc/deployment-apps
          remote_host: ${{ secrets.DEPLOY_HOST }}
          remote_user: ${{ secrets.DEPLOY_USER }}
          remote_key: ${{ secrets.DEPLOY_KEY }}

Explanation:

1) This workflow is triggered upon a push to main branch

2) The build specifies the job that will be run on a self-hosted runner

3) The steps in the build job includes checking out the repository using the checkout action followed by the rsync execution using the rsync-deployments action

Lets dissect the rsync-deployments action as this is the custom code I had to write for the use case:

  • the name attribute is a briefly descriptive name of what the Action does
  • the uses attribute then includes the marketplace action rsync-deployments to be referenced
  • the with attribute has several attributes inside as below
    • switches attribute has the parameters required to be passed with the rysnc command. Check out the link in the appendix for what each of them does.
    • path represents the source directory name within the repository which in this case has been named as deployment-apps
    • remote_path is the deployment server $SPLUNK_HOME/etc/deployment-apps directory
    • remote_host is the deployment server public IP or hostname
    • remote_user is the username we created in the deployment server that is ghuser
    • remote_key is the SSH private key created earlier to be used to authenticate into the deployment server

Note the use of GitHub Secrets in the last few attributes. This is a simple yet secure way to storing and accessing sensitive data that is susceptible to misuse by a threat actor. Below image shows where to set them.

PS: remote_port is an accepted attribute that has been skipped here as it defaults to 22. You can choose to specify a port number if default port 22 is not used for SSH.

As soon as the above yml file is committed or a new app is committed, the workflow job kicks off. The job status can be verified as seen in the below images.

Go to Actions tab:

Click on the latest run Workflow at the top – here ‘trigger GHA only on push to main’ which is the commit message:

Click on the job – build. You can expand all steps in the build job to look for detailed execution of that step. The build status page also highlights any failed step in red. Expand that step to check failure reasons.

Once it is verified that the job has successfully completed, we can login to the deployment server and confirm that the Splunk apps are pushed to the $SPLUNK_HOME/etc/deployment-apps directory.

$ ls -lart /opt/splunk/etc/deployment-apps/
total 8
drwxr-xr-x  16 splunk splunk 4096 Jun 24 18:11 ..
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 TA-org_splunk
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_APP_TEMPLATE
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_all_indexer_base
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_all_forwarder_outputs
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_all_deploymentclient
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_all_app_props
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_search_volume_indexes
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_indexer_volume_indexes
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_full_license_server
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_dept_app_inputs
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_cluster_forwarder_outputs
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_all_search_base
drwxrwxr-x   4 ghuser ghuser   35 Jun 29 05:00 org_all_indexes
drwxrwxr-x+ 16 splunk splunk 4096 Jun 29 05:00 .
drwxrwxr-x   3 ghuser ghuser   37 Jun 29 15:21 000_all_forwarder_outputs_route_onprem_and_cloud

A word of caution though, if we are pushing the apps using a user other than splunk that owns $SPLUNK_HOME, then such apps when pushed to the deployment clients will not preserve the ownership or permissions, instead, will have a permission mode of 700. Let’s look at how one of these apps org_APP_TEMPLATE will appear at a target forwarder of a serverclass.

$ ls -lart /opt/splunkforwarder/etc/apps/ | grep org
drwx------  4 splunk splunk   35 Jun 29 18:37 org_APP_TEMPLATE

Now if you’re wondering – wait, do I need to provision an extra server? – be aware that there is also the option of using a GitHub-hosted runner. This needs an update in the push2ds.yml’s runs-on: attribute; for e.g. If you want to simply have a Linux-flavored host as the intermediary, just update the attribute like so –  runs-on: ubuntu-latest . But keep in mind that this will require opening the SSH port of the deployment server to external IPs as well as some cost implications.

Conclusion

In this article we touched upon the benefits of version control for Splunk apps managed and distributed via a Deployment Server. Then we explored a simple practical approach to this using GitHub Actions and the main considerations if we’re going down this path. We then proceeded to apply it in a practical use case. If you are not using GitHub in your organization, depending on your CI/CD pipeline, you could possibly re-engineer the solution to fit for purpose. If you found this useful, please watch this space for a sequel about how this opens up further possibilities in end-to-end Splunk apps management in a distributed clustered deployment.


Appendix:

Communication between self-hosted runners and GitHub
About Github-hosted runners – IP Addresses allow-list
About billing for GitHub Actions
Self-hosted runners – Usage limits
Github-hosted runners – Usage limits
How to create a new user that can SSH into a Linux host
Install Docker on Linux
Github Action for Rsync – rsync deployments
Rsync Parameters
Customizing Github-hosted runners


Looking to expedite your success with Splunk? Click here to view our Splunk Professional Service offerings.

© Discovered Intelligence Inc., 2021. Unauthorised use and/or duplication of this material without express and written permission from this site’s owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Discovered Intelligence, with appropriate and specific direction (i.e. a linked URL) to this original content.

Solving Roaming Users: HTTP Out for the Splunk Universal Forwarder

The release of version 8.1.0 of the Splunk Universal Forwarder introduced a brand new feature to support sending data over HTTP. Traditionally, a Splunk Universal Forwarder uses the proprietary Splunk-to-Splunk (S2S) protocol for communicating with the Indexers. Using the ‘HTTP Out Sender for Universal Forwarder’ it can now send data to a Splunk Indexer using HTTP. What this feature does is effectively encapsulates the S2S message within a HTTP payload. Additionally, this now enables the use of a 3rd party load-balancer between Universal Forwarders and Splunk Receivers. To date, this is a practice which has not been recommended, or supported, for traditional S2S based data forwarding.

Where the new HTTP Out feature is especially useful is in scenarios such as collecting data from systems in an edge location or collecting data from a roaming user’s device. Typically in these situations it would require more complex network configuration, or network traffic exceptions, to support traditional S2S for the connection from the Universal Forwarder to the Indexers. HTTP Out now allows the Universal Forwarder to make use of a standard protocol and port (443), which is generally open and trusted, for outgoing traffic.

Use Case: The Roaming User

Let’s take a look at how we can use the HTTP Out feature of the Splunk Universal Forwarder to transmit data from the laptop of a roaming user, or generally a device outside of our corporate perimeter, which is an occurrence that has become more and more common with the shift to work from home during the pandemic.

For the purpose of this demonstration, we will be working with the following environment configuration:

  1. Splunk environment in AWS with 2 Indexers and 1 Search Head
  2. Internet-facing AWS Load Balancer
  3. Laptop with the Splunk Universal Forwarder (8.1.0)

Step 1: Configure The Receiver

On our Splunk Indexers we have already configured the HTTP Event Collector (HEC) and created a token for receiving data from the Universal Forwarder. Detailed steps for enabling HEC and creating a token can be found on the Splunk Documentation site here.

Step 2: Configure The Load Balancer

The next thing we need is a Load Balancer which is Internet facing. HTTP Out on the Splunk Universal Forwarder supports Network Load Balancers and Application Load Balancers.  For this use case, we have created an Application Load Balancer in AWS. The Load Balancer has a listener created for receiving connection requests on port 443 and forwards them to the Splunk Indexer on port 8088 (the default port used for HEC). The AWS Application Load Balancer provides a DNS A record which we will be using in the Universal Forwarder outputs configuration.

Step 3: Configure The Universal Forwarder

The last step is to install Splunk Universal Forwarder on the roaming user’s laptop and configure HTTP Out using the new httpout stanza in outputs.conf.

We have installed the Universal Forwarder on one of our laptops and created the following configuration within the outputs.conf file. For ease of deployment, the outputs.conf configuration file is packaged in a Splunk application and deployed to the laptop to enable data forwarding via HTTP.

[httpout]
httpEventCollectorToken = 65d65045-302c-4cfc-909a-ad70b7d4e593
uri = https://splunk-s2s-over-http-312409306.us-west-2.elb.amazonaws.com:443 

The URI address within this configuration is the Load Balancer DNS address which will handle the connection requests to Splunk HTTP Event Collector endpoints on the Indexers. 

The Splunk Universal Forwarder HTTP Out feature also supports batching to reduce the number of transactions used for sending out the data. Additionally, a new configuration LB_CHUNK_BREAKER is introduced in props.conf. Use this configuration on the Universal Forwarder to break events properly before sending the data out. When HTTP Out feature is used with a 3rd party load balancer, LB_CHUNK_BREAKER prevents partial breaking of a data, and sends a complete event to an Splunk Indexer. Please refer to the Splunk Documentation site here for detailed information on the available parameters.

Test and Verify Connectivity

Now that we have our configuration in place we need to restart the Splunk Universal Forwarder service. After this restart occurs we can immediately see that the internal logs are being received by our Splunk Indexers in AWS. This is a clear indicator that the HTTP Out connection is working as expected and data is flowing from the Universal Forwarder to the Load Balancer and through to our Splunk Indexers.

To demonstrate the roaming use case, we have written a small PowerShell script that will run on the laptop. The PowerShell script will generate events printing the current IP address, user, location, city, etc. The Splunk Universal Forwarder will execute this PowerShell script as a scripted input and read the events generated by it. Now, when we search within our Splunk environment we can see that the events being generated by the PowerShell script are flowing correctly, and continuously, to our Splunk Indexers. The laptop connects to the Load Balancer via a home network with no special requirements for network routing or rules.

Let’s now move to a different network by tethering the laptop through a mobile phone for Internet connectivity. This is something that may be common for people while on the road or in areas with minimal wifi access. What we will now observe is that data forwarding to our Splunk Indexers continues without any interruption even though we are on a completely new network with its own infrastructure, connectivity rules, etc. The screenshot below shows that the location and IP address of the laptop has changed however the flow of events from the laptop has not been interrupted.

This configuration could now be deployed to an entire fleet of roaming user devices to ensure that no matter where they are or what network they are on, there is continuous delivery of events using an Internet-facing Load Balancer. This will help IT and Security teams make sure they have the necessary information at all times to support, and protect, their corporate devices.


Looking to expedite your success with Splunk? Click here to view our Splunk Professional Service offerings.

© Discovered Intelligence Inc., 2021. Unauthorised use and/or duplication of this material without express and written permission from this site’s owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Discovered Intelligence, with appropriate and specific direction (i.e. a linked URL) to this original content.

2020 DI Holiday Gift: Focusing on our employees and small businesses

At Discovered Intelligence we are always excited to enter into the holiday season as an opportunity to celebrate our employees contributions to a successful year by connecting with everyone over a big holiday dinner and surprising them with fun, and cool, holiday gifts. Over the past years these holiday gifts have generally consisted of technology focused gifts sourced from major e-commerce sites like Amazon or BestBuy.

Well, after all of the challenges faced as a business and a society throughout 2020, we decided to take a little bit of a different approach to this seasons employee holiday gift by setting our focus on supporting small businesses and, our most important asset, our employees.

Locally sourced products from the Niagara Region of Ontario.
An extra week of vacation in 2021 for each employee.

As you can see from the pictures above, for this seasons holiday gift we carefully crafted a gift bag for each of our employees, sourcing products from various small businesses around the Niagara Region, just outside of Toronto, to make up a very tasty and warm (it’s Canada after all) gift bag. The gift bag was finished off with the one thing we felt each of our amazing employees could use most in 2021, more time to connect with family and friends, therefore each employee has received an additional week of vacation for 2021.

Read on to find out more about each item in our employee holiday bag and the small business they came from. We urge everyone to please not forget to continue to support small businesses in 2021!

Discovered Intelligence Hoodie
o  Sourced from A & E Custom Apparel in Vineland, Ontario
o  A & E is a local business that provides custom apparel and gifts to individuals and businesses. All work is done in-house. They provide embroidery, direct to garment printing, sublimation, heat vinyl and screen printing services.
o  https://www.aecustom.net/

Sour Cherry and Peach Jams
o  Sourced from Southridge Jam Co in Vineland, Ontario
o  The Southridge Jam Co. is a social enterprise in Niagara that provides individuals who have recently experienced homelessness the opportunity to develop job training and life skills. The jam is made by formerly homeless using locally grown fruit that is donated by farmers.
o  https://southridgejam.com/

Wildflower Honey
o  Sourced from Rosewood Estates Winery in Beamsville, Ontario
o  Having 88 years of family beekeeping experience, all honey is made locally using innovative and sustainable practices to maintain healthy and productive hives. Keeping our bees happy and healthy is their top priority.
o  https://www.rosewoodwine.com/

Dark Chocolate Cherries
o  Sourced from Cherry Lane farm in Vineland, Ontario
o  Since 1907, Cherry Lane has been providing tart cherries, tart cherry juice concentrate, and a variety of other fruit products to consumers. The members of the Smith family, who own and operate Cherry Lane, have always been proud to be Ontario fruit farmers.
o  https://cherrylane.net/

Parmesan & Rosemary Shortbreads
o  Sourced from Provisions Food Company in Beamsville, Ontario
o  In 2012, Lori McDonald decided to combine her background with her passion for the vineyards, orchards, and farms of the beautiful Niagara growing region of Canada. What began as Lori canning jams in her kitchen has grown into a talented and passionate team making Provisions Food Company products in their own production facility.
o  https://provisionsfoodcompany.com/

Maple Syrup and Kettle Corn
o  Sourced from White Meadows Farms in St Catharines, Ontario
o  Made from maple trees and locally-grown corn, White Meadows Farms has been around since 1937. They craft the very best maple syrup from the sap they collect from over 150 acres of sugar bush in Niagara. The popping corn is harvested from their own 2.5 acres worth of cobs.
o  https://whitemeadowsfarms.com/


Interesting in learning more about working at Discovered Intelligence? Good news, we’re hiring! Click here for more details.