Fluentd

Getting Started with Fluentd: Unified Logging for Modern Applications

When you run applications across multiple servers, containers, and cloud services, logs quickly become… chaos. Different formats, different locations, and no easy way to search or analyze them.

Fluentd is a powerful open-source data collector that helps you solve exactly this problem by unifying log collection and routing from many sources to many destinations.

In this post, we’ll cover:

What Fluentd is and why you’d use it
Key concepts (inputs, filters, outputs, buffers)
How to install Fluentd
A simple architecture diagram and explanation

What is Fluentd?

Fluentd is an open-source log and data collector. It sits between your applications and your log storage/analytics systems and acts as a unified logging layer.

Think of Fluentd as:

“A smart pipe that collects data from many places, transforms it, and sends it where you want.”

Why use Fluentd?

Some reasons Fluentd is popular:

Unified logging: Collect logs from apps, containers (like Docker/Kubernetes), system logs, Nginx, Apache, etc.
Flexible routing: Send data to Elasticsearch, OpenSearch, Loki, Kafka, S3, CloudWatch, BigQuery, Datadog, Splunk, and many more.
Plugin-based: 1 core + 100s of plugins for inputs, filters, and outputs.
JSON by default: Makes it easy to parse, enrich, and query logs.
Reliable: Supports buffering, retries, and backpressure so you don’t lose logs when a destination is down.

Fluentd Core Concepts

Fluentd’s configuration is typically done in a file like /etc/fluent/fluent.conf. It’s built around a simple pipeline idea: Input → Filter → Output, with buffering in between.

1. Inputs

Inputs define where Fluentd reads data from. Examples:

Tail a log file
Listen on a TCP/UDP port
Read from syslog
Receive logs from another Fluentd or Fluent Bit instance

Example (tail input):


<source>
  @type tail
  path /var/log/nginx/access.log
  pos_file /var/log/td-agent/nginx-access.pos
  tag nginx.access
  <parse>
    @type nginx
  </parse>
</source>

2. Filters

Filters modify or enrich logs as they pass through. You can:

Add Kubernetes metadata
Mask sensitive fields
Rename or remove fields
Change log format

Example (adding a field):


<filter nginx.access>
  @type record_transformer
  <record>
    environment production
  </record>
</filter>

3. Outputs

Outputs define where logs go: Elasticsearch, Loki, S3, Kafka, stdout, etc.

Example (send to Elasticsearch):


<match nginx.access>
  @type elasticsearch
  host elasticsearch.logging.svc
  port 9200
  index_name nginx-access
  type_name  _doc
</match>

4. Buffering & Reliability

Fluentd uses buffers to handle spikes and destination downtime.

Memory buffer: Fast but limited; good for small/low-risk workloads
File buffer: Stores data on disk; better for reliability

Example (file buffer inside an output):


<match nginx.access>
  @type elasticsearch
  host elasticsearch.logging.svc
  port 9200
  index_name nginx-access
  <buffer>
    @type file
    path /var/log/td-agent/buffer/nginx.access
    flush_interval 5s
    retry_forever true
  </buffer>
</match>

Installing Fluentd

There are several ways to install Fluentd depending on your environment.

Option 1: Install Fluentd (td-agent) on Linux (Ubuntu/Debian)

The easiest way is usually via the Fluentd/td-agent package.

Step 1: Install via package (example: Ubuntu)

For modern Ubuntu/Debian, you typically:

Add the Fluentd repository
Install td-agent (the production-ready Fluentd package)

Example (generic pattern – adjust for your OS version as needed):


# 1. Install prerequisites
sudo apt-get update

# 2. Install Fluentd via td-agent (example via treasuredata repo)
# (Exact repo commands depend on your OS version; check Fluentd docs)
curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-focal-td-agent4.sh | sh

Replace focal/td-agent4 with the appropriate script for your distribution if needed, based on Fluentd docs for your OS version.

Step 2: Manage the Fluentd (td-agent) service


# Start the service
sudo systemctl start td-agent

# Enable on boot
sudo systemctl enable td-agent

# Check status
sudo systemctl status td-agent

Step 3: Configuration file

The main configuration for td-agent is usually at:


/etc/td-agent/td-agent.conf

You edit this file to add <source>, <filter>, and <match> sections as needed, then restart:


sudo systemctl restart td-agent

Option 2: Run Fluentd with Docker

If you prefer containers:

Step 1: Pull the Fluentd image


docker pull fluent/fluentd:v1.17-1

(Tag is just an example; use a recent stable tag from Docker Hub.)

Step 2: Create a Fluentd configuration file

Create a local fluent.conf:


<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>

<match **>
  @type stdout
</match>

This configuration:

Listens on port 24224 for incoming logs (Fluentd forward protocol)
Prints everything to stdout (useful for testing)

Step 3: Run the container


docker run -d \
  -p 24224:24224 \
  -v $(pwd)/fluent.conf:/fluentd/etc/fluent.conf \
  --name fluentd \
  fluent/fluentd:v1.17-1

Now any client that speaks Fluentd’s forward protocol (like Fluent Bit) can send logs to this instance.

Option 3: Kubernetes (High Level)

In Kubernetes, Fluentd is usually deployed as a DaemonSet so that each node runs a Fluentd pod that:

Mounts /var/log/containers or /var/log/pods
Parses container logs
Sends them to a central backend (e.g., Elasticsearch, Loki, OpenSearch, etc.)

Most logging stacks (EFK, Elastic, Loki, etc.) provide ready-made Helm charts or YAML manifests you can install and customize.

Fluentd Architecture Diagram (Text Explanation)

Here’s a simple conceptual architecture you can visualize or convert into a proper diagram tool (Draw.io, Mermaid, Lucidchart, etc.):


            +-------------------------------+
            |         Applications          |
            | (Services, APIs, Batch Jobs)  |
            +-------------------------------+
                         |
                         | Logs (file, stdout, syslog)
                         v
            +-------------------------------+
            |          Fluentd              |
            |   (Running on each node)      |
            +-------------------------------+
            |  [Input plugins]              |
            |    - tail: /var/log/app.log   |
            |    - tail: /var/log/nginx...  |
            |    - forward (from Fluent Bit)|
            |                               |
            |  [Filter plugins]             |
            |    - parse / format logs      |
            |    - add metadata (env, pod)  |
            |    - mask sensitive fields    |
            |                               |
            |  [Buffer]                     |
            |    - memory / file            |
            |                               |
            |  [Output plugins]             |
            |    - Elasticsearch            |
            |    - S3 / GCS                 |
            |    - Kafka                    |
            |    - Datadog / Splunk         |
            +-------------------------------+
                         |
                         v
            +-------------------------------+
            |  Log Storage / Analytics      |
            |  (e.g., ES, Loki, SIEM, etc.) |
            +-------------------------------+

If you want a Mermaid diagram (for Markdown-based blogs):


flowchart LR
    A[Applications\nServices / APIs] -->|Logs| B[Fluentd]

    subgraph B[Fluentd]
        B1[Input Plugins\n(tail, syslog, forward)]
        B2[Filter Plugins\n(parse, enrich, mask)]
        B3[Buffer\n(memory / file)]
        B4[Output Plugins\n(ES, S3, Kafka, etc.)]

        B1 --> B2 --> B3 --> B4
    end

    B4 --> C[Log Storage / Analytics\nElasticsearch / Loki / etc.]

You can paste this into any Markdown renderer that supports Mermaid (like GitLab, GitHub with extensions, some blogs, Obsidian, etc.).

Minimal End-to-End Example

Here’s a tiny example of a full td-agent.conf snippet that:

Tails an app log
Adds an environment field
Sends data to Elasticsearch


# Input: Tail application log file
<source>
  @type tail
  path /var/log/myapp/app.log
  pos_file /var/log/td-agent/myapp.pos
  tag myapp.log
  <parse>
    @type json
  </parse>
</source>

# Filter: Add environment field
<filter myapp.log>
  @type record_transformer
  <record>
    environment production
  </record>
</filter>

# Output: Send to Elasticsearch
<match myapp.log>
  @type elasticsearch
  host elasticsearch.logging.svc
  port 9200
  index_name myapp-logs
  <buffer>
    @type file
    path /var/log/td-agent/buffer/myapp
    flush_interval 5s
  </buffer>
</match>

Wrap-Up

Fluentd gives you:

A unified way to collect and route logs
Flexibility via a huge plugin ecosystem
Reliability with buffering and retries

From a single node setup to a large-scale Kubernetes cluster, Fluentd can grow with your infrastructure.

If you’d like, I can also:

Tailor this blog to a specific stack (e.g., Fluentd + Elasticsearch + Kibana or Fluentd + Loki + Grafana)
Add Kubernetes-specific YAML/Helm snippets
Turn the architecture diagram into a more detailed microservices/logging diagram for your environment

Search This Blog

Platform Engineering