Fluentd

 

Getting Started with Fluentd: Unified Logging for Modern Applications

When you run applications across multiple servers, containers, and cloud services, logs quickly become… chaos. Different formats, different locations, and no easy way to search or analyze them.

Fluentd is a powerful open-source data collector that helps you solve exactly this problem by unifying log collection and routing from many sources to many destinations.

In this post, we’ll cover:

  • What Fluentd is and why you’d use it

  • Key concepts (inputs, filters, outputs, buffers)

  • How to install Fluentd

  • A simple architecture diagram and explanation


What is Fluentd?

Fluentd is an open-source log and data collector. It sits between your applications and your log storage/analytics systems and acts as a unified logging layer.

Think of Fluentd as:

“A smart pipe that collects data from many places, transforms it, and sends it where you want.”

Why use Fluentd?

Some reasons Fluentd is popular:

  • Unified logging: Collect logs from apps, containers (like Docker/Kubernetes), system logs, Nginx, Apache, etc.

  • Flexible routing: Send data to Elasticsearch, OpenSearch, Loki, Kafka, S3, CloudWatch, BigQuery, Datadog, Splunk, and many more.

  • Plugin-based: 1 core + 100s of plugins for inputs, filters, and outputs.

  • JSON by default: Makes it easy to parse, enrich, and query logs.

  • Reliable: Supports buffering, retries, and backpressure so you don’t lose logs when a destination is down.


Fluentd Core Concepts

Fluentd’s configuration is typically done in a file like /etc/fluent/fluent.conf. It’s built around a simple pipeline idea: Input → Filter → Output, with buffering in between.

1. Inputs

Inputs define where Fluentd reads data from. Examples:

  • Tail a log file

  • Listen on a TCP/UDP port

  • Read from syslog

  • Receive logs from another Fluentd or Fluent Bit instance

Example (tail input):

<source> @type tail path /var/log/nginx/access.log pos_file /var/log/td-agent/nginx-access.pos tag nginx.access <parse> @type nginx </parse> </source>

2. Filters

Filters modify or enrich logs as they pass through. You can:

  • Add Kubernetes metadata

  • Mask sensitive fields

  • Rename or remove fields

  • Change log format

Example (adding a field):

<filter nginx.access> @type record_transformer <record> environment production </record> </filter>

3. Outputs

Outputs define where logs go: Elasticsearch, Loki, S3, Kafka, stdout, etc.

Example (send to Elasticsearch):

<match nginx.access> @type elasticsearch host elasticsearch.logging.svc port 9200 index_name nginx-access type_name _doc </match>

4. Buffering & Reliability

Fluentd uses buffers to handle spikes and destination downtime.

  • Memory buffer: Fast but limited; good for small/low-risk workloads

  • File buffer: Stores data on disk; better for reliability

Example (file buffer inside an output):

<match nginx.access> @type elasticsearch host elasticsearch.logging.svc port 9200 index_name nginx-access <buffer> @type file path /var/log/td-agent/buffer/nginx.access flush_interval 5s retry_forever true </buffer> </match>

Installing Fluentd

There are several ways to install Fluentd depending on your environment.

Option 1: Install Fluentd (td-agent) on Linux (Ubuntu/Debian)

The easiest way is usually via the Fluentd/td-agent package.

Step 1: Install via package (example: Ubuntu)

For modern Ubuntu/Debian, you typically:

  1. Add the Fluentd repository

  2. Install td-agent (the production-ready Fluentd package)

Example (generic pattern – adjust for your OS version as needed):

# 1. Install prerequisites sudo apt-get update # 2. Install Fluentd via td-agent (example via treasuredata repo) # (Exact repo commands depend on your OS version; check Fluentd docs) curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-focal-td-agent4.sh | sh

Replace focal/td-agent4 with the appropriate script for your distribution if needed, based on Fluentd docs for your OS version.

Step 2: Manage the Fluentd (td-agent) service

# Start the service sudo systemctl start td-agent # Enable on boot sudo systemctl enable td-agent # Check status sudo systemctl status td-agent

Step 3: Configuration file

The main configuration for td-agent is usually at:

/etc/td-agent/td-agent.conf

You edit this file to add <source>, <filter>, and <match> sections as needed, then restart:

sudo systemctl restart td-agent

Option 2: Run Fluentd with Docker

If you prefer containers:

Step 1: Pull the Fluentd image

docker pull fluent/fluentd:v1.17-1

(Tag is just an example; use a recent stable tag from Docker Hub.)

Step 2: Create a Fluentd configuration file

Create a local fluent.conf:

<source> @type forward port 24224 bind 0.0.0.0 </source> <match **> @type stdout </match>

This configuration:

  • Listens on port 24224 for incoming logs (Fluentd forward protocol)

  • Prints everything to stdout (useful for testing)

Step 3: Run the container

docker run -d \ -p 24224:24224 \ -v $(pwd)/fluent.conf:/fluentd/etc/fluent.conf \ --name fluentd \ fluent/fluentd:v1.17-1

Now any client that speaks Fluentd’s forward protocol (like Fluent Bit) can send logs to this instance.


Option 3: Kubernetes (High Level)

In Kubernetes, Fluentd is usually deployed as a DaemonSet so that each node runs a Fluentd pod that:

  • Mounts /var/log/containers or /var/log/pods

  • Parses container logs

  • Sends them to a central backend (e.g., Elasticsearch, Loki, OpenSearch, etc.)

Most logging stacks (EFK, Elastic, Loki, etc.) provide ready-made Helm charts or YAML manifests you can install and customize.


Fluentd Architecture Diagram (Text Explanation)

Here’s a simple conceptual architecture you can visualize or convert into a proper diagram tool (Draw.io, Mermaid, Lucidchart, etc.):

+-------------------------------+ | Applications | | (Services, APIs, Batch Jobs) | +-------------------------------+ | | Logs (file, stdout, syslog) v +-------------------------------+ | Fluentd | | (Running on each node) | +-------------------------------+ | [Input plugins] | | - tail: /var/log/app.log | | - tail: /var/log/nginx... | | - forward (from Fluent Bit)| | | | [Filter plugins] | | - parse / format logs | | - add metadata (env, pod) | | - mask sensitive fields | | | | [Buffer] | | - memory / file | | | | [Output plugins] | | - Elasticsearch | | - S3 / GCS | | - Kafka | | - Datadog / Splunk | +-------------------------------+ | v +-------------------------------+ | Log Storage / Analytics | | (e.g., ES, Loki, SIEM, etc.) | +-------------------------------+

If you want a Mermaid diagram (for Markdown-based blogs):

flowchart LR A[Applications\nServices / APIs] -->|Logs| B[Fluentd] subgraph B[Fluentd] B1[Input Plugins\n(tail, syslog, forward)] B2[Filter Plugins\n(parse, enrich, mask)] B3[Buffer\n(memory / file)] B4[Output Plugins\n(ES, S3, Kafka, etc.)] B1 --> B2 --> B3 --> B4 end B4 --> C[Log Storage / Analytics\nElasticsearch / Loki / etc.]

You can paste this into any Markdown renderer that supports Mermaid (like GitLab, GitHub with extensions, some blogs, Obsidian, etc.).


Minimal End-to-End Example

Here’s a tiny example of a full td-agent.conf snippet that:

  • Tails an app log

  • Adds an environment field

  • Sends data to Elasticsearch

# Input: Tail application log file <source> @type tail path /var/log/myapp/app.log pos_file /var/log/td-agent/myapp.pos tag myapp.log <parse> @type json </parse> </source> # Filter: Add environment field <filter myapp.log> @type record_transformer <record> environment production </record> </filter> # Output: Send to Elasticsearch <match myapp.log> @type elasticsearch host elasticsearch.logging.svc port 9200 index_name myapp-logs <buffer> @type file path /var/log/td-agent/buffer/myapp flush_interval 5s </buffer> </match>

Wrap-Up

Fluentd gives you:

  • A unified way to collect and route logs

  • Flexibility via a huge plugin ecosystem

  • Reliability with buffering and retries

From a single node setup to a large-scale Kubernetes cluster, Fluentd can grow with your infrastructure.

If you’d like, I can also:

  • Tailor this blog to a specific stack (e.g., Fluentd + Elasticsearch + Kibana or Fluentd + Loki + Grafana)

  • Add Kubernetes-specific YAML/Helm snippets

  • Turn the architecture diagram into a more detailed microservices/logging diagram for your environment

Comments

Popular posts from this blog

Four pillars of Observability-events, Metrics,Logs,Traces