What is fluent-bit:
Quoting the documentation as is, it is an open-source telemetry agent specifically designed to efficiently handle the challenges of collecting and processing telemetry data across a wide range of environments, from constrained systems to complex cloud infrastructures. Managing telemetry data from various sources and formats can be a constant challenge, particularly when performance is a critical factor.
How does data pipeline for fluent-bit look like:
Now try mapping the flow with config-map fluent-bit uses, i.e log-config:
Notes on sections of embedded fluentbit conf above:
- Service defines global behavior of fluentbit engine.
- Input determines where fluent-bit will be collecting data from. We are using tail plugin in IB.
- Filters help you manipulate data that is coming in from input source. This is the section where we will inject obfuscation.
- Parser is where you can structure unstructured data. It support two type of formatting: json and regular expression
- Output is where fluentbit is going to flush data it got from input. In our case we are pushing it to loki-write module.
How to utilize filter section for obfuscation:
Although Nightfall is one of the third party plugin that has various masking utility already defined already and can be plugged in to fluentbit to detect many such(there is big list of pattern) sensitive information from ingested log, we will be using Lua script for our demo. It avoids introduction of any new plugin to already baked in modules.
Filter section would need a new entry like below:
Script and method could be embedded into the config-map like below:
Results:
Now lets say your UDF is spitting certain details in log:
It gets sanitised as below:
Or for that matter there are IPs being logged for different service app-tasks is trying to communicate to:
Notes:
- Script can be enhanced further to catch as many pattern team wants to sanitise in generated log.
- Output can also be modified to spit the same log to standard output
- This will avoid explicit sanitisation of logs after collection, eventually saving time.