Sometimes the best defense is a good observation system. Today I deployed a centralized logging stack across three machines in my home lab: ellies-home (my primary AI workstation), playground (the services host), and raspberrypi (the little Pi that could).

The result? A single pane of glass where I can see errors, warnings, and suspicious activity from all three systems in real-time. No more SSH-ing into each machine to chase down a problem.

The Stack#

I chose the Grafana Loki ecosystem for a few reasons:

  • Lightweight: Unlike Elasticsearch, Loki doesn’t index the full log content—just labels. This means it runs comfortably on my 16GB RAM NUC.
  • LogQL: The query language feels right at home for anyone who knows PromQL or SQL.
  • Native Grafana integration: Dashboards, alerting, and visualization all in one place.
  • Promtail: The log collector is simple to configure and supports custom parsing pipelines.

Architecture:

  • Loki (central server on playground): Aggregates and stores logs with 30-day retention.
  • Grafana (central server on playground): Dashboards, alerting, and the UI.
  • Promtail (agents on all 3 machines): Ships logs to Loki with custom parsing.

The Challenge: OpenClaw Audit Logs#

My primary operational log—config-audit.jsonl from OpenClaw—doesn’t have a native severity field. Every entry is just a JSON object with ts, source, event, result, and a suspicious array.

Promtail to the rescue! I wrote a pipeline that derives severity from the content:

- job_name: openclaw-audit
  static_configs:
    - targets: [localhost]
      labels:
        job: openclaw
        log_file: config-audit
        host: ellies-home
        __path__: /home/ellie/.openclaw/logs/config-audit.jsonl
  pipeline_stages:
    - json:
        expressions:
          ts: ts
          openclaw_source: source
          event: event
          result: result
          suspicious: suspicious
    - timestamp:
        source: ts
        format: RFC3339Nano
    - template:
        source: level
        template: >-
          {{ if and .suspicious (ne .suspicious "[]") }}WARN{{ else if or (contains "error" (.result | default "")) (contains "fail" (.result | default "")) (contains "error" (.event | default "")) }}ERROR{{ else }}INFO{{ end }}          
    - labels:
        level:
        openclaw_source:
        event:
    - output:
        source: result

Now I can query for {job="openclaw", level="ERROR"} or {job="openclaw", level="WARN"} just like any other log.

The Raspberry Pi Twist#

The Pi runs Debian Bookworm on ARM64, which meant:

  1. Downloading the promtail-linux-arm64 binary (not the usual amd64).
  2. Using /var/log/syslog instead of /var/log/messages.
  3. Adding a custom job to monitor thermal throttling and undervoltage events—things that only happen on edge devices.
- job_name: pi-health
  static_configs:
    - targets: [localhost]
      labels:
        job: pi-health
        host: raspberrypi
        __path__: /var/log/pi-info.log
  pipeline_stages:
    - match:
        selector: '{job="pi-health"}'
        stages:
          - regex:
              expression: '(?P<level>warn|error|critical|throttl|undervolt)'
          - labels:
              level:

Alerting#

I set up four alert rules in Grafana:

  1. System Errors (High Volume): Triggers if more than 10 errors appear in 5 minutes across any host.
  2. OpenClaw Errors: Fires on any error or warning in the OpenClaw audit log.
  3. OpenClaw Suspicious Config: Alerts if the suspicious array is non-empty (i.e., someone tried to modify a protected config).
  4. OpenClaw Service Down: Detects if the gateway service stops, fails, or exits unexpectedly.

All alerts route to my ProtonMail Bridge SMTP server, so I get emails at ellie@geekministry.org instead of relying on push notifications that might get buried.

The Payoff#

The best part? I didn’t have to choose between “comprehensive” and “practical.” This stack:

  • Uses ~500MB RAM on the central server (Loki: 256MB, Grafana: 128MB, Promtail: 64MB).
  • Stores 30 days of logs at roughly 1-2GB/month.
  • Scales to more hosts without re-architecture.
  • Lets me query across all systems with a single LogQL expression.

What’s Next?#

  • Dashboards: Visualizing error trends by host, job, and severity.
  • Session journal monitoring: Once the user journal directory initializes on ellies-home, I’ll add Promtail to watch journalctl --user for OpenClaw gateway errors.
  • Thermal trends: Graphing Pi CPU temperature over time to spot cooling issues before they cause throttling.

For now, though, I’m sitting back and watching the logs flow in. Three machines, one dashboard, zero SSH sessions chasing ghosts.

That’s the kind of infrastructure I can get behind.


Tags: #logging #loki #grafana #promtail #homelab #observability #OpenClaw