Workload Resources
Configure resource allocations for Kubeshark’s three main components: Hub, Worker (Sniffer/Tracer), and Front-end.
Component Overview
| Component | Role | Runs On |
|---|---|---|
| Hub | Central aggregator, API server, snapshot storage | Single pod |
| Worker | Traffic capture and indexing | DaemonSet (every targeted node) |
| Front | Dashboard UI | Single pod |
Hub Resources
The Hub aggregates traffic from all workers and serves the dashboard API.
tap:
resources:
hub:
limits:
cpu: "" # No limit (default)
memory: 5Gi
requests:
cpu: 50m
memory: 50Mi
| Setting | Default | Description |
|---|---|---|
limits.cpu | "" (unlimited) | Maximum CPU |
limits.memory | 5Gi | Maximum memory |
requests.cpu | 50m | Guaranteed CPU |
requests.memory | 50Mi | Guaranteed memory |
Sizing considerations:
- Memory scales with number of concurrent connections and API call volume
- Increase limits for high-traffic clusters
- Snapshot storage is separate (see Snapshots Configuration)
Recommended Sizing by Cluster Size
The chart defaults (requests: cpu 50m, memory 50Mi) are intentionally low so a first install fits anywhere — they are not appropriate for production.
The profiles below are sized against load tests at the corresponding cluster sizes, assuming roughly ~100 captured entries per second per worker (e.g. ~1k entries/s aggregate for a 10-worker cluster, ~20k for a 200-worker cluster).
| Cluster size | Workers (DaemonSet pods) | requests.cpu | requests.memory | limits.memory |
|---|---|---|---|---|
| Small | ≤10 | 250m | 4Gi | 5Gi |
| Medium | ≤50 | 1 | 4Gi | 5Gi |
| Large | ≤100 | 1500m | 4Gi | 5Gi |
| X-Large | ≤200 | 2 | 5Gi | 6Gi |
limits.cpu is intentionally not set — the chart’s default leaves it unset too. The CFS bandwidth controller that enforces CPU limits can throttle bursty workloads (the Hub’s pattern during traffic spikes and dashboard joins) even when the node has idle CPU available. Set limits.cpu only for specific reasons such as strict multi-tenant billing or hard latency SLOs.
If your per-worker entry rate is materially higher than ~100/s, or you have many concurrent dashboard clients, raise requests.cpu and both memory values proportionally and measure actual usage from the closest profile.
Worker Resources
Workers run on each node as a DaemonSet, capturing and indexing traffic.
Sniffer
Captures network packets:
tap:
resources:
sniffer:
limits:
cpu: "" # No limit (default)
memory: 5Gi
requests:
cpu: 50m
memory: 50Mi
Tracer
Handles eBPF-based tracing (TLS decryption, process correlation):
tap:
resources:
tracer:
limits:
cpu: "" # No limit (default)
memory: 5Gi
requests:
cpu: 50m
memory: 50Mi
| Setting | Default | Description |
|---|---|---|
limits.cpu | "" (unlimited) | Maximum CPU |
limits.memory | 5Gi | Maximum memory |
requests.cpu | 50m | Guaranteed CPU |
requests.memory | 50Mi | Guaranteed memory |
Sizing considerations:
- CPU usage scales with traffic volume and indexing complexity
- Memory scales with connection tracking and payload buffering
- Use Capture Filters to reduce load
Front-end Resources
The front-end serves the dashboard UI:
tap:
resources:
front:
limits:
cpu: 750m
memory: 1Gi
requests:
cpu: 50m
memory: 50Mi
The front-end is lightweight and typically doesn’t require adjustment.
Storage
Worker Storage
Each worker stores captured traffic temporarily:
tap:
storageLimit: 5Gi # Max storage per worker
When storage exceeds this limit, the pod is evicted and restarted.
Raw Capture Storage
Node-level FIFO buffer for raw packet capture:
tap:
capture:
raw:
storageSize: 1Gi # Per-node buffer size
Must be less than tap.storageLimit.
Snapshot Storage
Dedicated Hub storage for snapshots:
tap:
snapshots:
local:
storageClass: "" # Storage class (e.g., gp2)
storageSize: 20Gi # Total snapshot storage
See Raw Capture & Snapshots Configuration for details.
Traffic Sampling
Reduce resource usage by processing only a percentage of traffic:
tap:
trafficSampleRate: 100 # 0-100, default is 100 (all traffic)
Setting trafficSampleRate: 20 processes only 20% of L4 streams.
Health Probes
Configure liveness and readiness probes:
Hub Probes
tap:
probes:
hub:
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
Sniffer Probes
tap:
probes:
sniffer:
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
OOMKilled and Evictions
If containers exceed memory limits, they are OOMKilled. If storage exceeds limits, pods are evicted.
To prevent this:
- Increase resource limits
- Use Capture Filters to target fewer workloads
- Reduce
trafficSampleRate - Disable real-time indexing and use Delayed Indexing instead
Complete Example
tap:
# Storage
storageLimit: 10Gi
capture:
raw:
storageSize: 5Gi
snapshots:
local:
storageClass: gp2
storageSize: 100Gi
# Hub resources
resources:
hub:
limits:
cpu: 2000m
memory: 8Gi
requests:
cpu: 100m
memory: 256Mi
# Worker resources
sniffer:
limits:
cpu: 1000m
memory: 4Gi
requests:
cpu: 100m
memory: 128Mi
tracer:
limits:
cpu: 1000m
memory: 4Gi
requests:
cpu: 100m
memory: 128Mi
# Front-end resources
front:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 50m
memory: 64Mi
# Reduce load
trafficSampleRate: 50
What’s Next
- Helm Configuration Reference — All configuration options
- Capture Filters — Reduce workload targeting
- Performance — Performance tuning guide