MCP in Action
Coming Soon — V2.00: MCP integration is coming as part of Kubeshark V2.00. Read the announcement.
This page walks through a complete example of AI-powered network analysis—from the user’s question to actionable insights.
The Scenario
A developer gets paged: checkout latency has spiked and users are abandoning carts. Instead of diving into dashboards and logs, they ask their AI assistant (configured with Kubeshark’s MCP server) to investigate.
The Flow
+------------------------------------------------------------------------------+
| USER <-> AI <-> KUBESHARK |
+------------------------------------------------------------------------------+
USER AI KUBESHARK
| | |
| "What can you tell me | |
| about my frontend | |
| service?" | |
| -------------------------> | |
| | |
| | GET /mcp/data-boundaries |
| | -----------------------------> |
| | |
| | <- 3 nodes, 14:00-15:00 UTC |
| | |
| | POST /mcp/snapshots |
| | {name: "frontend-analysis"} |
| | -----------------------------> |
| | |
| | <- 850 MB snapshot ready |
| | |
| | POST /delayed_dissection |
| | -----------------------------> |
| | |
| | <- 12,847 API calls dissected |
| | |
| | GET /mcp/calls?kfl=... |
| | -----------------------------> |
| | |
| | <- Traffic data (JSON) |
| | |
| | +---------------------+ |
| | | AI analyzes: | |
| | | - Performance | |
| | | - Anomalies | |
| | | - API patterns | |
| | +---------------------+ |
| | |
| "Based on 12,847 API | |
| calls, here's what I | |
| found..." | |
| <------------------------- | |
| | |
| "Tell me more about | |
| those 503 errors" | |
| -------------------------> | |
| | |
| | GET /mcp/calls?kfl= |
| | response.status==503 |
| | -----------------------------> |
| | |
| | <- 127 failed requests |
| | |
| "All 503 errors occur | |
| when payment-service | |
| exceeds 5s timeout..." | |
| <------------------------- | |
| | |
v v v
The Conversation
+---------------------------------------------------------------------------------+
| Terminal - claude |
+---------------------------------------------------------------------------------+
| |
| $ claude |
| |
| > What can you tell me about my service? The name is frontend. |
| |
| * Analyzing service: frontend |
| - Checking recorded traffic availability... |
| - Creating snapshot from last 1 hour of traffic... |
| - Running L7 protocol dissection... |
| - Querying API calls (12,847 found)... |
| - Analyzing patterns and anomalies... |
| |
| Based on analyzing 12,847 API calls over the last hour, here's what I |
| found about your frontend service: |
| |
| OK Healthy: |
| - Overall latency is good (45ms P50) |
| - TCP connection establishment is fast (< 5ms) |
| - Most endpoints return success (97.7% 2xx responses) |
| |
| ! Needs Attention: |
| 1. Checkout failures - 2.3% of /api/v1/checkout return 503 |
| 2. Retry storms - cart-service retrying 3x within 100ms |
| 3. Connection churn - 847 new connections/min to catalogue-db |
| 4. Sync blocking - checkout calls payment-service synchronously |
| |
| Want me to dig deeper into any of these issues? |
| |
| > Tell me more about those checkout 503 errors. What's causing them? |
| |
| * Investigating checkout 503 errors |
| - Filtering failed checkout requests... |
| - Analyzing 127 failed requests... |
| |
| I examined all 127 failed checkout requests. Here's what I found: |
| |
| Pattern: All 503 errors occur when payment-service response time |
| exceeds 5 seconds (your configured timeout). |
| |
| Root cause: payment-service makes synchronous calls to an external |
| payment gateway. When the gateway is slow (P99: 4.8s), requests |
| stack up and exhaust the connection pool. |
| |
| Timeline: |
| 14:22:58 Gateway latency: 200ms -> 4.2s |
| 14:23:15 First 503 from payment-service |
| 14:23:45 Connection pool exhausted (TCP RST packets observed) |
| |
| Recommendation: Add circuit breaker on payment-service's gateway |
| calls, or increase connection pool with shorter timeouts. |
| |
| > _ |
| |
+---------------------------------------------------------------------------------+
Behind the Scenes: MCP Tool Calls
Kubeshark continuously records network traffic, Kubernetes events, and operating system events across all nodes. When the AI needs to analyze traffic, it creates a snapshot—isolating a time window from this continuous recording for focused analysis.
1. Check Data Availability
GET /mcp/data-boundaries
→ Returns: 3 nodes with recorded data from 14:00 to 15:00 UTC
2. Create Snapshot from Recorded Traffic
POST /mcp/snapshots
{
"name": "frontend-analysis",
"duration": "1h"
}
→ Kubeshark isolates the last hour of recorded traffic into a dedicated snapshot → Returns: Snapshot creation started
3. Wait for Snapshot Completion
GET /mcp/snapshots/frontend-analysis
→ Polls until complete → Returns: 850 MB isolated across 3 nodes
4. Start L7 Dissection
POST /delayed_dissection.DelayedDissection/Start
{
"snapshot_name": "frontend-analysis",
"dissection_name": "analysis-1"
}
→ Returns: Dissection job running
5. Wait for Dissection Completion
GET /delayed_dissection.DelayedDissection/GetStatus
→ Polls until complete → Returns: 12,847 API calls dissected
6. Query Outbound Traffic (from frontend)
GET /mcp/calls?db=frontend-analysis/analysis-1&kfl=src.pod contains "frontend"&limit=1000&format=full
→ Returns: 3,421 calls from frontend service
7. Query Inbound Traffic (to frontend)
GET /mcp/calls?db=frontend-analysis/analysis-1&kfl=dst.svc=="frontend"&limit=1000&format=full
→ Returns: 8,926 calls to frontend service
The Analysis
With the dissected traffic in hand, the AI examines the data and identifies:
Performance Metrics
| Metric | Value | Assessment |
|---|---|---|
| Latency P50 | 45ms | Healthy |
| Latency P99 | 230ms | Acceptable |
| TCP handshake P50 | 2ms | Normal |
| TCP handshake P99 | 8ms | Normal |
| Error rate | 2.3% | Elevated |
TCP Handshake RTT: The tcp_handshake_p50/p90/p99_us fields measure TCP 3-way handshake completion time in microseconds. Values under 10ms indicate healthy network connectivity. Elevated values (>10ms) may indicate network congestion or cross-AZ traffic.
Anomalies Detected
- Checkout failures:
/api/v1/checkoutreturning 503 errors (2.3% of requests) - Retry storms:
cart-service→frontendshowing same request 3x within 100ms - Connection churn: 847 new TCP connections/minute to
catalogue-db
API Flow Analysis
- User journey: login → catalogue → cart → checkout
- Missing observability: 12% of requests lack correlation IDs
- Blocking pattern: checkout calls payment-service synchronously
The Report
AI Response:
“Based on analyzing 12,847 API calls over the last hour, here’s what I found about your
frontendservice:Healthy:
- Overall latency is good (45ms P50)
- TCP connection establishment is fast (< 5ms)
- Most endpoints return success (97.7% 2xx responses)
Needs Attention:
Checkout failures — 2.3% of
/api/v1/checkoutcalls return 503. The upstreampayment-serviceappears to be timing out under load.Retry storms —
cart-serviceis retrying failed requests too aggressively, sending the same request 3 times within 100ms. This amplifies traffic during incidents.Database connection churn — Your service is creating 847 new connections/minute to
catalogue-db. Consider implementing connection pooling.Synchronous blocking — The checkout flow calls
payment-servicesynchronously, blocking the request thread. Consider making this async or adding a timeout.Want me to dig deeper into any of these issues?”
Drilling Deeper
The developer wants to understand the checkout failures:
User:
“Tell me more about those checkout 503 errors. What’s causing them?”
AI (makes additional MCP calls):
GET /mcp/calls?db=frontend-analysis/analysis-1&kfl=request.path=="/api/v1/checkout" and response.status==503&format=full
AI Response:
“I examined all 127 failed checkout requests. Here’s what I found:
Pattern: All 503 errors occur when
payment-serviceresponse time exceeds 5 seconds. Your frontend has a 5-second timeout configured.Root cause:
payment-serviceis making synchronous calls to an external payment gateway. When the gateway is slow (P99 latency 4.8s), requests stack up and exhaust the connection pool.Timeline: Failures started at 14:23 UTC, correlating with a latency spike from the external gateway.
Evidence:
- 14:22:58 — Gateway latency jumps from 200ms to 4.2s
- 14:23:15 — First 503 from payment-service
- 14:23:45 — payment-service connection pool exhausted (saw TCP RST packets)
Recommendation: Add circuit breaker on payment-service’s external gateway calls, or increase the connection pool size with shorter timeouts.
Want me to capture a PCAP of the failing requests for deeper analysis?”
Key MCP Capabilities Used
| Capability | MCP Endpoint | Purpose |
|---|---|---|
| Data availability | GET /mcp/data-boundaries | Know what time range of recorded traffic is available |
| Snapshot creation | POST /mcp/snapshots | Isolate a time window from continuous recording |
| Protocol dissection | DelayedDissection/Start | Parse L7 protocols (HTTP, gRPC, etc.) |
| Traffic query | GET /mcp/calls | Retrieve dissected API calls |
| Filtering | kfl=... parameter | Focus on specific service, path, status |
What Made This Possible
Without MCP, this investigation would require:
- SSH into nodes or configure tcpdump
- Export PCAPs and load into Wireshark
- Manually filter and correlate thousands of packets
- Write custom scripts to aggregate findings
- Build the timeline by hand
Time estimate: 4-6 hours for an experienced engineer
With Kubeshark’s MCP server, the AI:
- Queried continuously recorded traffic—no setup needed
- Created a snapshot isolating the relevant time window
- Dissected protocols and correlated requests/responses
- Identified patterns across 12,847 API calls
- Built the timeline and traced the root cause
Time estimate: 3 minutes of conversation
Try It Yourself
Once you have Kubeshark’s MCP server configured, try asking:
- “Analyze my
[service-name]service and tell me what you find” - “Are there any error patterns in the
[namespace]namespace?” - “Compare traffic to
[service]between 9 AM and 10 AM vs yesterday” - “What services does
[service]depend on? Are any of them unhealthy?”
The AI will handle the snapshot creation, dissection, and analysis—you just ask questions.
What’s Next
- Use Cases — More scenarios for AI-powered analysis
- How It Works — Technical details of the MCP protocol
- L7 API Dissection — How Kubeshark parses protocols