Guide

LLM Integration

Generate AI-optimized output for automated analysis

burl can output results in formats optimized for Large Language Model consumption, including automatic performance analysis and recommendations.

LLM Output Modes

JSON Format

burl https://api.example.com/health --llm json

Produces structured JSON with schema information and automatic interpretation:

{
  "$schema": "https://burl.wania.app/schema/v1/result.json",
  "meta": {
    "tool": "burl",
    "version": "0.1.0",
    "timestamp": "2024-12-29T15:30:00Z"
  },
  "config": {
    "url": "https://api.example.com/health",
    "method": "GET",
    "connections": 10,
    "duration_ms": 10000
  },
  "summary": {
    "total_requests": 1234,
    "successful_requests": 1234,
    "failed_requests": 0,
    "requests_per_second": 123.15,
    "bytes_per_second": 46315,
    "success_rate": 1.0
  },
  "latency_ms": {
    "min": 5.2,
    "max": 156.8,
    "mean": 25.4,
    "p50": 12.4,
    "p90": 32.1,
    "p95": 45.2,
    "p99": 89.3
  },
  "interpretation": {
    "performance": "good",
    "issues": [],
    "recommendations": []
  }
}

Markdown Format

burl https://api.example.com/health --llm markdown

Produces human-readable Markdown optimized for LLM context:

# HTTP Benchmark Results

## Summary
| Metric | Value |
|--------|-------|
| Total Requests | 1,234 |
| Success Rate | 100% |
| Requests/sec | 123.15 |
| Throughput | 45.23 KB/s |

## Latency Distribution
| Percentile | Value |
|------------|-------|
| P50 | 12.4ms |
| P90 | 32.1ms |
| P95 | 45.2ms |
| P99 | 89.3ms |

## Analysis
Performance is **good**. No issues detected.

Automatic Analysis

The interpretation field provides intelligent analysis of your benchmark results.

Performance Rating

RatingCriteria
excellentP99 < 50ms, success rate = 100%
goodP99 < 200ms, success rate > 99%
acceptableP99 < 500ms, success rate > 95%
poorP99 > 500ms or success rate < 95%

Issue Detection

burl automatically detects common issues:

{
  "interpretation": {
    "performance": "poor",
    "issues": [
      "High P99 latency (523ms) indicates tail latency problems",
      "5.2% error rate exceeds acceptable threshold",
      "Large gap between P50 (45ms) and P99 (523ms) suggests inconsistent performance"
    ],
    "recommendations": [
      "Investigate server-side processing for slow requests",
      "Check for resource contention under load",
      "Consider implementing request timeouts",
      "Review error responses for root cause"
    ]
  }
}

What Gets Detected

  • High error rates
  • Timeout frequency
  • Latency spikes (P99 >> P50)
  • Connection errors
  • Unusual status code distributions

Use Cases

CI/CD Integration

#!/bin/bash
result=$(burl https://api.example.com --llm json)
performance=$(echo $result | jq -r '.interpretation.performance')

if [ "$performance" = "poor" ]; then
  echo "Performance regression detected!"
  echo $result | jq '.interpretation.issues[]'
  exit 1
fi

AI Agent Workflows

# Generate report for AI analysis
burl https://api.example.com --llm markdown > report.md

# Feed to AI for recommendations
cat report.md | llm "Analyze this benchmark and suggest optimizations"

Automated Monitoring

# Store results with timestamp
burl https://api.example.com --llm json \
  -o "benchmarks/$(date +%Y%m%d_%H%M%S).json"

# Parse multiple results for trending
for f in benchmarks/*.json; do
  echo "$(basename $f): $(jq '.latency_ms.p99' $f)ms P99"
done

Combining with Other Options

LLM output works with all other burl options:

# High concurrency with LLM analysis
burl https://api.example.com \
  -c 100 \
  -d 60s \
  --llm json \
  -o analysis.json

# Rate-limited test with markdown report
burl https://api.example.com \
  -q 500 \
  -d 5m \
  --llm markdown \
  -o report.md

Output to File

# LLM JSON to file
burl https://api.example.com --llm json -o analysis.json

# LLM Markdown to file
burl https://api.example.com --llm markdown -o report.md

Schema Information

The LLM JSON output includes a $schema field pointing to the JSON schema:

https://burl.wania.app/schema/v1/result.json

This allows LLMs to understand the structure and validate their interpretations.

Best Practices

  1. Use JSON for programmatic analysis: The structured format is easier for scripts and APIs to parse.
  2. Use Markdown for context: When feeding results to an LLM for natural language analysis, Markdown provides better context.
  3. Include the interpretation: The automatic analysis gives LLMs a starting point for deeper insights.
  4. Store results over time: Build a history of benchmark results to detect regressions and trends.