Introduction

AI Red vs Blue Arena

A self-contained adversarial simulation platform where AI agents attack and defend each other to test robustness, security, alignment, and reasoning integrity.

What is the AI Arena?

The AI Red vs Blue Arena creates a continuous loop of:

Attack Generation (Red) - AI agents attempt to exploit vulnerabilities
Defense + Hardening (Blue) - AI agents defend and patch vulnerabilities
Evaluation & Scoring - Automated scoring and analysis
Retraining & Improvement - Integration with Oumi for model fine-tuning
Continuous Simulation + Monitoring - Real-time dashboards and alerts

Core Concept

Red Teaming (Attacker LLM)

Red Agents attempt to break the target AI via:

Prompt injection & jailbreaks
Tool execution manipulation
Context poisoning
Chain-of-thought leakage extraction
Reasoning corruption
Goal drift and long-game manipulation
Adversarial content crafting
Safety boundary probing

Blue Teaming (Defender LLM)

Blue Agents defend by:

Sanitizing incoming prompts
Enforcing guardrails & permissions
Validating tool calls
Intercepting policy violations
Auto-rewriting system prompts
Creating patch suggestions
Identifying harmful inputs
Suppressing hallucinations

The Arena Engine

The Arena runs Red vs Blue simulations on user-provided agents. Each simulation consists of:

Attack rounds
Defense rounds
Scoring
Logging
Post-analysis
Recommendations

Architecture Overview

Frontend: Next.js dashboard with real-time visualization
Backend: Express.js API server with WebSocket support
Database: lowdb (JSON-based) for persistence
LLM Integration: Groq (recommended), OpenAI, and Anthropic (with mock fallback)
Tool Execution: Cline for agent tool execution and sandboxing
Orchestration: Kestra workflows + custom match runner
Training: Oumi integration stubs for fine-tuning

Key Features

✅ Real tool execution via Cline (not just simulation)
✅ Sandboxed environment with permission gates
✅ Multiple LLM providers with automatic fallback
✅ 13+ Groq models available with automatic rotation
✅ Real-time WebSocket updates
✅ Kestra orchestration for automated workflows
✅ Oumi integration for fine-tuning
✅ Comprehensive scoring and evaluation system

Use Cases

For Developers

Validate agents before shipping
Test specific vulnerabilities
Auto-generate secure system prompts
Ensure tools cannot be exploited

For Enterprises

Continuous evaluation of internal AI systems
Centralized security metrics
Compliance reporting
Early detection of risky behaviors

For AI Safety Research

Observe model drift
Study multi-step manipulation
Evaluate long-horizon deception
Compare frontier models

For Security Teams

Automated LLM penetration testing
Attack simulation libraries
Replay and audit logs

Next Steps

Quick Start Guide - Get up and running in minutes
Architecture Overview - Understand the system design
Creating Agents - Learn how to create and configure agents