Advanced Topics - MCP Academy

Adversarial Multi-Agent Reasoning with MCP

Multi-agent debate patterns use two or more agents with opposing positions to produce more reliable and well-calibrated outputs than a single agent can achieve alone.

Introduction

In this lesson, we explore the adversarial multi-agent pattern — a technique where two AI agents are assigned opposing positions on a topic and must reason, call MCP tools, and challenge each other's conclusions.

A third agent (or a human reviewer) then evaluates the arguments and determines the best outcome.

This pattern is especially useful for:

Hallucination detection: A second agent challenges unsubstantiated claims the first agent makes.

Threat modeling and security reviews: One agent argues that a system is safe; the other looks for vulnerabilities.

API or requirements design: One agent defends a proposed design; the other raises objections.

Factual verification: Both agents independently query the same MCP tools and cross-check each other's conclusions.

By sharing the same MCP tool set, both agents operate in the same information environment — which means any disagreement reflects genuine reasoning differences rather than an information asymmetry.

Learning Objectives

By the end of this lesson, you will be able to:

Explain why adversarial multi-agent patterns catch errors that single-agent pipelines miss.

Design a debate architecture where two agents share a common MCP tool set.

Implement "for" and "against" system prompts that guide each agent to argue its assigned position.

Add a judge agent (or human review step) that synthesizes the debate into a final verdict.

Understand how MCP tool-sharing works across concurrent agents.

Architecture Overview

The adversarial pattern follows this high-level flow:


flowchart TD

    Topic([Debate Topic / Claim]) --> ForAgent

    Topic --> AgainstAgent



    subgraph SharedMCPServer["Shared MCP Tool Server"]

        WebSearch[Web Search Tool]

        CodeExec[Code Execution Tool]

        DocReader[Optional: Document Reader Tool]

    end



    ForAgent["Agent A\n(Argues FOR)"] -->|Tool calls| SharedMCPServer

    AgainstAgent["Agent B\n(Argues AGAINST)"] -->|Tool calls| SharedMCPServer



    SharedMCPServer -->|Results| ForAgent

    SharedMCPServer -->|Results| AgainstAgent



    ForAgent -->|Opening argument| Debate[(Debate Transcript)]

    AgainstAgent -->|Rebuttal| Debate



    ForAgent -->|Counter-rebuttal| Debate

    AgainstAgent -->|Counter-rebuttal| Debate



    Debate --> JudgeAgent["Judge Agent\n(Evaluates arguments)"]

    JudgeAgent --> Verdict([Final Verdict & Reasoning])



    style ForAgent fill:#c2f0c2,stroke:#333

    style AgainstAgent fill:#f9d5e5,stroke:#333

    style JudgeAgent fill:#d5e8f9,stroke:#333

    style SharedMCPServer fill:#fff9c4,stroke:#333

Key design decisions

| Decision | Rationale |

|----------|-----------|

| Both agents share one MCP server | Eliminates information asymmetry — disagreements reflect reasoning, not data access |

| Agents have opposing system prompts | Forces each agent to stress-test the other side's position |

| A judge agent synthesizes the debate | Produces a single actionable output without human bottleneck |

| Multiple debate rounds | Allows each agent to respond to the other's tool-backed evidence |

Implementation

Step 1 — Shared MCP Tool Server

Start by exposing the tools that both agents will call. In this example we use a minimal Python MCP server built with FastMCP.

Python – Shared Tool Server


# shared_tools_server.py

from mcp.server.fastmcp import FastMCP

import httpx



mcp = FastMCP("debate-tools")



@mcp.tool()

async def web_search(query: str) -> str:

    """Search the web and return a short summary of the top results."""

    # Replace with your preferred search API (e.g., SerpAPI, Brave Search).

    async with httpx.AsyncClient() as client:

        response = await client.get(

            "https://api.search.example.com/search",

            params={"q": query, "num": 3},

            headers={"Authorization": "Bearer YOUR_API_KEY"},

        )

        response.raise_for_status()

        results = response.json().get("results", [])

    snippets = "\n".join(r["snippet"] for r in results)

    return f"Search results for '{query}':\n{snippets}"



@mcp.tool()

async def run_python(code: str) -> str:

    """Execute a Python snippet and return stdout + stderr.



    WARNING: This is an unsafe placeholder that runs code directly on the host.

    In production, replace with a sandboxed execution environment (e.g., a container

    with no network access, strict resource limits, and no access to the host filesystem).

    """

    import subprocess, sys, textwrap

    result = subprocess.run(

        [sys.executable, "-c", textwrap.dedent(code)],

        capture_output=True, text=True, timeout=10

    )

    return result.stdout + result.stderr



if __name__ == "__main__":

    mcp.run(transport="stdio")

Run with:


python shared_tools_server.py

TypeScript – Shared Tool Server


// shared-tools-server.ts

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";

import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";

import { z } from "zod";

import { execFile } from "child_process";

import { promisify } from "util";



const execFileAsync = promisify(execFile);



const server = new McpServer({ name: "debate-tools", version: "1.0.0" });



server.tool(

  "web_search",

  "Search the web and return a short summary of the top results",

  { query: z.string() },

  async ({ query }) => {

    // Replace with your preferred search API.

    const url = `https://api.search.example.com/search?q=${encodeURIComponent(query)}&num=3`;

    const response = await fetch(url, {

      headers: { Authorization: "Bearer YOUR_API_KEY" },

    });

    const data = (await response.json()) as { results: { snippet: string }[] };

    const snippets = data.results.map((r) => r.snippet).join("\n");

    return {

      content: [{ type: "text", text: `Search results for '${query}':\n${snippets}` }],

    };

  }

);



server.tool(

  "run_python",

  "Execute a Python snippet and return stdout + stderr (placeholder — use a real sandbox in production)",

  { code: z.string() },

  async ({ code }) => {

    // WARNING: This executes LLM-controlled code directly on the host process.

    // In production, always run inside an isolated sandbox (e.g., a container

    // with no network access and strict resource limits).

    // See the Security Considerations section for details.

    try {

      // Pass code as a direct argument to python3 — no shell invocation,

      // no string interpolation, no command-injection risk.

      const { stdout, stderr } = await execFileAsync("python3", ["-c", code], {

        timeout: 10000,

      });

      return { content: [{ type: "text", text: stdout + stderr }] };

    } catch (err: unknown) {

      const message = err instanceof Error ? err.message : String(err);

      return { content: [{ type: "text", text: `Error: ${message}` }] };

    }

  }

);



const transport = new StdioServerTransport();

await server.connect(transport);

Run with:


npx ts-node shared-tools-server.ts

---

Step 2 — Agent System Prompts

Each agent receives a system prompt that locks it into its assigned position. The key is that both agents know they are in a debate and that they *must* use tools to back their claims.

Python – System Prompts


# prompts.py



FOR_SYSTEM_PROMPT = """You are Agent A in a structured debate.

Your role is to argue *in favour* of the proposition given to you.

Rules:

- Support your position with evidence gathered from the available MCP tools.

- Call the web_search tool to find real supporting data.

- Call the run_python tool to verify quantitative claims with code.

- When your opponent makes a claim, challenge it specifically and with evidence.

- Do not concede your position unless your opponent provides irrefutable evidence.

- Keep each turn concise (≤ 200 words)."""



AGAINST_SYSTEM_PROMPT = """You are Agent B in a structured debate.

Your role is to argue *against* the proposition given to you.

Rules:

- Challenge the opposing agent's arguments with evidence from the available MCP tools.

- Call the web_search tool to find counter-evidence.

- Call the run_python tool to verify or disprove quantitative claims with code.

- Point out logical fallacies, missing context, or unsupported assertions.

- Do not concede your position unless the evidence is irrefutable.

- Keep each turn concise (≤ 200 words)."""



JUDGE_SYSTEM_PROMPT = """You are an impartial judge evaluating a structured debate.

Your task:

1. Read the full debate transcript.

2. Identify the strongest evidence-backed arguments on each side.

3. Note any claims that were left unchallenged.

4. Deliver a balanced verdict that states:

   - Which side presented the more compelling case and why.

   - Key caveats or nuances that neither side addressed adequately.

   - A confidence score (0–100) for the winning position."""

---

Step 3 — Debate Orchestrator

The orchestrator creates both agents, manages the debate turns, then passes the full transcript to the judge.

Python – Debate Orchestrator


# debate_orchestrator.py

import asyncio

from anthropic import AsyncAnthropic

from mcp import ClientSession, StdioServerParameters

from mcp.client.stdio import stdio_client

from prompts import FOR_SYSTEM_PROMPT, AGAINST_SYSTEM_PROMPT, JUDGE_SYSTEM_PROMPT



client = AsyncAnthropic()



NUM_ROUNDS = 3  # Number of back-and-forth exchange rounds





async def run_agent_turn(

    conversation_history: list[dict],

    system_prompt: str,

    session: ClientSession,

) -> str:

    """Run one agent turn with MCP tool support.



    Lists tools from the shared MCP session, passes them to the LLM, and

    handles tool_use blocks in a loop until the model returns a final text reply.

    """

    # Fetch the current tool list from the shared MCP server.

    tools_result = await session.list_tools()

    tools = [

        {

            "name": t.name,

            "description": t.description or "",

            "input_schema": t.inputSchema,

        }

        for t in tools_result.tools

    ]



    messages = list(conversation_history)

    while True:

        response = await client.messages.create(

            model="claude-opus-4-5",

            max_tokens=512,

            system=system_prompt,

            messages=messages,

            tools=tools,

        )



        # Collect any text the model produced.

        text_blocks = [b for b in response.content if b.type == "text"]



        # If the model is done (no tool calls), return its text reply.

        tool_uses = [b for b in response.content if b.type == "tool_use"]

        if not tool_uses:

            return text_blocks[0].text if text_blocks else ""



        # Record the assistant turn (may mix text + tool_use blocks).

        messages.append({"role": "assistant", "content": response.content})



        # Execute each tool call and collect results.

        tool_results = []

        for tool_use in tool_uses:

            result = await session.call_tool(tool_use.name, tool_use.input)

            tool_results.append(

                {

                    "type": "tool_result",

                    "tool_use_id": tool_use.id,

                    "content": result.content[0].text if result.content else "",

                }

            )



        # Feed the tool results back to the model.

        messages.append({"role": "user", "content": tool_results})





async def run_debate(proposition: str) -> dict:

    """

    Run a full adversarial debate on a proposition.



    Both agents share a single MCP session so they operate in the same

    tool environment. Returns a dictionary with the transcript and verdict.

    """

    server_params = StdioServerParameters(

        command="python", args=["shared_tools_server.py"]

    )

    async with stdio_client(server_params) as (read, write):

        async with ClientSession(read, write) as session:

            await session.initialize()



            transcript: list[dict] = []



            # Seed the debate with the proposition.

            opening_message = {"role": "user", "content": f"Proposition: {proposition}"}



            for_history: list[dict] = [opening_message]

            against_history: list[dict] = [opening_message]



            for round_num in range(1, NUM_ROUNDS + 1):

                print(f"\n--- Round {round_num} ---")



                # Agent A argues FOR.

                for_response = await run_agent_turn(for_history, FOR_SYSTEM_PROMPT, session)

                print(f"Agent A (FOR): {for_response}")

                transcript.append({"round": round_num, "agent": "FOR", "text": for_response})



                # Share Agent A's argument with Agent B.

                for_history.append({"role": "assistant", "content": for_response})

                against_history.append({"role": "user", "content": f"Opponent argued: {for_response}"})



                # Agent B argues AGAINST.

                against_response = await run_agent_turn(

                    against_history, AGAINST_SYSTEM_PROMPT, session

                )

                print(f"Agent B (AGAINST): {against_response}")

                transcript.append({"round": round_num, "agent": "AGAINST", "text": against_response})



                # Share Agent B's argument with Agent A for the next round.

                against_history.append({"role": "assistant", "content": against_response})

                for_history.append({"role": "user", "content": f"Opponent argued: {against_response}"})



            # Build the transcript summary for the judge.

            transcript_text = "\n\n".join(

                f"Round {t['round']} – {t['agent']}:\n{t['text']}" for t in transcript

            )

            judge_input = [

                {

                    "role": "user",

                    "content": f"Proposition: {proposition}\n\nDebate transcript:\n{transcript_text}",

                }

            ]



            # Judge evaluates the debate.

            verdict = await run_agent_turn(judge_input, JUDGE_SYSTEM_PROMPT, session)

            print(f"\n=== Judge Verdict ===\n{verdict}")



            return {"transcript": transcript, "verdict": verdict}





if __name__ == "__main__":

    proposition = (

        "Large language models will eliminate the need for junior software developers within five years."

    )

    result = asyncio.run(run_debate(proposition))

TypeScript – Debate Orchestrator


// debate-orchestrator.ts

import Anthropic from "@anthropic-ai/sdk";



const client = new Anthropic();



const FOR_SYSTEM_PROMPT = `You are Agent A in a structured debate.

Your role is to argue *in favour* of the proposition given to you.

Rules:

- Support your position with evidence gathered from the available MCP tools.

- Call the web_search tool to find real supporting data.

- When your opponent makes a claim, challenge it specifically and with evidence.

- Keep each turn concise (≤ 200 words).`;



const AGAINST_SYSTEM_PROMPT = `You are Agent B in a structured debate.

Your role is to argue *against* the proposition given to you.

Rules:

- Challenge the opposing agent's arguments with evidence from the available MCP tools.

- Call the web_search tool to find counter-evidence.

- Point out logical fallacies, missing context, or unsupported assertions.

- Keep each turn concise (≤ 200 words).`;



const JUDGE_SYSTEM_PROMPT = `You are an impartial judge evaluating a structured debate.

Deliver a verdict with:

1. Which side presented the more compelling case and why.

2. Key caveats or nuances that neither side addressed.

3. A confidence score (0–100) for the winning position.`;



type Message = { role: "user" | "assistant"; content: string };



type DebateTurn = { round: number; agent: "FOR" | "AGAINST"; text: string };



async function runAgentTurn(history: Message[], systemPrompt: string): Promise<string> {

  const response = await client.messages.create({

    model: "claude-opus-4-5",

    max_tokens: 512,

    system: systemPrompt,

    messages: history,

  });



  const text = response.content

    .filter((block) => block.type === "text")

    .map((block) => block.text)

    .join("\n")

    .trim();



  if (!text) {

    const blockTypes = response.content.map((block) => block.type).join(", ");

    throw new Error(

      `Expected at least one text response block, but received: ${blockTypes || "none"}`

    );

  }



  return text;

}



async function runDebate(

  proposition: string,

  numRounds = 3

): Promise<{ transcript: DebateTurn[]; verdict: string }> {

  const transcript: DebateTurn[] = [];

  const openingMessage: Message = { role: "user", content: `Proposition: ${proposition}` };

  const forHistory: Message[] = [openingMessage];

  const againstHistory: Message[] = [openingMessage];



  for (let round = 1; round <= numRounds; round++) {

    console.log(`\n--- Round ${round} ---`);



    // Agent A (FOR)

    const forResponse = await runAgentTurn(forHistory, FOR_SYSTEM_PROMPT);

    console.log(`Agent A (FOR): ${forResponse}`);

    transcript.push({ round, agent: "FOR", text: forResponse });

    forHistory.push({ role: "assistant", content: forResponse });

    againstHistory.push({ role: "user", content: `Opponent argued: ${forResponse}` });



    // Agent B (AGAINST)

    const againstResponse = await runAgentTurn(againstHistory, AGAINST_SYSTEM_PROMPT);

    console.log(`Agent B (AGAINST): ${againstResponse}`);

    transcript.push({ round, agent: "AGAINST", text: againstResponse });

    againstHistory.push({ role: "assistant", content: againstResponse });

    forHistory.push({ role: "user", content: `Opponent argued: ${againstResponse}` });

  }



  // Judge

  const transcriptText = transcript

    .map((t) => `Round ${t.round} – ${t.agent}:\n${t.text}`)

    .join("\n\n");

  const judgeHistory: Message[] = [

    {

      role: "user",

      content: `Proposition: ${proposition}\n\nDebate transcript:\n${transcriptText}`,

    },

  ];

  const verdict = await runAgentTurn(judgeHistory, JUDGE_SYSTEM_PROMPT);

  console.log(`\n=== Judge Verdict ===\n${verdict}`);



  return { transcript, verdict };

}



// Run

const proposition =

  "Large language models will eliminate the need for junior software developers within five years.";

runDebate(proposition).catch(console.error);

C# – Debate Orchestrator


// DebateOrchestrator.cs

using System;

using System.Collections.Generic;

using System.Linq;

using System.Threading.Tasks;

using Anthropic.SDK;

using Anthropic.SDK.Messaging;



public class DebateOrchestrator

{

    private const string Model = "claude-opus-4-5";

    private readonly AnthropicClient _client = new();



    private const string ForSystemPrompt = @"You are Agent A in a structured debate.

Your role is to argue *in favour* of the proposition given to you.

Rules:

- Support your position with evidence.

- Challenge your opponent's claims specifically.

- Keep each turn concise (≤ 200 words).";



    private const string AgainstSystemPrompt = @"You are Agent B in a structured debate.

Your role is to argue *against* the proposition given to you.

Rules:

- Challenge the opposing agent's arguments with evidence.

- Point out logical fallacies or unsupported assertions.

- Keep each turn concise (≤ 200 words).";



    private const string JudgeSystemPrompt = @"You are an impartial judge evaluating a structured debate.

Deliver a verdict with:

1. Which side presented the more compelling case and why.

2. Key caveats neither side addressed.

3. A confidence score (0–100) for the winning position.";



    private record DebateTurn(int Round, string Agent, string Text);



    private async Task<string> RunAgentTurnAsync(

        List<Message> history,

        string systemPrompt)

    {

        var request = new MessageParameters

        {

            Model = Model,

            MaxTokens = 512,

            System = [new SystemMessage(systemPrompt)],

            Messages = history

        };

        var response = await _client.Messages.GetClaudeMessageAsync(request);

        return response.Content.OfType<TextContent>().FirstOrDefault()?.Text ?? string.Empty;

    }



    public async Task<(List<DebateTurn> Transcript, string Verdict)> RunDebateAsync(

        string proposition,

        int numRounds = 3)

    {

        var transcript = new List<DebateTurn>();

        var opening = new Message { Role = RoleType.User, Content = $"Proposition: {proposition}" };



        var forHistory = new List<Message> { opening };

        var againstHistory = new List<Message> { opening };



        for (int round = 1; round <= numRounds; round++)

        {

            Console.WriteLine($"\n--- Round {round} ---");



            // Agent A (FOR)

            var forResponse = await RunAgentTurnAsync(forHistory, ForSystemPrompt);

            Console.WriteLine($"Agent A (FOR): {forResponse}");

            transcript.Add(new DebateTurn(round, "FOR", forResponse));

            forHistory.Add(new Message { Role = RoleType.Assistant, Content = forResponse });

            againstHistory.Add(new Message { Role = RoleType.User, Content = $"Opponent argued: {forResponse}" });



            // Agent B (AGAINST)

            var againstResponse = await RunAgentTurnAsync(againstHistory, AgainstSystemPrompt);

            Console.WriteLine($"Agent B (AGAINST): {againstResponse}");

            transcript.Add(new DebateTurn(round, "AGAINST", againstResponse));

            againstHistory.Add(new Message { Role = RoleType.Assistant, Content = againstResponse });

            forHistory.Add(new Message { Role = RoleType.User, Content = $"Opponent argued: {againstResponse}" });

        }



        // Judge

        var transcriptText = string.Join("\n\n",

            transcript.Select(t => $"Round {t.Round} – {t.Agent}:\n{t.Text}"));

        var judgeHistory = new List<Message>

        {

            new() { Role = RoleType.User, Content = $"Proposition: {proposition}\n\nDebate transcript:\n{transcriptText}" }

        };

        var verdict = await RunAgentTurnAsync(judgeHistory, JudgeSystemPrompt);

        Console.WriteLine($"\n=== Judge Verdict ===\n{verdict}");



        return (transcript, verdict);

    }



    public static async Task Main()

    {

        var orchestrator = new DebateOrchestrator();

        const string proposition =

            "Large language models will eliminate the need for junior software developers within five years.";

        await orchestrator.RunDebateAsync(proposition);

    }

}

---

Step 4 — Wiring MCP Tools into the Agents

The Python orchestrator above already shows the complete MCP-wired implementation. The key pattern is:

One shared session: run_debate opens a single ClientSession and passes it to every run_agent_turn call, so both agents and the judge operate in the same tool environment.

Tool listing per turn: run_agent_turn calls session.list_tools() to fetch the current tool definitions and forwards them to the LLM as the tools parameter.

Tool-use loop: When the model returns tool_use blocks, run_agent_turn calls session.call_tool() for each one and feeds the results back to the model, repeating until the model produces a final text response.

Refer to 03-GettingStarted/02-client for complete MCP client examples in each language.

---

Practical Use Cases

|----------|-----------|---------------|--------------|

---

Security Considerations

When running adversarial agents in production, keep these points in mind:

Sandbox code execution: The run_python tool must execute in an isolated environment (e.g., a container with no network access and resource limits). Never run untrusted LLM-generated code directly on the host.

Tool call validation: Validate all tool inputs before execution. Both agents share the same tool server, so a malicious prompt injected into the debate could attempt to misuse tools.

Rate limiting: Implement per-agent rate limits on tool calls to prevent runaway loops.

Audit logging: Log every tool call and result so you can review what evidence each agent used to reach its conclusions.

Human-in-the-loop: For high-stakes decisions, route the judge's verdict through a human reviewer before acting on it.

See 02-Security for a comprehensive guide to MCP security best practices.

---

Exercise

Design an adversarial MCP pipeline for one of the following scenarios:

1. Code review: Agent A defends a pull request; Agent B looks for bugs, security issues, and style problems. The judge summarises the top issues.

2. Architecture decision: Agent A proposes microservices; Agent B advocates for a monolith. The judge produces a decision matrix.

3. Content moderation: Agent A argues a piece of content is safe to publish; Agent B finds policy violations. The judge assigns a risk score.

For each scenario:

Define the system prompts for both agents and the judge.

Identify which MCP tools each agent needs.

Sketch the message flow (opening argument → rebuttal → counter-rebuttal → verdict).

Describe how you would validate the judge's verdict before acting on it.

---

Key Takeaways

Adversarial multi-agent patterns use opposing system prompts to force agents to stress-test each other's reasoning.

Sharing a single MCP tool server ensures both agents work from the same information, so disagreements are about reasoning, not data access.

A judge agent synthesizes the debate into an actionable verdict without requiring a human bottleneck for every decision.

This pattern is especially powerful for hallucination detection, threat modeling, factual verification, and design reviews.

Secure tool execution and robust logging are essential when running adversarial agents in production.

---

What's next

5.1 MCP Integration

5.8 Security

5.5 Routing

MCP를 이용한 적대적 다중 에이전트 추론

다중 에이전트 토론 패턴은 서로 반대 입장을 가진 두 명 이상의 에이전트를 사용하여 단일 에이전트가 단독으로 달성할 수 있는 것보다 더 신뢰할 수 있고 잘 보정된 출력을 생성합니다.

소개

이 강의에서는 적대적 다중 에이전트 패턴을 살펴봅니다 — 이는 두 AI 에이전트가 특정 주제에 대해 상반된 입장을 할당받아 추론하고 MCP 도구를 호출하며 서로의 결론에 도전하는 기법입니다. 세 번째 에이전트(또는 인간 리뷰어)가 그 논거를 평가하여 최선의 결과를 결정합니다.

이 패턴은 특히 다음에 유용합니다:

환각 감지: 두 번째 에이전트가 첫 번째 에이전트가 제시한 근거 없는 주장에 도전합니다.

위협 모델링 및 보안 리뷰: 한 에이전트는 시스템이 안전하다고 주장하고, 다른 에이전트는 취약점을 찾습니다.

API 또는 요구사항 설계: 한 에이전트는 제안된 설계를 방어하고, 다른 에이전트는 반론을 제기합니다.

사실 검증: 두 에이전트 모두 독립적으로 동일한 MCP 도구를 조회하고 서로의 결론을 상호 검증합니다.

동일한 MCP 도구 집합을 공유함으로써 두 에이전트는 동일한 정보 환경에서 작동합니다 — 이는 어떠한 의견 차이도 정보 비대칭이 아닌 진정한 추론 차이를 반영함을 의미합니다.

학습 목표

이 강의가 끝나면 다음을 할 수 있습니다:

적대적 다중 에이전트 패턴이 단일 에이전트 파이프라인이 놓치는 오류를 포착하는 이유 설명하기

두 에이전트가 공통 MCP 도구 집합을 공유하는 토론 아키텍처 설계하기

각 에이전트가 할당된 입장을 주장하도록 안내하는 "찬성" 및 "반대" 시스템 프롬프트 구현하기

토론을 최종 평결로 종합하는 판사 에이전트(또는 인간 리뷰 단계) 추가하기

동시 에이전트 간 MCP 도구 공유 작동 방식 이해하기

아키텍처 개요

적대적 패턴은 다음과 같은 상위 흐름을 따릅니다:


flowchart TD

    Topic([토론 주제 / 주장]) --> ForAgent

    Topic --> AgainstAgent



    subgraph SharedMCPServer["공유 MCP 도구 서버"]

        WebSearch[웹 검색 도구]

        CodeExec[코드 실행 도구]

        DocReader[선택 사항: 문서 읽기 도구]

    end



    ForAgent["에이전트 A\n(찬성 주장)"] -->|도구 호출| SharedMCPServer

    AgainstAgent["에이전트 B\n(반대 주장)"] -->|도구 호출| SharedMCPServer



    SharedMCPServer -->|결과| ForAgent

    SharedMCPServer -->|결과| AgainstAgent



    ForAgent -->|개회 발언| Debate[(토론 기록)]

    AgainstAgent -->|반박| Debate



    ForAgent -->|재반박| Debate

    AgainstAgent -->|재반박| Debate



    Debate --> JudgeAgent["심판 에이전트\n(주장 평가)"]

    JudgeAgent --> Verdict([최종 평결 및 이유])



    style ForAgent fill:#c2f0c2,stroke:#333

    style AgainstAgent fill:#f9d5e5,stroke:#333

    style JudgeAgent fill:#d5e8f9,stroke:#333

    style SharedMCPServer fill:#fff9c4,stroke:#333

주요 설계 결정사항

| 결정사항 | 이유 |

|----------|-------|

| 두 에이전트가 하나의 MCP 서버 공유 | 정보 비대칭 제거 — 의견 차이는 데이터 접근이 아닌 추론 차이 반영 |

| 에이전트별 상반된 시스템 프롬프트 | 각 에이전트가 상대방 입장을 철저히 검증하도록 강제 |

| 판사 에이전트가 토론 종합 | 인간 병목 없이 단일 실행 가능한 출력 생성 |

| 여러 차례 토론 라운드 | 각 에이전트가 상대방의 도구 기반 증거에 응답할 기회 제공 |

구현

1단계 — 공유 MCP 도구 서버

두 에이전트가 호출할 도구를 노출하는 것부터 시작합니다. 이 예제에서는 FastMCP로 구축된 최소한의 Python MCP 서버를 사용합니다.

Python – 공유 도구 서버


# shared_tools_server.py

from mcp.server.fastmcp import FastMCP

import httpx



mcp = FastMCP("debate-tools")



@mcp.tool()

async def web_search(query: str) -> str:

    """Search the web and return a short summary of the top results."""

    # 선호하는 검색 API로 교체하세요 (예: SerpAPI, Brave Search).

    async with httpx.AsyncClient() as client:

        response = await client.get(

            "https://api.search.example.com/search",

            params={"q": query, "num": 3},

            headers={"Authorization": "Bearer YOUR_API_KEY"},

        )

        response.raise_for_status()

        results = response.json().get("results", [])

    snippets = "\n".join(r["snippet"] for r in results)

    return f"Search results for '{query}':\n{snippets}"



@mcp.tool()

async def run_python(code: str) -> str:

    """Execute a Python snippet and return stdout + stderr.



    WARNING: This is an unsafe placeholder that runs code directly on the host.

    In production, replace with a sandboxed execution environment (e.g., a container

    with no network access, strict resource limits, and no access to the host filesystem).

    """

    import subprocess, sys, textwrap

    result = subprocess.run(

        [sys.executable, "-c", textwrap.dedent(code)],

        capture_output=True, text=True, timeout=10

    )

    return result.stdout + result.stderr



if __name__ == "__main__":

    mcp.run(transport="stdio")

실행 방법:


python shared_tools_server.py

TypeScript – 공유 도구 서버


// shared-tools-server.ts

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";

import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";

import { z } from "zod";

import { execFile } from "child_process";

import { promisify } from "util";



const execFileAsync = promisify(execFile);



const server = new McpServer({ name: "debate-tools", version: "1.0.0" });



server.tool(

  "web_search",

  "Search the web and return a short summary of the top results",

  { query: z.string() },

  async ({ query }) => {

    // 선호하는 검색 API로 교체하세요.

    const url = `https://api.search.example.com/search?q=${encodeURIComponent(query)}&num=3`;

    const response = await fetch(url, {

      headers: { Authorization: "Bearer YOUR_API_KEY" },

    });

    const data = (await response.json()) as { results: { snippet: string }[] };

    const snippets = data.results.map((r) => r.snippet).join("\n");

    return {

      content: [{ type: "text", text: `Search results for '${query}':\n${snippets}` }],

    };

  }

);



server.tool(

  "run_python",

  "Execute a Python snippet and return stdout + stderr (placeholder — use a real sandbox in production)",

  { code: z.string() },

  async ({ code }) => {

    // 경고: 이것은 LLM이 제어하는 코드를 호스트 프로세스에서 직접 실행합니다.

    // 운영 환경에서는 항상 격리된 샌드박스(예: 네트워크 접근 불가 및 엄격한 리소스 제한이 있는 컨테이너) 내에서 실행하세요.

    // 네트워크 접근 불가 및 엄격한 리소스 제한이 있는 컨테이너).

    // 자세한 내용은 보안 고려사항 섹션을 참조하세요.

    try {

      // 코드를 python3에 직접 인수로 전달하세요 — 셸 호출 없이,

      // 문자열 보간 없이, 명령어 삽입 위험 없이.

      const { stdout, stderr } = await execFileAsync("python3", ["-c", code], {

        timeout: 10000,

      });

      return { content: [{ type: "text", text: stdout + stderr }] };

    } catch (err: unknown) {

      const message = err instanceof Error ? err.message : String(err);

      return { content: [{ type: "text", text: `Error: ${message}` }] };

    }

  }

);



const transport = new StdioServerTransport();

await server.connect(transport);

실행 방법:


npx ts-node shared-tools-server.ts

---

2단계 — 에이전트 시스템 프롬프트

각 에이전트는 할당된 입장에 고정되는 시스템 프롬프트를 받습니다. 핵심은 두 에이전트 모두 토론 중임을 알고 있으며 반드시 도구를 사용해 주장을 뒷받침해야 한다는 점입니다.

Python – 시스템 프롬프트


# prompts.py



FOR_SYSTEM_PROMPT = """You are Agent A in a structured debate.

Your role is to argue *in favour* of the proposition given to you.

Rules:

- Support your position with evidence gathered from the available MCP tools.

- Call the web_search tool to find real supporting data.

- Call the run_python tool to verify quantitative claims with code.

- When your opponent makes a claim, challenge it specifically and with evidence.

- Do not concede your position unless your opponent provides irrefutable evidence.

- Keep each turn concise (≤ 200 words)."""



AGAINST_SYSTEM_PROMPT = """You are Agent B in a structured debate.

Your role is to argue *against* the proposition given to you.

Rules:

- Challenge the opposing agent's arguments with evidence from the available MCP tools.

- Call the web_search tool to find counter-evidence.

- Call the run_python tool to verify or disprove quantitative claims with code.

- Point out logical fallacies, missing context, or unsupported assertions.

- Do not concede your position unless the evidence is irrefutable.

- Keep each turn concise (≤ 200 words)."""



JUDGE_SYSTEM_PROMPT = """You are an impartial judge evaluating a structured debate.

Your task:

1. Read the full debate transcript.

2. Identify the strongest evidence-backed arguments on each side.

3. Note any claims that were left unchallenged.

4. Deliver a balanced verdict that states:

   - Which side presented the more compelling case and why.

   - Key caveats or nuances that neither side addressed adequately.

   - A confidence score (0–100) for the winning position."""

---

3단계 — 토론 주관자(오케스트레이터)

주관자는 두 에이전트를 생성하고, 토론 차례를 관리하며, 전체 대화 기록을 판사에게 전달합니다.

Python – 토론 주관자


# debate_orchestrator.py

import asyncio

from anthropic import AsyncAnthropic

from mcp import ClientSession, StdioServerParameters

from mcp.client.stdio import stdio_client

from prompts import FOR_SYSTEM_PROMPT, AGAINST_SYSTEM_PROMPT, JUDGE_SYSTEM_PROMPT



client = AsyncAnthropic()



NUM_ROUNDS = 3  # 주고받는 교환 라운드 수





async def run_agent_turn(

    conversation_history: list[dict],

    system_prompt: str,

    session: ClientSession,

) -> str:

    """Run one agent turn with MCP tool support.



    Lists tools from the shared MCP session, passes them to the LLM, and

    handles tool_use blocks in a loop until the model returns a final text reply.

    """

    # 공유 MCP 서버에서 현재 도구 목록을 가져옵니다.

    tools_result = await session.list_tools()

    tools = [

        {

            "name": t.name,

            "description": t.description or "",

            "input_schema": t.inputSchema,

        }

        for t in tools_result.tools

    ]



    messages = list(conversation_history)

    while True:

        response = await client.messages.create(

            model="claude-opus-4-5",

            max_tokens=512,

            system=system_prompt,

            messages=messages,

            tools=tools,

        )



        # 모델이 생성한 모든 텍스트를 수집합니다.

        text_blocks = [b for b in response.content if b.type == "text"]



        # 모델이 완료된 경우(도구 호출 없음) 텍스트 응답을 반환합니다.

        tool_uses = [b for b in response.content if b.type == "tool_use"]

        if not tool_uses:

            return text_blocks[0].text if text_blocks else ""



        # 어시스턴트 차례를 기록합니다(텍스트와 tool_use 블록이 혼합될 수 있음).

        messages.append({"role": "assistant", "content": response.content})



        # 각 도구 호출을 실행하고 결과를 수집합니다.

        tool_results = []

        for tool_use in tool_uses:

            result = await session.call_tool(tool_use.name, tool_use.input)

            tool_results.append(

                {

                    "type": "tool_result",

                    "tool_use_id": tool_use.id,

                    "content": result.content[0].text if result.content else "",

                }

            )



        # 도구 결과를 모델에 다시 제공합니다.

        messages.append({"role": "user", "content": tool_results})





async def run_debate(proposition: str) -> dict:

    """

    Run a full adversarial debate on a proposition.



    Both agents share a single MCP session so they operate in the same

    tool environment. Returns a dictionary with the transcript and verdict.

    """

    server_params = StdioServerParameters(

        command="python", args=["shared_tools_server.py"]

    )

    async with stdio_client(server_params) as (read, write):

        async with ClientSession(read, write) as session:

            await session.initialize()



            transcript: list[dict] = []



            # 제안을 통해 토론을 시작합니다.

            opening_message = {"role": "user", "content": f"Proposition: {proposition}"}



            for_history: list[dict] = [opening_message]

            against_history: list[dict] = [opening_message]



            for round_num in range(1, NUM_ROUNDS + 1):

                print(f"\n--- Round {round_num} ---")



                # 에이전트 A가 찬성 입장을 주장합니다.

                for_response = await run_agent_turn(for_history, FOR_SYSTEM_PROMPT, session)

                print(f"Agent A (FOR): {for_response}")

                transcript.append({"round": round_num, "agent": "FOR", "text": for_response})



                # 에이전트 A의 주장을 에이전트 B와 공유합니다.

                for_history.append({"role": "assistant", "content": for_response})

                against_history.append({"role": "user", "content": f"Opponent argued: {for_response}"})



                # 에이전트 B가 반대 입장을 주장합니다.

                against_response = await run_agent_turn(

                    against_history, AGAINST_SYSTEM_PROMPT, session

                )

                print(f"Agent B (AGAINST): {against_response}")

                transcript.append({"round": round_num, "agent": "AGAINST", "text": against_response})



                # 다음 라운드를 위해 에이전트 B의 주장을 에이전트 A와 공유합니다.

                against_history.append({"role": "assistant", "content": against_response})

                for_history.append({"role": "user", "content": f"Opponent argued: {against_response}"})



            # 심사를 위한 대본 요약을 만듭니다.

            transcript_text = "\n\n".join(

                f"Round {t['round']} – {t['agent']}:\n{t['text']}" for t in transcript

            )

            judge_input = [

                {

                    "role": "user",

                    "content": f"Proposition: {proposition}\n\nDebate transcript:\n{transcript_text}",

                }

            ]



            # 심사는 토론을 평가합니다.

            verdict = await run_agent_turn(judge_input, JUDGE_SYSTEM_PROMPT, session)

            print(f"\n=== Judge Verdict ===\n{verdict}")



            return {"transcript": transcript, "verdict": verdict}





if __name__ == "__main__":

    proposition = (

        "Large language models will eliminate the need for junior software developers within five years."

    )

    result = asyncio.run(run_debate(proposition))

TypeScript – 토론 주관자


// 토론 조정자.ts

import Anthropic from "@anthropic-ai/sdk";



const client = new Anthropic();



const FOR_SYSTEM_PROMPT = `You are Agent A in a structured debate.

Your role is to argue *in favour* of the proposition given to you.

Rules:

- Support your position with evidence gathered from the available MCP tools.

- Call the web_search tool to find real supporting data.

- When your opponent makes a claim, challenge it specifically and with evidence.

- Keep each turn concise (≤ 200 words).`;



const AGAINST_SYSTEM_PROMPT = `You are Agent B in a structured debate.

Your role is to argue *against* the proposition given to you.

Rules:

- Challenge the opposing agent's arguments with evidence from the available MCP tools.

- Call the web_search tool to find counter-evidence.

- Point out logical fallacies, missing context, or unsupported assertions.

- Keep each turn concise (≤ 200 words).`;



const JUDGE_SYSTEM_PROMPT = `You are an impartial judge evaluating a structured debate.

Deliver a verdict with:

1. Which side presented the more compelling case and why.

2. Key caveats or nuances that neither side addressed.

3. A confidence score (0–100) for the winning position.`;



type Message = { role: "user" | "assistant"; content: string };



type DebateTurn = { round: number; agent: "FOR" | "AGAINST"; text: string };



async function runAgentTurn(history: Message[], systemPrompt: string): Promise<string> {

  const response = await client.messages.create({

    model: "claude-opus-4-5",

    max_tokens: 512,

    system: systemPrompt,

    messages: history,

  });



  const text = response.content

    .filter((block) => block.type === "text")

    .map((block) => block.text)

    .join("\n")

    .trim();



  if (!text) {

    const blockTypes = response.content.map((block) => block.type).join(", ");

    throw new Error(

      `Expected at least one text response block, but received: ${blockTypes || "none"}`

    );

  }



  return text;

}



async function runDebate(

  proposition: string,

  numRounds = 3

): Promise<{ transcript: DebateTurn[]; verdict: string }> {

  const transcript: DebateTurn[] = [];

  const openingMessage: Message = { role: "user", content: `Proposition: ${proposition}` };

  const forHistory: Message[] = [openingMessage];

  const againstHistory: Message[] = [openingMessage];



  for (let round = 1; round <= numRounds; round++) {

    console.log(`\n--- Round ${round} ---`);



    // 에이전트 A (찬성)

    const forResponse = await runAgentTurn(forHistory, FOR_SYSTEM_PROMPT);

    console.log(`Agent A (FOR): ${forResponse}`);

    transcript.push({ round, agent: "FOR", text: forResponse });

    forHistory.push({ role: "assistant", content: forResponse });

    againstHistory.push({ role: "user", content: `Opponent argued: ${forResponse}` });



    // 에이전트 B (반대)

    const againstResponse = await runAgentTurn(againstHistory, AGAINST_SYSTEM_PROMPT);

    console.log(`Agent B (AGAINST): ${againstResponse}`);

    transcript.push({ round, agent: "AGAINST", text: againstResponse });

    againstHistory.push({ role: "assistant", content: againstResponse });

    forHistory.push({ role: "user", content: `Opponent argued: ${againstResponse}` });

  }



  // 판사

  const transcriptText = transcript

    .map((t) => `Round ${t.round} – ${t.agent}:\n${t.text}`)

    .join("\n\n");

  const judgeHistory: Message[] = [

    {

      role: "user",

      content: `Proposition: ${proposition}\n\nDebate transcript:\n${transcriptText}`,

    },

  ];

  const verdict = await runAgentTurn(judgeHistory, JUDGE_SYSTEM_PROMPT);

  console.log(`\n=== Judge Verdict ===\n${verdict}`);



  return { transcript, verdict };

}



// 실행

const proposition =

  "Large language models will eliminate the need for junior software developers within five years.";

runDebate(proposition).catch(console.error);

C# – 토론 주관자


// DebateOrchestrator.cs

using System;

using System.Collections.Generic;

using System.Linq;

using System.Threading.Tasks;

using Anthropic.SDK;

using Anthropic.SDK.Messaging;



public class DebateOrchestrator

{

    private const string Model = "claude-opus-4-5";

    private readonly AnthropicClient _client = new();



    private const string ForSystemPrompt = @"You are Agent A in a structured debate.

Your role is to argue *in favour* of the proposition given to you.

Rules:

- Support your position with evidence.

- Challenge your opponent's claims specifically.

- Keep each turn concise (≤ 200 words).";



    private const string AgainstSystemPrompt = @"You are Agent B in a structured debate.

Your role is to argue *against* the proposition given to you.

Rules:

- Challenge the opposing agent's arguments with evidence.

- Point out logical fallacies or unsupported assertions.

- Keep each turn concise (≤ 200 words).";



    private const string JudgeSystemPrompt = @"You are an impartial judge evaluating a structured debate.

Deliver a verdict with:

1. Which side presented the more compelling case and why.

2. Key caveats neither side addressed.

3. A confidence score (0–100) for the winning position.";



    private record DebateTurn(int Round, string Agent, string Text);



    private async Task<string> RunAgentTurnAsync(

        List<Message> history,

        string systemPrompt)

    {

        var request = new MessageParameters

        {

            Model = Model,

            MaxTokens = 512,

            System = [new SystemMessage(systemPrompt)],

            Messages = history

        };

        var response = await _client.Messages.GetClaudeMessageAsync(request);

        return response.Content.OfType<TextContent>().FirstOrDefault()?.Text ?? string.Empty;

    }



    public async Task<(List<DebateTurn> Transcript, string Verdict)> RunDebateAsync(

        string proposition,

        int numRounds = 3)

    {

        var transcript = new List<DebateTurn>();

        var opening = new Message { Role = RoleType.User, Content = $"Proposition: {proposition}" };



        var forHistory = new List<Message> { opening };

        var againstHistory = new List<Message> { opening };



        for (int round = 1; round <= numRounds; round++)

        {

            Console.WriteLine($"\n--- Round {round} ---");



            // Agent A (FOR)

            var forResponse = await RunAgentTurnAsync(forHistory, ForSystemPrompt);

            Console.WriteLine($"Agent A (FOR): {forResponse}");

            transcript.Add(new DebateTurn(round, "FOR", forResponse));

            forHistory.Add(new Message { Role = RoleType.Assistant, Content = forResponse });

            againstHistory.Add(new Message { Role = RoleType.User, Content = $"Opponent argued: {forResponse}" });



            // Agent B (AGAINST)

            var againstResponse = await RunAgentTurnAsync(againstHistory, AgainstSystemPrompt);

            Console.WriteLine($"Agent B (AGAINST): {againstResponse}");

            transcript.Add(new DebateTurn(round, "AGAINST", againstResponse));

            againstHistory.Add(new Message { Role = RoleType.Assistant, Content = againstResponse });

            forHistory.Add(new Message { Role = RoleType.User, Content = $"Opponent argued: {againstResponse}" });

        }



        // Judge

        var transcriptText = string.Join("\n\n",

            transcript.Select(t => $"Round {t.Round} – {t.Agent}:\n{t.Text}"));

        var judgeHistory = new List<Message>

        {

            new() { Role = RoleType.User, Content = $"Proposition: {proposition}\n\nDebate transcript:\n{transcriptText}" }

        };

        var verdict = await RunAgentTurnAsync(judgeHistory, JudgeSystemPrompt);

        Console.WriteLine($"\n=== Judge Verdict ===\n{verdict}");



        return (transcript, verdict);

    }



    public static async Task Main()

    {

        var orchestrator = new DebateOrchestrator();

        const string proposition =

            "Large language models will eliminate the need for junior software developers within five years.";

        await orchestrator.RunDebateAsync(proposition);

    }

}

---

4단계 — 에이전트에 MCP 도구 연동

위 Python 주관자 코드는 이미 완전한 MCP 연동 구현을 보여줍니다. 주요 패턴은 다음과 같습니다:

하나의 공유 세션: run_debate가 단일 ClientSession을 열고 이를 각 run_agent_turn 호출에 전달하여 두 에이전트와 판사가 동일한 도구 환경에서 작동하게 함

턴별 도구 목록 호출: run_agent_turn이 session.list_tools()를 호출해 현재 도구 정의를 가져와 LLM에 tools 매개변수로 전달

도구 사용 루프: 모델이 tool_use 블록을 반환하면 run_agent_turn이 각 도구에 대해 session.call_tool() 호출 후 결과를 모델에 다시 공급, 최종 텍스트 응답이 나올 때까지 반복

각 언어별 전체 MCP 클라이언트 예제는 03-GettingStarted/02-client를 참고하세요.

---

실용 사례

|----------|-------------|---------------|----------|

---

보안 고려사항

운영 환경에서 적대적 에이전트를 실행할 때 다음을 유의하세요:

샌드박스 코드 실행: run_python 도구는 격리된 환경(예: 네트워크 비접속 및 자원 제한이 있는 컨테이너)에서 실행되어야 합니다. 신뢰할 수 없는 LLM 생성 코드를 호스트에서 직접 실행하지 마세요.

도구 호출 검증: 실행 전에 모든 도구 입력을 검증하세요. 두 에이전트가 동일한 도구 서버를 공유하므로 토론 중 악의적 프롬프트가 도구를 악용할 수 있습니다.

속도 제한: 제어 불가능한 호출 루프를 막기 위해 에이전트별 도구 호출 횟수 제한을 적용하세요.

감사 로깅: 각 도구 호출 및 결과를 로그에 남겨 각 에이전트가 어떤 증거로 결론에 도달했는지 추적 가능하도록 하세요.

인간 참여: 중요한 결정의 경우, 판사의 평결을 인간 리뷰어에게 경유시킨 뒤 실행하세요.

MCP 보안 모범 사례에 관한 전체 안내는 02-Security를 참고하세요.

---

연습 문제

다음 시나리오 중 하나에 대해 적대적 MCP 파이프라인을 설계하세요:

1. 코드 리뷰: 에이전트 A는 풀 리퀘스트를 방어하고, 에이전트 B는 버그, 보안 문제, 스타일 문제를 찾습니다. 판사는 주요 문제를 요약합니다.

2. 아키텍처 결정: 에이전트 A는 마이크로서비스를 제안하고, 에이전트 B는 모놀리스를 옹호합니다. 판사는 결정 매트릭스를 작성합니다.

3. 콘텐츠 검열: 에이전트 A는 게시할 콘텐츠가 안전하다고 주장하고, 에이전트 B는 정책 위반을 찾습니다. 판사는 위험 점수를 부여합니다.

각 시나리오에 대해:

두 에이전트와 판사의 시스템 프롬프트를 정의하세요.

각 에이전트가 필요로 하는 MCP 도구를 식별하세요.

메시지 흐름(초기 주장 → 반박 → 재반박 → 평결)을 구상하세요.

평결을 실행하기 전에 어떻게 검증할지 설명하세요.

---

핵심 요약

적대적 다중 에이전트 패턴은 상반된 시스템 프롬프트를 사용해 에이전트들이 서로의 추론을 철저히 검증하도록 만듭니다.

하나의 MCP 도구 서버를 공유해 두 에이전트가 동일 정보를 기반으로 작업하므로 의견 차이는 데이터 접근이 아닌 추론 차이에 관한 것입니다.

판사 에이전트가 토론을 실행 가능한 평결로 종합해 모든 결정마다 인간 병목 현상이 발생하지 않도록 합니다.

이 패턴은 특히 환각 감지, 위협 모델링, 사실 검증, 설계 검토에 강력합니다.

운영 환경에서 적대적 에이전트를 실행하려면 도구 실행 보안과 견고한 로깅이 필수입니다.

---

다음 단계