MiniMax-Agent-Guide-2.md

Mini Agent 源码解析——2 工具的实现和调用

导言

在上一篇中，我们已经对 Mini Agent 的整体结构和运行入口有了一个初步认识；而从这一篇开始，我们会把视角进一步收紧，聚焦到 Agent 真正“动起来”的关键部分——工具的定义、组织与调用流程。对于一个 Agent 系统来说，模型本身决定了理解与推理的上限，而工具机制则决定了它能否真正与外部环境交互、完成实际任务。理解这一层实现，不仅有助于看懂 Mini Agent 的执行逻辑，也能帮助我们在后续扩展自定义能力时更有把握。

代码案例

我们先从 examples/01_basic_tools.py 这个示例文件入手。这个例子没有把重点放在完整的 Agent 调度流程上，而是专门把几种基础工具的调用方式拆开演示出来，包括 WriteTool、ReadTool、EditTool 和 BashTool。

从整体结构上看，这个示例把每一种工具都封装成了一个独立的异步函数，最后再在 main() 中统一串联执行：

async def main():
    """Run all demos."""
    print("=" * 60)
    print("Basic Tools Usage Examples")
    print("=" * 60)
    print("\nThese examples show how to use the core tools directly.")
    print("In a real agent scenario, the LLM decides which tools to use.\n")

    await demo_write_tool()
    await demo_read_tool()
    await demo_edit_tool()
    await demo_bash_tool()

这种组织方式很值得注意。它说明在 Mini Agent 里，工具本身是可以被独立调用的执行单元，Agent 只是站在更高一层，负责根据上下文决定“什么时候调用哪个工具”。

1. WriteTool：写入文件

先看 WriteTool 的示例：

async def demo_write_tool():
    with tempfile.TemporaryDirectory() as tmpdir:
        file_path = Path(tmpdir) / "hello.txt"

        tool = WriteTool()
        result = await tool.execute(
            path=str(file_path), content="Hello, Mini Agent!\nThis is a test file."
        )

        if result.success:
            print(f"✅ File created: {file_path}")
            print(f"Content:\n{file_path.read_text()}")
        else:
            print(f"❌ Failed: {result.error}")

这里最关键的部分有两步：先实例化 WriteTool()，再通过异步 execute() 方法传入 path 和 content。这说明工具调用接口本身是高度统一的：调用方不需要关心底层如何打开文件、如何写入内容，只需要提供完成任务所必需的参数即可。

同时，返回值也不是一个简单的布尔值或字符串，而是一个结构化结果对象。调用方通过 result.success 判断执行是否成功，再根据场景读取内容或错误信息。这样的设计使得上层逻辑非常容易扩展：无论是人工编写的示例代码，还是由模型驱动的 Agent，都可以按照同样的方式处理工具结果。

2. ReadTool：读取文件

接着看 ReadTool：

async def demo_read_tool():
    with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".txt") as f:
        f.write("Line 1: Hello\nLine 2: World\nLine 3: Mini Agent")
        temp_path = f.name

    try:
        tool = ReadTool()
        result = await tool.execute(path=temp_path)

        if result.success:
            print(f"✅ File read successfully")
            print(f"Content:\n{result.content}")
        else:
            print(f"❌ Failed: {result.error}")
    finally:
        Path(temp_path).unlink()

这个例子进一步说明，Mini Agent 的工具抽象并不是围绕某个具体资源单独设计的，而是遵循统一的调用模式：创建工具实例、传入参数、接收结果。对于 ReadTool 来说，输入只需要一个路径，而输出则被封装在 result.content 中。

这一点非常重要。因为在 Agent 场景里，模型不仅要看“读到了什么内容”，还要判断“这次读取是否成功”、“如果失败，错误是什么”。如果工具只返回原始文本，那么上层在处理异常和状态时会非常别扭。

3. EditTool：文件修改操作

相比单纯的读取和写入，EditTool 更能体现 Mini Agent 对“工具语义化”的设计思路：

async def demo_edit_tool():
    with tempfile.NamedTemporaryFile(mode="w", delete=False, suffix=".txt") as f:
        f.write("Python is great!\nI love Python programming.")
        temp_path = f.name

    try:
        print(f"Original content:\n{Path(temp_path).read_text()}\n")

        tool = EditTool()
        result = await tool.execute(
            path=temp_path, old_str="Python", new_str="Agent"
        )

        if result.success:
            print(f"✅ File edited successfully")
            print(f"New content:\n{Path(temp_path).read_text()}")
        else:
            print(f"❌ Failed: {result.error}")
    finally:
        Path(temp_path).unlink()

这里调用方并没有手动读取整个文件、处理字符串、再写回磁盘，而是直接把一次修改意图描述为：在某个文件中，把 old_str 替换为 new_str。这其实是一个非常典型的 Agent 友好型设计——模型更擅长生成“我要改什么”这样的目标描述，而不擅长稳定地产生一长串底层文件操作细节。

因此，EditTool 的价值不只是“帮你改文件”，而是在接口层面把复杂的实现细节折叠掉，把能力暴露为更贴近任务语义的动作单元。这也是为什么一个好用的 Agent 系统，往往不只是提供底层 API，而是会尽量把工具包装成更容易被模型正确调用的形式。

4. BashTool：连接外部执行环境

最后是 BashTool，它展示了 Mini Agent 如何把能力扩展到 shell 命令这一层：

async def demo_bash_tool():
    tool = BashTool()

    print("\nCommand: ls -la")
    result = await tool.execute(command="ls -la")
    if result.success:
        print(f"✅ Command executed successfully")
        print(f"Output:\n{result.content[:200]}...")

    print("\nCommand: pwd")
    result = await tool.execute(command="pwd")
    if result.success:
        print(f"✅ Current directory: {result.content.strip()}")

    print("\nCommand: echo 'Hello from BashTool!'")
    result = await tool.execute(command="echo 'Hello from BashTool!'")
    if result.success:
        print(f"✅ Output: {result.content.strip()}")

和前面几个文件类工具相比，BashTool 的差别在于：它操作的对象不再是某个固定资源，而是整个外部命令执行环境。但从调用方式上看，它依然保持了完全一致的风格——实例化工具、调用 execute()、传入结构化参数、检查返回结果。

这说明在 Mini Agent 的设计中，“工具”的统一抽象能力是非常强的。无论底层能力来自文件系统还是命令行，只要它能被包装成清晰的输入输出接口，就可以纳入同一套工具体系。对 Agent 而言，这种统一性非常关键，因为模型并不需要针对每种外部能力学习完全不同的调用方式，它只需要理解“给定目标，选择合适工具，并填入对应参数”这一通用模式即可。

工具类源码解析

前面的示例解决的是“怎么用”的问题，接下来我们再往下一层看：Mini Agent 是如何定义一个工具的？这一部分的核心代码位于 mini_agent/tools/base.py，它实际上给整个工具系统定下了统一的数据结构和抽象接口。

1. ToolResult：统一封装工具执行结果

先看结果对象的定义：

class ToolResult(BaseModel):
    """Tool execution result."""
    success: bool
    content: str = ""
    error: str | None = None

这个类看起来非常简单，但作用很关键。前面在 WriteTool、ReadTool、EditTool 和 BashTool 的调用示例中，我们已经看到所有工具最终都会返回一个结果对象，而这个对象的统一结构，正是由 ToolResult 提供的。

这里最重要的是三个字段：success 用来表示执行是否成功，content 是正常输出内容，error 用来描述失败原因。这样的设计有一个直接好处：上层不需要针对不同工具编写完全不同的结果处理逻辑。无论当前调用的是文件工具还是命令行工具，只要拿到 ToolResult，就可以先判断成功与否，再决定是继续消费内容，还是进入错误处理分支。

同时，这里使用了 pydantic.BaseModel 作为基类，这意味着结果对象天然具备结构化数据模型的特性。对于 Agent 框架来说，这种方式比返回裸字典或随意拼接的字符串更稳妥，因为它保证了字段形式的统一，也方便后续做序列化、校验和调试。

2. Tool：所有工具共享的抽象基类

接着看工具本身的定义：

class Tool:
    """Base class for all tools."""

    @property
    def name(self) -> str:
        """Tool name."""
        raise NotImplementedError

    @property
    def description(self) -> str:
        """Tool description."""
        raise NotImplementedError

    @property
    def parameters(self) -> dict[str, Any]:
        """Tool parameters schema (JSON Schema format)."""
        raise NotImplementedError

    async def execute(self, *args, **kwargs) -> ToolResult:  # type: ignore
        """Execute the tool with arbitrary arguments."""
        raise NotImplementedError

这一段定义了 Mini Agent 对“工具”最核心的约束。换句话说，只要某个类实现了这里规定的几个成员，它就可以被当作一个标准工具接入框架。

属性 name：它决定了工具在系统中的标识名称，也通常是模型在发起工具调用时会引用的名字。
属性 description：它负责告诉模型“这个工具是干什么的”。在 Agent 场景下，这个字段并不只是给开发者看的注释，而是模型选择工具的重要提示信息。
属性 parameters：要求每个工具都提供一份 JSON Schema 格式的参数定义，这意味着工具不仅要暴露“我能做什么”，还要明确告诉模型“调用我时需要传哪些参数、这些参数是什么类型、哪些是必填项”。这一步实际上是在把自然语言任务转译成结构化调用接口，是大模型工具调用能力能够稳定工作的基础。比如，EditTool 的 parameters 就是这样的：

{
  "type": "object",
  "properties": {
    "path": {
      "type": "string",
      "description": "Absolute or relative path to the file"
    },
    "old_str": {
      "type": "string",
      "description": "Exact string to find and replace (must be unique in file)"
    },
    "new_str": {
      "type": "string",
      "description": "Replacement string (use for refactoring, renaming, etc.)"
    }
  },
  "required": [
    "path",
    "old_str",
    "new_str"
  ]
}

异步方法 execute()。它规定了所有工具真正执行动作时的统一入口。无论底层是读文件、写文件、替换文本，还是执行 shell 命令，最终都要通过这个方法完成实际调用，并返回一个 ToolResult。因此，从框架视角看，工具之间的差异只体现在参数定义和内部实现上；而从调用协议看，它们是完全一致的。

3. to_schema()：转换为 Anthropic 工具格式

定义了抽象接口之后，还需要把工具暴露给模型使用。base.py 中首先提供了面向 Anthropic 格式的转换方法：

def to_schema(self) -> dict[str, Any]:
    """Convert tool to Anthropic tool schema."""
    return {
        "name": self.name,
        "description": self.description,
        "input_schema": self.parameters,
    }

这段代码非常直白：把工具对象上定义好的 name、description 和 parameters 重新组织成模型 API 所需要的 schema 结构。其中最值得注意的是 input_schema 字段，它直接对应前面 parameters 提供的 JSON Schema。

4. to_openai_schema()：兼容 OpenAI 工具协议

同一个思路在 OpenAI 风格接口上也有体现：

def to_openai_schema(self) -> dict[str, Any]:
    """Convert tool to OpenAI tool schema."""
    return {
        "type": "function",
        "function": {
            "name": self.name,
            "description": self.description,
            "parameters": self.parameters,
        },
    }

这里可以明显看出，两家模型接口在工具 schema 的外层结构上并不完全一样。Anthropic 风格直接使用 name、description 和 input_schema；而 OpenAI 风格则包裹在 type: "function" 和 function 对象内部。但无论外层格式怎么变化，真正的核心信息仍然没有变：工具名、工具描述、参数定义。

这恰恰说明 Tool 基类抽象得比较到位。它把最本质的工具信息沉淀在统一接口里，再通过不同的导出方法适配不同模型生态。对一个希望同时兼容多种 LLM Provider 的 Agent 框架来说，这是很自然、也很实用的做法。

工具的具体实现——以 BashTool 为例

如果说 mini_agent/tools/base.py 解决的是“工具应该长什么样”的问题，那么 mini_agent/tools/bash_tool.py 展示的就是“一个具体工具是如何把抽象接口落到真实能力上的”。这个文件很适合作为案例，因为它不只是简单执行一条 shell 命令，还额外处理了跨平台适配、前后台执行、输出收集和进程管理等一整套细节。

1. BashOutputResult：在通用结果之上补充命令执行信息

先看结果对象的扩展（BashOutputResult -> ToolResult）：

class BashOutputResult(ToolResult):
    """Bash command execution result with separated stdout and stderr."""

    stdout: str = Field(description="The command's standard output")
    stderr: str = Field(description="The command's standard error output")
    exit_code: int = Field(description="The command's exit code")
    bash_id: str | None = Field(default=None, description="Shell process ID (only when run_in_background=True)")

    @model_validator(mode="after")
    def format_content(self) -> "BashOutputResult":
        output = ""
        if self.stdout:
            output += self.stdout
        if self.stderr:
            output += f"\n[stderr]:\n{self.stderr}"
        if self.bash_id:
            output += f"\n[bash_id]:\n{self.bash_id}"
        if self.exit_code:
            output += f"\n[exit_code]:\n{self.exit_code}"

        if not output:
            output = "(no output)"

        self.content = output
        return self

这一段很能体现 Mini Agent 的设计思路：它没有推翻 ToolResult，而是在其基础上继续扩展。ToolResult 提供的是所有工具共享的最小公共结构，而 BashOutputResult 则补充了 shell 场景中特有的信息，比如 stdout、stderr、exit_code 以及后台任务场景下的 bash_id。

更值得注意的是 format_content() 这个 model_validator。它会在对象构造完成后，自动把 stdout、stderr、bash_id 和 exit_code 拼装成统一的 content 字段。这样一来，上层如果只想用最通用的方式读取结果，可以继续访问 content；而如果需要更细粒度的信息，也可以直接访问结构化字段。这是一种兼顾通用性和可扩展性的做法。

2. BackgroundShell 与 BackgroundShellManager：后台命令的状态管理

BashTool 之所以比普通文件工具更复杂，一个关键原因在于它需要处理“命令可能不会立刻执行完”这种情况。因此源码中单独定义了两个辅助类：BackgroundShell 和 BackgroundShellManager。

先看 BackgroundShell：

class BackgroundShell:
    def __init__(self, bash_id: str, command: str, process: "asyncio.subprocess.Process", start_time: float):
        self.bash_id = bash_id
        self.command = command
        self.process = process
        self.start_time = start_time
        self.output_lines: list[str] = []
        self.last_read_index = 0
        self.status = "running"
        self.exit_code: int | None = None

这个类本身并不负责复杂逻辑，更像是一个运行时数据容器，用来记录某个后台 shell 的基本状态：它执行了什么命令、对应哪个进程、已经输出了哪些内容、当前运行状态如何。也就是说，当 BashTool 进入后台模式后，原本“一次调用、一次返回”的简单模型就不够用了，框架必须把这个执行过程保存下来，供后续继续追踪。

而真正负责统一管理这些后台任务的，是 BackgroundShellManager：

class BackgroundShellManager:
    _shells: dict[str, BackgroundShell] = {}
    _monitor_tasks: dict[str, asyncio.Task] = {}

    @classmethod
    def add(cls, shell: BackgroundShell) -> None:
        cls._shells[shell.bash_id] = shell

    @classmethod
    def get(cls, bash_id: str) -> BackgroundShell | None:
        return cls._shells.get(bash_id)

从实现上看，它维护了一个全局字典来保存所有后台 shell，并提供 add()、get()、terminate() 等方法进行统一管理。这样的设计非常实用，因为后台任务不是一次函数调用内部能解决的，它需要跨多次交互持续存在。Agent 今天启动一个后台命令，下一轮对话里可能还要继续查看输出，甚至决定提前终止它；这时候就必须有一层持久化的运行时管理结构。

3. BashTool 的基础属性：名称、描述与参数

接下来才真正进入 BashTool 自身。先看最基础的几个属性实现：

class BashTool(Tool):
    def __init__(self, workspace_dir: str | None = None):
        self.is_windows = platform.system() == "Windows"
        self.shell_name = "PowerShell" if self.is_windows else "bash"
        self.workspace_dir = workspace_dir

    @property
    def name(self) -> str:
        return "bash"

这里可以很清楚地看到，BashTool 完整继承了前面 Tool 基类的约束：它需要给出自己的 name，并在初始化时准备好工具运行所需的上下文。这里的上下文主要是两类：一类是当前操作系统类型，用来决定后续到底调用 PowerShell 还是 bash；另一类是 workspace_dir，用来限定命令执行时的工作目录。

再看 parameters：

@property
def parameters(self) -> dict[str, Any]:
    cmd_desc = f"The {self.shell_name} command to execute. Quote file paths with spaces using double quotes."
    return {
        "type": "object",
        "properties": {
            "command": {
                "type": "string",
                "description": cmd_desc,
            },
            "timeout": {
                "type": "integer",
                "description": "Optional: Timeout in seconds (default: 120, max: 600). Only applies to foreground commands.",
                "default": 120,
            },
            "run_in_background": {
                "type": "boolean",
                "description": "Optional: Set to true to run the command in the background.",
                "default": False,
            },
        },
        "required": ["command"],
    }

这部分正好印证了前一节对 Tool.parameters 的分析：一个具体工具必须把自己的输入接口描述清楚。对 BashTool 来说，最核心的入参是 command，此外还支持 timeout 和 run_in_background 两个可选参数。也就是说，模型在调用这个工具之前，并不是只知道“它能执行命令”，而是已经提前拿到了明确的调用约束。

4. execute()：前台执行与后台执行的分叉

真正的核心逻辑在 execute() 中：

async def execute(
    self,
    command: str,
    timeout: int = 120,
    run_in_background: bool = False,
) -> ToolResult:

这个方法表面上看只是接收三个参数，但内部实际上分成了两条执行路径。

第一条是前台执行。对于普通命令，BashTool 会直接启动子进程，等待命令完成，再一次性返回输出：

process = await asyncio.create_subprocess_shell(
    shell_cmd,
    stdout=asyncio.subprocess.PIPE,
    stderr=asyncio.subprocess.PIPE,
    cwd=self.workspace_dir,
)

stdout, stderr = await asyncio.wait_for(process.communicate(), timeout=timeout)

这对应的是最传统的工具调用模式：发起一次调用，阻塞等待结果，最后返回成功或失败状态。随后源码会进一步解码 stdout 和 stderr，再结合退出码构造 BashOutputResult。

这里顺便解释一下 asyncio.subprocess.PIPE 的作用。它本质上是告诉 Python：不要让子进程把输出直接打印到终端，而是把这条输出流接到一个可被当前协程读取的管道上。正因为设置了 stdout=asyncio.subprocess.PIPE 和 stderr=asyncio.subprocess.PIPE，后面的 process.communicate() 才能真正拿到命令执行后的标准输出和标准错误。

如果不使用 PIPE，那么很多命令输出就会直接流向当前终端，BashTool 自己反而拿不到结果，也就无法把这些内容封装进 BashOutputResult 返回给上层 Agent。换句话说，PIPE 在这里承担的是“接管子进程输出”的角色，它让 shell 命令的执行结果从终端行为变成了可编程、可结构化处理的数据。

在后台执行分支里，stdout=asyncio.subprocess.PIPE 同样非常重要。因为后台任务不是立即结束的，框架后续还要持续读取输出并保存到 BackgroundShell 中，供 bash_output 工具增量查看；如果没有 PIPE，这条持续输出链路就建立不起来了。

第二条是后台执行。当 run_in_background=True 时，逻辑就完全不同了：

bash_id = str(uuid.uuid4())[:8]

process = await asyncio.create_subprocess_shell(
    shell_cmd,
    stdout=asyncio.subprocess.PIPE,
    stderr=asyncio.subprocess.STDOUT,
    cwd=self.workspace_dir,
)

bg_shell = BackgroundShell(bash_id=bash_id, command=command, process=process, start_time=time.time())
BackgroundShellManager.add(bg_shell)
await BackgroundShellManager.start_monitor(bash_id)

这里并不会等待命令执行完，而是先生成一个唯一的 bash_id，再把新进程注册到后台管理器中，随后启动监控任务并立即返回。这意味着对 Agent 来说，这次工具调用的结果不再是“命令输出本身”，而是“一个可继续追踪的后台任务句柄”。

这个设计非常关键，因为很多真实场景里的 shell 命令都不是瞬间完成的，比如启动开发服务器、跑长时间测试、执行构建任务等。若仍然强行使用同步等待模式，工具系统会变得非常僵硬；而通过 bash_id 引入二阶段交互后，Agent 就能先启动任务，再在后续步骤里按需轮询输出、判断状态、甚至主动终止任务。

5. BashOutputTool 与 BashKillTool：把后台任务变成可操作对象

一旦引入后台执行，光有 BashTool 本身还不够，所以源码里又补了两个相关工具：BashOutputTool 和 BashKillTool。

BashOutputTool 的作用是根据 bash_id 获取后台任务的新输出：

class BashOutputTool(Tool):
    @property
    def name(self) -> str:
        return "bash_output"

    async def execute(
        self,
        bash_id: str,
        filter_str: str | None = None,
    ) -> BashOutputResult:
        bg_shell = BackgroundShellManager.get(bash_id)
        ...

而 BashKillTool 则负责终止某个后台任务：

class BashKillTool(Tool):
    @property
    def name(self) -> str:
        return "bash_kill"

    async def execute(self, bash_id: str) -> BashOutputResult:
        bg_shell = await BackgroundShellManager.terminate(bash_id)
        ...

这两个工具非常有意思，因为它们说明 Mini Agent 并不是把“后台命令”当成某种特殊 case 硬塞进一个接口里，而是把它拆成了更清晰的工具协作模式：

bash：负责启动命令
bash_output：负责读取增量输出
bash_kill：负责终止后台任务

这种拆分方式其实很符合 Agent 系统的设计哲学。因为从模型视角看，复杂能力最好不要揉成一个巨大的黑盒，而应该拆成几个职责清晰、可组合的动作单元。这样模型在推理时更容易决定下一步该调用哪个工具，也更容易形成稳定的多步操作链。

总结

这一篇我们围绕 Mini Agent 的工具系统，分别从“如何使用”和“如何实现”两个层面做了拆解。

首先，通过 examples/01_basic_tools.py，我们看到了 WriteTool、ReadTool、EditTool 和 BashTool 的基本调用方式：它们都遵循统一的使用模式——实例化工具对象、传入结构化参数、调用异步 execute() 方法，并接收统一封装的结果对象。也正因为这种一致性，Agent 才能在更高一层稳定地完成工具选择与多步编排。

接着，在 mini_agent/tools/base.py 中，我们进一步看到了这套统一性的来源。ToolResult 负责约束结果结构，Tool 基类则统一定义了工具名称、工具描述、参数模式和执行入口。与此同时，to_schema() 与 to_openai_schema() 也展示了 Mini Agent 如何把内部工具抽象转换成不同模型 API 所需的工具描述格式。虽然它当前还不是完整的 MCP 实现，但从理念上看，已经和 MCP 所强调的“以标准接口向模型暴露外部能力”非常接近。

最后，以 mini_agent/tools/bash_tool.py 为例，我们又看到了一个具体工具是如何把抽象接口真正落地的。BashTool 不只是执行命令这么简单，它还处理了跨平台 shell 差异、命令输出捕获、后台任务跟踪，以及 bash_output、bash_kill 这样的配套工具协作关系。尤其是 asyncio.subprocess.PIPE 这类底层细节，也让我们更直观地理解了“工具调用”并不只是一个概念层的封装，而是实实在在建立在进程管理、流读取和状态维护之上的工程实现。

如果把整篇内容串起来看，Mini Agent 的工具系统其实已经非常清晰：它先通过统一抽象定义“什么是工具”，再通过结构化 schema 告诉模型“工具该怎么调用”，最后由具体工具类去承担真实能力的执行与结果封装。Agent 看起来像是在“智能地调用工具”，但这份智能背后，依赖的正是这一层设计良好的工具协议。

也正因为如此，工具系统可以说是 Agent 框架里最关键的基础设施之一。模型负责理解任务，工具负责连接真实世界，而框架本身则负责把两者组织起来。在后续章节里，我们还可以继续追问一个更自然的问题：当工具已经具备统一描述和执行能力之后，Mini Agent 又是如何把这些工具注册进运行时、暴露给模型、并在一次对话中完成多轮调用和结果回填的？这也正是下一步理解 Agent 工作机制的关键。

当然，这些问题就留给之后的文章来讨论了~

Mini Agent 源码解析系列

#LLM #Agent

MiniMax-Agent-Guide-2.md

https://onlyar.site/2026/04/02/MiniMax-Agent-Guide-2/

作者

Only(AR)

发布于

2026年4月2日

许可协议

Mini Agent 源码解析——1 导言下一篇