Mini Agent 源码解析——5 LLM 引擎

Mini Agent 源码解析——5 LLM 引擎

导言

前面几篇已经把 Mini Agent 的主要原理介绍得比较完整了。这一篇换一个角度,来看看 Mini Agent 是如何对接不同的 LLM 提供商的——具体来说,就是如何在使用 OpenAI 和 Anthropic 两种调用格式之间做到灵活切换,并在这个基础上扩展出 LLMEngine 的抽象设计。

代码案例

examples/05_provider_selection.py 演示了如何通过 LLMClientprovider 参数,在不同的 LLM 提供商之间切换。整个示例的核心在于 LLMProvider 枚举的使用方式,以及切换后客户端的 api_base 等属性如何随之变化。

1. 通过 provider 参数指定调用目标

示例中分别用 LLMProvider.ANTHROPICLLMProvider.OPENAI 初始化了两个客户端:

1
2
3
4
5
6
7
8
9
10
11
12
13
from mini_agent import LLMClient, LLMProvider, Message

anthropic_client = LLMClient(
api_key=config["api_key"],
provider=LLMProvider.ANTHROPIC,
model=config.get("model", "MiniMax-M2.5"),
)

openai_client = LLMClient(
api_key=config["api_key"],
provider=LLMProvider.OPENAI,
model=config.get("model", "MiniMax-M2.5"),
)

两者的初始化方式完全相同,区别只在于 provider 参数传入的值不一样。这说明 LLMClient 对上层屏蔽了不同提供商之间的差异:调用方不需要关心底层用的是哪家 API,传入的 provider 决定了请求发往哪个端点、如何组装请求体、如何解析响应。

2. 不指定 provider 时的默认行为

示例中还演示了不传 provider 参数时的默认行为:

1
2
3
4
5
6
client = LLMClient(
api_key=config["api_key"],
model=config.get("model", "MiniMax-M2.5"),
)

print(f"Provider (default): {client.provider}")

默认 provider 是 LLMProvider.ANTHROPIC。这意味着在没有显式指定的情况下,客户端会自动使用 Anthropic 的调用格式。这种设计对大多数场景是友好的——只需要配置 API key 和模型名,就能直接使用,不需要每次都显式声明 provider。

3. 同一套接口支持多 provider 对比

示例里还有一个直接对比两种 provider 输出差异的演示:

1
2
3
4
5
6
7
messages = [Message(role="user", content="What is 2+2?")]

anthropic_response = await anthropic_client.generate(messages)
print(f"Anthropic: {anthropic_response.content}")

openai_response = await openai_client.generate(messages)
print(f"OpenAI: {openai_response.content}")

同一个 messages 对象直接传给两个不同的客户端,都能得到正确结果。这说明虽然底层调用格式不同,但 LLMClient.generate() 的接口是完全统一的。框架内部已经处理好了请求格式转换和响应解析的细节,上层代码不需要为不同的 provider 写分支逻辑。

执行这段代码,会发现两个供应商的输出存在差异:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
============================================================
DEMO: LLMClient with Anthropic Provider
============================================================
Provider: LLMProvider.ANTHROPIC
API Base: https://api.minimaxi.com/anthropic

👤 User: Say 'Hello from Anthropic!'
💭 Thinking: The user wants me to say "Hello from Anthropic!". This is a simple request to greet them on behalf of Anthropic.
💬 Model: Hello from Anthropic! 👋
✅ Anthropic provider demo completed

============================================================
DEMO: LLMClient with OpenAI Provider
============================================================
Provider: LLMProvider.OPENAI
API Base: https://api.minimaxi.com/v1

👤 User: Say 'Hello from OpenAI!'
💬 Model: Hello from OpenAI!
✅ OpenAI provider demo completed

也就是 OpenAI 的输出并不包含“Thinking”这一步的内容,而 Anthropic 的输出则包含了模型的思考过程。这是因为两家供应商在接口设计上的差异导致的,但对于使用者来说,这些差异已经被 LLMClient 屏蔽掉了。

源码解析

mini_agent/llm/ 目录下的文件构成了 Mini Agent 的 LLM 调用层,结构非常清晰:

1
2
3
4
5
llm/
├── base.py # LLMClientBase 抽象基类
├── llm_wrapper.py # LLMClient 统一入口(router)
├── anthropic_client.py # Anthropic 协议实现
└── openai_client.py # OpenAI 协议实现

整体设计思路是:抽象出统一的基类接口,再用 LLMClient 作为调度层,根据 provider 参数路由到具体的协议实现类。

1. 抽象基类:LLMClientBase

base.py 定义了所有 LLM 客户端必须实现的接口:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class LLMClientBase(ABC):
@abstractmethod
async def generate(
self,
messages: list[Message],
tools: list[Any] | None = None,
) -> LLMResponse:
pass

@abstractmethod
def _prepare_request(
self,
messages: list[Message],
tools: list[Any] | None = None,
) -> dict[str, Any]:
pass

@abstractmethod
def _convert_messages(self, messages: list[Message]) -> tuple[str | None, list[dict[str, Any]]]:
pass

三个抽象方法分别对应一次 LLM 调用的三个必经阶段:消息格式转换 → 请求参数准备 → 执行调用并解析响应。子类只需要实现这三个方法,就能接入框架。

2. LLMClient:统一入口与路由层

llm_wrapper.py 里的 LLMClient 是对外暴露的统一接口,它的作用是根据 provider 参数实例化对应的底层客户端:

1
2
3
4
5
6
7
8
9
10
11
12
13
class LLMClient:
def __init__(
self,
api_key: str,
provider: LLMProvider = LLMProvider.ANTHROPIC,
api_base: str = "https://api.minimaxi.com",
model: str = "MiniMax-M2.5",
retry_config: RetryConfig | None = None,
):
if provider == LLMProvider.ANTHROPIC:
self._client = AnthropicClient(api_key=api_key, api_base=full_api_base, model=model, ...)
elif provider == LLMProvider.OPENAI:
self._client = OpenAIClient(api_key=api_key, api_base=full_api_base, model=model, ...)

初始化完成后,generate() 方法直接委托给内部客户端执行:

1
2
async def generate(self, messages: list[Message], tools: list | None = None) -> LLMResponse:
return await self._client.generate(messages, tools)

这里还有个值得注意的细节:当 api_base 包含 MiniMax 的域名时,LLMClient 会自动补全路径后缀——Anthropic 补 /anthropic,OpenAI 补 /v1。对于第三方 API,则直接使用传入的 api_base 不做修改。这种自动适配减少了配置上的麻烦。

3. AnthropicClient:Anthropic 协议实现

AnthropicClient 遵循 Anthropic 的 API 格式,主要差异集中在消息转换和响应解析两个环节。

消息转换方面,Anthropic 的特点是把 system 消息单独提取出来作为 system 参数,而不是放在 messages 数组里;同时支持 thinkingtool_use 作为 content blocks:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
def _convert_messages(self, messages: list[Message]) -> tuple[str | None, list[dict[str, Any]]]:
"""Convert internal messages to Anthropic format.

Args:
messages: List of internal Message objects

Returns:
Tuple of (system_message, api_messages)
"""
system_message = None
api_messages = []

for msg in messages:
if msg.role == "system": # 单独处理 system message
system_message = msg.content
continue

if msg.role in ["user", "assistant"]:
if msg.role == "assistant" and (msg.thinking or msg.tool_calls):
content_blocks = []

if msg.thinking:
content_blocks.append({"type": "thinking", "thinking": msg.thinking})

if msg.content:
content_blocks.append({"type": "text", "text": msg.content})

if msg.tool_calls:
for tool_call in msg.tool_calls:
content_blocks.append(
{
"type": "tool_use",
"id": tool_call.id,
"name": tool_call.function.name,
"input": tool_call.function.arguments,
}
)

api_messages.append({"role": "assistant", "content": content_blocks})
else:
api_messages.append({"role": msg.role, "content": msg.content})

elif msg.role == "tool": # 工具执行结果
api_messages.append(
{
"role": "user", # 输入 Anthropic 要转成 user
"content": [
{
"type": "tool_result",
"tool_use_id": msg.tool_call_id,
"content": msg.content,
}
],
}
)

return system_message, api_messages

一个简单的输入消息被转换完之后格式如下,仅供参考:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
"system_message": "You are a useful assistant.",
"api_messages": [
{
"role": "user",
"content": "Say 'Hello from Anthropic!'"
}
],
"tools": [
{
"name": "read_file",
"description": "Read file contents from the filesystem...",
"input_schema": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Absolute or relative path to the file"
}, ...
},
"required": [
"path"
]
}
}
]
}

响应解析方面,Anthropic 返回的 content 是一个 blocks 数组,需要逐个判断类型:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
def _parse_response(self, response: anthropic.types.Message) -> LLMResponse:
"""Parse Anthropic response into LLMResponse.

Args:
response: Anthropic Message response

Returns:
LLMResponse object
"""
# Extract text content, thinking, and tool calls
text_content = ""
thinking_content = ""
tool_calls = []

for block in response.content:
if block.type == "text":
text_content += block.text
elif block.type == "thinking":
thinking_content += block.thinking
elif block.type == "tool_use":
# Parse Anthropic tool_use block
tool_calls.append(
ToolCall(
id=block.id,
type="function",
function=FunctionCall(
name=block.name,
arguments=block.input,
),
)
)

# 处理 token 计数信息
usage = None
if hasattr(response, "usage") and response.usage:
input_tokens = response.usage.input_tokens or 0
output_tokens = response.usage.output_tokens or 0
cache_read_tokens = getattr(response.usage, "cache_read_input_tokens", 0) or 0
cache_creation_tokens = getattr(response.usage, "cache_creation_input_tokens", 0) or 0
total_input_tokens = input_tokens + cache_read_tokens + cache_creation_tokens
usage = TokenUsage(
prompt_tokens=total_input_tokens,
completion_tokens=output_tokens,
total_tokens=total_input_tokens + output_tokens,
)

return LLMResponse(
content=text_content,
thinking=thinking_content if thinking_content else None,
tool_calls=tool_calls if tool_calls else None,
finish_reason=response.stop_reason or "stop",
usage=usage,
)

以下是一个 Anthropic 返回格式的例子:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
{
"id": "...",
"content": [
{
"signature": "...",
"thinking": "The user wants me to say hello. This is a simple request that doesn't require any tools.",
"type": "thinking"
},
{
"citations": null,
"text": "Hello from Anthropic!",
"type": "text"
}
],
"model": "MiniMax-M2.7",
"role": "assistant",
"stop_reason": "end_turn",
"stop_sequence": null,
"type": "message",
"usage": {
"cache_creation": null,
"cache_creation_input_tokens": null,
"cache_read_input_tokens": null,
"input_tokens": 296,
"output_tokens": 28,
"server_tool_use": null,
"service_tier": null
},
"base_resp": {
"status_code": 0,
"status_msg": ""
}
}

4. OpenAIClient:OpenAI 协议实现

OpenAIClient 的实现思路相同,但具体格式有所不同。

消息转换方面,OpenAI 把 system 消息放在 messages 数组里,不需要单独提取;另外它通过 reasoning_details 字段传递思考过程:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
def _convert_messages(self, messages: list[Message]) -> tuple[str | None, list[dict[str, Any]]]:
"""Convert internal messages to OpenAI format.

Args:
messages: List of internal Message objects

Returns:
Tuple of (system_message, api_messages)
Note: OpenAI includes system message in the messages array
"""
api_messages = []

for msg in messages:
if msg.role == "system":
# OpenAI includes system message in messages array
api_messages.append({"role": "system", "content": msg.content})
continue

# For user messages
if msg.role == "user":
api_messages.append({"role": "user", "content": msg.content})

# For assistant messages
elif msg.role == "assistant":
assistant_msg = {"role": "assistant"}

# Add content if present
if msg.content:
assistant_msg["content"] = msg.content

# Add tool calls if present
if msg.tool_calls:
tool_calls_list = []
for tool_call in msg.tool_calls:
tool_calls_list.append(
{
"id": tool_call.id,
"type": "function",
"function": {
"name": tool_call.function.name,
"arguments": json.dumps(tool_call.function.arguments),
},
}
)
assistant_msg["tool_calls"] = tool_calls_list

# IMPORTANT: 在这里提供历史的思考过程,确保思维链不会中断
if msg.thinking:
assistant_msg["reasoning_details"] = [{"text": msg.thinking}]

api_messages.append(assistant_msg)

# For tool result messages
elif msg.role == "tool":
api_messages.append(
{
"role": "tool",
"tool_call_id": msg.tool_call_id,
"content": msg.content,
}
)

return None, api_messages

一个 OpenAI 格式的请求示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
{
"api_messages": [
{
"role": "system",
"content": "You are a useful assistant."
},
{
"role": "user",
"content": "Say 'Hello from OpenAI!'"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read file contents from the filesystem...",
"parameters": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Absolute or relative path to the file"
}, ...
},
"required": [
"path"
]
}
}
}
]
}

响应解析方面,reasoning_detailsmessage 对象上的一个字段,需要单独提取:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
def _parse_response(self, response: Any) -> LLMResponse:
"""Parse OpenAI response into LLMResponse.

Args:
response: OpenAI ChatCompletion response (full response object)

Returns:
LLMResponse object
"""
message = response.choices[0].message

text_content = message.content or ""

thinking_content = ""
if hasattr(message, "reasoning_details") and message.reasoning_details:
for detail in message.reasoning_details:
if hasattr(detail, "text"):
thinking_content += detail.text

tool_calls = []
if message.tool_calls:
for tool_call in message.tool_calls:
arguments = json.loads(tool_call.function.arguments)

tool_calls.append(
ToolCall(
id=tool_call.id,
type="function",
function=FunctionCall(
name=tool_call.function.name,
arguments=arguments,
),
)
)

usage = None
if hasattr(response, "usage") and response.usage:
usage = TokenUsage(
prompt_tokens=response.usage.prompt_tokens or 0,
completion_tokens=response.usage.completion_tokens or 0,
total_tokens=response.usage.total_tokens or 0,
)

return LLMResponse(
content=text_content,
thinking=thinking_content if thinking_content else None,
tool_calls=tool_calls if tool_calls else None,
finish_reason="stop",
usage=usage,
)

一个 OpenAI 格式的回复示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
{
"id": "...",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": "Hello from OpenAI!",
"refusal": null,
"role": "assistant",
"annotations": null,
"audio": null,
"function_call": null,
"tool_calls": null,
"reasoning_content": "The user is asking me to say \"Hello from OpenAI!\". This is a simple request that doesn't require any tool usage. I'll just respond with the greeting.\n",
"reasoning_details": [
{
"type": "reasoning.text",
"id": "reasoning-text-1",
"format": "MiniMax-response-v1",
"index": 0,
"text": "The user is asking me to say \"Hello from OpenAI!\". This is a simple request that doesn't require any tool usage. I'll just respond with the greeting.\n"
}
]
}
}
],
"created": 1776599306,
"model": "MiniMax-M2.7",
"object": "chat.completion",
"service_tier": null,
"system_fingerprint": null,
"usage": {
"completion_tokens": 40,
"prompt_tokens": 302,
"total_tokens": 342,
"completion_tokens_details": null,
"prompt_tokens_details": {
"audio_tokens": null,
"cached_tokens": 0
},
"total_characters": 0
},
"input_sensitive": false,
"output_sensitive": false,
"input_sensitive_type": 0,
"output_sensitive_type": 0,
"output_sensitive_int": 0,
"base_resp": {
"status_code": 0,
"status_msg": "success"
}
}

5. 两种协议的核心差异

把两个实现对比来看,最核心的差异有三处:

Anthropic OpenAI
system 处理 单独提取作为 system 参数 放在 messages 数组里
thinking 处理 作为 content 里的 thinking block 作为 reasoning_details 字段
工具结果格式 tool_result 放入 user 消息的 content blocks tool result 放入 tool 角色消息

框架在两个客户端之上再包了一层 LLMClient,让调用方完全不需要关心这些差异。切换 provider 只需要改一个参数,底层的消息转换和响应解析全部自动适配。

总结

这一篇围绕 Mini Agent 的 LLM 引擎层展开,从代码示例到源码实现完整梳理了一遍。

在示例层面,examples/05_provider_selection.py 展示了 LLMClient 如何通过 provider 参数支持不同的 LLM 提供商。同一个 generate() 接口可以无缝切换 Anthropic 和 OpenAI 两种调用方式,差异被屏蔽在 LLMClient 内部。

在源码层面,mini_agent/llm/ 下的四个文件各司其职:LLMClientBase 定义了统一的抽象接口,LLMClient 负责根据 provider 路由到具体实现,AnthropicClientOpenAIClient 分别处理各自的协议细节——消息格式转换、请求体组装、响应解析。三处主要差异(system 处理、thinking 处理、工具结果格式)都在具体客户端里各自消化掉了。

这种设计的核心好处是对扩展友好:新增一个 LLM provider 只需要继承 LLMClientBase,实现三个核心方法,再在 LLMClient 的初始化分支里加一个条件分支即可,不需要改动任何上层代码。到这里,Mini Agent 的五大核心模块——工具系统、记忆系统、工作流程、LLM 引擎——就已经全部覆盖了。


Mini Agent 源码解析——5 LLM 引擎
https://onlyar.site/2026/04/19/MiniMax-Agent-Guide-5/
作者
Only(AR)
发布于
2026年4月19日
许可协议