技术博客

Agent 开发入门（四）：让模型说人话 — Structured Output

模型默认输出自然语言，但 Agent 需要稳定的 JSON。Structured Output 就是解决这个问题的。

发布时间

2026年4月7日

阅读信息

约 19 分钟

主题标签

AI Agent / Structured Output / JSON Mode

前三篇我们把 API 调用、Function Calling、上下文管理都走了一遍。到这里，你已经能让模型”干活”了——调工具、记对话、多轮交互。

但有一个问题一直没正面解决：模型的输出是自然语言，程序怎么用？

问题在哪

假设你让模型分析一个用户请求，判断意图、提取参数、决定下一步动作。模型可能返回这样的内容：

好的，我分析了用户的请求。用户想查询北京明天的天气，意图是天气查询，需要调用天气 API，参数是城市=北京，日期=明天。

这段话人读起来没问题，但程序怎么从里面提取”城市=北京”？正则？字符串匹配？一旦模型换个说法，程序就炸了。

Agent 的每一步决策都依赖解析模型输出。如果输出格式不稳定，整个 Agent 就是沙子上建的楼。

Structured Output 就是解决这个问题的。 让模型直接输出 JSON，程序拿到就能用。

方案一：Prompt 里要求输出 JSON

最直接的做法：在 system prompt 或 user prompt 里加一句”请以 JSON 格式输出”。

String systemPrompt = """
    你是一个任务分析助手。
    请分析用户的请求，以 JSON 格式返回结果，包含以下字段：
    - intent: 用户意图（字符串）
    - action: 需要执行的动作（字符串）
    - params: 参数（对象）
    - confidence: 置信度（0-1 的浮点数）
    """;

优点：零成本，任何模型都能用。

缺点：不可靠。模型可能返回：

好的，以下是 JSON 格式的分析结果：

```json
{
  "intent": "天气查询",
  ...
}

希望这个结果对你有帮助！


多了 markdown 代码块、多了前后文字。你得写额外的解析逻辑来剥离这些噪音，而且不能保证每次格式一致。

生产环境不建议单独依赖这个方案。

---

## 方案二：JSON Mode（response_format: json_object）

百炼平台兼容 OpenAI 格式，支持 `response_format` 参数。把类型设为 `json_object`，模型会被强制要求输出合法 JSON，不会有多余的文字和 markdown 包裹。

请求体加一个字段：

```json
{
  "model": "qwen-plus",
  "response_format": {"type": "json_object"},
  "messages": [...]
}

注意：开启 JSON Mode 时，prompt 里必须也提到”输出 JSON”。否则部分模型会报错，提示你没有在 prompt 里说明要输出 JSON。这是 OpenAI 规范的要求，百炼也遵循这个行为。

完整 Java 示例：

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.nio.charset.StandardCharsets;

public class JsonModeDemo {

    static final String API_KEY = System.getenv("DASHSCOPE_API_KEY");
    static final String BASE_URL = "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions";

    public static void main(String[] args) throws Exception {
        String userInput = "帮我查一下北京明天的天气";

        // system prompt 里必须提到"输出 JSON"
        String systemPrompt = """
                你是一个任务分析助手。
                分析用户请求，以 JSON 格式输出，包含字段：
                intent（用户意图），action（执行动作），city（城市，无则为 null），date（日期，无则为 null）。
                """;

        String requestBody = """
                {
                  "model": "qwen-plus",
                  "response_format": {"type": "json_object"},
                  "messages": [
                    {"role": "system", "content": "%s"},
                    {"role": "user", "content": "%s"}
                  ]
                }
                """.formatted(
                systemPrompt.replace("\"", "\\\"").replace("\n", "\\n"),
                userInput
        );

        HttpClient client = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create(BASE_URL))
                .header("Authorization", "Bearer " + API_KEY)
                .header("Content-Type", "application/json")
                .POST(HttpRequest.BodyPublishers.ofString(requestBody, StandardCharsets.UTF_8))
                .build();

        HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
        String responseBody = response.body();

        // 从响应里提取 content 字段
        String content = extractContent(responseBody);
        System.out.println("模型输出（JSON）：");
        System.out.println(content);

        // 简单解析：提取 intent 字段
        String intent = extractJsonField(content, "intent");
        System.out.println("解析出的 intent：" + intent);
    }

    // 从完整响应体里提取 choices[0].message.content
    static String extractContent(String responseBody) {
        String marker = "\"content\":\"";
        int start = responseBody.indexOf(marker);
        if (start == -1) return "";
        start += marker.length();
        int end = responseBody.indexOf("\"", start);
        // 处理转义字符
        StringBuilder sb = new StringBuilder();
        for (int i = start; i < responseBody.length(); i++) {
            char c = responseBody.charAt(i);
            if (c == '\\' && i + 1 < responseBody.length()) {
                char next = responseBody.charAt(i + 1);
                if (next == '"') { sb.append('"'); i++; continue; }
                if (next == 'n') { sb.append('\n'); i++; continue; }
                if (next == '\\') { sb.append('\\'); i++; continue; }
            }
            if (c == '"') break;
            sb.append(c);
        }
        return sb.toString();
    }

    // 从 JSON 字符串里提取指定字段的字符串值（仅适用于简单场景）
    static String extractJsonField(String json, String field) {
        String key = "\"" + field + "\":\"";
        int start = json.indexOf(key);
        if (start == -1) return null;
        start += key.length();
        int end = json.indexOf("\"", start);
        return end == -1 ? null : json.substring(start, end);
    }
}

运行后，content 里拿到的就是干净的 JSON，不会有多余文字：

{
  "intent": "天气查询",
  "action": "query_weather",
  "city": "北京",
  "date": "明天"
}

方案三：JSON Schema 约束

JSON Mode 保证输出是合法 JSON，但不约束结构。模型可能返回你不期望的字段，或者把字段类型搞错。

json_schema 模式更进一步：你定义一个 JSON Schema，模型必须严格按照这个 schema 输出。

{
  "model": "qwen-plus",
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "task_analysis",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "intent": {"type": "string", "description": "用户意图"},
          "action": {"type": "string", "description": "需要执行的动作"},
          "city": {"type": ["string", "null"], "description": "城市，无则为 null"},
          "date": {"type": ["string", "null"], "description": "日期，无则为 null"},
          "confidence": {"type": "number", "description": "置信度 0-1"}
        },
        "required": ["intent", "action", "city", "date", "confidence"],
        "additionalProperties": false
      }
    }
  },
  "messages": [...]
}

几个要点：

name 是 schema 的名字，随便起，但要有意义
strict: true 开启严格模式，模型不能输出 schema 之外的字段
required 列出所有必须出现的字段
additionalProperties: false 禁止额外字段
可空字段用 ["string", "null"] 表示类型

嵌套结构和数组的写法：

{
  "type": "object",
  "properties": {
    "steps": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "step_id": {"type": "integer"},
          "description": {"type": "string"},
          "tool": {"type": ["string", "null"]}
        },
        "required": ["step_id", "description", "tool"],
        "additionalProperties": false
      }
    }
  }
}

数组用 "type": "array"，items 定义每个元素的结构。嵌套对象和顶层对象写法一样，递归定义就行。

Java 实战：带 Schema 的任务分析

把 schema 内联到请求体里，完整示例：

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.nio.charset.StandardCharsets;

public class JsonSchemaDemo {

    static final String API_KEY = System.getenv("DASHSCOPE_API_KEY");
    static final String BASE_URL = "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions";

    public static void main(String[] args) throws Exception {
        String userInput = "帮我把这份报告翻译成英文，然后发给 team@example.com";

        String systemPrompt = "你是一个任务分析助手。分析用户请求，以 JSON 格式输出任务分解结果。";

        // JSON Schema 定义
        String schema = """
                {
                  "type": "object",
                  "properties": {
                    "intent": {"type": "string"},
                    "steps": {
                      "type": "array",
                      "items": {
                        "type": "object",
                        "properties": {
                          "step_id": {"type": "integer"},
                          "description": {"type": "string"},
                          "tool": {"type": ["string", "null"]}
                        },
                        "required": ["step_id", "description", "tool"],
                        "additionalProperties": false
                      }
                    },
                    "confidence": {"type": "number"}
                  },
                  "required": ["intent", "steps", "confidence"],
                  "additionalProperties": false
                }
                """;

        String requestBody = """
                {
                  "model": "qwen-plus",
                  "response_format": {
                    "type": "json_schema",
                    "json_schema": {
                      "name": "task_analysis",
                      "strict": true,
                      "schema": %s
                    }
                  },
                  "messages": [
                    {"role": "system", "content": "%s"},
                    {"role": "user", "content": "%s"}
                  ]
                }
                """.formatted(schema, systemPrompt, userInput);

        HttpClient client = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create(BASE_URL))
                .header("Authorization", "Bearer " + API_KEY)
                .header("Content-Type", "application/json")
                .POST(HttpRequest.BodyPublishers.ofString(requestBody, StandardCharsets.UTF_8))
                .build();

        HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
        String content = extractContent(response.body());

        System.out.println("结构化输出：");
        System.out.println(content);
    }

    static String extractContent(String responseBody) {
        // 找到 content 字段后的 JSON 内容
        // 注意：json_schema 模式下 content 本身就是 JSON 字符串，需要处理转义
        String marker = "\"content\":\"";
        int start = responseBody.indexOf(marker);
        if (start == -1) return "";
        start += marker.length();
        StringBuilder sb = new StringBuilder();
        for (int i = start; i < responseBody.length(); i++) {
            char c = responseBody.charAt(i);
            if (c == '\\' && i + 1 < responseBody.length()) {
                char next = responseBody.charAt(i + 1);
                if (next == '"') { sb.append('"'); i++; continue; }
                if (next == 'n') { sb.append('\n'); i++; continue; }
                if (next == 't') { sb.append('\t'); i++; continue; }
                if (next == '\\') { sb.append('\\'); i++; continue; }
            }
            if (c == '"') break;
            sb.append(c);
        }
        return sb.toString();
    }
}

输出大概是这样：

{
  "intent": "翻译并发送报告",
  "steps": [
    {"step_id": 1, "description": "将报告翻译成英文", "tool": "translate"},
    {"step_id": 2, "description": "发送邮件到 team@example.com", "tool": "send_email"}
  ],
  "confidence": 0.95
}

关于 JSON 解析

上面的代码用了手写的字符串提取，能跑，但不适合生产。

为什么示例里不用 Jackson/Gson？ 这个系列的目标是零依赖，让你看清楚底层在做什么。实际项目里，强烈建议用 Jackson：

// 生产环境推荐写法（需要引入 jackson-databind）
ObjectMapper mapper = new ObjectMapper();
JsonNode root = mapper.readTree(content);
String intent = root.get("intent").asText();
JsonNode steps = root.get("steps");

Jackson 会帮你处理所有边界情况：转义字符、Unicode、数字精度、null 值等。手写解析在复杂 JSON 面前很容易出 bug。

常见坑

坑一：JSON Mode 下 prompt 没提 JSON

开启 response_format: {"type": "json_object"} 后，如果 system prompt 或 user prompt 里没有出现”JSON”这个词，部分模型会直接报错。养成习惯：开 JSON Mode 的同时，prompt 里也说”以 JSON 格式输出”。

坑二：schema 里的可空字段

如果一个字段可能为 null，类型要写成数组形式 ["string", "null"]，不能只写 "string"。否则模型在字段值为空时可能输出空字符串或者直接省略字段，导致解析出错。

坑三：模型偶尔输出不合法 JSON

即使开了 JSON Mode，极少数情况下模型还是可能输出有问题的 JSON（比如截断、多余字符）。生产环境要加兜底：

static String safeParseIntent(String json) {
    try {
        // 尝试解析，失败则返回默认值
        return extractJsonField(json, "intent");
    } catch (Exception e) {
        // 记录日志，返回兜底值
        System.err.println("JSON 解析失败，原始内容：" + json);
        return "unknown";
    }
}

更健壮的做法是：解析失败时，把原始输出重新发给模型，让它修正格式。这就是 Agent 里常见的”自我修复”模式，后面的篇章会讲到。

坑四：strict 模式下 schema 必须完整

开启 strict: true 后，schema 里的每个对象都必须声明 additionalProperties: false，每个字段都要在 required 里列出。漏掉任何一个，请求会报错。

和 Agent 的关系

回到最开始的问题：Agent 为什么需要 Structured Output？

Agent 的运行逻辑大概是这样的：

用户输入 → 模型分析 → 解析输出 → 决定下一步 → 调用工具 → 把结果喂回模型 → 循环

每一个”解析输出”的环节，都依赖模型输出是可程序化处理的结构。如果这里不稳定，整个循环就断了。

Structured Output 不是锦上添花，是 Agent 稳定运行的基础设施。前三篇讲的 API 调用、Function Calling、上下文管理，加上这篇的 Structured Output，四块拼图凑齐了。

下一篇，我们把这四块拼起来，手写一个最小可用的 Agent。

这是「Agent 开发入门」系列的第四篇。关注粒方Lab，跟我一起从零搭建 AI Agent。

AI AgentStructured OutputJSON Mode入门Java百炼