结构化输出
结构化输出是一项功能,可确保模型始终生成符合您提供的 JSON Schema 的响应。这对于需要可预测、结构化数据格式的应用程序特别有用。
概述
使用结构化输出,您可以:
- 保证格式一致性:模型输出始终符合您的 JSON Schema
- 减少解析错误:消除因格式不一致导致的解析问题
- 提高可靠性:在生产环境中获得更可预测的结果
- 简化集成:直接将模型输出用于您的应用程序逻辑
支持的模型
结构化输出功能支持以下模型:
gpt-4o-2024-08-06及更新版本gpt-4o-mini-2024-07-18及更新版本
基本用法
定义 JSON Schema
首先,定义您希望模型输出遵循的 JSON Schema:
from openai import OpenAI
import json
client = OpenAI()
# 定义 JSON Schema
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"email": {"type": "string", "format": "email"},
"skills": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["name", "age", "email"],
"additionalProperties": False
}
response = client.chat.completions.create(
model="gpt-4o-2024-08-06",
messages=[
{
"role": "user",
"content": "为一个软件工程师创建一个用户档案"
}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "user_profile",
"schema": schema
}
}
)
# 输出将严格遵循定义的 schema
result = json.loads(response.choices[0].message.content)
print(result)
使用 Pydantic 模型
您也可以使用 Pydantic 模型来定义结构:
from pydantic import BaseModel
from typing import List
import json
class UserProfile(BaseModel):
name: str
age: int
email: str
skills: List[str]
is_active: bool = True
# 将 Pydantic 模型转换为 JSON Schema
schema = UserProfile.model_json_schema()
response = client.chat.completions.create(
model="gpt-4o-2024-08-06",
messages=[
{
"role": "user",
"content": "创建一个数据科学家的用户档案"
}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "user_profile",
"schema": schema
}
}
)
# 直接解析为 Pydantic 对象
result = UserProfile.model_validate_json(response.choices[0].message.content)
print(result)
实际应用示例
1. 数据提取
从非结构化文本中提取结构化信息:
# 定义提取 schema
extraction_schema = {
"type": "object",
"properties": {
"company_name": {"type": "string"},
"industry": {"type": "string"},
"revenue": {"type": "number"},
"employees": {"type": "integer"},
"founded_year": {"type": "integer"},
"headquarters": {"type": "string"}
},
"required": ["company_name", "industry"],
"additionalProperties": False
}
text = """
苹果公司成立于1976年,总部位于加利福尼亚州库比蒂诺。
作为全球领先的科技公司,苹果在2023年的年收入达到3940亿美元,
拥有超过164,000名员工。
"""
response = client.chat.completions.create(
model="gpt-4o-2024-08-06",
messages=[
{
"role": "user",
"content": f"从以下文本中提取公司信息:\n\n{text}"
}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "company_info",
"schema": extraction_schema
}
}
)
company_info = json.loads(response.choices[0].message.content)
print(company_info)
2. API 响应格式化
为 API 端点创建一致的响应格式:
api_response_schema = {
"type": "object",
"properties": {
"status": {"type": "string", "enum": ["success", "error"]},
"data": {
"type": "object",
"properties": {
"items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {"type": "string"},
"title": {"type": "string"},
"description": {"type": "string"},
"price": {"type": "number"},
"category": {"type": "string"}
},
"required": ["id", "title", "price"]
}
},
"total": {"type": "integer"},
"page": {"type": "integer"}
},
"required": ["items", "total", "page"]
},
"message": {"type": "string"}
},
"required": ["status", "data"],
"additionalProperties": False
}
response = client.chat.completions.create(
model="gpt-4o-2024-08-06",
messages=[
{
"role": "user",
"content": "生成一个包含5个电子产品的商品列表API响应"
}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "api_response",
"schema": api_response_schema
}
}
)
3. 配置文件生成
生成应用程序配置文件:
config_schema = {
"type": "object",
"properties": {
"database": {
"type": "object",
"properties": {
"host": {"type": "string"},
"port": {"type": "integer"},
"name": {"type": "string"},
"ssl": {"type": "boolean"}
},
"required": ["host", "port", "name"]
},
"cache": {
"type": "object",
"properties": {
"type": {"type": "string", "enum": ["redis", "memcached"]},
"ttl": {"type": "integer"},
"max_size": {"type": "integer"}
},
"required": ["type", "ttl"]
},
"logging": {
"type": "object",
"properties": {
"level": {"type": "string", "enum": ["DEBUG", "INFO", "WARNING", "ERROR"]},
"file": {"type": "string"},
"max_size": {"type": "string"}
},
"required": ["level"]
}
},
"required": ["database", "cache", "logging"],
"additionalProperties": False
}
response = client.chat.completions.create(
model="gpt-4o-2024-08-06",
messages=[
{
"role": "user",
"content": "为一个高流量的电商网站生成生产环境配置"
}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "app_config",
"schema": config_schema
}
}
)
最佳实践
1. 明确的 Schema 定义
确保您的 JSON Schema 尽可能明确和详细:
# ✅ 好的 schema - 明确且详细
good_schema = {
"type": "object",
"properties": {
"product_name": {
"type": "string",
"minLength": 1,
"maxLength": 100
},
"price": {
"type": "number",
"minimum": 0,
"multipleOf": 0.01
},
"category": {
"type": "string",
"enum": ["electronics", "clothing", "books", "home"]
},
"in_stock": {"type": "boolean"},
"tags": {
"type": "array",
"items": {"type": "string"},
"maxItems": 10
}
},
"required": ["product_name", "price", "category", "in_stock"],
"additionalProperties": False
}
# ❌ 不够明确的 schema
bad_schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"category": {"type": "string"}
}
}
2. 处理嵌套结构
对于复杂的嵌套数据,确保每个层级都有清晰的定义:
nested_schema = {
"type": "object",
"properties": {
"user": {
"type": "object",
"properties": {
"personal_info": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer", "minimum": 0, "maximum": 150},
"email": {"type": "string", "format": "email"}
},
"required": ["name", "email"],
"additionalProperties": False
},
"preferences": {
"type": "object",
"properties": {
"notifications": {"type": "boolean"},
"theme": {"type": "string", "enum": ["light", "dark"]},
"language": {"type": "string", "pattern": "^[a-z]{2}$"}
},
"additionalProperties": False
}
},
"required": ["personal_info"],
"additionalProperties": False
}
},
"required": ["user"],
"additionalProperties": False
}
3. 错误处理
始终包含适当的错误处理:
try:
response = client.chat.completions.create(
model="gpt-4o-2024-08-06",
messages=[{"role": "user", "content": "生成用户数据"}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "user_data",
"schema": schema
}
}
)
result = json.loads(response.choices[0].message.content)
# 验证结果
if validate_data(result):
process_data(result)
else:
handle_validation_error(result)
except json.JSONDecodeError as e:
print(f"JSON 解析错误: {e}")
except Exception as e:
print(f"请求错误: {e}")
限制和注意事项
Schema 复杂度
- 避免过于复杂的嵌套结构
- 限制数组的最大长度
- 合理设置字符串长度限制
性能考虑
- 复杂的 schema 可能增加响应时间
- 考虑在性能和结构严格性之间找到平衡
模型兼容性
- 确保使用支持结构化输出的模型版本
- 测试不同模型的输出质量
定价
结构化输出功能不收取额外费用,按标准的令牌定价计算。但是,由于需要遵循严格的格式,可能会略微增加输出令牌的使用量。
通过使用结构化输出,您可以构建更可靠、更易于集成的 AI 应用程序,确保模型输出始终符合您的应用程序需求。