设置跟踪标签
标签是可变的键值对,您可以将其附加到跟踪上,以添加有价值的元数据和上下文。这些元数据对于组织、搜索和过滤您的跟踪非常有用。例如,您可以根据用户输入的主题、他们运行的环境或使用的模型版本来标记您的跟踪。
MLflow 通过其 API 或 MLflow UI,为您提供了在任何时候(即使在跟踪记录之后)添加、更新或删除标签的灵活性。
何时使用跟踪标签
跟踪标签对于以下场景特别有用:
- 会话管理:按对话会话或用户交互对跟踪进行分组
- 环境跟踪:区分生产、暂存和开发跟踪
- 模型版本控制:跟踪生成特定跟踪的模型版本
- 用户上下文:将跟踪与特定用户或客户群体关联起来
- 性能监控:根据性能特征标记跟踪
- A/B 测试:区分不同的实验变体
- 活动跟踪
- 已完成跟踪
- 搜索和过滤
- 最佳实践
为活动跟踪设置标签
使用 mlflow.update_current_trace() 在跟踪执行期间添加标签。
import mlflow
@mlflow.trace
def my_func(x):
mlflow.update_current_trace(tags={"fruit": "apple"})
return x + 1
result = my_func(5)
示例:在生产系统中设置服务标签
import mlflow
import os
@mlflow.trace
def process_user_request(user_id: str, session_id: str, request_text: str):
# Add comprehensive tags for production monitoring
mlflow.update_current_trace(
tags={
"user_id": user_id,
"session_id": session_id,
"environment": os.getenv("ENVIRONMENT", "development"),
"model_version": os.getenv("MODEL_VERSION", "1.0.0"),
"request_type": "chat_completion",
"priority": "high" if "urgent" in request_text.lower() else "normal",
}
)
response = f"Processed: {request_text}"
return response
注意
当键尚不存在时,`mlflow.update_current_trace()` 函数会将指定的标签添加到当前跟踪。如果键已存在,它会使用新值更新该键。
为已完成跟踪设置标签
添加或修改已完成并已记录的跟踪的标签。
可用 API
| API | 用例 |
|---|---|
mlflow.set_trace_tag() | 用于设置标签的流畅 API |
mlflow.client.MlflowClient.set_trace_tag() | 用于设置标签的客户端 API |
| MLflow UI | 可视化标签管理 |
基本用法
import mlflow
from mlflow import MlflowClient
# Using fluent API
mlflow.set_trace_tag(trace_id="your-trace-id", key="tag_key", value="tag_value")
mlflow.delete_trace_tag(trace_id="your-trace-id", key="tag_key")
# Using client API
client = MlflowClient()
client.set_trace_tag(trace_id="your-trace-id", key="tag_key", value="tag_value")
client.delete_trace_tag(trace_id="your-trace-id", key="tag_key")
批量标记
import mlflow
from mlflow import MlflowClient
client = MlflowClient()
# Find traces that need to be tagged
traces = mlflow.search_traces(
experiment_ids=["1"], filter_string="status = 'ERROR'", max_results=100
)
# Add tags to all error traces
for trace in traces:
client.set_trace_tag(trace_id=trace.info.trace_id, key="needs_review", value="true")
client.set_trace_tag(
trace_id=trace.info.trace_id, key="review_priority", value="high"
)
性能分析标记
import mlflow
from mlflow import MlflowClient
from datetime import datetime
client = MlflowClient()
# Get slow traces for analysis
traces = mlflow.search_traces(
experiment_ids=["1"], filter_string="execution_time_ms > 5000", max_results=50
)
# Tag based on performance analysis
for trace in traces:
execution_time = trace.info.execution_time_ms
if execution_time > 10000:
performance_tag = "very_slow"
elif execution_time > 7500:
performance_tag = "slow"
else:
performance_tag = "moderate"
client.set_trace_tag(
trace_id=trace.info.trace_id, key="performance_category", value=performance_tag
)
使用 MLflow UI
导航到跟踪详细信息页面,然后单击标签旁边的铅笔图标以进行可视化编辑。

UI 功能
- 通过单击“+”按钮添加新标签
- 通过单击铅笔图标编辑现有标签
- 通过单击垃圾桶图标删除标签
- 查看与跟踪关联的所有标签
使用标签进行搜索和过滤
使用标签可快速高效地查找特定跟踪。
基本标签过滤
import mlflow
# Find traces by environment
production_traces = mlflow.search_traces(
experiment_ids=["1"], filter_string="tags.environment = 'production'"
)
# Find traces by user
user_traces = mlflow.search_traces(
experiment_ids=["1"], filter_string="tags.user_id = 'user_123'"
)
# Find high-priority traces
urgent_traces = mlflow.search_traces(
experiment_ids=["1"], filter_string="tags.priority = 'high'"
)
复杂的基于标签的查询
# Combine tag filters with other conditions
slow_production_errors = mlflow.search_traces(
experiment_ids=["1"],
filter_string="""
tags.environment = 'production'
AND status = 'ERROR'
AND execution_time_ms > 5000
""",
)
# Find traces that need review
review_traces = mlflow.search_traces(
experiment_ids=["1"],
filter_string="tags.needs_review = 'true'",
order_by=["timestamp_ms DESC"],
)
# Find specific user sessions
session_traces = mlflow.search_traces(
experiment_ids=["1"],
filter_string="tags.session_id = 'session_456'",
order_by=["timestamp_ms ASC"],
)
操作监控查询
# Monitor A/B test performance
control_group = mlflow.search_traces(
experiment_ids=["1"], filter_string="tags.experiment_variant = 'control'"
)
treatment_group = mlflow.search_traces(
experiment_ids=["1"], filter_string="tags.experiment_variant = 'treatment'"
)
# Find traces needing escalation
escalation_traces = mlflow.search_traces(
experiment_ids=["1"],
filter_string="""
tags.sla_tier = 'critical'
AND execution_time_ms > 30000
""",
)
分析和报告
# Generate performance reports by model version
model_v1_traces = mlflow.search_traces(
experiment_ids=["1"], filter_string="tags.model_version = 'v1.0.0'"
)
model_v2_traces = mlflow.search_traces(
experiment_ids=["1"], filter_string="tags.model_version = 'v2.0.0'"
)
# Compare performance
v1_avg_time = sum(t.info.execution_time_ms for t in model_v1_traces) / len(
model_v1_traces
)
v2_avg_time = sum(t.info.execution_time_ms for t in model_v2_traces) / len(
model_v2_traces
)
print(f"V1 average time: {v1_avg_time:.2f}ms")
print(f"V2 average time: {v2_avg_time:.2f}ms")
跟踪标签最佳实践
1. 保持命名约定一致
# Good: Consistent naming
tags = {
"environment": "production", # lowercase
"model_version": "v2.1.0", # semantic versioning
"user_segment": "premium", # descriptive names
"processing_stage": "preprocessing", # clear context
}
# Avoid: Inconsistent naming
tags = {
"env": "PROD", # abbreviation + uppercase
"ModelVer": "2.1", # mixed case + different format
"user_type": "premium", # different terminology
"stage": "pre", # unclear abbreviation
}
2. 分层组织
# Use dots for hierarchical organization
tags = {
"service.name": "chat_api",
"service.version": "1.2.0",
"service.region": "us-east-1",
"user.segment": "enterprise",
"user.plan": "premium",
"request.type": "completion",
"request.priority": "high",
}
3. 时间信息
import datetime
tags = {
"deployment_date": "2024-01-15",
"quarter": "Q1_2024",
"week": "2024-W03",
"shift": "evening", # for operational monitoring
}
4. 操作监控
# Tags for monitoring and alerting
tags = {
"sla_tier": "critical", # for SLA monitoring
"cost_center": "ml_platform", # for cost attribution
"alert_group": "ml_ops", # for alert routing
"escalation": "tier_1", # for incident management
}
5. 实验跟踪
# Tags for A/B testing and experiments
tags = {
"experiment_name": "prompt_optimization_v2",
"variant": "control",
"hypothesis": "improved_context_helps",
"feature_flag": "new_prompt_engine",
}
常用标签类别
| 类别 | 示例标签 | 用例 |
|---|---|---|
| 环境 | environment: production/staging/dev | 部署跟踪 |
| 用户上下文 | user_id, session_id, user_segment | 用户行为分析 |
| 模型信息 | model_version, model_type, checkpoint | 模型性能跟踪 |
| 请求类型 | request_type, complexity, priority | 请求分类 |
| 性能 | latency_tier, cost_category, sla_tier | 性能监控 |
| 业务逻辑 | feature_flag, experiment_variant, routing | A/B 测试和路由 |
| 操作 | region, deployment_id, instance_type | 基础设施跟踪 |
标签命名指南
- 使用小写和下划线以保持一致性
- 描述性强但要简洁
- 使用语义化版本控制来表示版本(v1.2.3)
- 在相关时包含单位(time_seconds, size_mb)
- 为相关概念使用分层命名(service.name, service.version)
- 避免使用缩写,除非它们在您的领域中是众所周知的
总结
跟踪标签提供了一种强大的方式来为您的 MLflow 跟踪添加上下文和元数据,从而实现:
- 更好的组织:将相关跟踪分组在一起
- 强大的过滤:通过搜索快速找到特定跟踪
- 操作监控:按类别跟踪性能和问题
- 用户分析:了解用户行为模式
- 调试:添加有助于故障排除的上下文
无论您是在跟踪执行期间还是在事后添加标签,标签都能使您的跟踪数据对生产监控和分析更有价值且更具可操作性。