搜索轨迹

本指南将引导您如何使用 MLflow UI 和 Python API 在 MLflow 中搜索轨迹。如果您有兴趣根据轨迹的元数据、标签、执行时间、状态或其他轨迹属性来查询特定轨迹，本资源将很有价值。

MLflow 的轨迹搜索功能允许您利用 SQL 样式的语法根据各种条件筛选轨迹。虽然不支持 OR 关键字，但搜索功能足够强大，可以处理复杂的轨迹发现和分析查询。

轨迹搜索概述

在生产环境中进行 MLflow 轨迹追踪时，您通常会在不同实验中拥有数千条轨迹，这些轨迹代表各种模型推理、LLM 调用或 ML 管道执行。search_traces API 可帮助您根据轨迹的执行特征、元数据、标签和其他属性查找特定轨迹，从而使轨迹分析和调试更加高效。

搜索查询语法

search_traces API 使用 SQL 样式的领域特定语言 (DSL) 来查询轨迹。

搜索组件的可视化表示：

search components

主要功能：

支持的属性：request_id、timestamp_ms、execution_time_ms、status、name、run_id
标签支持：使用 tags. 或 tag. 前缀按轨迹标签进行筛选
元数据支持：使用 metadata. 前缀按请求元数据进行筛选
时间戳筛选：内置支持基于时间的查询
状态筛选：按轨迹执行状态筛选（OK、ERROR、IN_PROGRESS）

语法规则：

字段语法

属性：status、timestamp_ms、execution_time_ms、trace.name
标签：tags.operation_type、tag.model_name（两种前缀都支持）
元数据：metadata.run_id
对特殊字符使用反引号：tags.`model-name`

值语法

字符串值必须加引号：status = 'OK'
数值不需要加引号：execution_time_ms > 1000
标签和元数据值必须加引号作为字符串

支持的比较运算符

数值型（timestamp_ms、execution_time_ms）：>、>=、=、!=、<、<=
字符串型（name、status、request_id）：=、!=、IN、NOT IN
标签/元数据：=、!=

轨迹状态值

OK - 成功执行
ERROR - 执行失败
IN_PROGRESS - 正在执行

示例查询

按名称筛选

# Search for traces by name
mlflow.search_traces(filter_string="trace.name = 'predict'")
mlflow.search_traces(filter_string="name = 'llm_inference'")

按状态筛选

# Get successful traces
mlflow.search_traces(filter_string="trace.status = 'OK'")
mlflow.search_traces(filter_string="status = 'OK'")

# Get failed traces
mlflow.search_traces(filter_string="status = 'ERROR'")

# Multiple statuses
mlflow.search_traces(filter_string="status IN ('OK', 'ERROR')")

按执行时间筛选

# Find slow traces (> 1 second)
mlflow.search_traces(filter_string="execution_time_ms > 1000")

# Performance range
mlflow.search_traces(
    filter_string="execution_time_ms >= 200 AND execution_time_ms <= 800"
)

按时间戳筛选

import time

# Get traces from last hour
timestamp = int(time.time() * 1000)
mlflow.search_traces(filter_string=f"trace.timestamp > {timestamp - 3600000}")

# Alternative syntax
mlflow.search_traces(filter_string=f"timestamp_ms > {timestamp - 3600000}")

按标签筛选

# Filter by tag values (both syntaxes supported)
mlflow.search_traces(filter_string="tag.model_name = 'gpt-4'")
mlflow.search_traces(filter_string="tags.operation_type = 'llm_inference'")

按运行关联筛选

# Find traces associated with a specific run
mlflow.search_traces(run_id="run_id_123456")

# Or using filter string
mlflow.search_traces(filter_string="metadata.run_id = 'run_id_123456'")

组合多个条件

# Complex query
mlflow.search_traces(filter_string="trace.status = 'OK' AND tag.importance = 'high'")

# Production error analysis
mlflow.search_traces(
    filter_string="""
        tags.environment = 'production'
        AND status = 'ERROR'
        AND execution_time_ms > 500
    """
)

在 UI 中筛选轨迹

在 MLflow 轨迹 UI 中使用搜索框，根据上述相同的语法按各种条件筛选轨迹。

UI 搜索支持与 API 相同的所有筛选语法，允许您

按轨迹名称、状态或执行时间筛选
按标签和元数据搜索
使用时间戳范围
使用 AND 组合多个条件

使用 Python 进行编程搜索

mlflow.search_traces() 提供了便捷的轨迹搜索功能

import mlflow

# Basic search with default DataFrame output
traces_df = mlflow.search_traces(filter_string="status = 'OK'")

# Return as list of Trace objects
traces_list = mlflow.search_traces(filter_string="status = 'OK'", return_type="list")

注意

return_type 参数在 MLflow 2.21.1+ 版本中可用。对于更旧的版本，请使用 mlflow.client.MlflowClient.search_traces() 获取列表输出。

返回格式

1. DataFrame

search_traces API 默认返回一个 pandas DataFrame，包含以下列

MLflow 3.x
MLflow 2.x

trace_id - 主标识符
trace - 轨迹对象
client_request_id - 客户端请求 ID
state - 轨迹状态 (OK, ERROR, IN_PROGRESS, STATE_UNSPECIFIED)
request_time - 开始时间（毫秒）
execution_duration - 持续时间（毫秒）
inputs - 追踪逻辑的输入
outputs - 追踪逻辑的输出
expectations - 轨迹上标注的真实标签字典
trace_metadata - 键值元数据
tags - 相关标签
assessments - 轨迹上附加的评估对象列表

request_id - 主标识符
trace - 轨迹对象
timestamp_ms - 开始时间（毫秒）
status - 轨迹状态
execution_time_ms - 持续时间（毫秒）
request - 追踪逻辑的输入
response - 追踪逻辑的输出
request_metadata - 键值元数据
spans - 轨迹中的跨度
tags - 相关标签

2. 轨迹对象列表

或者，您可以指定 return_type="list" 以获取 mlflow.entities.Trace() 对象列表，而不是 DataFrame。

traces = mlflow.search_traces(filter_string="status = 'OK'", return_type="list")
# list[mlflow.entities.Trace]

排序结果

MLflow 支持按以下键对结果进行排序

timestamp_ms (默认：降序) - 轨迹开始时间
execution_time_ms - 轨迹持续时间
status - 轨迹执行状态
request_id - 轨迹标识符

# Order by timestamp (most recent first)
traces = mlflow.search_traces(order_by=["timestamp_ms DESC"])

# Multiple ordering criteria
traces = mlflow.search_traces(order_by=["timestamp_ms DESC", "status ASC"])

提取跨度字段

将特定跨度数据提取到 DataFrame 列中

traces = mlflow.search_traces(
    extract_fields=[
        "morning_greeting.inputs.name",  # Extract specific input
        "morning_greeting.outputs",  # Extract all outputs
    ],
)

# Creates additional columns:
# - morning_greeting.inputs.name
# - morning_greeting.outputs

这对于创建评估数据集很有用

eval_data = traces.rename(
    columns={
        "morning_greeting.inputs.name": "inputs",
        "morning_greeting.outputs": "ground_truth",
    }
)

results = mlflow.genai.evaluate(data=eval_data, scorers=[...])

注意

extract_fields 仅适用于 return_type="pandas"。

分页

mlflow.client.MlflowClient.search_traces() 支持分页

from mlflow import MlflowClient

client = MlflowClient()
page_token = None
all_traces = []

while True:
    results = client.search_traces(
        experiment_ids=["1"],
        filter_string="status = 'OK'",
        max_results=100,
        page_token=page_token,
    )

    all_traces.extend(results)

    if not results.token:
        break
    page_token = results.token

print(f"Found {len(all_traces)} total traces")

常见用例

性能分析

# Find slowest 10 traces
slowest_traces = mlflow.search_traces(
    filter_string="status = 'OK'",
    order_by=["execution_time_ms DESC"],
    max_results=10,
)

# Performance threshold violations
slow_production = mlflow.search_traces(
    filter_string="""
        tags.environment = 'production'
        AND execution_time_ms > 2000
        AND status = 'OK'
    """,
)

错误分析

import time

# Recent errors
yesterday = int((time.time() - 24 * 3600) * 1000)
error_traces = mlflow.search_traces(
    filter_string=f"status = 'ERROR' AND timestamp_ms > {yesterday}",
    order_by=["timestamp_ms DESC"],
)

# Analyze error patterns
error_by_operation = {}
for _, trace in error_traces.iterrows():
    # Access tags from the trace object
    tags = trace["tags"] if "tags" in trace else {}
    op_type = tags.get("operation_type", "unknown")
    error_by_operation[op_type] = error_by_operation.get(op_type, 0) + 1

模型性能比较

# Compare performance across models
models = ["gpt-4", "bert-base", "roberta-large"]
model_stats = {}

for model in models:
    traces = mlflow.search_traces(
        filter_string=f"tags.model_name = '{model}' AND status = 'OK'",
        return_type="list",
    )

    if traces:
        exec_times = [trace.info.execution_time_ms for trace in traces]
        model_stats[model] = {
            "count": len(traces),
            "avg_time": sum(exec_times) / len(exec_times),
            "max_time": max(exec_times),
        }

print("Model performance comparison:")
for model, stats in model_stats.items():
    print(f"{model}: {stats['count']} traces, avg {stats['avg_time']:.1f}ms")

创建评估数据集

# Extract LLM conversation data for evaluation
conversation_data = mlflow.search_traces(
    filter_string="tags.task_type = 'conversation' AND status = 'OK'",
    extract_fields=["llm_call.inputs.prompt", "llm_call.outputs.response"],
)

# Rename for evaluation
eval_dataset = conversation_data.rename(
    columns={
        "llm_call.inputs.prompt": "inputs",
        "llm_call.outputs.response": "ground_truth",
    }
)

# Use with MLflow evaluate
results = mlflow.genai.evaluate(data=eval_dataset, scorers=[...])

环境监控

# Monitor error rates across environments
environments = ["production", "staging", "development"]

for env in environments:
    total = mlflow.search_traces(filter_string=f"tags.environment = '{env}'")

    errors = mlflow.search_traces(
        filter_string=f"tags.environment = '{env}' AND status = 'ERROR'",
    )

    error_rate = len(errors) / len(total) * 100 if len(total) > 0 else 0
    print(f"{env}: {error_rate:.1f}% error rate ({len(errors)}/{len(total)})")

创建示例轨迹

创建示例轨迹以探索搜索功能

import time
import mlflow
from mlflow.entities import SpanType


# Define methods to be traced
@mlflow.trace(span_type=SpanType.TOOL, attributes={"time": "morning"})
def morning_greeting(name: str):
    time.sleep(1)
    mlflow.update_current_trace(tags={"person": name})
    return f"Good morning {name}."


@mlflow.trace(span_type=SpanType.TOOL, attributes={"time": "evening"})
def evening_greeting(name: str):
    time.sleep(1)
    mlflow.update_current_trace(tags={"person": name})
    return f"Good evening {name}."


@mlflow.trace(span_type=SpanType.TOOL)
def goodbye():
    raise Exception("Cannot say goodbye")


# Execute within different experiments
morning_experiment = mlflow.set_experiment("Morning Experiment")
morning_greeting("Tom")

# Get timestamp for filtering
morning_time = int(time.time() * 1000)

evening_experiment = mlflow.set_experiment("Evening Experiment")
evening_greeting("Mary")
try:
    goodbye()
except:
    pass  # This creates an ERROR trace

print("Created example traces with different statuses and timing")

替代设置 - 生产级轨迹

import mlflow
import time
import random
from mlflow.tracing import trace

mlflow.set_experiment("trace-search-guide")

# Configuration for realistic traces
operation_types = ["llm_inference", "embedding_generation", "text_classification"]
model_names = ["gpt-4", "bert-base", "roberta-large"]
environments = ["production", "staging", "development"]


def simulate_operation(op_type, model_name, duration_ms):
    """Simulate an AI/ML operation"""
    time.sleep(duration_ms / 1000.0)

    # Simulate occasional errors
    if random.random() < 0.1:
        raise Exception(f"Simulated error in {op_type}")

    return f"Completed {op_type} with {model_name}"


# Create diverse traces
for i in range(20):
    op_type = random.choice(operation_types)
    model_name = random.choice(model_names)
    environment = random.choice(environments)
    duration = random.randint(50, 2000)  # 50ms to 2s

    try:
        with mlflow.start_run():
            mlflow.set_tag("environment", environment)

            with trace(
                name=f"{op_type}_{i}",
                tags={
                    "operation_type": op_type,
                    "model_name": model_name,
                    "environment": environment,
                    "input_tokens": str(random.randint(10, 500)),
                },
            ) as span:
                result = simulate_operation(op_type, model_name, duration)
                span.set_attribute("result", result)

    except Exception:
        # Creates ERROR status traces
        continue

print("Created 20 example traces with various characteristics")

启动 MLflow UI 进行探索

mlflow ui

访问 https://:5000/ 以在 UI 中查看您的轨迹。

创建这些轨迹后，您可以通过 UI 或通过 fluent 或 client search_traces API 进行编程搜索实验。

重要说明

MLflow 版本兼容性

MLflow 3 中的 Schema 变更

DataFrame Schema：格式取决于用于调用 search_traces API 的 MLflow 版本，而不是用于记录轨迹的版本。MLflow 3.x 使用与 2.x 不同的列名。

返回类型支持

MLflow 2.21.1+：return_type 参数在 mlflow.search_traces() 中可用
早期版本：使用 MlflowClient.search_traces() 获取列表输出

性能提示

使用时间戳筛选以限制搜索空间
限制 max_results 以在排序时加快查询速度
使用分页处理大型结果集
在存储系统中索引常用查询的标签

后端考虑

数据库后端：通过对时间戳和状态进行适当索引来优化性能
Databricks：通过 sql_warehouse_id 参数增强性能
本地文件存储：对于大型数据集可能会较慢。不推荐，仅适用于存储少量轨迹。

总结

search_traces API 在 MLflow 中提供了强大的轨迹发现和分析功能。通过结合灵活的筛选、基于时间的查询、基于标签的组织以及跨度字段提取等高级功能，您可以高效地调查轨迹模式、调试问题和监控系统性能。

要点

使用 SQL 样式的语法，结合 tags./tag.、metadata. 和直接属性引用
按执行时间、状态、时间戳和自定义标签进行筛选
使用 AND 组合多个条件（不支持 OR）
使用排序和分页以高效地探索数据
利用跨度字段提取创建评估数据集
根据您的用例选择合适的返回类型

无论您是在调试生产问题、分析模型性能、监控系统健康状况还是创建评估数据集，掌握轨迹搜索 API 都将使您的 MLflow 工作流程更加高效和富有洞察力。

轨迹搜索概述​

搜索查询语法​

搜索组件的可视化表示：​

主要功能：​

语法规则：​

示例查询​

按名称筛选​

按状态筛选​

按执行时间筛选​

按时间戳筛选​

按标签筛选​

按运行关联筛选​

组合多个条件​

在 UI 中筛选轨迹​

使用 Python 进行编程搜索​

返回格式​

1. DataFrame​

2. 轨迹对象列表​

排序结果​

提取跨度字段​

分页​

常见用例​

性能分析​

错误分析​

模型性能比较​

创建评估数据集​

环境监控​

创建示例轨迹​

替代设置 - 生产级轨迹​

重要说明​

MLflow 版本兼容性​

性能提示​

后端考虑​

总结​