跳到主要内容

搜索 Traces

本指南将引导您了解如何使用 MLflow UI 和 Python API 搜索 MLflow 中的 traces。如果您有兴趣根据 traces 的元数据、标签、执行时间、状态或其他 trace 属性进行查询,本资源将非常有用。

MLflow 的 trace 搜索功能允许您利用类似 SQL 的语法根据各种条件过滤 traces。虽然不支持 OR 关键字,但搜索功能足够强大,可以处理复杂的 trace 发现和分析查询。

重要

本地文件存储仅提供有限的搜索功能,并且随着数据量的增长会变慢。 从 MLflow 3.6.0 开始,FileStore 已弃用。我们建议迁移到基于 SQL 的存储或 Databricks,以获得更好的性能和更强大的搜索功能。

搜索 Traces 概述

在生产环境中处理 MLflow tracing 时,您通常会有成千上万个 traces,分布在不同的实验中,代表各种模型推理、LLM 调用或 ML 管道执行。search_traces API 帮助您根据 traces 的执行特征、元数据、标签和其他属性查找特定的 traces,从而使 trace 分析和调试更加高效。

在 UI 中过滤 Traces

UI 搜索支持与 API 相同的筛选语法,允许您按以下条件搜索:

  • Trace inputs
  • Trace attributes: trace name, status, end time, execution time, run_id
  • Trace tags and metadata
  • Trace assessments: feedback or expectations

使用 MLflow Trace UI 中的筛选器下拉菜单按各种标准过滤 traces

search components

例如,搜索状态为 ERROR 的 traces

Search Traces UI

搜索 trace inputs

Search Traces Inputs UI

按键和值搜索 trace assessments

Search Traces By Assessments UI

搜索查询语法

search_traces API 使用类似 SQL 的领域特定语言 (DSL) 来查询 traces。

搜索组件的可视化表示:

search components

支持的筛选器和比较器

字段类型字段运算符示例
Trace Statustrace.status=, !=trace.status = "OK"
Trace Timestampstrace.timestamp_ms, trace.execution_time_ms, trace.end_time_ms=, !=, >, <, >=, <=trace.end_time_ms > 1762408895531
Trace IDstrace.run_id=trace.run_id = "run_id"
String Fieldstrace.client_request_id, trace.name=, !=, LIKE, ILIKE, RLIKEtrace.name LIKE "%Generate%"
Linked Promptsprompt= (format: "name/version")prompt = "qa-system-prompt/4"
Span Name/Typespan.name, span.type=, !=, LIKE, ILIKE, RLIKEspan.type RLIKE "^LLM"
Span Attributesspan.attributes.<key>LIKE, ILIKEspan.attributes.model RLIKE "^gpt"
标签tag.<key>=, !=, LIKE, ILIKE, RLIKEtag.key = "value"
Metadatametadata.<key>=, !=, LIKE, ILIKE, RLIKEmetadata.user_id LIKE "user%"
反馈feedback.<name>=, !=, LIKE, ILIKE, RLIKEfeedback.rating = "excellent"
预期expectation.<name>=, !=, LIKE, ILIKE, RLIKEexpectation.result = "pass"
Full Texttrace.textLIKE (with % wildcards)trace.text LIKE "%tell me a story"

Value Syntax

  • String values must be quoted: status = 'OK'
  • Numeric values don't need quotes: execution_time_ms > 1000
  • Tag and metadata values must be quoted as strings
  • Full text search must use LIKE with % wildcards

Pattern Matching Operators

  • LIKE: Case-sensitive pattern matching (use % for wildcards)
  • ILIKE: Case-insensitive pattern matching (use % for wildcards)
  • RLIKE: Regular expression matching

示例查询

搜索 trace 中的任何内容。

python
# Search for traces containing specific text
mlflow.search_traces(filter_string="trace.text LIKE '%authentication error%'")

# Search for multiple terms
mlflow.search_traces(filter_string="trace.text LIKE '%timeout%'")

按名称过滤

python
# Exact match
mlflow.search_traces(filter_string="trace.name = 'predict'")

# Pattern matching with LIKE
mlflow.search_traces(filter_string="trace.name LIKE '%inference%'")

# Case-insensitive pattern matching with ILIKE
mlflow.search_traces(filter_string="trace.name ILIKE '%PREDICT%'")

# Regular expression matching with RLIKE
mlflow.search_traces(filter_string="trace.name RLIKE '^(predict|inference)_[0-9]+'")

按状态过滤

python
# Get successful traces
mlflow.search_traces(filter_string="trace.status = 'OK'")

# Get failed traces
mlflow.search_traces(filter_string="trace.status = 'ERROR'")

# Get in-progress traces
mlflow.search_traces(filter_string="trace.status != 'OK'")

按执行时间过滤

python
# Find slow traces (> 1 second)
mlflow.search_traces(filter_string="trace.execution_time_ms > 1000")

# Performance range
mlflow.search_traces(
filter_string="trace.execution_time_ms >= 200 AND trace.execution_time_ms <= 800"
)

# Equal to specific duration
mlflow.search_traces(filter_string="trace.execution_time_ms = 500")

按时间戳过滤

python
import time

# Get traces from last hour
timestamp = int(time.time() * 1000)
mlflow.search_traces(filter_string=f"trace.timestamp_ms > {timestamp - 3600000}")

# Exact timestamp match
mlflow.search_traces(filter_string=f"trace.timestamp_ms = {timestamp}")

# Timestamp range
mlflow.search_traces(
filter_string=f"trace.timestamp_ms >= {timestamp - 7200000} AND trace.timestamp_ms <= {timestamp - 3600000}"
)

按标签过滤

python
# Exact match
mlflow.search_traces(filter_string="tag.model_name = 'gpt-4'")

# Pattern matching with LIKE (case-sensitive)
mlflow.search_traces(filter_string="tag.model_name LIKE 'gpt-%'")

# Case-insensitive pattern matching with ILIKE
mlflow.search_traces(filter_string="tag.environment ILIKE '%prod%'")

# Regular expression matching with RLIKE
mlflow.search_traces(filter_string="tag.version RLIKE '^v[0-9]+\\.[0-9]+'")

按 Run 关联过滤

python
# Find traces associated with a specific run
mlflow.search_traces(filter_string="trace.run_id = 'run_id_123456'")

按链接的 Prompts 过滤

python
# Find traces using a specific prompt version
mlflow.search_traces(filter_string='prompt = "qa-agent-system-prompt/4"')
注意

prompt 筛选器仅支持精确匹配 (=) 运算符,格式为 "name/version"

按 Span 属性过滤

python
# Filter by span name
mlflow.search_traces(filter_string="span.name = 'llm_call'")

# Pattern matching on span name
mlflow.search_traces(filter_string="span.name LIKE '%embedding%'")

# Filter by span type
mlflow.search_traces(filter_string="span.type = 'LLM'")

# Filter by custom span attributes (requires wildcards with LIKE/ILIKE)
mlflow.search_traces(filter_string="span.attributes.model_version LIKE '%v2%'")
mlflow.search_traces(filter_string="span.attributes.temperature LIKE '%0.7%'")
mlflow.search_traces(filter_string="span.attributes.model_version ILIKE '%V2%'")

按反馈过滤

python
# Filter by feedback ratings
mlflow.search_traces(filter_string="feedback.rating = 'positive'")

# Pattern matching on feedback
mlflow.search_traces(filter_string="feedback.user_comment LIKE '%helpful%'")

按期望过滤

python
# Filter by expectation values
mlflow.search_traces(filter_string="expectation.accuracy = 'high'")

# Pattern matching on expectations
mlflow.search_traces(filter_string="expectation.label ILIKE '%success%'")

按结束时间过滤

python
import time

# Get traces that completed in the last hour
end_time = int(time.time() * 1000)
mlflow.search_traces(filter_string=f"trace.end_time_ms > {end_time - 3600000}")

# Find traces that ended within a specific time range
mlflow.search_traces(
filter_string=f"trace.end_time_ms >= {end_time - 7200000} AND trace.end_time_ms <= {end_time - 3600000}"
)

组合多个条件

python
# Complex query with tags and status
mlflow.search_traces(filter_string="trace.status = 'OK' AND tag.importance = 'high'")

# Production error analysis with execution time
mlflow.search_traces(
filter_string="""
tag.environment = 'production'
AND trace.status = 'ERROR'
AND trace.execution_time_ms > 500
"""
)

# Advanced query with span attributes and feedback
mlflow.search_traces(
filter_string="""
span.name LIKE '%llm%'
AND feedback.rating = 'positive'
AND trace.execution_time_ms < 1000
"""
)

# Search with pattern matching and time range
mlflow.search_traces(
filter_string="""
trace.name ILIKE '%inference%'
AND trace.timestamp_ms > 1700000000000
AND span.attributes.model_version LIKE '%v2%'
"""
)

使用 Python 进行程序化搜索

mlflow.search_traces() 提供便捷的 trace 搜索功能

python
import mlflow

# Basic search with default DataFrame output
traces_df = mlflow.search_traces(filter_string="trace.status = 'OK'")

# Return as list of Trace objects
traces_list = mlflow.search_traces(
filter_string="trace.status = 'OK'", return_type="list"
)
注意

return_type 参数在 MLflow 2.21.1+ 中可用。对于旧版本,请使用 mlflow.client.MlflowClient.search_traces() 获取列表输出。

返回格式

1. DataFrame

search_traces API 默认返回一个 pandas DataFrame,包含以下列:

  • trace_id - Primary identifier
  • trace - Trace object
  • client_request_id - Client request ID
  • state - Trace state (OK, ERROR, IN_PROGRESS, STATE_UNSPECIFIED)
  • request_time - Start time in milliseconds
  • execution_duration - Duration in milliseconds
  • inputs - Input to traced logic
  • outputs - Output of traced logic
  • expectations - A dictionary of ground truth labels annotated on the trace
  • trace_metadata - Key-value metadata
  • tags - Associated tags
  • assessments - List of assessment objects attached on the trace

2. List of Trace Objects

Alternatively, you can specify return_type="list" to get a list of mlflow.entities.Trace() objects instead of a DataFrame.

python
traces = mlflow.search_traces(filter_string="trace.status = 'OK'", return_type="list")
# list[mlflow.entities.Trace]

Ordering Results

MLflow supports ordering results by the following keys

  • timestamp_ms (default: DESC) - Trace start time
  • execution_time_ms - Trace duration
  • status - Trace execution status
  • request_id - Trace identifier
python
# Order by timestamp (most recent first)
traces = mlflow.search_traces(order_by=["timestamp_ms DESC"])

# Multiple ordering criteria
traces = mlflow.search_traces(order_by=["timestamp_ms DESC", "status ASC"])

Pagination

mlflow.client.MlflowClient.search_traces() supports pagination

python
from mlflow import MlflowClient

client = MlflowClient()
page_token = None
all_traces = []

while True:
results = client.search_traces(
experiment_ids=["1"],
filter_string="status = 'OK'",
max_results=100,
page_token=page_token,
)

all_traces.extend(results)

if not results.token:
break
page_token = results.token

print(f"Found {len(all_traces)} total traces")

重要说明

MLflow Version Compatibility

MLflow 3 中的 Schema 更改

DataFrame Schema: 格式取决于用于 **调用** search_traces API 的 MLflow 版本,而不是用于记录 traces 的版本。MLflow 3.x 的列名与 2.x 不同。

性能提示

  1. 使用时间戳筛选器 来限制搜索空间
  2. 限制 max_results 以在排序时获得更快的查询
  3. 对大型结果集使用分页

Backend Considerations

  • SQL Store Backend: 支持上面记录的完整搜索语法,包括
    • 所有 trace、span、metadata、tag、feedback 和 expectation 筛选器
    • Pattern matching operators (LIKE, ILIKE, RLIKE)
    • Full text search with trace.text
    • 通过对时间戳进行适当的索引来优化性能
  • Local File Store: 搜索功能有限。对于大型数据集可能会变慢。不推荐使用,仅适用于存储少量 traces。