跳到主要内容

搜索 Traces

本指南将引导您了解如何使用 MLflow UI 和 Python API 搜索 MLflow 中的跟踪。如果您有兴趣根据元数据、标签、执行时间、状态或其他跟踪属性查询特定跟踪,本资源将非常有用。

MLflow 的跟踪搜索功能允许您利用类似 SQL 的语法根据各种条件过滤跟踪。虽然不支持 OR 关键字,但搜索功能强大到足以处理复杂的跟踪发现和分析查询。

重要

本地文件存储仅提供有限的搜索功能,并且随着数据量的增加会变慢。 截至 MLflow 3.6.0,FileStore 已弃用。我们建议迁移到基于 SQL 的存储或 Databricks,以提高性能和更强大的搜索功能。

搜索跟踪概览

在使用生产环境中的 MLflow 跟踪时,您通常会有跨越不同实验的数千个跟踪,这些实验代表了各种模型推理、LLM 调用或 ML 管道执行。search_traces API 帮助您根据跟踪的执行特征、元数据、标签和其他属性查找特定跟踪——从而使跟踪分析和调试更加高效。

在 UI 中过滤跟踪

UI 搜索支持与 API 相同的过滤语法,允许您按以下条件进行搜索:

  • 跟踪输入
  • 跟踪属性:跟踪名称、状态、结束时间、执行时间、run_id
  • 跟踪标签和元数据
  • 跟踪评估:反馈或期望

使用 MLflow 跟踪 UI 中的过滤器下拉菜单按各种条件过滤跟踪

search components

例如,搜索状态为 ERROR 的跟踪

Search Traces UI

搜索跟踪输入

Search Traces Inputs UI

按键和值搜索跟踪评估

Search Traces By Assessments UI

搜索查询语法

search_traces API 使用类似 SQL 的领域特定语言 (DSL) 来查询跟踪。

搜索组件的可视化表示:

search components

支持的过滤器和比较器

字段类型字段运算符示例
跟踪状态trace.status=, !=trace.status = "OK"
跟踪时间戳trace.timestamp_mstrace.execution_time_mstrace.end_time_ms=, !=, >, <, >=, <=trace.end_time_ms > 1762408895531
跟踪 IDtrace.run_id=trace.run_id = "run_id"
字符串字段trace.client_request_idtrace.name=!=LIKEILIKERLIKEtrace.name LIKE "%Generate%"
链接的提示prompt= (格式: "name/version")prompt = "qa-system-prompt/4"
跨度名称/类型span.namespan.type=!=LIKEILIKERLIKEspan.type RLIKE "^LLM"
跨度属性span.attributes.<key>LIKEILIKEspan.attributes.model RLIKE "^gpt"
标签tag.<key>=!=LIKEILIKERLIKEtag.key = "value"
Metadatametadata.<key>=!=LIKEILIKERLIKEIS NULLIS NOT NULLmetadata.user_id LIKE "user%"
反馈feedback.<name>=!=LIKEILIKERLIKEfeedback.rating = "excellent"
预期expectation.<name>=!=LIKEILIKERLIKEexpectation.result = "pass"
全文trace.textLIKE (使用 % 通配符)trace.text LIKE "%tell me a story"

值语法

  • 字符串值必须加引号:status = 'OK'
  • 数字值不需要引号:execution_time_ms > 1000
  • 标签和元数据值必须作为字符串加引号
  • 全文搜索必须使用带有 % 通配符的 LIKE

模式匹配运算符

  • LIKE:区分大小写的模式匹配(使用 % 作为通配符)
  • ILIKE:不区分大小写的模式匹配(使用 % 作为通配符)
  • RLIKE:正则表达式匹配

示例查询

搜索跟踪中存在的任何内容。

python
# Search for traces containing specific text
mlflow.search_traces(filter_string="trace.text LIKE '%authentication error%'")

# Search for multiple terms
mlflow.search_traces(filter_string="trace.text LIKE '%timeout%'")

按名称过滤

python
# Exact match
mlflow.search_traces(filter_string="trace.name = 'predict'")

# Pattern matching with LIKE
mlflow.search_traces(filter_string="trace.name LIKE '%inference%'")

# Case-insensitive pattern matching with ILIKE
mlflow.search_traces(filter_string="trace.name ILIKE '%PREDICT%'")

# Regular expression matching with RLIKE
mlflow.search_traces(filter_string="trace.name RLIKE '^(predict|inference)_[0-9]+'")

按状态过滤

python
# Get successful traces
mlflow.search_traces(filter_string="trace.status = 'OK'")

# Get failed traces
mlflow.search_traces(filter_string="trace.status = 'ERROR'")

# Get in-progress traces
mlflow.search_traces(filter_string="trace.status != 'OK'")

按执行时间过滤

python
# Find slow traces (> 1 second)
mlflow.search_traces(filter_string="trace.execution_time_ms > 1000")

# Performance range
mlflow.search_traces(
filter_string="trace.execution_time_ms >= 200 AND trace.execution_time_ms <= 800"
)

# Equal to specific duration
mlflow.search_traces(filter_string="trace.execution_time_ms = 500")

按时间戳过滤

python
import time

# Get traces from last hour
timestamp = int(time.time() * 1000)
mlflow.search_traces(filter_string=f"trace.timestamp_ms > {timestamp - 3600000}")

# Exact timestamp match
mlflow.search_traces(filter_string=f"trace.timestamp_ms = {timestamp}")

# Timestamp range
mlflow.search_traces(
filter_string=f"trace.timestamp_ms >= {timestamp - 7200000} AND trace.timestamp_ms <= {timestamp - 3600000}"
)

按标签过滤

python
# Exact match
mlflow.search_traces(filter_string="tag.model_name = 'gpt-4'")

# Pattern matching with LIKE (case-sensitive)
mlflow.search_traces(filter_string="tag.model_name LIKE 'gpt-%'")

# Case-insensitive pattern matching with ILIKE
mlflow.search_traces(filter_string="tag.environment ILIKE '%prod%'")

# Regular expression matching with RLIKE
mlflow.search_traces(filter_string="tag.version RLIKE '^v[0-9]+\\.[0-9]+'")

按元数据过滤

python
# Exact match
mlflow.search_traces(filter_string="metadata.user_id = 'user_123'")

# Find traces where metadata key exists
mlflow.search_traces(filter_string="metadata.session_id IS NOT NULL")

# Find traces where metadata key is missing
mlflow.search_traces(filter_string="metadata.region IS NULL")

# Combine null checks with other filters
mlflow.search_traces(
filter_string="metadata.region IS NOT NULL AND metadata.env = 'production'"
)

按运行关联过滤

python
# Find traces associated with a specific run
mlflow.search_traces(filter_string="trace.run_id = 'run_id_123456'")

按链接的提示过滤

python
# Find traces using a specific prompt version
mlflow.search_traces(filter_string='prompt = "qa-agent-system-prompt/4"')
注意

prompt 过滤器仅支持精确匹配 (=) 运算符,格式为 "name/version"

按跨度属性过滤

python
# Filter by span name
mlflow.search_traces(filter_string="span.name = 'llm_call'")

# Pattern matching on span name
mlflow.search_traces(filter_string="span.name LIKE '%embedding%'")

# Filter by span type
mlflow.search_traces(filter_string="span.type = 'LLM'")

# Filter by custom span attributes (requires wildcards with LIKE/ILIKE)
mlflow.search_traces(filter_string="span.attributes.model_version LIKE '%v2%'")
mlflow.search_traces(filter_string="span.attributes.temperature LIKE '%0.7%'")
mlflow.search_traces(filter_string="span.attributes.model_version ILIKE '%V2%'")

按反馈过滤

python
# Filter by feedback ratings
mlflow.search_traces(filter_string="feedback.rating = 'positive'")

# Pattern matching on feedback
mlflow.search_traces(filter_string="feedback.user_comment LIKE '%helpful%'")

按期望过滤

python
# Filter by expectation values
mlflow.search_traces(filter_string="expectation.accuracy = 'high'")

# Pattern matching on expectations
mlflow.search_traces(filter_string="expectation.label ILIKE '%success%'")

按结束时间过滤

python
import time

# Get traces that completed in the last hour
end_time = int(time.time() * 1000)
mlflow.search_traces(filter_string=f"trace.end_time_ms > {end_time - 3600000}")

# Find traces that ended within a specific time range
mlflow.search_traces(
filter_string=f"trace.end_time_ms >= {end_time - 7200000} AND trace.end_time_ms <= {end_time - 3600000}"
)

组合多个条件

python
# Complex query with tags and status
mlflow.search_traces(filter_string="trace.status = 'OK' AND tag.importance = 'high'")

# Production error analysis with execution time
mlflow.search_traces(
filter_string="""
tag.environment = 'production'
AND trace.status = 'ERROR'
AND trace.execution_time_ms > 500
"""
)

# Advanced query with span attributes and feedback
mlflow.search_traces(
filter_string="""
span.name LIKE '%llm%'
AND feedback.rating = 'positive'
AND trace.execution_time_ms < 1000
"""
)

# Search with pattern matching and time range
mlflow.search_traces(
filter_string="""
trace.name ILIKE '%inference%'
AND trace.timestamp_ms > 1700000000000
AND span.attributes.model_version LIKE '%v2%'
"""
)

使用 Python 进行编程搜索

mlflow.search_traces() 提供方便的跟踪搜索功能

python
import mlflow

# Basic search with default DataFrame output
traces_df = mlflow.search_traces(filter_string="trace.status = 'OK'")

# Return as list of Trace objects
traces_list = mlflow.search_traces(
filter_string="trace.status = 'OK'", return_type="list"
)
注意

return_type 参数在 MLflow 2.21.1+ 中可用。对于旧版本,请使用 mlflow.client.MlflowClient.search_traces() 以获得列表输出。

返回格式

1. DataFrame

search_traces API 默认返回一个 pandas DataFrame,其中包含以下列:

  • trace_id - 主标识符
  • trace - 跟踪对象
  • client_request_id - 客户端请求 ID
  • state - 跟踪状态 (OK, ERROR, IN_PROGRESS, STATE_UNSPECIFIED)
  • request_time - 开始时间(以毫秒为单位)
  • execution_duration - 持续时间(以毫秒为单位)
  • inputs - 输入到被跟踪逻辑的内容
  • outputs - 被跟踪逻辑的输出
  • expectations - 在跟踪上注明的地面实况标签的字典
  • trace_metadata - 键值元数据
  • tags - 关联的标签
  • assessments - 附加到跟踪上的一系列评估对象

2. 跟踪对象列表

或者,您可以指定 return_type="list" 以获得 mlflow.entities.Trace() 对象的列表而不是 DataFrame。

python
traces = mlflow.search_traces(filter_string="trace.status = 'OK'", return_type="list")
# list[mlflow.entities.Trace]

排序结果

MLflow 支持按以下键排序结果:

  • timestamp_ms (默认: DESC) - 跟踪开始时间
  • execution_time_ms - 跟踪持续时间
  • status - 跟踪执行状态
  • request_id - 跟踪标识符
python
# Order by timestamp (most recent first)
traces = mlflow.search_traces(order_by=["timestamp_ms DESC"])

# Multiple ordering criteria
traces = mlflow.search_traces(order_by=["timestamp_ms DESC", "status ASC"])

分页

mlflow.client.MlflowClient.search_traces() 支持分页

python
from mlflow import MlflowClient

client = MlflowClient()
page_token = None
all_traces = []

while True:
results = client.search_traces(
experiment_ids=["1"],
filter_string="status = 'OK'",
max_results=100,
page_token=page_token,
)

all_traces.extend(results)

if not results.token:
break
page_token = results.token

print(f"Found {len(all_traces)} total traces")

重要说明

MLflow 版本兼容性

MLflow 3 中的架构更改

DataFrame 架构:格式取决于用于调用 search_traces API 的 MLflow 版本,而不是用于记录跟踪的版本。MLflow 3.x 使用的列名与 2.x 不同。

性能提示

  1. 使用时间戳过滤器来限制搜索空间
  2. 限制 max_results 以在排序时加快查询速度
  3. 对大型结果集使用分页

后端注意事项

  • SQL 存储后端:支持上述记录的全部搜索语法,包括
    • 所有跟踪、跨度、元数据、标签、反馈和期望过滤器
    • 模式匹配运算符 (LIKE, ILIKE, RLIKE)
    • 使用 trace.text 进行全文搜索
    • 对时间戳进行适当索引以优化性能
  • 本地文件存储:搜索功能有限。处理大型数据集时可能较慢。不推荐,仅适用于存储少量跟踪。