追踪 DSPy🧩

DSPy 是一个用于构建模块化 AI 系统的开源框架，它提供了优化其提示和权重的算法。

MLflow Tracing 为 DSPy 提供了自动追踪能力。您可以通过调用 mlflow.dspy.autolog() 函数来启用 DSPy 的追踪。在调用 DSPy 模块时，嵌套的追踪会自动记录到当前的 MLflow 实验中。

python
import mlflow

mlflow.dspy.autolog()

提示

MLflow 对 DSPy 的集成不仅仅限于追踪。MLflow 为 DSPy 提供了完整的追踪体验，包括模型追踪、索引管理和评估。请参阅 MLflow DSPy Flavor 了解更多信息！

示例用法

python
import dspy
import mlflow

# Enabling tracing for DSPy
mlflow.dspy.autolog()

# Optional: Set a tracking URI and an experiment
mlflow.set_tracking_uri("https://:5000")
mlflow.set_experiment("DSPy")

# Define a simple ChainOfThought model and run it
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)


# Define a simple summarizer model and run it
class SummarizeSignature(dspy.Signature):
    """Given a passage, generate a summary."""

    passage: str = dspy.InputField(desc="a passage to summarize")
    summary: str = dspy.OutputField(desc="a one-line summary of the passage")


class Summarize(dspy.Module):
    def __init__(self):
        self.summarize = dspy.ChainOfThought(SummarizeSignature)

    def forward(self, passage: str):
        return self.summarize(passage=passage)


summarizer = Summarize()
summarizer(
    passage=(
        "MLflow Tracing is a feature that enhances LLM observability in your Generative AI (GenAI) applications "
        "by capturing detailed information about the execution of your application's services. Tracing provides "
        "a way to record the inputs, outputs, and metadata associated with each intermediate step of a request, "
        "enabling you to easily pinpoint the source of bugs and unexpected behaviors."
    )
)

评估期间的追踪

评估 DSPy 模型是 AI 系统开发中的重要一步。MLflow Tracing 可以通过为每个输入提供程序执行的详细信息，帮助您追踪评估后程序的性能。

当为 DSPy 启用 MLflow 自动追踪时，当您执行 DSPy 的内置评估套件时，将自动生成追踪。以下示例演示了如何运行评估并在 MLflow 中查看追踪。

python
import dspy
from dspy.evaluate.metrics import answer_exact_match

import mlflow

# Enabling tracing for DSPy evaluation
mlflow.dspy.autolog(log_traces_from_eval=True)

# Define a simple evaluation set
eval_set = [
    dspy.Example(
        question="How many 'r's are in the word 'strawberry'?", answer="3"
    ).with_inputs("question"),
    dspy.Example(
        question="How many 'a's are in the word 'banana'?", answer="3"
    ).with_inputs("question"),
    dspy.Example(
        question="How many 'e's are in the word 'elephant'?", answer="2"
    ).with_inputs("question"),
]


# Define a program
class Counter(dspy.Signature):
    question: str = dspy.InputField()
    answer: str = dspy.OutputField(
        desc="Should only contain a single number as an answer"
    )


cot = dspy.ChainOfThought(Counter)

# Evaluate the programs
with mlflow.start_run(run_name="CoT Evaluation"):
    evaluator = dspy.evaluate.Evaluate(
        devset=eval_set,
        return_all_scores=True,
        return_outputs=True,
        show_progress=True,
    )
    aggregated_score, outputs, all_scores = evaluator(cot, metric=answer_exact_match)

    # Log the aggregated score
    mlflow.log_metric("exact_match", aggregated_score)
    # Log the detailed evaluation results as a table
    mlflow.log_table(
        {
            "question": [example.question for example in eval_set],
            "answer": [example.answer for example in eval_set],
            "output": outputs,
            "exact_match": all_scores,
        },
        artifact_file="eval_results.json",
    )

如果您打开 MLflow UI 并转到“CoT Evaluation”运行，您将在“Traces”选项卡中看到评估结果，以及在评估期间生成的追踪列表。

注意

您可以通过调用 mlflow.dspy.autolog() 函数并将 log_traces_from_eval 参数设置为 False 来禁用这些步骤的追踪。

编译（优化）期间的追踪

编译（优化）是 DSPy 的核心概念。通过编译，DSPy 自动优化您的 DSPy 程序的提示和权重，以达到最佳性能。

默认情况下，MLflow 在编译期间 **不** 生成追踪，因为编译可能会触发数百或数千次 DSPy 模块的调用。要启用编译的追踪，您可以调用 mlflow.dspy.autolog() 函数，并将 log_traces_from_compile 参数设置为 True。

python
import dspy
import mlflow

# Enable auto-tracing for compilation
mlflow.dspy.autolog(log_traces_from_compile=True)

# Optimize the DSPy program as usual
tp = dspy.MIPROv2(metric=metric, auto="medium", num_threads=24)
optimized = tp.compile(cot, trainset=trainset)

Token 用量

MLflow >= 3.5.0 支持 dspy 的 token 用量追踪。token 用量调用将记录在 mlflow.chat.tokenUsage 属性中。整个追踪过程的总 token 用量可在追踪信息对象的 token_usage 字段中找到。

python
import dspy
import mlflow

mlflow.dspy.autolog()

dspy.settings.configure(lm=dspy.LM("openai/gpt-4o-mini"))

task = dspy.Predict("instruction -> response")
result = task(instruction="Translate 'hello' to French.")

last_trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id=last_trace_id)

# Print the token usage
total_usage = trace.info.token_usage
print("== Total token usage: ==")
print(f"  Input tokens: {total_usage['input_tokens']}")
print(f"  Output tokens: {total_usage['output_tokens']}")
print(f"  Total tokens: {total_usage['total_tokens']}")

# Print the token usage for each LLM call
print("\n== Detailed usage for each LLM call: ==")
for span in trace.data.spans:
    if usage := span.get_attribute("mlflow.chat.tokenUsage"):
        print(f"{span.name}:")
        print(f"  Input tokens: {usage['input_tokens']}")
        print(f"  Output tokens: {usage['output_tokens']}")
        print(f"  Total tokens: {usage['total_tokens']}")

bash
== Total token usage: ==
  Input tokens: 143
  Output tokens: 12
  Total tokens: 155

== Detailed usage for each LLM call: ==
LM.__call__:
  Input tokens: 143
  Output tokens: 12
  Total tokens: 155

禁用自动跟踪

可以通过调用 mlflow.dspy.autolog(disable=True) 或 mlflow.autolog(disable=True) 来全局禁用 DSPy 的自动追踪。

示例用法​

评估期间的追踪​

编译（优化）期间的追踪​

Token 用量​

禁用自动跟踪​

示例用法

评估期间的追踪

编译（优化）期间的追踪

Token 用量

禁用自动跟踪