跟踪 Haystack

Haystack 是由 deepset 开发的开源 AI 编排框架,旨在帮助 Python 开发人员构建可用于生产的 LLM 驱动应用程序。它具有模块化架构——围绕组件和管道构建,用于构建从检索增强生成 (RAG) 工作流到自主代理系统和可扩展搜索引擎的所有内容。
MLflow 跟踪在使用 Haystack 管道和组件时提供自动跟踪功能。当通过调用 mlflow.haystack.autolog() 函数启用 Haystack 自动跟踪时,Haystack 管道和组件的使用情况将在交互式开发期间自动记录生成的跟踪。
python
import mlflow
mlflow.haystack.autolog()
MLflow 跟踪自动捕获以下信息
- 管道和组件
- 延迟
- 有关添加的不同组件的元数据,例如工具名称
- Token 使用量和成本
- 缓存命中
- 如果抛出任何异常
基本示例
python
import mlflow
from haystack import Document, Pipeline
from haystack.components.builders.chat_prompt_builder import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.dataclasses import ChatMessage
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.utils import Secret
mlflow.haystack.autolog()
mlflow.set_experiment("Haystack Tracing")
# Write documents to InMemoryDocumentStore
document_store = InMemoryDocumentStore()
document_store.write_documents(
[
Document(content="My name is Jean and I live in Paris."),
Document(content="My name is Mark and I live in Berlin."),
Document(content="My name is Giorgio and I live in Rome."),
]
)
# Build a RAG pipeline
prompt_template = [
ChatMessage.from_system("You are a helpful assistant."),
ChatMessage.from_user(
"Given these documents, answer the question.\n"
"Documents:\n{% for doc in documents %}{{ doc.content }}{% endfor %}\n"
"Question: {{question}}\n"
"Answer:"
),
]
# Define required variables explicitly
prompt_builder = ChatPromptBuilder(
template=prompt_template, required_variables={"question", "documents"}
)
retriever = InMemoryBM25Retriever(document_store=document_store)
llm = OpenAIChatGenerator(api_key=Secret.from_env_var("OPENAI_API_KEY"))
rag_pipeline = Pipeline()
rag_pipeline.add_component("retriever", retriever)
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", llm)
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm.messages")
# Ask a question
question = "Who lives in Paris?"
results = rag_pipeline.run(
{
"retriever": {"query": question},
"prompt_builder": {"question": question},
}
)
print(results["llm"]["replies"])

Token 用量
MLflow >= 3.4.0 支持 Haystack 的 token 使用情况跟踪。每次 LLM 调用的 token 使用情况将记录在 mlflow.chat.tokenUsage 属性中。整个跟踪的总 token 使用情况可在跟踪信息对象的 token_usage 字段中找到。
python
question = "Who lives in Paris?"
results = rag_pipeline.run(
{
"retriever": {"query": question},
"prompt_builder": {"question": question},
}
)
print(results["llm"]["replies"])
last_trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id=last_trace_id)
# Print the token usage
total_usage = trace.info.token_usage
print("== Total token usage: ==")
print(f" Input tokens: {total_usage['input_tokens']}")
print(f" Output tokens: {total_usage['output_tokens']}")
print(f" Total tokens: {total_usage['total_tokens']}")
# Print the token usage for each LLM call
print("\n== Detailed usage for each LLM call: ==")
for span in trace.data.spans:
if usage := span.get_attribute("mlflow.chat.tokenUsage"):
print(f"{span.name}:")
print(f" Input tokens: {usage['input_tokens']}")
print(f" Output tokens: {usage['output_tokens']}")
print(f" Total tokens: {usage['total_tokens']}")
bash
== Total token usage: ==
Input tokens: 64
Output tokens: 5
Total tokens: 69
== Detailed usage for each LLM call: ==
OpenAIChatGenerator:
Input tokens: 64
Output tokens: 5
禁用自动跟踪
可以通过调用 mlflow.haystack.autolog(disable=True) 或 mlflow.autolog(disable=True) 在全局范围内禁用 Haystack 的自动跟踪。