追踪 LlamaIndex🦙

LlamaIndex Tracing via autolog

LlamaIndex 是一个开源框架，用于构建支持大型语言模型处理任何格式数据的代理式生成式 AI 应用程序。

MLflow Tracing 为 LlamaIndex 提供了自动追踪功能。您可以通过调用 mlflow.llama_index.autolog() 函数来启用 LlamaIndex 的追踪。在调用 LlamaIndex 引擎和工作流时，嵌套的追踪会自动记录到当前 MLflow 实验中。

import mlflow

mlflow.llama_index.autolog()

提示

MLflow LlamaIndex 集成不仅仅是追踪。MLflow 为 LlamaIndex 提供了完整的追踪体验，包括模型追踪、索引管理和评估。请查看 **MLflow LlamaIndex Flavor** 以了解更多信息！

示例用法

首先，我们下载一些测试数据来创建一个玩具索引。

!mkdir -p data
!curl -L https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt -o ./data/paul_graham_essay.txt

将它们加载到一个简单的内存向量索引中。

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

现在您可以启用 LlamaIndex 自动追踪并开始查询索引了。

import mlflow

# Enabling tracing for LlamaIndex
mlflow.llama_index.autolog()

# Optional: Set a tracking URI and an experiment
mlflow.set_tracking_uri("https://:5000")
mlflow.set_experiment("LlamaIndex")

# Query the index
query_engine = index.as_query_engine()
response = query_engine.query("What was the first program the author wrote?")

Token 用量

MLflow >= 3.2.0 支持 LlamaIndex 的 token 用量追踪。每次 LLM 调用使用的 token 用量将记录在 mlflow.chat.tokenUsage 属性中。整个追踪过程中的 token 总用量可在追踪信息对象的 token_usage 字段中找到。

import json
import mlflow
from llama_index.llms.openai import OpenAI

mlflow.llama_index.autolog()

# Use the chat complete method to create new chat.
llm = OpenAI(model="gpt-3.5-turbo")
Settings.llm = llm
response = llm.chat(
    [ChatMessage(role="user", content="What is the capital of France?")]
)

# Get the trace object just created
last_trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id=last_trace_id)

# Print the token usage
total_usage = trace.info.token_usage
print("== Total token usage: ==")
print(f"  Input tokens: {total_usage['input_tokens']}")
print(f"  Output tokens: {total_usage['output_tokens']}")
print(f"  Total tokens: {total_usage['total_tokens']}")

# Print the token usage for each LLM call
print("\n== Detailed usage for each LLM call: ==")
for span in trace.data.spans:
    if usage := span.get_attribute("mlflow.chat.tokenUsage"):
        print(f"{span.name}:")
        print(f"  Input tokens: {usage['input_tokens']}")
        print(f"  Output tokens: {usage['output_tokens']}")
        print(f"  Total tokens: {usage['total_tokens']}")

== Total token usage: ==
  Input tokens: 14
  Output tokens: 7
  Total tokens: 21

== Detailed usage for each LLM call: ==
OpenAI.chat:
  Input tokens: 14
  Output tokens: 7
  Total tokens: 21

LlamaIndex 工作流

Workflow 是 LlamaIndex 的下一代 GenAI 编排框架。它被设计为一个灵活且可解释的框架，用于构建任意的 LLM 应用程序，例如代理、RAG 流、数据提取管道等。MLflow 支持对 Workflow 对象进行追踪、评估和记录，使其更易于观察和维护。

LlamaIndex 工作流的自动追踪可以通过调用相同的 mlflow.llama_index.autolog() 来开箱即用。

要了解更多关于 MLflow 与 LlamaIndex Workflow 集成的详细信息，请继续以下教程。

使用 MLflow 和 LlamaIndex 工作流构建高级 RAG

禁用自动跟踪

可以通过调用 mlflow.llama_index.autolog(disable=True) 或 mlflow.autolog(disable=True) 来全局禁用 LlamaIndex 的自动追踪。

示例用法​

Token 用量​

LlamaIndex 工作流​

禁用自动跟踪​

示例用法

Token 用量

LlamaIndex 工作流

禁用自动跟踪