跟踪 Ollama

Ollama Tracing via autolog

Ollama 是一个开源平台，用户可以在本地设备上运行大型语言模型（LLMs），例如 Llama 3.2、Gemma 2、Mistral、Code Llama 等。

由于 Ollama 提供的本地 LLM 端点与 OpenAI API 兼容，您可以通过 OpenAI SDK 查询它，并使用 mlflow.openai.autolog() 为 Ollama 启用跟踪。通过 Ollama 进行的任何 LLM 交互都将被记录到当前的 MLflow 实验中。

import mlflow

mlflow.openai.autolog()

示例用法

运行具有所需 LLM 模型的 Ollama 服务器。

ollama run llama3.2:1b

为 OpenAI SDK 启用自动跟踪。

import mlflow

# Enable auto-tracing for OpenAI
mlflow.openai.autolog()

# Optional: Set a tracking URI and an experiment
mlflow.set_tracking_uri("https://:5000")
mlflow.set_experiment("Ollama")

查询 LLM，并在 MLflow UI 中查看跟踪记录。

from openai import OpenAI

client = OpenAI(
    base_url="https://:11434/v1",  # The local Ollama REST endpoint
    api_key="dummy",  # Required to instantiate OpenAI client, it can be a random string
)

response = client.chat.completions.create(
    model="llama3.2:1b",
    messages=[
        {"role": "system", "content": "You are a science teacher."},
        {"role": "user", "content": "Why is the sky blue?"},
    ],
)

Token 用量

MLflow >= 3.2.0 支持通过 OpenAI SDK 提供的本地 LLM 端点的令牌使用情况跟踪。每次 LLM 调用使用的令牌将在 mlflow.chat.tokenUsage 属性中记录。整个跟踪过程的总令牌使用量可在跟踪信息对象的 token_usage 字段中找到。

import json
import mlflow

mlflow.openai.autolog()

from openai import OpenAI

client = OpenAI(
    base_url="https://:11434/v1",  # The local Ollama REST endpoint
    api_key="dummy",  # Required to instantiate OpenAI client, it can be a random string
)

response = client.chat.completions.create(
    model="llama3.2:1b",
    messages=[
        {"role": "system", "content": "You are a science teacher."},
        {"role": "user", "content": "Why is the sky blue?"},
    ],
)

# Get the trace object just created
last_trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id=last_trace_id)

# Print the token usage
total_usage = trace.info.token_usage
print("== Total token usage: ==")
print(f"  Input tokens: {total_usage['input_tokens']}")
print(f"  Output tokens: {total_usage['output_tokens']}")
print(f"  Total tokens: {total_usage['total_tokens']}")

# Print the token usage for each LLM call
print("\n== Detailed usage for each LLM call: ==")
for span in trace.data.spans:
    if usage := span.get_attribute("mlflow.chat.tokenUsage"):
        print(f"{span.name}:")
        print(f"  Input tokens: {usage['input_tokens']}")
        print(f"  Output tokens: {usage['output_tokens']}")
        print(f"  Total tokens: {usage['total_tokens']}")

== Total token usage: ==
  Input tokens: 23
  Output tokens: 194
  Total tokens: 217

== Detailed usage for each LLM call: ==
Completions:
  Input tokens: 23
  Output tokens: 194
  Total tokens: 217

禁用自动跟踪

可以通过调用 mlflow.openai.autolog(disable=True) 或 mlflow.autolog(disable=True) 来全局禁用 Ollama 的自动跟踪。

示例用法​

Token 用量​

禁用自动跟踪​

示例用法

Token 用量

禁用自动跟踪