MLflow Langchain 自动日志记录

MLflow LangChain Flavor 支持自动日志记录，这是一项强大的功能，允许您记录 LangChain 模型和执行的关键详细信息，而无需显式的日志记录语句。MLflow LangChain 自动日志记录涵盖模型的各个方面，包括追踪、模型、签名等。

注意

经证实，MLflow LangChain 自动日志记录与 LangChain 版本 0.1.0 至 0.2.3 兼容。在此范围之外，该功能可能无法按预期工作。要安装兼容的 LangChain 版本，请运行以下命令

pip install mlflow[langchain] --upgrade

快速入门

要为 LangChain 模型启用自动日志记录，请在脚本或 Notebook 的开头调用 mlflow.langchain.autolog()。默认情况下，这将自动记录追踪，如果您显式启用，还将记录模型、输入示例和模型签名等其他工件。有关配置的更多信息，请参阅配置自动日志记录部分。

import mlflow

mlflow.langchain.autolog()

# Enable other optional logging
# mlflow.langchain.autolog(log_models=True, log_input_examples=True)

# Your LangChain model code here
...

调用链后，您可以在 MLflow UI 中查看记录的追踪和工件。

LangChain Tracing via autolog

配置自动日志记录

MLflow LangChain 自动日志记录可以记录有关模型及其推理的各种信息。**默认情况下，仅启用追踪日志记录**，但您可以在调用 mlflow.langchain.autolog() 时设置相应的参数来启用其他信息的自动日志记录。有关其他配置，请参阅 API 文档。

目标	默认值	参数	描述
追踪	`true`	`log_traces`	是否为模型生成和记录追踪。有关追踪功能的更多详细信息，请参阅 MLflow 追踪。
模型工件	`false`	`log_models`	如果设置为 `True`，将在调用 LangChain 模型时记录模型。支持的模型包括 `Chain`、`AgentExecutor`、`BaseRetriever`、`SimpleChatModel`、`ChatPromptTemplate` 以及 `Runnable` 类型的一部分。请参阅MLflow 存储库以获取支持模型的完整列表。
模型签名	`false`	`log_model_signatures`	如果设置为 `True`，将在推理期间收集并与 Langchain 模型工件一起记录描述模型输入和输出的 ModelSignatures。此选项仅在启用 `log_models` 时可用。
输入示例	`false`	`log_input_examples`	如果设置为 `True`，将在推理期间收集并与 LangChain 模型工件一起记录推理数据中的输入示例。此选项仅在启用 `log_models` 时可用。

例如，要禁用追踪日志记录，并改为启用模型日志记录，请运行以下代码

import mlflow

mlflow.langchain.autolog(
    log_traces=False,
    log_models=True,
)

注意

MLflow 不支持包含检索器的链的自动模型日志记录。保存检索器需要额外的 loader_fn 和 persist_dir 信息才能加载模型。如果您想记录包含检索器的模型，请手动记录模型，如retriever_chain 示例中所示。

LangChain 自动日志记录示例代码

import os
from operator import itemgetter

from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnableLambda

import mlflow

# Uncomment the following to use the full abilities of langchain autologgin
# %pip install `langchain_community>=0.0.16`
# These two libraries enable autologging to log text analysis related artifacts
# %pip install textstat spacy

assert (
    "OPENAI_API_KEY" in os.environ
), "Please set the OPENAI_API_KEY environment variable."

# Enable mlflow langchain autologging
# Note: We only support auto-logging models that do not contain retrievers
mlflow.langchain.autolog(
    log_input_examples=True,
    log_model_signatures=True,
    log_models=True,
    registered_model_name="lc_model",
)

prompt_with_history_str = """
Here is a history between you and a human: {chat_history}
Now, please answer this question: {question}
"""
prompt_with_history = PromptTemplate(
    input_variables=["chat_history", "question"], template=prompt_with_history_str
)


def extract_question(input):
    return input[-1]["content"]


def extract_history(input):
    return input[:-1]


llm = OpenAI(temperature=0.9)

# Build a chain with LCEL
chain_with_history = (
    {
        "question": itemgetter("messages") | RunnableLambda(extract_question),
        "chat_history": itemgetter("messages") | RunnableLambda(extract_history),
    }
    | prompt_with_history
    | llm
    | StrOutputParser()
)

inputs = {"messages": [{"role": "user", "content": "Who owns MLflow?"}]}

print(chain_with_history.invoke(inputs))
# sample output:
# "1. Databricks\n2. Microsoft\n3. Google\n4. Amazon\n\nEnter your answer: 1\n\n
# Correct! MLflow is an open source project developed by Databricks. ...

# We automatically log the model and trace related artifacts
# A model with name `lc_model` is registered, we can load it back as a PyFunc model
model_name = "lc_model"
model_version = 1
loaded_model = mlflow.pyfunc.load_model(f"models:/{model_name}/{model_version}")
print(loaded_model.predict(inputs))

追踪 LangGraph

MLflow 支持 LangGraph 的自动追踪，LangGraph 是 LangChain 推出的一个开源库，用于使用 LLM 构建有状态的多代理应用，常用于创建代理和多代理工作流。要为 LangGraph 启用自动追踪，请使用相同的 mlflow.langchain.autolog() 函数。

from typing import Literal

import mlflow

from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

# Enabling tracing for LangGraph (LangChain)
mlflow.langchain.autolog()

# Optional: Set a tracking URI and an experiment
mlflow.set_tracking_uri("https://:5000")
mlflow.set_experiment("LangGraph")


@tool
def get_weather(city: Literal["nyc", "sf"]):
    """Use this to get weather information."""
    if city == "nyc":
        return "It might be cloudy in nyc"
    elif city == "sf":
        return "It's always sunny in sf"


llm = ChatOpenAI(model="gpt-4o-mini")
tools = [get_weather]
graph = create_react_agent(llm, tools)

# Invoke the graph
result = graph.invoke(
    {"messages": [{"role": "user", "content": "what is the weather in sf?"}]}
)

注意

MLflow 不支持 LangGraph 的其他自动日志记录功能，例如自动模型日志记录。对于 LangGraph，仅记录追踪。

工作原理

MLflow LangChain 自动日志记录使用两种方式记录追踪和其他工件。追踪通过 LangChain 的 Callbacks 框架实现。其他工件通过修改支持模型的调用函数来记录。在典型场景下，您无需关心内部实现细节，但本节简要概述了其底层工作原理。

MLflow 追踪回调

MlflowLangchainTracer 是一个回调处理器，被注入到 langchain 模型推理过程中，用于自动记录追踪。它在链的一系列操作（如 on_chain_start、on_llm_start）开始时启动一个新的 Span，并在操作完成后结束。各种元数据（如 Span 类型、操作名称、输入、输出、延迟）会自动记录到 Span 中。

自定义回调

有时您可能希望自定义在追踪中记录哪些信息。您可以通过创建一个继承自 MlflowLangchainTracer 的自定义回调处理器来实现这一点。以下示例演示了如何在聊天模型开始运行时向 Span 记录额外的属性。

from mlflow.langchain.langchain_tracer import MlflowLangchainTracer


class CustomLangchainTracer(MlflowLangchainTracer):
    # Override the handler functions to customize the behavior. The method signature is defined by LangChain Callbacks.
    def on_chat_model_start(
        self,
        serialized: Dict[str, Any],
        messages: List[List[BaseMessage]],
        *,
        run_id: UUID,
        tags: Optional[List[str]] = None,
        parent_run_id: Optional[UUID] = None,
        metadata: Optional[Dict[str, Any]] = None,
        name: Optional[str] = None,
        **kwargs: Any,
    ):
        """Run when a chat model starts running."""
        attributes = {
            **kwargs,
            **metadata,
            # Add additional attribute to the span
            "version": "1.0.0",
        }

        # Call the _start_span method at the end of the handler function to start a new span.
        self._start_span(
            span_name=name or self._assign_span_name(serialized, "chat model"),
            parent_run_id=parent_run_id,
            span_type=SpanType.CHAT_MODEL,
            run_id=run_id,
            inputs=messages,
            attributes=kwargs,
        )

修改函数以记录工件

其他工件（如模型）通过修改支持模型的调用函数来插入日志记录调用进行记录。MLflow 修改以下函数

invoke
batch
stream
get_relevant_documents（针对检索器）
__call__（针对 Chain 和 AgentExecutor）
ainvoke
abatch
astream

警告

MLflow 支持异步函数（例如 ainvoke、abatch、astream）的自动日志记录，但是，日志记录操作不是异步的，可能会阻塞主线程。调用函数本身仍然是非阻塞的，并返回一个协程对象，但日志记录开销可能会减慢模型推理过程。请在使用异步函数进行自动日志记录时注意此副作用。

常见问题

如果您在使用 MLflow LangChain Flavor 时遇到任何问题，请参阅常见问题。如果您仍有问题，请随时在MLflow Github 存储库中提交一个 Issue。

如何在自动日志记录期间抑制警告消息？

MLflow Langchain 自动日志记录在底层调用了各种日志记录函数和 LangChain 实用工具。其中一些可能会生成对自动日志记录过程不关键的警告消息。如果您想抑制这些警告消息，请向 mlflow.langchain.autolog() 函数传递 silent=True 参数。

import mlflow

mlflow.langchain.autolog(silent=True)

# No warning messages will be emitted from autologging

我无法加载由 mlflow langchain 自动日志记录记录的模型

有几种类型的模型，MLflow LangChain 自动日志记录不支持原生保存或加载。

模型包含 langchain 检索器

MLflow 自动日志记录不支持 LangChain 检索器。如果您的模型包含检索器，您需要使用 mlflow.langchain.log_model API 手动记录模型。由于加载这些模型需要指定 loader_fn 和 persist_dir 参数，请查看 retriever_chain 中的示例。
无法 Pickle 特定对象

对于某些 LangChain 不支持原生保存或加载的模型，我们会在保存时 Pickle 对象。由于此功能，您的 cloudpickle 版本在保存和加载环境之间必须一致，以确保对象引用正确解析。为进一步保证对象表示的正确性，您应确保您的环境中安装了至少版本 2 的 pydantic。

如何自定义追踪中的 Span 名称？

默认情况下，MLflow 会根据 LangChain 中的类名（例如 ChatOpenAI、RunnableLambda 等）创建 Span 名称。如果您想自定义 Span 名称，可以执行以下操作

将 name 参数传递给 LangChain 类的构造函数。这在您想为单个组件设置特定名称时非常有用。
使用 with_config 方法为 runnable 设置名称。您可以将 "run_name" 键传递给配置字典，以设置包含多个组件的子链的名称。

import mlflow
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

# Enable auto-tracing for LangChain
mlflow.langchain.autolog()

# Method 1: Pass `name` parameter to the constructor
model = ChatOpenAI(name="custom-llm", model="gpt-4o-mini")
# Method 2: Use `with_config` method to set the name for the runnables
runnable = (model | StrOutputParser()).with_config({"run_name": "custom-chain"})

runnable.invoke("Hi")

上述代码将创建一个类似于以下的追踪

Customize Span Names in LangChain Traces

如何向 Span 添加额外的元数据？

您可以通过将 LangChain 的 RunnableConfig 字典的 metadata 参数传递给构造函数或在运行时，向 Span 记录额外的元数据。

import mlflow
from langchain_openai import ChatOpenAI

# Enable auto-tracing for LangChain
mlflow.langchain.autolog()

# Pass metadata to the constructor using `with_config` method
model = ChatOpenAI(model="gpt-4o-mini").with_config({"metadata": {"key1": "value1"}})

# Pass metadata at runtime using the `config` parameter
model.invoke("Hi", config={"metadata": {"key2": "value2"}})

可以在 MLflow UI 的 Attributes（属性）选项卡中访问元数据。

快速入门​

配置自动日志记录​

LangChain 自动日志记录示例代码​

追踪 LangGraph​

工作原理​

MLflow 追踪回调​

自定义回调​

修改函数以记录工件​

常见问题​

如何在自动日志记录期间抑制警告消息？​

我无法加载由 mlflow langchain 自动日志记录记录的模型​

如何自定义追踪中的 Span 名称？​

如何向 Span 添加额外的元数据？​