GenAI Agent 快速入门

使用 MLflow 全面的跟踪和评估功能构建和评估基于 LangChain 的聊天机器人。本快速入门演示了使用 MLflow 3 的 GenAI 功能进行提示工程、跟踪生成和性能评估。

先决条件

安装所需软件包

需要 MLflow 3

此快速入门需要 MLflow 3.0 或更高版本才能获得完整的 GenAI 功能。

pip install --upgrade mlflow
pip install langchain-openai

设置 OpenAI API 密钥

配置您的 OpenAI API 密钥以向 OpenAI 服务进行身份验证

export OPENAI_API_KEY=your_api_key_here

概述

在本快速入门中，您将学习如何

注册并进行版本控制提示模板
创建基于 LangChain 的对话代理
启用自动跟踪日志记录以进行调试
使用自定义指标评估模型性能

让我们构建一个简单的 IT 支持聊天机器人，并使用 MLflow 跟踪其开发生命周期。

步骤 1：注册提示模板

首先创建一个版本化的提示模板。这使您可以跟踪提示的演变并确保跨实验的可重复性。

import mlflow

system_prompt = mlflow.genai.register_prompt(
    name="chatbot_prompt",
    template="You are a chatbot that can answer questions about IT. Answer this question: {{question}}",
    commit_message="Initial version of chatbot",
)

在 MLflow UI 中查看您的提示

导航到提示选项卡以查看您已注册的提示

The MLflow UI showing a prompt
version

步骤 2：构建 LangChain 对话链

创建一个简单的链，将您的提示模板与 OpenAI 的聊天模型相结合

from langchain.schema.output_parser import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

# Convert MLflow prompt to LangChain format
prompt = ChatPromptTemplate.from_template(system_prompt.to_single_brace_format())

# Build the chain: prompt → LLM → output parser
chain = prompt | ChatOpenAI(temperature=0.7) | StrOutputParser()

# Test the chain
question = "What is MLflow?"
print(chain.invoke({"question": question}))
# MLflow is an open-source platform for managing the end-to-end machine learning lifecycle...

步骤 3：启用跟踪可观察性

设置自动跟踪日志记录以在开发过程中监控模型的行为。这会创建所有模型交互的链接历史记录。

配置活动模型和自动日志记录

# Set the active model for linking traces
mlflow.set_active_model(name="langchain_model")

# Enable autologging - all traces will be automatically linked to the active model
mlflow.langchain.autolog()

生成测试跟踪

运行多个查询以生成用于分析的跟踪

questions = [
    {"question": "What is MLflow Tracking and how does it work?"},
    {"question": "What is Unity Catalog?"},
    {"question": "What are user-defined functions (UDFs)?"},
]
outputs = []

for question in questions:
    outputs.append(chain.invoke(question))

# Verify traces are linked to the active model
active_model_id = mlflow.get_active_model_id()
mlflow.search_traces(model_id=active_model_id)

在 UI 中浏览跟踪

查看已记录的模型：检查实验中的模型选项卡

The MLflow UI showing the logged models in an
experiment

访问模型详细信息：单击您的模型以查看其唯一的 model_id

The MLflow UI showing the logged model details
page

分析生成的跟踪：导航到跟踪选项卡以检查单独的交互

The MLflow UI showing the logged model autolog traces
lineage

步骤 4：评估模型性能

使用 MLflow 的评估框架来评估您的聊天机器人针对预期响应的准确性和相关性。

使用自定义指标运行评估

使用 GenAI 特定的指标评估您的模型

eval_df = pd.DataFrame(
    {
        "inputs": questions,
        "expected_response": [
            """MLflow Tracking is a key component of the MLflow platform designed to record and manage machine learning experiments. It enables data scientists and engineers to log parameters, code versions, metrics, and artifacts in a systematic way, facilitating experiment tracking and reproducibility.

How It Works:

At the heart of MLflow Tracking is the concept of a run, which is an execution of a machine learning code. Each run can log the following:

Parameters: Input variables or hyperparameters used in the model (e.g., learning rate, number of trees). Metrics: Quantitative measures to evaluate the model's performance (e.g., accuracy, loss). Artifacts: Output files like models, datasets, or images generated during the run. Source Code: The version of the code or Git commit hash used. These logs are stored in a tracking server, which can be set up locally or on a remote server. The tracking server uses a backend storage (like a database or file system) to keep a record of all runs and their associated data.

Users interact with MLflow Tracking through its APIs available in multiple languages (Python, R, Java, etc.). By invoking these APIs in the code, you can start and end runs, and log data as the experiment progresses. Additionally, MLflow offers autologging capabilities for popular machine learning libraries, automatically capturing relevant parameters and metrics without manual code changes.

The logged data can be visualized using the MLflow UI, a web-based interface that displays all experiments and runs. This UI allows you to compare runs side-by-side, filter results, and analyze performance metrics over time. It aids in identifying the best models and understanding the impact of different parameters.

By providing a structured way to record experiments, MLflow Tracking enhances collaboration among team members, ensures transparency, and makes it easier to reproduce results. It integrates seamlessly with other MLflow components like Projects and Model Registry, offering a comprehensive solution for managing the machine learning lifecycle.""",
            """Unity Catalog is a feature in Databricks that allows you to create a centralized inventory of your data assets, such as tables, views, and functions, and share them across different teams and projects. It enables easy discovery, collaboration, and reuse of data assets within your organization.

With Unity Catalog, you can:

1. Create a single source of truth for your data assets: Unity Catalog acts as a central repository of all your data assets, making it easier to find and access the data you need.
2. Improve collaboration: By providing a shared inventory of data assets, Unity Catalog enables data scientists, engineers, and other stakeholders to collaborate more effectively.
3. Foster reuse of data assets: Unity Catalog encourages the reuse of existing data assets, reducing the need to create new assets from scratch and improving overall efficiency.
4. Enhance data governance: Unity Catalog provides a clear view of data assets, enabling better data governance and compliance.

Unity Catalog is particularly useful in large organizations where data is scattered across different teams, projects, and environments. It helps create a unified view of data assets, making it easier to work with data across different teams and projects.""",
            """User-defined functions (UDFs) in the context of Databricks and Apache Spark are custom functions that you can create to perform specific tasks on your data. These functions are written in a programming language such as Python, Java, Scala, or SQL, and can be used to extend the built-in functionality of Spark.

UDFs can be used to perform complex data transformations, data cleaning, or to apply custom business logic to your data. Once defined, UDFs can be invoked in SQL queries or in DataFrame transformations, allowing you to reuse your custom logic across multiple queries and applications.

To use UDFs in Databricks, you first need to define them in a supported programming language, and then register them with the SparkSession. Once registered, UDFs can be used in SQL queries or DataFrame transformations like any other built-in function.""",
        ],
        "outputs": outputs,
    }
)

from mlflow.genai.scorers import Correctness, RelevanceToQuery, Guidelines

# Run evaluation with GenAI metrics
result = mlflow.genai.evaluate(
    data=eval_df,
    scorers=[
        Correctness(),
        RelevanceToQuery(),
    ],
)

# View evaluation results
result.tables["eval_results"]

分析评估结果

评估会生成带有理由的详细指标

The MLflow UI showing the evaluate run
metrics

总结

您已成功

✅ 创建了一个版本化的提示模板，以实现可重复性
✅ 使用 OpenAI 构建了一个 LangChain 对话代理
✅ 启用自动跟踪日志记录以实现完全的可观察性
✅ 使用 GenAI 特定的指标评估了模型性能

后续步骤

使用提示进行实验：尝试不同的提示模板并比较它们的性能
添加自定义指标：为您的用例创建特定于领域的评估指标
部署您的模型：使用 MLflow 的部署功能来服务您的聊天机器人
扩展评估：对更大的数据集运行评估以确保稳健性

先决条件​

安装所需软件包​

设置 OpenAI API 密钥​

概述​

步骤 1：注册提示模板​

在 MLflow UI 中查看您的提示​

步骤 2：构建 LangChain 对话链​

步骤 3：启用跟踪可观察性​

配置活动模型和自动日志记录​

生成测试跟踪​

在 UI 中浏览跟踪​

步骤 4：评估模型性能​

使用自定义指标运行评估​

分析评估结果​

总结​

后续步骤​

其他资源​