LlamaIndex 与 MLflow 集成介绍

欢迎使用本交互式教程，旨在向您介绍 LlamaIndex 及其与 MLflow 的集成。本教程采用 Notebook 结构，旨在提供关于 LlamaIndex 最简单和核心功能的动手实践学习体验。

您将学到什么

在本教程结束时，您将：

在 LlamaIndex 中创建 MVP VectorStoreIndex。
将该索引记录到 MLflow 跟踪服务器。
将该索引注册到 MLflow 模型注册表。
加载模型并执行推理。
探索 MLflow UI 以了解记录的工件。

这些基础知识将帮助您熟悉 MLflow 中 LlamaIndex 的用户旅程。

设置

首先，我们必须确保拥有所需的依赖项和环境变量。默认情况下，LlamaIndex 使用 OpenAI 作为 LLM 和嵌入模型的来源，因此我们也这样做。让我们首先安装所需的库并提供 OpenAI API 密钥。

%pip install mlflow>=2.15 llama-index>=0.10.44 -q

Note: you may need to restart the kernel to use updated packages.

import os
from getpass import getpass

from llama_index.core import Document, VectorStoreIndex
from llama_index.core.llms import ChatMessage

import mlflow

os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")

assert "OPENAI_API_KEY" in os.environ, "Please set the OPENAI_API_KEY environment variable."

创建索引

向量存储索引是 LlamaIndex 中的核心组件之一。它们包含已摄取文档块的嵌入向量（有时也包含文档块本身）。这些向量支持各种类型的推理，例如查询引擎、聊天引擎和检索器，每种在 LlamaIndex 中都有不同的用途。

查询引擎
- 用法：执行直接查询以根据用户的提问检索相关信息。
- 场景：非常适合获取简洁的答案或匹配特定查询的文档，类似于搜索引擎。
聊天引擎
- 用法：参与需要跨多次交互维护上下文和历史记录的对话式 AI 任务。
- 场景：适用于需要对话上下文的交互式应用程序，例如客户支持机器人或虚拟助手。
检索器
- 用法：检索与给定输入在语义上相似的文档或文本片段。
- 场景：在检索增强生成 (RAG) 系统中非常有用，用于获取相关上下文或背景信息，从而提高摘要或问答等任务中生成响应的质量。

通过利用这些不同类型的推理，LlamaIndex 使您能够构建适用于各种用例的强大 AI 应用程序，增强用户与大型语言模型之间的交互。

print("------------- Example Document used to Enrich LLM Context -------------")
llama_index_example_document = Document.example()
print(llama_index_example_document)

index = VectorStoreIndex.from_documents([llama_index_example_document])

print("
------------- Example Query Engine -------------")
query_response = index.as_query_engine().query("What is llama_index?")
print(query_response)

print("
------------- Example Chat Engine  -------------")
chat_response = index.as_chat_engine().chat(
  "What is llama_index?",
  chat_history=[ChatMessage(role="system", content="You are an expert on RAG!")],
)
print(chat_response)


print("
------------- Example Retriever   -------------")
retriever_response = index.as_retriever().retrieve("What is llama_index?")
print(retriever_response)

------------- Example Document used to Enrich LLM Context -------------
Doc ID: e4c638ce-6757-482e-baed-096574550602
Text: Context LLMs are a phenomenal piece of technology for knowledge
generation and reasoning. They are pre-trained on large amounts of
publicly available data. How do we best augment LLMs with our own
private data? We need a comprehensive toolkit to help perform this
data augmentation for LLMs.  Proposed Solution That's where LlamaIndex
comes in. Ll...

------------- Example Query Engine -------------
LlamaIndex is a "data framework" designed to assist in building LLM apps by offering tools such as data connectors for various data sources, ways to structure data for easy use with LLMs, an advanced retrieval/query interface, and integrations with different application frameworks. It caters to both beginner and advanced users, providing a high-level API for simple data ingestion and querying, as well as lower-level APIs for customization and extension of different modules to suit individual needs.

------------- Example Chat Engine  -------------
LlamaIndex is a data framework designed to assist in building LLM apps by providing tools such as data connectors for various data sources, ways to structure data for easy use with LLMs, an advanced retrieval/query interface, and integrations with different application frameworks. It caters to both beginner and advanced users with a high-level API for easy data ingestion and querying, as well as lower-level APIs for customization and extension of different modules to suit specific needs.

------------- Example Retriever   -------------
[NodeWithScore(node=TextNode(id_='d18bb1f1-466a-443d-98d9-6217bf71ee5a', embedding=None, metadata={'filename': 'README.md', 'category': 'codebase'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='e4c638ce-6757-482e-baed-096574550602', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'filename': 'README.md', 'category': 'codebase'}, hash='3183371414f6a23e9a61e11b45ec45f808b148f9973166cfed62226e3505eb05')}, text='Context
LLMs are a phenomenal piece of technology for knowledge generation and reasoning.
They are pre-trained on large amounts of publicly available data.
How do we best augment LLMs with our own private data?
We need a comprehensive toolkit to help perform this data augmentation for LLMs.

Proposed Solution
That's where LlamaIndex comes in. LlamaIndex is a "data framework" to help
you build LLM  apps. It provides the following tools:

Offers data connectors to ingest your existing data sources and data formats
(APIs, PDFs, docs, SQL, etc.)
Provides ways to structure your data (indices, graphs) so that this data can be
easily used with LLMs.
Provides an advanced retrieval/query interface over your data:
Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.
Allows easy integrations with your outer application framework
(e.g. with LangChain, Flask, Docker, ChatGPT, anything else).
LlamaIndex provides tools for both beginner users and advanced users.
Our high-level API allows beginner users to use LlamaIndex to ingest and
query their data in 5 lines of code. Our lower-level APIs allow advanced users to
customize and extend any module (data connectors, indices, retrievers, query engines,
reranking modules), to fit their needs.', mimetype='text/plain', start_char_idx=1, end_char_idx=1279, text_template='{metadata_str}

{content}', metadata_template='{key}: {value}', metadata_seperator='
'), score=0.850998849877966)]

使用 MLflow 记录索引

以下代码使用 MLflow 记录 LlamaIndex 模型，使您能够在不同环境中持久化和管理它。通过使用 MLflow，您可以可靠地跟踪、版本化和重现您的模型。脚本会记录参数、示例输入，并在特定名称下注册模型。model_uri 提供了一个唯一的标识符，用于稍后检索模型。这种持久性对于确保开发、测试和生产中的一致性和可重现性至关重要。使用 MLflow 管理模型简化了加载、部署和共享，保持了有序的工作流程。

关键参数

engine_type：定义 pyfunc 和 spark_udf 推理类型
input_example：定义输入签名并通过预测推断输出签名
registered_model_name：定义模型在 MLflow 模型注册表中的名称

mlflow.llama_index.autolog()  # This is for enabling tracing

with mlflow.start_run() as run:
  mlflow.llama_index.log_model(
      index,
      artifact_path="llama_index",
      engine_type="query",  # Defines the pyfunc and spark_udf inference type
      input_example="hi",  # Infers signature
      registered_model_name="my_llama_index_vector_store",  # Stores an instance in the model registry
  )

  run_id = run.info.run_id
  model_uri = f"runs:/{run_id}/llama_index"
  print(f"Unique identifier for the model location for loading: {model_uri}")

2024/07/24 17:58:27 INFO mlflow.llama_index.serialize_objects: API key(s) will be removed from the global Settings object during serialization to protect against key leakage. At inference time, the key(s) must be passed as environment variables.
/Users/michael.berk/opt/anaconda3/envs/mlflow-dev/lib/python3.8/site-packages/_distutils_hack/__init__.py:26: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
Successfully registered model 'my_llama_index_vector_store'.
Created version '1' of model 'my_llama_index_vector_store'.

Downloading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

Unique identifier for the model location for loading: runs:/036936a7ac964f0cb6ab99fa908d6421/llama_index

加载索引并执行推理

以下代码演示了使用加载的模型可以完成的三种核心推理类型。

通过 LlamaIndex 加载并执行推理： 此方法使用 mlflow.llama_index.load_model 加载模型，并执行直接查询、聊天或检索。当您想要利用底层 Llama 索引对象的全部功能时，此方法是理想选择。
通过 MLflow PyFunc 加载并执行推理： 此方法使用 mlflow.pyfunc.load_model 加载模型，从而以通用 PyFunc 格式启用模型预测，并在记录时指定引擎类型。它对于使用 mlflow.evaluate 评估模型或部署模型进行服务非常有用。
通过 MLflow Spark UDF 加载并执行推理： 此方法使用 mlflow.pyfunc.spark_udf 将模型加载为 Spark UDF，有助于在 Spark DataFrame 中对大型数据集进行分布式推理。它非常适合处理大规模数据，并且与 PyFunc 推理一样，仅支持在记录时定义的引擎类型。

print("
------------- Inference via Llama Index   -------------")
index = mlflow.llama_index.load_model(model_uri)
query_response = index.as_query_engine().query("hi")
print(query_response)

print("
------------- Inference via MLflow PyFunc -------------")
index = mlflow.pyfunc.load_model(model_uri)
query_response = index.predict("hi")
print(query_response)

2024/07/24 18:02:21 WARNING mlflow.tracing.processor.mlflow: Creating a trace within the default experiment with id '0'. It is strongly recommended to not use the default experiment to log traces due to ambiguous search results and probable performance issues over time due to directory table listing performance degradation with high volumes of directories within a specific path. To avoid performance and disambiguation issues, set the experiment for your environment using `mlflow.set_experiment()` API.

------------- Inference via Llama Index   -------------

2024/07/24 18:02:22 WARNING mlflow.tracing.processor.mlflow: Creating a trace within the default experiment with id '0'. It is strongly recommended to not use the default experiment to log traces due to ambiguous search results and probable performance issues over time due to directory table listing performance degradation with high volumes of directories within a specific path. To avoid performance and disambiguation issues, set the experiment for your environment using `mlflow.set_experiment()` API.

Hello! How can I assist you today?

------------- Inference via MLflow PyFunc -------------
Hello! How can I assist you today?

# Optional: Spark UDF inference
show_spark_udf_inference = False
if show_spark_udf_inference:
  print("
------------- Inference via MLflow Spark UDF -------------")
  from pyspark.sql import SparkSession

  spark = SparkSession.builder.getOrCreate()

  udf = mlflow.pyfunc.spark_udf(spark, model_uri, result_type="string")
  df = spark.createDataFrame([("hi",), ("hello",)], ["text"])
  df.withColumn("response", udf("text")).toPandas()

探索 MLflow UI

最后，让我们探索一下底层发生了什么。要打开 MLflow UI，请运行以下单元格。请注意，您也可以在包含 mlruns 文件夹的同一目录（默认情况下是此 Notebook 的目录）中通过新的 CLI 窗口运行此命令。

import os
import subprocess

from IPython.display import IFrame

# Start the MLflow UI in a background process
mlflow_ui_command = ["mlflow", "ui", "--port", "5000"]
subprocess.Popen(
  mlflow_ui_command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, preexec_fn=os.setsid
)

<subprocess.Popen at 0x7fbe09399ee0>

# Wait for the MLflow server to start then run the following command
# Note that cached results don't render, so you need to run this to see the UI
IFrame(src="http://localhost:5000", width=1000, height=600)

让我们导航到屏幕左上角的实验选项卡，然后单击我们最近的运行，如下图所示。

MLflow 在 MLflow 运行期间记录与您的模型及其环境相关的工件。大多数记录的文件，例如 conda.yaml、python_env.yml 和 requirements.txt，是所有 MLflow 记录的标准文件，有助于环境之间的可重现性。然而，有两组工件是 LlamaIndex 特有的。

index：一个存储序列化向量存储的目录。更多详情，请访问 LlamaIndex 的序列化文档。
settings.json：序列化的 llama_index.core.Settings 服务上下文。更多详情，请访问 LlamaIndex 的设置文档。

通过存储这些对象，MLflow 能够重现您记录模型的环境。

llama_index_mlflow_ui_run

重要提示：MLflow 不会序列化 API 密钥。这些密钥必须作为环境变量存在于您的模型加载环境中。

我们还在模型注册表中创建了模型的记录。通过在记录模型时简单地指定 registered_model_name 和 input_example，我们可以获得强大的签名推断以及模型注册表中的一个实例，如下所示。

llama_index_mlflow_ui_registered_model

最后，让我们探索我们记录的跟踪。在 Experiments 选项卡中，我们可以点击 Tracing 查看我们两次推理调用的记录跟踪。跟踪有效地显示了我们推理系统中发生的回调式堆栈跟踪。

llama_index_tracing_quickstart

如果我们点击第一个跟踪，我们可以看到关于我们的输入、输出以及链中每个步骤持续时间的一些非常酷的细节。

llama_index_single_trace_quickstart

自定义和后续步骤

在使用生产系统时，用户通常会利用自定义服务上下文，这可以通过 LlamaIndex 的设置 (Settings) 对象完成。

您将学到什么​

设置​

创建索引​

使用 MLflow 记录索引​

加载索引并执行推理​

探索 MLflow UI​

自定义和后续步骤​