在 MLflow Transformers 风味中使用大型模型

警告

本指南中描述的功能适用于熟悉 Transformers 和 MLflow 的高级用户。在使用这些功能之前，请了解与其相关的局限性和潜在风险。

MLflow Transformers 风味允许您在 MLflow 中跟踪各种 Transformers 模型。但是，由于其大小和内存要求，记录大型模型（例如大型语言模型 (LLM)）可能需要大量资源。本指南概述了 MLflow 在记录模型时减少内存和磁盘使用量的功能，使您能够在资源受限的环境中使用大型模型。

概述

下表总结了使用 Transformers 风味记录模型的不同方法。请注意，每种方法都有一定的限制和要求，如下文所述。

保存方法描述内存使用磁盘使用示例

基于普通管道的日志记录

使用管道实例或管道组件字典记录模型。

高

保存方法	描述	内存使用	磁盘使用	示例
基于普通管道的日志记录	使用管道实例或管道组件字典记录模型。	高	高	`import mlflow import transformers pipeline = transformers.pipeline( task="text-generation", model="meta-llama/Meta-Llama-3.1-70B", ) with mlflow.start_run(): mlflow.transformers.log_model( transformers_model=pipeline, name="model", )`
内存高效的模型日志记录	通过指定本地检查点的路径来记录模型，避免将模型加载到内存中。	低	高	`import mlflow with mlflow.start_run(): mlflow.transformers.log_model( # Pass a path to local checkpoint as a model transformers_model="/path/to/local/checkpoint", # Task argument is required for this saving mode. task="text-generation", name="model", )`
存储高效的模型日志记录	通过保存对 HuggingFace Hub 存储库的引用来记录模型，而不是模型权重。	高	低	`import mlflow import transformers pipeline = transformers.pipeline( task="text-generation", model="meta-llama/Meta-Llama-3.1-70B", ) with mlflow.start_run(): mlflow.transformers.log_model( transformers_model=pipeline, name="model", # Set save_pretrained to False to save storage space save_pretrained=False, )`

import mlflow
import transformers

pipeline = transformers.pipeline(
    task="text-generation",
    model="meta-llama/Meta-Llama-3.1-70B",
)

with mlflow.start_run():
    mlflow.transformers.log_model(
        transformers_model=pipeline,
        name="model",
    )

内存高效的模型日志记录

通过指定本地检查点的路径来记录模型，避免将模型加载到内存中。

低

高

import mlflow

with mlflow.start_run():
    mlflow.transformers.log_model(
        # Pass a path to local checkpoint as a model
        transformers_model="/path/to/local/checkpoint",
        # Task argument is required for this saving mode.
        task="text-generation",
        name="model",
    )

存储高效的模型日志记录

通过保存对 HuggingFace Hub 存储库的引用来记录模型，而不是模型权重。

高

低

import mlflow
import transformers

pipeline = transformers.pipeline(
    task="text-generation",
    model="meta-llama/Meta-Llama-3.1-70B",
)

with mlflow.start_run():
    mlflow.transformers.log_model(
        transformers_model=pipeline,
        name="model",
        # Set save_pretrained to False to save storage space
        save_pretrained=False,
    )

内存高效的模型日志记录

此方法在 MLflow 2.16.1 中引入，允许您在不将模型加载到内存中的情况下记录模型

import mlflow

with mlflow.start_run():
    mlflow.transformers.log_model(
        # Pass a path to local checkpoint as a model to avoid loading the model instance
        transformers_model="path/to/local/checkpoint",
        # Task argument is required for this saving mode.
        task="text-generation",
        name="model",
    )

在上面的示例中，我们在 mlflow.transformers.log_model() API 中，将本地模型检查点/权重的路径作为模型参数传递，而不是传递管道实例。MLflow 将检查检查点的模型元数据并记录模型权重，而无需将其加载到内存中。这样，您可以使用最少的计算资源将庞大的数十亿参数模型记录到 MLflow 中。

重要说明

使用此功能时，请注意以下要求和限制

检查点目录**必须**包含有效的 config.json 文件和模型权重文件。如果需要 tokenizer，则其状态文件也必须存在于检查点目录中。您可以通过调用 tokenizer.save_pretrained("path/to/local/checkpoint") 方法将 tokenizer 状态保存在您的检查点目录中。
您**必须**使用模型设计的适当任务名称来指定 task 参数。
MLflow 可能无法在此模式下准确推断模型依赖项。有关管理模型依赖项的更多信息，请参阅在 MLflow 模型中管理依赖项。

警告

确保您指定正确的任务参数，因为不兼容的任务会导致模型在**加载时失败**。您可以在 HuggingFace Hub 上检查模型的有效任务类型。

存储高效的模型日志记录

通常，当 MLflow 记录 ML 模型时，它会将模型权重的副本保存到工件存储中。但是，当您使用 HuggingFace Hub 中的预训练模型，并且在记录模型之前不打算微调或以其他方式操纵模型或其权重时，这并不是最佳选择。对于这种非常常见的情况，复制（通常非常大的）模型权重是多余的，同时开发提示、测试推理参数，否则只不过是不必要的存储空间浪费。

为了解决这个问题，MLflow 2.11.0 在 mlflow.transformers.save_model() 和 mlflow.transformers.log_model() API 中引入了一个新的参数 save_pretrained。当此参数设置为 False 时，MLflow 将放弃保存预训练模型权重，而是存储对 HuggingFace Hub 上底层存储库条目的引用；具体来说，当您的组件或管道被记录时，存储库名称和模型权重的唯一提交哈希值会被存储。当加载回这种*仅引用*模型时，MLflow 将检查保存的元数据中的存储库名称和提交哈希值，并从 HuggingFace Hub 下载模型权重或使用来自您的 HuggingFace 本地缓存目录的本地缓存模型。

以下是使用 save_pretrained 参数记录模型的示例

import transformers

pipeline = transformers.pipeline(
    task="text-generation",
    model="meta-llama/Meta-Llama-3.1-70B",
    torch_dtype="torch.float16",
)

with mlflow.start_run():
    mlflow.transformers.log_model(
        transformers_model=pipeline,
        name="model",
        # Set save_pretrained to False to save storage space
        save_pretrained=False,
    )

在上面的示例中，MLflow 不会保存 Llama-3.1-70B 模型的权重副本，而是将以下元数据记录为对 HuggingFace Hub 模型的引用。这将节省大约 150GB 的存储空间，并显著减少开发期间启动的每次运行的日志记录延迟。

通过导航到 MLflow UI，您可以看到使用存储库 ID 和提交哈希记录的模型

flavors:
    ...
    transformers:
        source_model_name: meta-llama/Meta-Llama-3.1-70B-Instruct
        source_model_revision: 33101ce6ccc08fa6249c10a543ebfcac65173393
        ...

在生产部署之前，您可能希望持久化模型权重而不是存储库引用。为此，您可以使用 mlflow.transformers.persist_pretrained_model() API 从 HuggingFace Hub 下载模型权重并将其保存到工件位置。有关更多信息，请参阅 OSS 模型注册表或旧工作区模型注册表部分。

注册仅引用模型以进行生产

使用上述任何一种优化方法记录的模型都是“仅引用”的，这意味着模型权重不会保存到工件存储中，而只会保存对 HuggingFace Hub 存储库的引用。当您正常加载模型时，MLflow 将从 HuggingFace Hub 下载模型权重。

但是，这可能不适合生产用例，因为模型权重可能不可用或由于网络问题导致下载失败。MLflow 提供了一种解决方案来解决将引用模型注册到模型注册表时出现的此问题。

Databricks Unity Catalog

将仅引用模型注册到 Databricks Unity Catalog 模型注册表比正常的模型注册过程需要没有额外的步骤。MLflow 会自动下载模型权重并将其与模型元数据一起注册到 Unity Catalog。

import mlflow

mlflow.set_registry_uri("databricks-uc")

# Log the repository ID as a model. The model weight will not be saved to the artifact store
with mlflow.start_run():
    model_info = mlflow.transformers.log_model(
        transformers_model="meta-llama/Meta-Llama-3.1-70B-Instruct",
        name="model",
    )

# When registering the model to Unity Catalog Model Registry, MLflow will automatically
# persist the model weight files. This may take a several minutes for large models.
mlflow.register_model(model_info.model_uri, "your.model.name")

OSS 模型注册表或旧工作区模型注册表

对于 OSS 模型注册表或 Databricks 中的旧工作区模型注册表，您需要在注册模型之前手动将模型权重持久化到工件存储中。您可以使用 mlflow.transformers.persist_pretrained_model() API 从 HuggingFace Hub 下载模型权重并将其保存到工件位置。此过程**不需要重新记录模型**，而是有效地就地更新现有模型和元数据。

import mlflow

# Log the repository ID as a model. The model weight will not be saved to the artifact store
with mlflow.start_run():
    model_info = mlflow.transformers.log_model(
        transformers_model="meta-llama/Meta-Llama-3.1-70B-Instruct",
        name="model",
    )

# Before registering the model to the non-UC model registry, persist the model weight
# from the HuggingFace Hub to the artifact location.
mlflow.transformers.persist_pretrained_model(model_info.model_uri)

# Register the model
mlflow.register_model(model_info.model_uri, "your.model.name")

跳过保存预训练模型权重的注意事项

虽然这些功能对于节省计算资源和存储空间以记录大型模型非常有用，但需要注意一些注意事项

模型可用性变更：如果您使用的是其他用户存储库中的模型，则该模型可能会在 HuggingFace Hub 中被删除或变为私有。在这种情况下，MLflow 无法加载回该模型。对于生产用例，建议在将模型从开发或暂存环境转移到生产环境之前，将模型权重的副本保存到工件存储中。
HuggingFace Hub 访问：由于网络延迟或 HuggingFace Hub 服务状态，从 HuggingFace Hub 下载模型可能会很慢或不稳定。MLflow 不提供任何重试机制或强大的错误处理来处理从 HuggingFace Hub 下载模型。因此，您不应依赖此功能来运行最终的生产候选版本。

通过了解这些方法及其局限性，您可以有效地在 MLflow 中使用大型 Transformers 模型，同时优化资源使用。

概述​

内存高效的模型日志记录​

重要说明​

存储高效的模型日志记录​

注册仅引用模型以进行生产​

Databricks Unity Catalog​

OSS 模型注册表或旧工作区模型注册表​

跳过保存预训练模型权重的注意事项​

概述

内存高效的模型日志记录

重要说明

存储高效的模型日志记录

注册仅引用模型以进行生产

Databricks Unity Catalog

OSS 模型注册表或旧工作区模型注册表

跳过保存预训练模型权重的注意事项