使用 OpenAI 和 MLflow 构建代码助手

下载此笔记本

概述

欢迎阅读本综合教程。您将开始一段引人入胜的旅程，将 OpenAI 强大的语言模型与 MLflow 集成。我们将构建一个真正有用的工具，只需向我们声明的任何函数添加一个装饰器，即可在积极开发的代码的交互环境中获得即时反馈。

学习目标

在本教程结束时，您将

掌握 OpenAI 的 GPT-4，用于代码辅助：了解如何利用 OpenAI 的 GPT-4 模型来提供实时的编码辅助。学习如何利用它的功能来生成代码建议、解释，并提高整体编码效率。
利用 MLflow 进行增强的模型跟踪：深入研究 MLflow 强大的跟踪系统，以管理机器学习实验。学习如何从 MLflow 中调整一个 pyfunc model，以控制 LLM 的输出在交互式编码环境中的显示方式。
无缝结合 OpenAI 和 MLflow：发现将 OpenAI 的 AI 功能与 MLflow 的跟踪和管理系统集成的实际步骤。这种集成展示了结合这些工具如何简化智能应用程序的开发和部署。
开发和部署自定义 Python 代码助手：获得使用 OpenAI 模型创建基于 Python 的代码助手的实践经验。然后，在 Jupyter Notebook 环境中使用它时，实际看到它的作用，以便在开发过程中提供有用的帮助。
利用 AI 驱动的洞察力提高代码质量：应用 AI 驱动的分析来审查和增强您的代码。了解 AI 助手如何提供关于代码质量的实时反馈、提出改进建议，并帮助维持高编码标准。
探索用于稳健开发的高级 Python 特性：理解诸如装饰器和函数式编程等高级 Python 特性。这些对于构建高效、可扩展和可维护的软件解决方案至关重要，尤其是在集成 AI 功能时。

涵盖的关键概念

MLflow 的模型管理：探索 MLflow 的特性，用于跟踪实验、将代码打包到可重复的运行中，以及管理和部署模型。
自定义 Python 模型：学习如何使用 MLflow 的内置自定义功能来定义一个通用的 Python 函数，该函数允许您在与 OpenAI 交互以对 LLM 的输出执行替代处理时，设计自己的处理逻辑。
Python 装饰器和函数式编程：学习诸如装饰器和函数式编程等高级 Python 概念，以实现高效的代码评估和增强。

为什么要使用 MLflow？

MLflow 在本教程中作为一个关键元素出现，使我们的用例不仅可行而且非常高效。它提供了与 OpenAI 的高级语言模型的安全和无缝的接口。在本教程中，我们将探索 MLflow 如何极大地简化了存储 OpenAI 的特定指令提示的过程，并通过向返回的文本添加可读的格式来增强用户体验。

MLflow 的灵活性和可扩展性使其成为与各种工具集成（尤其是在像 Jupyter Notebook 这样的交互式编码环境中）的强大选择。我们将亲眼目睹 MLflow 如何促进快速实验和迭代，使我们能够以最小的努力创建一个功能性工具。该工具不仅可以辅助开发，而且可以提升整体的编码和模型管理体验。通过利用 MLflow 的综合功能，我们将完成一个无缝的端到端工作流程，从设置复杂的模型到高效地执行复杂的任务。

GPT-4 使用的重要成本考虑因素

GPT-4 的较高成本

务必注意，使用 GPT-4 而不是 GPT-4o-mini 可能会产生更高的成本。GPT-4 的高级功能和增强的性能带来更高的价格，使其成为比像 GPT-3.5 这样的早期模型更昂贵的选择。

为什么在本教程中选择 GPT-4？

增强的功能：我们在此教程中选择 GPT-4 主要是因为它在诸如代码重构和检测代码实现中的问题等领域的卓越功能。
演示目的：此处使用 GPT-4 作为演示，以展示语言模型技术的前沿进展及其在复杂任务中的应用。

考虑替代方案以提高成本效益

对于成本是一个重要问题的项目，或者 GPT-4 的高级功能不是必需的，请考虑使用 GPT-4o-mini 或其他更具成本效益的替代方案。这些模型仍然可以为广泛的应用提供强大的性能，但成本较低。

GPT-4 的预算

如果您选择继续使用 GPT-4，建议您

密切监控使用情况：跟踪您的 API 使用情况以有效地管理成本。
相应地进行预算：分配足够的资源来支付与 GPT-4 相关的更高成本。

通过注意这些成本考虑因素，您可以就哪种 OpenAI 模型最适合您项目的需求和预算做出明智的决定。

import warnings

# Disable a few less-than-useful UserWarnings from setuptools and pydantic
warnings.filterwarnings("ignore", category=UserWarning)

import functools
import inspect
import os
import textwrap

import openai

import mlflow
from mlflow.models.signature import ModelSignature
from mlflow.pyfunc import PythonModel
from mlflow.types.schema import ColSpec, ParamSchema, ParamSpec, Schema

# Run a quick validation that we have an entry for the OPEN_API_KEY within environment variables
assert "OPENAI_API_KEY" in os.environ, "OPENAI_API_KEY environment variable must be set"

初始化 MLflow 客户端

根据您运行此 notebook 的位置，您的配置可能因初始化 MLflow 客户端的方式而异。如果您不确定如何配置和使用 MLflow 跟踪服务器，或者有哪些选项可用，您可以参阅此处的运行 notebook 指南，以获取有关设置跟踪服务器 URI 以及配置对托管或自托管 MLflow 跟踪服务器的访问的更多信息。

设置 MLflow 实验

在本教程的这一部分，我们使用 MLflow 的 set_experiment 函数来定义一个名为“Code Helper”的实验。此步骤对于 MLflow 的工作流程至关重要，原因如下

唯一标识：像“Code Helper”这样唯一且不同的实验名称对于轻松识别和隔离与此特定项目相关的运行至关重要，尤其是在同时处理多个项目或实验时。
简化跟踪：命名实验可以轻松跟踪与其关联的所有运行和模型，从而维护模型开发、参数、指标和结果的清晰历史记录。
易于在 MLflow UI 中访问：一个不同的实验名称可确保在 MLflow UI 中快速定位和访问我们实验的运行和模型，从而促进分析、比较不同的运行和共享结果。
有助于更好的组织：随着项目复杂性的增加，拥有一个命名良好的实验有助于更好地组织和管理机器学习生命周期，从而更容易浏览实验的不同阶段。

使用像“Code Helper”这样唯一的实验名称为高效的模型管理和跟踪奠定了基础，这是任何机器学习工作流程的关键方面，尤其是在动态和协作环境中。

mlflow.set_experiment("Code Helper")

<Experiment: artifact_location='file:///Users/benjamin.wilson/repos/mlflow-fork/mlflow/docs/source/llms/openai/notebooks/mlruns/703316263508654123', creation_time=1701891935339, experiment_id='703316263508654123', last_update_time=1701891935339, lifecycle_stage='active', name='Code Helper', tags={}>

定义 AI 模型的指令集

在本教程的这一部分，我们定义了一组特定的指令来指导 AI 模型的行为。这是通过 instruction 数组实现的，该数组概述了系统（AI 模型）和用户之间的角色和预期交互。以下是其组成部分的细分

系统角色：数组的第一个元素将 AI 模型定义为“系统”的角色。它将模型描述为“有用的专家软件工程师”，其目的是辅助代码分析并提供教育支持。期望 AI 模型能够
- 提供对代码意图的清晰解释。
- 评估代码的正确性和可读性。
- 提出改进建议，同时侧重于简单性、可维护性和遵守最佳编码实践。
用户角色：第二个元素代表“用户”角色。用户（在本例中，是从本教程中学习的人）通过提交代码以供审查来与 AI 模型交互。期望用户
- 提供代码片段以进行评估。
- 寻求 AI 模型对代码改进的反馈和建议。

该指令集对于创建交互式学习体验至关重要。它指导 AI 模型提供有针对性的建设性反馈，使其成为理解编码实践和增强编码技能的宝贵工具。

instruction = [
  {
      "role": "system",
      "content": (
          "As an AI specializing in code review, your task is to analyze and critique the submitted code. For each code snippet, provide a detailed review that includes: "
          "1. Identification of any errors or bugs. "
          "2. Suggestions for optimizing code efficiency and structure. "
          "3. Recommendations for enhancing code readability and maintainability. "
          "4. Best practice advice relevant to the code’s language and functionality. "
          "Your feedback should help the user improve their coding skills and understand best practices in software development."
      ),
  },
  {"role": "user", "content": "Review my code and suggest improvements: {code}"},
]

在 MLflow 中定义和利用模型签名

在本教程的这一部分，我们为我们的 OpenAI 模型定义了一个 ModelSignature，这是保存基本模型以及稍后在我们的自定义 Python 模型实现中的关键步骤。以下是该过程的概述

模型签名定义:
- 我们创建一个 ModelSignature 对象，该对象指定我们模型的输入、输出和参数。
- inputs 和 outputs 被定义为具有单个字符串列的模式，表明我们的模型将处理字符串类型的数据。
- params 模式包括两个参数：max_tokens 和 temperature，每个参数都有一个默认值和定义的数据类型。

注意：我们在这里显式定义模型签名是为了演示的目的。如果您不指定模式，则会自动推断模式，并根据在记录或保存模型时定义的 task 来设置模式。

记录基本 OpenAI 模型:
- 使用 mlflow.openai.log_model，我们记录基本 OpenAI 模型 (gpt-4) 以及我们之前定义的 instruction 集。
- 我们定义的 signature 也在这一步中传入，确保模型以正确的输入、输出和参数规范保存。

这种双重用途签名至关重要，因为它确保模型在其基本形式以及稍后包装在自定义 Python 模型中时，数据处理方式的一致性。这种方法简化了工作流程，并在模型实现和部署的不同阶段保持统一性。

# Define the model signature that will be used for both the base model and the eventual custom pyfunc implementation later.
signature = ModelSignature(
  inputs=Schema([ColSpec(type="string", name=None)]),
  outputs=Schema([ColSpec(type="string", name=None)]),
  params=ParamSchema(
      [
          ParamSpec(name="max_tokens", default=500, dtype="long"),
          ParamSpec(name="temperature", default=0, dtype="float"),
      ]
  ),
)

# Log the base OpenAI model with the included instruction set (prompt)
with mlflow.start_run():
  model_info = mlflow.openai.log_model(
      model="gpt-4",
      task=openai.chat.completions,
      name="base_model",
      messages=instruction,
      signature=signature,
  )

我们在 MLflow UI 中记录的模型

记录模型后，您可以打开 MLflow UI 并查看已记录的组件。请注意，我们模型的配置，包括模型类型 (gpt-4)、端点 API 类型 (task) 被记录 (chat.completions)，并且提示都已被记录。

openai-ui

通过自定义 Pyfunc 实现增强用户体验

在本节中，我们介绍一个自定义 Python 模型 CodeHelper，它可以显著提高在交互式开发环境（如 Jupyter Notebook）中与 OpenAI 模型交互时的用户体验。CodeHelper 类旨在格式化来自 OpenAI 模型的输出，使其更具可读性和视觉吸引力，类似于聊天界面。以下是它的工作原理

初始化和模型加载:
- CodeHelper 类继承自 PythonModel。
- load_context 方法用于加载 OpenAI 模型，该模型保存为 self.model。此模型从 context.artifacts 加载，确保使用适当的模型进行预测。
响应格式化:
- _format_response 方法对于增强输出格式至关重要。
- 它处理响应中的每个项目，以不同的方式处理文本和代码块。
- 代码块之外的文本行被包装到 80 个字符的宽度，以提高可读性。
- 代码块内的行（用 ``` 标记）不会被包装，从而保留代码结构。
- 这种格式创建了一个类似于聊天界面的输出，使交互更加直观和用户友好。
进行预测:
- predict 方法是模型进行预测的地方。
- 它调用加载的 OpenAI 模型以获取给定输入的原始响应。
- 然后将原始响应传递给 _format_response 方法进行格式化。
- 返回格式化的响应，提供清晰易读的输出。

通过实现这种自定义 pyfunc，我们增强了用户与 AI 代码助手的交互。它不仅使输出更易于理解，而且以熟悉的格式呈现它，类似于消息传递，这在交互式编码环境中尤其有益。

# Custom pyfunc implementation that applies text and code formatting to the output results from the OpenAI model
class CodeHelper(PythonModel):
  def __init__(self):
      self.model = None

  def load_context(self, context):
      self.model = mlflow.pyfunc.load_model(context.artifacts["model_path"])

  @staticmethod
  def _format_response(response):
      formatted_output = ""
      in_code_block = False

      for item in response:
          lines = item.split("
")
          for line in lines:
              # Check for the start/end of a code block
              if line.strip().startswith("```"):
                  in_code_block = not in_code_block
                  formatted_output += line + "
"
                  continue

              if in_code_block:
                  # Don't wrap lines inside code blocks
                  formatted_output += line + "
"
              else:
                  # Wrap lines outside of code blocks
                  wrapped_lines = textwrap.fill(line, width=80)
                  formatted_output += wrapped_lines + "
"

      return formatted_output

  def predict(self, context, model_input, params):
      # Call the loaded OpenAI model instance to get the raw response
      raw_response = self.model.predict(model_input, params=params)

      # Return the formatted response so that it is easier to read
      return self._format_response(raw_response)

使用 MLflow 保存自定义 Python 模型

本教程的这一部分演示了如何使用 MLflow 保存自定义 Python 模型 CodeHelper。该过程涉及指定模型的位置和其他信息，以确保正确存储模型并可以在将来检索以供使用。以下是概述

定义工件:
- 创建一个 artifacts 字典，其键 "model_path" 指向基本 OpenAI 模型的位置。此步骤对于将我们的自定义模型与必要的基本模型文件链接起来非常重要。我们通过访问 log_model() 函数返回的 model_uri 属性来检索先前记录的 openai 模型的位置。
保存模型:
- mlflow.pyfunc.save_model 函数用于保存 CodeHelper 模型。
- path：指定模型将保存到的位置 (final_model_path)。
- python_model：提供 CodeHelper 类的实例，指示要保存的模型。
- input_example：给出一个示例输入 (["x = 1"])，这对于理解模型期望的输入格式很有用。
- signature：传递先前定义的 ModelSignature，确保模型处理数据方式的一致性。
- artifacts：包含 artifacts 字典以将基本 OpenAI 模型与我们的自定义模型关联。

此步骤对于以 MLflow 可以管理和跟踪的格式封装我们的 CodeHelper 模型的全部功能至关重要。它允许轻松部署和版本控制模型，从而促进其在各种应用程序和环境中使用。

# Define the location of the base model that we'll be using within our custom pyfunc implementation
artifacts = {"model_path": model_info.model_uri}

with mlflow.start_run():
  helper_model = mlflow.pyfunc.log_model(
      name="code_helper",
      python_model=CodeHelper(),
      input_example=["x = 1"],
      signature=signature,
      artifacts=artifacts,
  )

Downloading artifacts:   0%|          | 0/5 [00:00<?, ?it/s]

加载我们保存的自定义 Python 模型

在下一节中，我们将加载刚刚保存的模型，以便我们可以使用它！

loaded_helper = mlflow.pyfunc.load_model(helper_model.model_uri)

使用 MLflow 模型比较两种代码审查方法

在本教程中，我们将探索两种不同的方法，利用 MLflow 模型来审查代码并提供反馈。这些方法提供了不同级别的复杂性和集成，以适应不同的用例和偏好。

方法 1：简单的 `review` 函数

我们的第一种方法是一个简单的 review 函数。此方法不太具有侵入性，不会修改原始函数的行为。如果您想手动触发对函数代码的审查，并且不需要查看函数的输出结果以了解 LLM 分析的上下文，那么它是理想的选择。

工作原理：review 函数接受一个函数和一个 MLflow 模型作为参数。然后，它使用该模型来评估给定函数的源代码。
手动调用：您需要显式调用 review(my_func) 才能审查 my_func。这种方法是手动的，不会自动与函数调用集成。
简单性：此方法更简单、更直接，使其适用于一次性评估或不需要自动审查的用例。

方法 2：高级 `code_inspector` 装饰器

第二种方法是一个高级装饰器 code_inspector，它通过自动审查函数并允许执行函数评估来更深入地集成。这对于更复杂的函数可能很有用，在这些函数中，输出结果与来自代码助手的评估相结合，可以更深入地了解任何观察到的逻辑缺陷。

自动评估：当作为装饰器应用时，code_inspector 会在每次调用时自动评估函数代码。
错误处理：包括评估过程中的强大错误处理。
函数修改：此方法修改函数的行为，从而合并自动审查过程。

`review` 函数简介

我们将首先检查 review 函数。此函数将在我们的 Jupyter notebook 的下一个单元格中定义。以下是 review 函数功能的快速概述

输入：它接受一个函数和一个 MLflow 模型作为输入。
功能：提取输入函数的源代码，并使用 MLflow 模型提供有关它的反馈。
错误处理：增强了错误处理，可以优雅地管理异常。

在以下 Jupyter notebook 单元格中，您将看到 review 函数的实现，展示了它在评估代码方面的简单性和有效性。

在探索了 review 函数之后，我们将深入研究更复杂的 code_inspector 装饰器，以了解其自动评估过程和错误处理机制。

def review(func, model):
  """
  Function to review the source code of a given function using a specified MLflow model.

  Args:
  func (function): The function to review.
  model (MLflow pyfunc model): The MLflow pyfunc model used for evaluation.

  Returns:
  The model's prediction or an error message.
  """
  try:
      # Extracting the source code of the function
      source_code = inspect.getsource(func)

      # Using the model to predict/evaluate the source code
      prediction = model.predict([source_code])
      print(prediction)
  except Exception as e:
      # Handling any exceptions that occur and returning an error message
      return f"Error during model prediction or source code inspection: {e}"

`process_data` 函数的解释和审查

函数概述

process_data 函数旨在通过识别唯一元素和计算重复项来处理列表。但是，该实现存在一些效率低下和可读性问题。

建议的修订代码

GPT-4 分析的输出提供了清晰简洁的反馈，正如提示所指示的那样。通过此应用程序的 MLflow 集成，使用该工具的简易性显而易见，使我们能够在开发过程中获得高质量的指导，只需一个简单的函数调用。

def process_data(lst):
  s = 0
  q = []
  for i in range(len(lst)):
      a = lst[i]
      for j in range(i + 1, len(lst)):
          b = lst[j]
          if a == b:
              s += 1
          else:
              q.append(b)
  rslt = [x for x in lst if x not in q]
  k = []
  for i in rslt:
      if i not in k:
          k.append(i)
  final_data = sorted(k, reverse=True)
  return final_data, s


review(process_data, loaded_helper)

Your code seems to be trying to find the count of duplicate elements in a list
and return a sorted list of unique elements in descending order along with the
count of duplicates. Here are some suggestions to improve your code:

1. **Errors or Bugs**: There are no syntax errors in your code, but the logic is
flawed. The variable `s` is supposed to count the number of duplicate elements,
but it only counts the number of times an element is equal to another element in
the list, which is not the same thing. Also, the way you're trying to get unique
elements is inefficient and can lead to incorrect results.

2. **Optimizing Code Efficiency and Structure**: You can use Python's built-in
`set` and `list` data structures to simplify your code and make it more
efficient. A `set` in Python is an unordered collection of unique elements. You
can convert your list to a set to remove duplicates, and then convert it back to
a list. The length of the original list minus the length of the list with
duplicates removed will give you the number of duplicate elements.

3. **Enhancing Code Readability and Maintainability**: Use meaningful variable
names to make your code easier to understand. Also, add comments to explain what
each part of your code does.

4. **Best Practice Advice**: It's a good practice to write a docstring at the
beginning of your function to explain what it does.

Here's a revised version of your code incorporating these suggestions:

```python
def process_data(lst):
  """
  This function takes a list as input, removes duplicate elements, sorts the remaining elements in descending order,
  and counts the number of duplicate elements in the original list.
  It returns a tuple containing the sorted list of unique elements and the count of duplicate elements.
  """
  # Convert the list to a set to remove duplicates, then convert it back to a list
  unique_elements = list(set(lst))
  
  # Sort the list of unique elements in descending order
  sorted_unique_elements = sorted(unique_elements, reverse=True)
  
  # Count the number of duplicate elements
  duplicate_count = len(lst) - len(unique_elements)
  
  return sorted_unique_elements, duplicate_count
```
This version of the code is simpler, more efficient, and easier to understand.
It also correctly counts the number of duplicate elements in the list.

`code_inspector` 装饰器函数

code_inspector 函数是一个 Python 装饰器，旨在通过使用 MLflow pyfunc 模型来增强具有自动代码审查功能的函数。此装饰器增强了函数的功能，从而允许使用 MLflow pyfunc 模型自动审查函数的代码质量和正确性，从而丰富了开发和学习体验。与上面的 review() 函数的实现相比，此方法将允许在调用时执行该函数，从而在使用自动代码审查时增强上下文信息。

import functools
import inspect


def code_inspector(model):
  """
  Decorator for automatic code review using an MLflow pyfunc model.

  Args:
      model: The MLflow pyfunc model for code evaluation.
  """

  def decorator_check_my_function(func):
      # Decorator that wraps around the given function
      @functools.wraps(func)
      def wrapper(*args, **kwargs):
          try:
              # Extracting the source code of the decorated function
              parsed_func = inspect.getsource(func)

              # Using the MLflow model to evaluate the extracted source code
              response = model.predict([parsed_func])

              # Printing the response for code review feedback
              print(response)

          except Exception as e:
              # Handling exceptions during model prediction or source code extraction
              print("Error during model prediction or formatting:", e)

          # Executing and returning the original function's output
          return func(*args, **kwargs)

      return wrapper

  return decorator_check_my_function

首次使用试用：带有 `code_inspector` 的 `summing_function`

我们将 code_inspector 装饰器应用于一个名为 summing_function 的函数。此函数旨在计算给定范围内的总和之和。以下是有关其功能以及 code_inspector 带来的增强功能的见解

函数概述:
- summing_function 计算到 n 的数字的累积总和。它通过迭代一个范围并对每个步骤中的中间总和求和来实现。
- 字典 intermediate_sums 用于存储这些总和，然后聚合这些总和以找到最终总和。
使用 code_inspector:
- 该函数使用 code_inspector(loaded_helper) 进行装饰。这意味着每次调用 summing_function 时，作为 loaded_helper 加载的 MLflow 模型都会分析其代码。
- 该装饰器提供有关代码的实时反馈，评估代码的质量、效率和最佳实践等方面。
教育效益:
- 此设置非常适合学习，允许用户在其代码上获得即时、可操作的反馈。
- 它提供了一种实用的方法来理解函数背后的逻辑，并学习代码优化和改进。

通过将 code_inspector 与 summing_function 集成，本教程演示了一种增强编码技能的交互式方法，通过即时反馈来帮助理解和改进。

在继续查看 GPT-4 的响应之前，您能识别此代码中的所有问题（不止几个）吗？

@code_inspector(loaded_helper)
def summing_function(n):
  sum_result = 0

  intermediate_sums = {}

  for i in range(1, n + 1):
      intermediate_sums[str(i)] = sum(x for x in range(1, i + 1))
      for key in intermediate_sums:
          if key == str(i):
              sum_result = intermediate_sums[key]  # noqa: F841

  final_sum = sum([intermediate_sums[key] for key in intermediate_sums if int(key) == n])

  return int(str(final_sum))

执行和分析 `summing_function(1000)`

当我们执行 summing_function(1000) 时，会发生几个关键过程，通过 code_inspector 装饰器利用我们的自定义 MLflow 模型。以下是发生的情况

装饰器激活:
- 在调用 summing_function(1000) 时，code_inspector 装饰器是第一个激活的。此装饰器旨在通过使用 loaded_helper 模型分析修饰的函数。
模型分析函数代码:
- code_inspector 使用 inspect 模块检索 summing_function 的源代码。
- 然后，此源代码传递给 loaded_helper 模型，该模型根据其训练和提供的指令执行分析。该模型预测有关代码质量、效率和最佳实践的反馈。
反馈呈现:
- 打印出模型生成的反馈。此反馈可能包括有关代码优化的建议、潜在错误的识别或有关编码实践的一般建议。
- 此步骤在函数执行其逻辑之前，提供了对代码质量的教育性见解。
函数执行:
- 显示反馈后，summing_function 继续使用输入 1000 执行。
- 该函数计算到 1000 的数字的累积总和，但由于其效率低下的实现，此过程可能比必要的更慢且资源密集。
返回结果:
- 该函数返回最终计算的总和，即在其中实现的求和逻辑的结果。

此演示突出了 code_inspector 装饰器与我们的自定义 MLflow 模型相结合，如何提供一种独特的实时代码分析和反馈机制，从而增强交互环境中的学习和开发体验。

summing_function(1000)

Here's a detailed review of your code:

1. Errors or bugs: There are no syntax errors in your code, but there is a
logical error. The summing_function is supposed to calculate the sum of numbers
from 1 to n, but it's doing more than that. It's calculating the sum of numbers
from 1 to i for each i in the range 1 to n, storing these sums in a dictionary,
and then summing these sums again. This is unnecessary and inefficient.

2. Optimizing code efficiency and structure: The function can be simplified
significantly. The sum of numbers from 1 to n can be calculated directly using
the formula n*(n+1)/2. This eliminates the need for the loop and the dictionary,
making the function much more efficient.

3. Enhancing code readability and maintainability: The code can be made more
readable by simplifying it and removing unnecessary parts. The use of the
dictionary and the conversion of numbers to strings and back to numbers is
confusing and unnecessary.

4. Best practice advice: In Python, it's best to keep things simple and
readable. Avoid unnecessary complexity and use built-in functions and operators
where possible. Also, avoid unnecessary type conversions.

Here's a simplified version of your function:

```python
def summing_function(n):
  return n * (n + 1) // 2
```

This function does exactly the same thing as your original function, but it's
much simpler, more efficient, and more readable.

分析 `one_liner` 函数

用 code_inspector 修饰的 one_liner 函数演示了一种有趣的方法，但存在一些问题

复杂性：该函数使用嵌套的 lambda 表达式来计算 n 的阶乘。虽然紧凑，但此方法过于复杂且难以阅读，从而降低了代码的可维护性和可理解性。
可读性：良好的编码实践强调可读性，而此处由于单行代码方法而受到损害。这种代码可能难以调试和理解，特别是对于那些不熟悉特定编码风格的人而言。
最佳实践：虽然演示了 Python 编写简洁代码的功能，但此示例偏离了常见的最佳实践，尤其是在清晰度和简单性方面。

当由 code_inspector 模型审查时，这些问题很可能会被突出显示，从而强调了在编写简洁代码与可读性和可维护性之间保持平衡的重要性。

@code_inspector(loaded_helper)
def one_liner(n):
  return (
      (lambda f, n: f(f, n))(lambda f, n: n * f(f, n - 1) if n > 1 else 1, n)
      if isinstance(n, int) and n >= 0
      else "Invalid input"
  )

one_liner(10)

The code you've provided is a one-liner function that calculates the factorial
of a given number `n`. It uses a lambda function to recursively calculate the
factorial. Here's a review of your code:

1. Errors or bugs: There are no syntax errors or bugs in your code. It correctly
checks if the input is a non-negative integer and calculates the factorial. If
the input is not a non-negative integer, it returns "Invalid input".

2. Optimizing code efficiency and structure: The code is already quite efficient
as it uses recursion to calculate the factorial. However, the structure of the
code is quite complex due to the use of a lambda function for recursion. This
can make the code difficult to understand and maintain.

3. Enhancing code readability and maintainability: The code could be made more
readable by breaking it down into multiple lines and adding comments to explain
what each part of the code does. The use of a lambda function for recursion
makes the code more difficult to understand than necessary. A more
straightforward recursive function could be used instead.

4. Best practice advice: In Python, it's generally recommended to use clear and
simple code over complex one-liners. This is because clear code is easier to
read, understand, and maintain. While one-liners can be fun and clever, they can
also be difficult to understand and debug.

Here's a revised version of your code that's easier to understand:

```python
def factorial(n):
  # Check if the input is a non-negative integer
  if not isinstance(n, int) or n < 0:
      return "Invalid input"
  
  # Base case: factorial of 0 is 1
  if n == 0:
      return 1
  
  # Recursive case: n! = n * (n-1)!
  return n * factorial(n - 1)
```

This version of the code does the same thing as your original code, but it's
much easier to understand because it uses a straightforward recursive function
instead of a lambda function.

审查 `find_phone_numbers` 函数

用 code_inspector 增强的 find_phone_numbers 函数旨在从给定的文本中提取电话号码，但包含一些值得注意的问题和预期行为

拼写错误：该函数错误地使用了 re.complie 而不是 re.compile，从而导致运行时异常。
模式匹配不准确：正则表达式模式 "(\d{3})-\d{3}-\d{4}" 虽然是为典型的电话号码格式化的，但如果电话号码未出现在字符串中，则可能会导致错误。
缺乏错误处理：直接访问 phone_numbers 中的第一个元素而不检查列表是否为空可能会导致 IndexError。
Import语句的位置: import re 语句位于函数内部，这不常见。为了清晰起见，导入通常放在脚本的顶部。
分析和异常处理:
- 由于我们在 code_inspector 中构建自定义 MLflow 模型的方式，函数的潜在问题将在函数逻辑执行之前进行分析并返回反馈。
- 经过此分析后，函数的执行很可能会导致异常（由于拼写错误），这证明了仔细的代码审查和测试的重要性。

code_inspector 模型的审查将突出这些编码失误，强调正确的语法、模式准确性和 Python 编程中的错误处理的价值。

import re


@code_inspector(loaded_helper)
def find_phone_numbers(text):
  pattern = r"(d{3})-d{3}-d{4}"

  compiled_pattern = re.complie(pattern)

  phone_numbers = compiled_pattern.findall(text)
  first_number = phone_numbers[0]

  print(f"First found phone number: {first_number}")
  return phone_numbers

find_phone_numbers("Give us a call at 888-867-5309")

Here's a detailed review of your code:

1. Errors or Bugs:
 - There's a typo in the `re.compile` function. You've written `re.complie`
instead of `re.compile`.

2. Suggestions for Optimizing Code Efficiency and Structure:
 - The import statement `import re` is inside the function. It's a good
practice to keep all import statements at the top of the file. This makes it
easier to see what modules are being used in the script.
 - The function will throw an error if no phone numbers are found in the text
because you're trying to access the first element of `phone_numbers` without
checking if it exists. You should add a check to see if any phone numbers were
found before trying to access the first one.

3. Recommendations for Enhancing Code Readability and Maintainability:
 - The function name `find_phone_numbers` is clear and descriptive, which is
good. However, the variable `pattern` could be more descriptive. Consider
renaming it to `phone_number_pattern` or something similar.
 - You should add docstrings to your function to describe what it does, what
its parameters are, and what it returns.

4. Best Practice Advice:
 - Use exception handling to catch potential errors and make your program more
robust.
 - Avoid using print statements in functions that are meant to return a value.
If you want to debug, consider using logging instead.

Here's how you could improve your code:

```python
import re

def find_phone_numbers(text):
  """
  This function finds all phone numbers in the given text.

  Parameters:
  text (str): The text to search for phone numbers.

  Returns:
  list: A list of all found phone numbers.
  """
  phone_number_pattern = "(d{3})-d{3}-d{4}"
  compiled_pattern = re.compile(phone_number_pattern)

  phone_numbers = compiled_pattern.findall(text)

  if phone_numbers:
      print(f"First found phone number: {phone_numbers[0]}")

  return phone_numbers
```

Remember, the print statement is not recommended in production code. It's there
for the sake of this example.

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

/var/folders/cd/n8n0rm2x53l_s0xv_j_xklb00000gp/T/ipykernel_38633/78508464.py in <cell line: 1>()
----> 1 find_phone_numbers("Give us a call at 888-867-5309")

/var/folders/cd/n8n0rm2x53l_s0xv_j_xklb00000gp/T/ipykernel_38633/2021999358.py in wrapper(*args, **kwargs)
   18             except Exception as e:
   19                 print("Error during model prediction or formatting:", e)
---> 20             return func(*args, **kwargs)
   21 
   22         return wrapper

/var/folders/cd/n8n0rm2x53l_s0xv_j_xklb00000gp/T/ipykernel_38633/773713950.py in find_phone_numbers(text)
    5     import re
    6 
----> 7     compiled_pattern = re.complie(pattern)
    8 
    9     phone_numbers = compiled_pattern.findall(text)

AttributeError: module 're' has no attribute 'complie'

结论：在 AI 辅助开发中利用 MLflow 的强大功能

在本教程结束之际，我们已经完成了 OpenAI 语言模型与 MLflow 强大功能的集成，从而为 AI 辅助软件开发创建了一个强大的工具包。以下是我们的旅程和主要收获的总结

将 OpenAI 与 MLflow 集成:
- 我们探讨了如何将 OpenAI 的高级语言模型无缝集成到 MLflow 框架中。这种集成突出了将 AI 智能与强大的模型管理相结合的潜力。
实施自定义 Python 模型:
- 我们的旅程包括创建一个自定义的 CodeHelper 模型，该模型展示了 MLflow 在处理自定义 Python 函数方面的灵活性。该模型通过将 AI 响应格式化为更易于阅读的格式，从而显著增强了用户体验。
实时代码分析和反馈:
- 通过使用 code_inspector 装饰器，我们展示了 MLflow 在提供有关代码质量和效率的实时、有见地的反馈方面的实用性，从而营造了一种引导最佳编码实践的学习环境。
处理复杂的代码分析:
- 本教程展示了复杂的代码示例，揭示了 MLflow 与 OpenAI 结合如何处理复杂的代码分析，提供建议并识别潜在问题。
从交互式反馈中学习:
- 由我们的 MLflow 模型启用的交互式反馈循环说明了一种学习和提高编码技能的实用方法，使此工具集对于教育和开发目的特别有价值。
MLflow 的灵活性和可扩展性:
- 在整个教程中，MLflow 的灵活性和可扩展性显而易见。无论是管理简单的 Python 函数还是集成最先进的 AI 模型，MLflow 都被证明是在简化模型管理过程中的宝贵资产。

总而言之，本教程不仅提供了对有效编码实践的见解，而且还强调了 MLflow 在增强 AI 辅助软件开发方面的多功能性。它证明了如何创新地应用机器学习工具和模型来提高代码质量、效率和整体开发体验。

下一步是什么？

要继续您的学习之旅，请参阅有关 MLflow 的 OpenAI 风味的更多高级教程。

概述​

学习目标​

涵盖的关键概念​

为什么要使用 MLflow？​

GPT-4 使用的重要成本考虑因素​

GPT-4 的较高成本​

为什么在本教程中选择 GPT-4？​

考虑替代方案以提高成本效益​

GPT-4 的预算​

初始化 MLflow 客户端​

设置 MLflow 实验​

定义 AI 模型的指令集​

在 MLflow 中定义和利用模型签名​

我们在 MLflow UI 中记录的模型​

通过自定义 Pyfunc 实现增强用户体验​

使用 MLflow 保存自定义 Python 模型​

加载我们保存的自定义 Python 模型​

使用 MLflow 模型比较两种代码审查方法​

方法 1：简单的 review 函数​

方法 2：高级 code_inspector 装饰器​

review 函数简介​

process_data 函数的解释和审查​

函数概述​

建议的修订代码​

code_inspector 装饰器函数​

首次使用试用：带有 code_inspector 的 summing_function​

执行和分析 summing_function(1000)​

分析 one_liner 函数​

审查 find_phone_numbers 函数​

结论：在 AI 辅助开发中利用 MLflow 的强大功能​

下一步是什么？​

概述