你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn

Azure OpenAI GPT-3.5 Turbo 微调教程

本教程将引导你完成对 gpt-35-turbo-0613 模型的微调。

本教程介绍如何执行下列操作:

  • 创建示例微调数据集。
  • 为资源终结点和 API 密钥创建环境变量。
  • 准备样本训练和验证数据集以进行微调。
  • 上传训练文件和验证文件进行微调。
  • gpt-35-turbo-0613 创建微调作业。
  • 部署自定义微调模型。

先决条件

  • Azure 订阅 - 免费创建订阅
  • 已在所需的 Azure 订阅中授予对 Azure OpenAI 的访问权限 目前,仅应用程序授予对此服务的访问权限。 可以通过在 https://aka.ms/oai/access 上填写表单来申请对 Azure OpenAI 的访问权限。
  • Python 3.8 或更高版本
  • 以下 Python 库:jsonrequestsostiktokentimeopenainumpy
  • OpenAI Python 库应至少为版本 0.28.1
  • Jupyter Notebook
  • 可进行 gpt-35-turbo-0613 微调的区域中的 Azure OpenAI 资源。 如果你没有资源,我们的资源部署指南中阐述了创建资源的过程。
  • 微调访问需要认知服务 OpenAI 参与者
  • 如果尚无权查看配额并在 Azure OpenAI Studio 中部署模型,则需要其他权限

重要

强烈建议在开始本教程之前查看定价信息进行微调,以确保你熟悉相关的成本。 在测试中,除了与微调推理相关的成本以及部署微调模型的每小时托管成本之外,本教程还收取一小时的训练费用。 完成本教程后,应删除经过微调的模型部署,否则将继续产生按小时计费的托管成本。

设置

Python 库

pip install openai requests tiktoken numpy

检索密钥和终结点

若要成功对 Azure OpenAI 发出调用,需要一个终结点和一个密钥。

变量名称
ENDPOINT 从 Azure 门户检查资源时,可在“密钥和终结点”部分中找到此值。 或者,可以在“Azure OpenAI Studio”>“操场”>“代码视图”中找到该值。 示例终结点为:https://docs-test-001.openai.azure.com/
API-KEY 从 Azure 门户检查资源时,可在“密钥和终结点”部分中找到此值。 可以使用 KEY1KEY2

在 Azure 门户中转到你的资源。 可在“资源管理”部分中找到“密钥和终结点”部分。 复制终结点和访问密钥,因为在对 API 调用进行身份验证时需要这两项。 可以使用 KEY1KEY2。 始终准备好两个密钥可以安全地轮换和重新生成密钥,而不会导致服务中断。

Azure 门户中 Azure OpenAI 资源概述 UI 的屏幕截图,其中终结点和访问密钥的位置用红圈标示。

环境变量

setx AZURE_OPENAI_API_KEY "REPLACE_WITH_YOUR_KEY_VALUE_HERE"
setx AZURE_OPENAI_ENDPOINT "REPLACE_WITH_YOUR_ENDPOINT_HERE"

创建示例数据集

微调 gpt-35-turbo-0613 需要特殊格式的 JSONL 训练文件。 OpenAI 在其文档中提供了以下示例:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

对于本示例,我们将稍作修改,将其更改为:

{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

尽管这三个示例有助于提供常规格式,但如果你想引导自定义微调模型以类似的方式进行响应,则需要更多示例。 一般来说,你需要至少 50 个高质量示例来开始。然而,你完全有可能有一个用例,它可能需要 1,000 个高质量的训练示例才能成功。

通常,将数据集大小加倍会促使模型质量呈线性增长。 但请记住,低质量的示例可能会对性能产生负面影响。 如果你根据大量内部数据来训练模型,而不是先修剪数据集以便获得最高质量的示例,则最终可能会得到一个性能比预期差得多的模型。

需要创建两个文件:training_set.jsonlvalidation_set.jsonl

在运行 Jupyter Notebook 的同一目录中创建文件,并将以下代码块的内容复制到相应的文件中:

training_set.jsonl

{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who discovered Antarctica?"}, {"role": "assistant", "content": "Some chaps named Fabian Gottlieb von Bellingshausen and Mikhail Lazarev, as if they don't teach that in every school!"}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What is the biggest ocean?"}, {"role": "assistant", "content": "The Pacific Ocean. It's not like it's a small pond or anything."}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What is the largest planet?"}, {"role": "assistant", "content": "It's called Jupiter, you may have heard of it...or not."}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who painted the Mona Lisa?"}, {"role": "assistant", "content": "Oh, some obscure artist named Leonardo da Vinci. Not that he is famous or anything."}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the boiling point of water?"}, {"role": "assistant", "content": "Exciting fact of the day: it's 100 degrees Celsius. Life-changing, isn't it?"}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'The Odyssey'?"}, {"role": "assistant", "content": "Oh, just some poet named Homer. Nothing special, really."}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What is the largest mammal?"}, {"role": "assistant", "content": "Blue Whale. Quite a big fish isn't it? Oh wait, it's not a fish!"}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the highest mountain peak?"}, {"role": "assistant", "content": "Everest. Brace yourself, it's not a walk in the park."}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who invented the telephone?"}, {"role": "assistant", "content": "A guy named Alexander Graham Bell. Bet you didn't guess that one!"}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What is the distance from Earth to the Sun?"}, {"role": "assistant", "content": "About 93 million miles. Just a quick drive, really."}]}

validation_set.jsonl

{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of Australia?"}, {"role": "assistant", "content": "It's Canberra, not Sydney. Shocking, I know!"}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'The Great Gatsby'?"}, {"role": "assistant", "content": "Some guy named F. Scott Fitzgerald. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the deepest part of the ocean?"}, {"role": "assistant", "content": "Mariana Trench, makes your swimming pool seem a bit shallow, doesn't it?"}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who composed the 'Moonlight Sonata'?"}, {"role": "assistant", "content": "Just a little-known guy named Beethoven. No big deal."}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What are the colors of the rainbow?"}, {"role": "assistant", "content": "ROYGBIV. Red, orange, yellow, green, blue, indigo, violet. Not that we learned this in preschool or anything!"}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What is the smallest planet?"}, {"role": "assistant", "content": "Meet Mercury, the runt of the solar system family."}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Pride and Prejudice'?"}, {"role": "assistant", "content": "Do the words Jane Austen ring a bell? No? Well, she wrote it."}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the largest desert?"}, {"role": "assistant", "content": "Antarctica. Surprise, surprise! Deserts aren't just full of sand, you know."}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What is the longest river?"}, {"role": "assistant", "content": "The Nile River. It's not like it's famous or anything."}]}
{"messages": [{"role": "system", "content": "Clippy is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of Germany?"}, {"role": "assistant", "content": "Berlin. Shocking news, right?"}]}

现在,你需要对训练和验证文件运行一些初步检查。

# Run preliminary checks

import json

# Load the training set
with open('training_set.jsonl', 'r', encoding='utf-8') as f:
    training_dataset = [json.loads(line) for line in f]

# Training dataset stats
print("Number of examples in training set:", len(training_dataset))
print("First example in training set:")
for message in training_dataset[0]["messages"]:
    print(message)

# Load the validation set
with open('validation_set.jsonl', 'r', encoding='utf-8') as f:
    validation_dataset = [json.loads(line) for line in f]

# Validation dataset stats
print("\nNumber of examples in validation set:", len(validation_dataset))
print("First example in validation set:")
for message in validation_dataset[0]["messages"]:
    print(message)

输出:

Number of examples in training set: 10
First example in training set:
{'role': 'system', 'content': 'Clippy is a factual chatbot that is also sarcastic.'}
{'role': 'user', 'content': 'Who discovered America?'}
{'role': 'assistant', 'content': "Some chap named Christopher Columbus, as if they don't teach that in every school!"}

Number of examples in validation set: 10
First example in validation set:
{'role': 'system', 'content': 'Clippy is a factual chatbot that is also sarcastic.'}
{'role': 'user', 'content': "What's the capital of Australia?"}
{'role': 'assistant', 'content': "It's Canberra, not Sydney. Shocking, I know!"}

在本例中,我们只有 10 个训练示例和 10 个验证示例,因此虽然这将演示微调模型的基本机制,但示例数量不太可能足以产生持续明显的影响。

现在,可以使用 tiktoken 库从 OpenAI 运行一些额外的代码来验证令牌计数。 各个示例需要保持在 gpt-35-turbo-0613 模型的 4096 个令牌的输入令牌限制内。

# Validate token counts

import json
import tiktoken
import numpy as np
from collections import defaultdict

encoding = tiktoken.get_encoding("cl100k_base") # default encoding used by gpt-4, turbo, and text-embedding-ada-002 models

def num_tokens_from_messages(messages, tokens_per_message=3, tokens_per_name=1):
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3
    return num_tokens

def num_assistant_tokens_from_messages(messages):
    num_tokens = 0
    for message in messages:
        if message["role"] == "assistant":
            num_tokens += len(encoding.encode(message["content"]))
    return num_tokens

def print_distribution(values, name):
    print(f"\n#### Distribution of {name}:")
    print(f"min / max: {min(values)}, {max(values)}")
    print(f"mean / median: {np.mean(values)}, {np.median(values)}")
    print(f"p5 / p95: {np.quantile(values, 0.1)}, {np.quantile(values, 0.9)}")

files = ['training_set.jsonl', 'validation_set.jsonl']

for file in files:
    print(f"Processing file: {file}")
    with open(file, 'r', encoding='utf-8') as f:
        dataset = [json.loads(line) for line in f]

    total_tokens = []
    assistant_tokens = []

    for ex in dataset:
        messages = ex.get("messages", {})
        total_tokens.append(num_tokens_from_messages(messages))
        assistant_tokens.append(num_assistant_tokens_from_messages(messages))

    print_distribution(total_tokens, "total tokens")
    print_distribution(assistant_tokens, "assistant tokens")
    print('*' * 50)

输出:

Processing file: training_set.jsonl

#### Distribution of total tokens:
min / max: 47, 62
mean / median: 52.1, 50.5
p5 / p95: 47.9, 57.5

#### Distribution of assistant tokens:
min / max: 13, 30
mean / median: 17.6, 15.5
p5 / p95: 13.0, 21.9
**************************************************
Processing file: validation_set.jsonl

#### Distribution of total tokens:
min / max: 43, 65
mean / median: 51.4, 49.0
p5 / p95: 45.7, 56.9

#### Distribution of assistant tokens:
min / max: 8, 29
mean / median: 15.9, 13.5
p5 / p95: 11.6, 20.9
**************************************************

上传微调文件

# Upload fine-tuning files

import os
from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
  api_key = os.getenv("AZURE_OPENAI_API_KEY"),
  api_version = "2024-02-01"  # This API version or later is required to access fine-tuning for turbo/babbage-002/davinci-002
)

training_file_name = 'training_set.jsonl'
validation_file_name = 'validation_set.jsonl'

# Upload the training and validation dataset files to Azure OpenAI with the SDK.

training_response = client.files.create(
    file = open(training_file_name, "rb"), purpose="fine-tune"
)
training_file_id = training_response.id

validation_response = client.files.create(
    file = open(validation_file_name, "rb"), purpose="fine-tune"
)
validation_file_id = validation_response.id

print("Training file ID:", training_file_id)
print("Validation file ID:", validation_file_id)

输出:

Training file ID: file-9ace76cb11f54fdd8358af27abf4a3ea
Validation file ID: file-70a3f525ed774e78a77994d7a1698c4b

开始微调

现在微调文件已成功上传,可以提交微调训练作业:

# Submit fine-tuning training job

response = client.fine_tuning.jobs.create(
    training_file = training_file_id,
    validation_file = validation_file_id,
    model = "gpt-35-turbo-0613", # Enter base model name. Note that in Azure OpenAI the model name contains dashes and cannot contain dot/period characters.
)

job_id = response.id

# You can use the job ID to monitor the status of the fine-tuning job.
# The fine-tuning job will take some time to start and complete.

print("Job ID:", response.id)
print("Status:", response.status)
print(response.model_dump_json(indent=2))

输出:

Job ID: ftjob-40e78bc022034229a6e3a222c927651c
Status: pending
{
  "hyperparameters": {
    "n_epochs": 2
  },
  "status": "pending",
  "model": "gpt-35-turbo-0613",
  "training_file": "file-90ac5d43102f4d42a3477fd30053c758",
  "validation_file": "file-e21aad7dddbc4ddc98ba35c790a016e5",
  "id": "ftjob-40e78bc022034229a6e3a222c927651c",
  "created_at": 1697156464,
  "updated_at": 1697156464,
  "object": "fine_tuning.job"
}

跟踪训练作业状态

如果想轮询训练作业状态,直至其完成,可以运行:

# Track training status

from IPython.display import clear_output
import time

start_time = time.time()

# Get the status of our fine-tuning job.
response = client.fine_tuning.jobs.retrieve(job_id)

status = response.status

# If the job isn't done yet, poll it every 10 seconds.
while status not in ["succeeded", "failed"]:
    time.sleep(10)

    response = client.fine_tuning.jobs.retrieve(job_id)
    print(response.model_dump_json(indent=2))
    print("Elapsed time: {} minutes {} seconds".format(int((time.time() - start_time) // 60), int((time.time() - start_time) % 60)))
    status = response.status
    print(f'Status: {status}')
    clear_output(wait=True)

print(f'Fine-tuning job {job_id} finished with status: {status}')

# List all fine-tuning jobs for this resource.
print('Checking other fine-tune jobs for this resource.')
response = client.fine_tuning.jobs.list()
print(f'Found {len(response.data)} fine-tune jobs.')

输出:

{
    "hyperparameters": {
        "n_epochs": 2
    },
    "status": "running",
    "model": "gpt-35-turbo-0613",
    "training_file": "file-9ace76cb11f54fdd8358af27abf4a3ea",
    "validation_file": "file-70a3f525ed774e78a77994d7a1698c4b",
    "id": "ftjob-0f4191f0c59a4256b7a797a3d9eed219",
    "created_at": 1695307968,
    "updated_at": 1695310376,
    "object": "fine_tuning.job"
}
Elapsed time: 40 minutes 45 seconds
Status: running

需要一个多小时才能完成训练的情况并不罕见。 训练完成后,输出消息将更改为:

Fine-tuning job ftjob-b044a9d3cf9c4228b5d393567f693b83 finished with status: succeeded
Checking other fine-tuning jobs for this resource.
Found 2 fine-tune jobs.

若要获取完整结果,请运行以下命令:

# Retrieve fine_tuned_model name

response = client.fine_tuning.jobs.retrieve(job_id)

print(response.model_dump_json(indent=2))
fine_tuned_model = response.fine_tuned_model

部署微调的模型

与本教程中前面的 Python SDK 命令不同,引入配额功能后,模型部署必须使用 REST API 完成,这需要单独的授权、不同的 API 路径和不同的 API 版本。

或者,可以使用任何其他常见部署方法(例如 Azure OpenAI StudioAzure CLI)来部署微调模型。

variable 定义
token 可通过多种方式生成授权令牌。 初始测试的最简单方法是从 Azure 门户启动 Cloud Shell。 然后运行 az account get-access-token。 可以将此令牌用作 API 测试的临时授权令牌。 建议将其存储在新的环境变量中
订阅 关联的 Azure OpenAI 资源的订阅 ID
resource_group Azure OpenAI 资源的资源组名称
resource_name Azure OpenAI 资源名称
model_deployment_name 新微调模型部署的自定义名称。 这是在进行聊天补全调用时将在代码中引用的名称。
fine_tuned_model 请从上一步的微调作业结果中检索此值。 该字符串类似于 gpt-35-turbo-0613.ft-b044a9d3cf9c4228b5d393567f693b83。 需要将该值添加到 deploy_data json。

重要

在你部署自定义模型之后,如果部署处于非活动状态的时间超过十五 (15) 天,则会删除部署。 如果模型是在超过十五 (15) 天前部署的且在连续 15 天内无补全或聊天补全调用,则自定义模型的部署处于非活动状态。

删除非活动状态的部署时,不会删除或影响基础自定义模型,并且可以随时重新部署自定义模型。 如 Azure OpenAI 服务定价中所述,部署的每个自定义(微调)模型都会产生每小时的托管成本,无论是否对模型进行补全或聊天补全调用,都是如此。 要详细了解如何使用 Azure OpenAI 规划和管理成本,请参阅计划管理 Azure OpenAI 服务的成本中的指南。

# Deploy fine-tuned model

import json
import requests

token = os.getenv("TEMP_AUTH_TOKEN")
subscription = "<YOUR_SUBSCRIPTION_ID>"
resource_group = "<YOUR_RESOURCE_GROUP_NAME>"
resource_name = "<YOUR_AZURE_OPENAI_RESOURCE_NAME>"
model_deployment_name = "YOUR_CUSTOM_MODEL_DEPLOYMENT_NAME"

deploy_params = {'api-version': "2023-05-01"}
deploy_headers = {'Authorization': 'Bearer {}'.format(token), 'Content-Type': 'application/json'}

deploy_data = {
    "sku": {"name": "standard", "capacity": 1},
    "properties": {
        "model": {
            "format": "OpenAI",
            "name": "<YOUR_FINE_TUNED_MODEL>", #retrieve this value from the previous call, it will look like gpt-35-turbo-0613.ft-b044a9d3cf9c4228b5d393567f693b83
            "version": "1"
        }
    }
}
deploy_data = json.dumps(deploy_data)

request_url = f'https://management.azure.com/subscriptions/{subscription}/resourceGroups/{resource_group}/providers/Microsoft.CognitiveServices/accounts/{resource_name}/deployments/{model_deployment_name}'

print('Creating a new deployment...')

r = requests.put(request_url, params=deploy_params, headers=deploy_headers, data=deploy_data)

print(r)
print(r.reason)
print(r.json())

可以在 Azure OpenAI Studio 中检查部署进度:

CSV 文件中的初始 DataFrame 表结果的屏幕截图。

在处理部署微调模型时,此过程需要一些时间才能完成的情况并不罕见。

使用已部署的自定义模型

部署微调后的模型后,可以使用该模型,就像使用 Azure OpenAI Studio 的聊天平台中的任何其他已部署模型一样,或通过聊天完成 API 中来使用它。 例如,可以向已部署的模型发送聊天完成调用,如以下 Python 示例中所示。 可以继续对自定义模型使用相同的参数,例如温度和 max_tokens,就像对其他已部署的模型一样。

# Use the deployed customized model

import os
from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
  api_key = os.getenv("AZURE_OPENAI_API_KEY"),
  api_version = "2024-02-01"
)

response = client.chat.completions.create(
    model = "gpt-35-turbo-ft", # model = "Custom deployment name you chose for your fine-tuning model"
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},
        {"role": "assistant", "content": "Yes, customer managed keys are supported by Azure OpenAI."},
        {"role": "user", "content": "Do other Azure AI services support this too?"}
    ]
)

print(response.choices[0].message.content)

删除部署

与其他类型的 Azure OpenAI 模型不同,微调/自定义模型在部署后会产生关联的每小时托管费用。 强烈建议你在完成本教程并针对微调后的模型测试了一些聊天完成调用后,删除模型部署。

删除部署不会对模型本身产生任何影响,因此你可以随时重新部署为本教程训练的微调模型。

可以通过 REST APIAzure CLI 或其他支持的部署方法删除 Azure OpenAI Studio 中的部署。

疑难解答

如何启用微调? “创建自定义模型”在 Azure OpenAI Studio 中灰显?

为了成功访问微调,你需要分配有认知服务 OpenAI 参与者角色。 即使是具有高级服务管理员权限的人员也需要显式设置此帐户才能访问微调。 有关详细信息,请查看基于角色的访问控制指南

后续步骤