1309 lines
48 KiB
Plaintext
1309 lines
48 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "aa3de8c6",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"source": [
|
||
"# 第八章 评估(上)——存在一个简单的正确答案时"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "c768620b",
|
||
"metadata": {},
|
||
"source": [
|
||
"在之前的几个视频中,我们展示了如何使用llm构建应用程序,包括从评估输入到处理输入再到在向用户显示输出之前进行最终输出检查。\n",
|
||
"\n",
|
||
"构建这样的系统后,如何知道它的工作情况?甚至在部署并让用户使用它时,如何跟踪它的运行情况并发现任何缺陷并继续改进系统的答案质量?\n",
|
||
"\n",
|
||
"在这个视频中,我想与您分享一些最佳实践,用于评估llm的输出。\n",
|
||
"\n",
|
||
"构建基于LLM的应用程序与传统监督学习应用程序之间的区别在于,因为您可以快速构建这样的应用程序,评估它的方法,通常不会从测试集开始。相反,您经常会逐渐建立一组测试示例。\n",
|
||
"\n",
|
||
"在传统的监督学习环境中,收集一个训练集、开发集或保留交叉验证集,然后在整个开发过程中使用它们。\n",
|
||
"\n",
|
||
"但是如果你能够在几分钟内指定一个提示,并在几个小时内得到一些工作成果,那么如果你不得不暂停很长时间收集一千个测试样本,那将会是一个巨大的痛苦,因为现在你可以在零个训练样本的情况下得到这个工作成果。\n",
|
||
"\n",
|
||
"因此,在使用LLM构建应用程序时,你将体会到如下的过程。\n",
|
||
"\n",
|
||
"首先,你会在只有一到三到五个样本的小样本中调整提示,并尝试让提示在它们身上起作用。\n",
|
||
"\n",
|
||
"然后,当系统进行额外的测试时,你偶尔会遇到一些棘手的例子。提示在它们身上不起作用,或者算法在它们身上不起作用。\n",
|
||
"\n",
|
||
"这就是使用chatgpt api的开发者如何构建应用程序的过程。\n",
|
||
"\n",
|
||
"在这种情况下,您可以将这些额外的一个或两个或三个或五个示例添加到您正在测试的集合中,以机会主义地添加其他棘手的示例。\n",
|
||
"\n",
|
||
"最终,您已经添加了足够的这些示例到您缓慢增长的开发集中,它变得有点不方便通过提示手动运行每个示例。\n",
|
||
"\n",
|
||
"然后,您开始开发在这些小示例集上用于衡量性能的指标,例如平均准确性。\n",
|
||
"\n",
|
||
"这个过程的一个有趣方面是如果您随时决定您的系统已经足够好了,你可以停在那里不用改进它。事实上,有许多部署应用程序停在第一或第二个步骤,并且运行得非常好。\n",
|
||
"\n",
|
||
"一个重要的警告是,有很多大模型的应用程序没有实质性的风险,即使它没有给出完全正确的答案。\n",
|
||
"\n",
|
||
"但是,对于部分高风险应用,如果存在偏见或不适当的输出的风险可能对某人造成伤害,那么收集测试集的责任、严格评估系统的性能、确保在使用之前它能够做正确的事情,这变得更加重要。\n",
|
||
"\n",
|
||
"但是,如果你正在使用它来总结文章只是为了自己阅读而不是别人,那么可能造成的危害风险更小,你可以在这个过程中早早停止,而不必去花费收集更大数据集的代价。"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "b0582759",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"source": [
|
||
"一、安装\n",
|
||
"\n",
|
||
"1.首先,我们需要加载API密钥和一些Python库。\n",
|
||
"\n",
|
||
"在这个课程中,我们已经帮你准备好了加载OpenAI API密钥的代码。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"id": "a9726b15",
|
||
"metadata": {
|
||
"height": 166
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"import os\n",
|
||
"import openai\n",
|
||
"import sys\n",
|
||
"import time\n",
|
||
"sys.path.append('../..')\n",
|
||
"import utils_en\n",
|
||
"import utils_zh\n",
|
||
"\n",
|
||
"openai.api_key = \"your_key\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"id": "458993db",
|
||
"metadata": {
|
||
"height": 149
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 封装一个使用 GPT3.5 的函数\n",
|
||
"def get_completion_from_messages(messages, model=\"gpt-3.5-turbo\", temperature=0, max_tokens=500):\n",
|
||
" response = openai.ChatCompletion.create(\n",
|
||
" model=model,\n",
|
||
" messages=messages,\n",
|
||
" temperature=temperature, \n",
|
||
" max_tokens=max_tokens, \n",
|
||
" )\n",
|
||
" return response.choices[0].message[\"content\"]"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "3b6a4c17",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"source": [
|
||
"2.获取相关产品和类别\n",
|
||
"\n",
|
||
"我们要获取前几章中提到的产品目录中的产品和类别列表。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"id": "6f4062ea",
|
||
"metadata": {
|
||
"height": 47
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"{'Computers and Laptops': ['TechPro Ultrabook',\n",
|
||
" 'BlueWave Gaming Laptop',\n",
|
||
" 'PowerLite Convertible',\n",
|
||
" 'TechPro Desktop',\n",
|
||
" 'BlueWave Chromebook'],\n",
|
||
" 'Smartphones and Accessories': ['SmartX ProPhone',\n",
|
||
" 'MobiTech PowerCase',\n",
|
||
" 'SmartX MiniPhone',\n",
|
||
" 'MobiTech Wireless Charger',\n",
|
||
" 'SmartX EarBuds'],\n",
|
||
" 'Televisions and Home Theater Systems': ['CineView 4K TV',\n",
|
||
" 'SoundMax Home Theater',\n",
|
||
" 'CineView 8K TV',\n",
|
||
" 'SoundMax Soundbar',\n",
|
||
" 'CineView OLED TV'],\n",
|
||
" 'Gaming Consoles and Accessories': ['GameSphere X',\n",
|
||
" 'ProGamer Controller',\n",
|
||
" 'GameSphere Y',\n",
|
||
" 'ProGamer Racing Wheel',\n",
|
||
" 'GameSphere VR Headset'],\n",
|
||
" 'Audio Equipment': ['AudioPhonic Noise-Canceling Headphones',\n",
|
||
" 'WaveSound Bluetooth Speaker',\n",
|
||
" 'AudioPhonic True Wireless Earbuds',\n",
|
||
" 'WaveSound Soundbar',\n",
|
||
" 'AudioPhonic Turntable'],\n",
|
||
" 'Cameras and Camcorders': ['FotoSnap DSLR Camera',\n",
|
||
" 'ActionCam 4K',\n",
|
||
" 'FotoSnap Mirrorless Camera',\n",
|
||
" 'ZoomMaster Camcorder',\n",
|
||
" 'FotoSnap Instant Camera']}"
|
||
]
|
||
},
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"products_and_category = utils_en.get_products_and_category()\n",
|
||
"products_and_category"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "d91f5384",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"source": [
|
||
"二、找出相关产品和类别名称(版本1)\n",
|
||
"\n",
|
||
"这可能是我们现在正在使用的版本。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"id": "e426619a",
|
||
"metadata": {
|
||
"height": 744
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 从用户输入中获取到产品和类别\n",
|
||
"def find_category_and_product_v1(user_input,products_and_category):\n",
|
||
"\n",
|
||
" # 分隔符\n",
|
||
" delimiter = \"####\"\n",
|
||
" # 定义的系统信息,陈述了需要 GPT 完成的工作\n",
|
||
" system_message = f\"\"\"\n",
|
||
" You will be provided with customer service queries. \\\n",
|
||
" The customer service query will be delimited with {delimiter} characters.\n",
|
||
" Output a python list of json objects, where each object has the following format:\n",
|
||
" 'category': <one of Computers and Laptops, Smartphones and Accessories, Televisions and Home Theater Systems, \\\n",
|
||
" Gaming Consoles and Accessories, Audio Equipment, Cameras and Camcorders>,\n",
|
||
" AND\n",
|
||
" 'products': <a list of products that must be found in the allowed products below>\n",
|
||
"\n",
|
||
"\n",
|
||
" Where the categories and products must be found in the customer service query.\n",
|
||
" If a product is mentioned, it must be associated with the correct category in the allowed products list below.\n",
|
||
" If no products or categories are found, output an empty list.\n",
|
||
" \n",
|
||
"\n",
|
||
" List out all products that are relevant to the customer service query based on how closely it relates\n",
|
||
" to the product name and product category.\n",
|
||
" Do not assume, from the name of the product, any features or attributes such as relative quality or price.\n",
|
||
"\n",
|
||
" The allowed products are provided in JSON format.\n",
|
||
" The keys of each item represent the category.\n",
|
||
" The values of each item is a list of products that are within that category.\n",
|
||
" Allowed products: {products_and_category}\n",
|
||
" \n",
|
||
"\n",
|
||
" \"\"\"\n",
|
||
" # 给出几个示例\n",
|
||
" few_shot_user_1 = \"\"\"I want the most expensive computer.\"\"\"\n",
|
||
" few_shot_assistant_1 = \"\"\" \n",
|
||
" [{'category': 'Computers and Laptops', \\\n",
|
||
"'products': ['TechPro Ultrabook', 'BlueWave Gaming Laptop', 'PowerLite Convertible', 'TechPro Desktop', 'BlueWave Chromebook']}]\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" messages = [ \n",
|
||
" {'role':'system', 'content': system_message}, \n",
|
||
" {'role':'user', 'content': f\"{delimiter}{few_shot_user_1}{delimiter}\"}, \n",
|
||
" {'role':'assistant', 'content': few_shot_assistant_1 },\n",
|
||
" {'role':'user', 'content': f\"{delimiter}{user_input}{delimiter}\"}, \n",
|
||
" ] \n",
|
||
" return get_completion_from_messages(messages)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"id": "ac683bfb",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"'''中文Prompt'''\n",
|
||
"def find_category_and_product_v1(user_input,products_and_category):\n",
|
||
"\n",
|
||
" delimiter = \"####\"\n",
|
||
" system_message = f\"\"\"\n",
|
||
" 您将提供客户服务查询。\\\n",
|
||
" 客户服务查询将用{delimiter}字符分隔。\n",
|
||
" 输出一个python列表,列表中的每个对象都是json对象,每个对象的格式如下:\n",
|
||
" 'category': <Computers and Laptops, Smartphones and Accessories, Televisions and Home Theater Systems, \\\n",
|
||
" Gaming Consoles and Accessories, Audio Equipment, Cameras and Camcorders中的一个>,\n",
|
||
" 以及\n",
|
||
" 'products': <必须在下面允许的产品中找到的产品列表>\n",
|
||
" \n",
|
||
" 其中类别和产品必须在客户服务查询中找到。\n",
|
||
" 如果提到了一个产品,它必须与下面允许的产品列表中的正确类别关联。\n",
|
||
" 如果没有找到产品或类别,输出一个空列表。\n",
|
||
" \n",
|
||
" 根据产品名称和产品类别与客户服务查询的相关性,列出所有相关的产品。\n",
|
||
" 不要从产品的名称中假设任何特性或属性,如相对质量或价格。\n",
|
||
" \n",
|
||
" 允许的产品以JSON格式提供。\n",
|
||
" 每个项目的键代表类别。\n",
|
||
" 每个项目的值是该类别中的产品列表。\n",
|
||
" 允许的产品:{products_and_category}\n",
|
||
" \n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" few_shot_user_1 = \"\"\"我想要最贵的电脑。\"\"\"\n",
|
||
" few_shot_assistant_1 = \"\"\" \n",
|
||
" [{'category': 'Computers and Laptops', \\\n",
|
||
"'products': ['TechPro Ultrabook', 'BlueWave Gaming Laptop', 'PowerLite Convertible', 'TechPro Desktop', 'BlueWave Chromebook']}]\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" messages = [ \n",
|
||
" {'role':'system', 'content': system_message}, \n",
|
||
" {'role':'user', 'content': f\"{delimiter}{few_shot_user_1}{delimiter}\"}, \n",
|
||
" {'role':'assistant', 'content': few_shot_assistant_1 },\n",
|
||
" {'role':'user', 'content': f\"{delimiter}{user_input}{delimiter}\"}, \n",
|
||
" ] \n",
|
||
" return get_completion_from_messages(messages)"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "aca82030",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"source": [
|
||
"三、在一些查询上进行评估"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"id": "09cb58f3",
|
||
"metadata": {
|
||
"height": 98
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
" [{'category': 'Televisions and Home Theater Systems', 'products': ['CineView 4K TV', 'SoundMax Home Theater', 'CineView 8K TV', 'SoundMax Soundbar', 'CineView OLED TV']}]\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# 第一个评估的查询\n",
|
||
"customer_msg_0 = f\"\"\"Which TV can I buy if I'm on a budget?\"\"\"\n",
|
||
"\n",
|
||
"products_by_category_0 = find_category_and_product_v1(customer_msg_0,\n",
|
||
" products_and_category)\n",
|
||
"print(products_by_category_0)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"id": "d2160d28",
|
||
"metadata": {
|
||
"height": 98
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
" [{'category': 'Smartphones and Accessories', 'products': ['MobiTech PowerCase', 'MobiTech Wireless Charger', 'SmartX EarBuds']}]\n",
|
||
"\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# 第二个评估的查询\n",
|
||
"customer_msg_1 = f\"\"\"I need a charger for my smartphone\"\"\"\n",
|
||
"\n",
|
||
"products_by_category_1 = find_category_and_product_v1(customer_msg_1,\n",
|
||
" products_and_category)\n",
|
||
"print(products_by_category_1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"id": "4de5c246",
|
||
"metadata": {
|
||
"height": 115
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"\" [{'category': 'Computers and Laptops', 'products': ['TechPro Ultrabook', 'BlueWave Gaming Laptop', 'PowerLite Convertible', 'TechPro Desktop', 'BlueWave Chromebook']}]\""
|
||
]
|
||
},
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# 第三个评估查询\n",
|
||
"customer_msg_2 = f\"\"\"\n",
|
||
"What computers do you have?\"\"\"\n",
|
||
"\n",
|
||
"products_by_category_2 = find_category_and_product_v1(customer_msg_2,\n",
|
||
" products_and_category)\n",
|
||
"products_by_category_2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 10,
|
||
"id": "74f16345",
|
||
"metadata": {
|
||
"height": 132
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
" [{'category': 'Smartphones and Accessories', 'products': ['SmartX ProPhone']},\n",
|
||
" {'category': 'Cameras and Camcorders', 'products': ['FotoSnap DSLR Camera']},\n",
|
||
" {'category': 'Televisions and Home Theater Systems', 'products': ['CineView 4K TV', 'SoundMax Home Theater', 'CineView 8K TV', 'SoundMax Soundbar', 'CineView OLED TV']}]\n",
|
||
" \n",
|
||
" Note: The query mentions \"smartx pro phone\" and \"fotosnap camera, the dslr one\", so the output includes the relevant categories and products. The query also asks about TVs, so the relevant category is included in the output.\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# 第四个查询,更复杂\n",
|
||
"customer_msg_3 = f\"\"\"\n",
|
||
"tell me about the smartx pro phone and the fotosnap camera, the dslr one.\n",
|
||
"Also, what TVs do you have?\"\"\"\n",
|
||
"\n",
|
||
"products_by_category_3 = find_category_and_product_v1(customer_msg_3,\n",
|
||
" products_and_category)\n",
|
||
"print(products_by_category_3)"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "f430fa3f",
|
||
"metadata": {},
|
||
"source": [
|
||
"中文Prompt评估"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"id": "cacb96b2",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
" [{'category': 'Televisions and Home Theater Systems', 'products': ['CineView 4K TV', 'SoundMax Home Theater', 'SoundMax Soundbar', 'CineView OLED TV']}]\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# 第一个评估的查询\n",
|
||
"customer_msg_0 = f\"\"\"如果我预算有限,我可以买哪款电视?\"\"\"\n",
|
||
"\n",
|
||
"products_by_category_0 = find_category_and_product_v1(customer_msg_0,\n",
|
||
" products_and_category)\n",
|
||
"print(products_by_category_0)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"id": "04364405",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
" [{'category': 'Smartphones and Accessories', 'products': ['MobiTech PowerCase', 'MobiTech Wireless Charger', 'SmartX EarBuds']}]\n",
|
||
"\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"customer_msg_1 = f\"\"\"我需要一个智能手机的充电器\"\"\"\n",
|
||
"\n",
|
||
"products_by_category_1 = find_category_and_product_v1(customer_msg_1,\n",
|
||
" products_and_category)\n",
|
||
"print(products_by_category_1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"id": "66e9ecd0",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"\" [{'category': 'Computers and Laptops', 'products': ['TechPro Ultrabook', 'BlueWave Gaming Laptop', 'PowerLite Convertible', 'TechPro Desktop', 'BlueWave Chromebook']}]\""
|
||
]
|
||
},
|
||
"execution_count": 8,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"customer_msg_2 = f\"\"\"\n",
|
||
"你们有哪些电脑?\"\"\"\n",
|
||
"\n",
|
||
"products_by_category_2 = find_category_and_product_v1(customer_msg_2,\n",
|
||
" products_and_category)\n",
|
||
"products_by_category_2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"id": "112cfd5f",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
" [{'category': 'Smartphones and Accessories', 'products': ['SmartX ProPhone']}, {'category': 'Cameras and Camcorders', 'products': ['FotoSnap DSLR Camera']}]\n",
|
||
" \n",
|
||
" {'category': 'Televisions and Home Theater Systems', 'products': ['CineView 4K TV', 'SoundMax Home Theater', 'CineView 8K TV', 'SoundMax Soundbar', 'CineView OLED TV']}\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"customer_msg_3 = f\"\"\"\n",
|
||
"告诉我关于smartx pro手机和fotosnap相机的信息,那款DSLR的。\n",
|
||
"另外,你们有哪些电视?\"\"\"\n",
|
||
"\n",
|
||
"products_by_category_3 = find_category_and_product_v1(customer_msg_3,\n",
|
||
" products_and_category)\n",
|
||
"print(products_by_category_3)"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "d58f15be",
|
||
"metadata": {},
|
||
"source": [
|
||
"它看起来像是输出了正确的数据,但它也输出了一堆文本,这些是多余的。这使得将其解析为Python字典列表更加困难。"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "ff2af235",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"source": [
|
||
"四、更难的测试用例\n",
|
||
"\n",
|
||
"找出一些在实际使用中,模型表现不如预期的查询。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"id": "4cbf55cd",
|
||
"metadata": {
|
||
"height": 132
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
" [{'category': 'Televisions and Home Theater Systems', 'products': ['CineView 8K TV']},\n",
|
||
" {'category': 'Gaming Consoles and Accessories', 'products': ['GameSphere X']},\n",
|
||
" {'category': 'Computers and Laptops', 'products': ['BlueWave Chromebook']}]\n",
|
||
" \n",
|
||
" Note: The CineView TV mentioned is the 8K one, and the Gamesphere console mentioned is the X one. \n",
|
||
" For the computer category, since the customer mentioned being on a budget, we cannot determine which specific product to recommend. \n",
|
||
" Therefore, we have included all the products in the Computers and Laptops category in the output.\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"customer_msg_4 = f\"\"\"\n",
|
||
"tell me about the CineView TV, the 8K one, Gamesphere console, the X one.\n",
|
||
"I'm on a budget, what computers do you have?\"\"\"\n",
|
||
"\n",
|
||
"products_by_category_4 = find_category_and_product_v1(customer_msg_4,\n",
|
||
" products_and_category)\n",
|
||
"print(products_by_category_4)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 10,
|
||
"id": "5b11172f",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
" [{'category': 'Televisions and Home Theater Systems', 'products': ['CineView 8K TV']}, {'category': 'Gaming Consoles and Accessories', 'products': ['GameSphere X']}, {'category': 'Computers and Laptops', 'products': ['TechPro Ultrabook', 'BlueWave Gaming Laptop', 'PowerLite Convertible', 'TechPro Desktop', 'BlueWave Chromebook']}]\n",
|
||
" \n",
|
||
" 具体来说,CineView 8K电视是一款高端电视,具有8K分辨率和OLED显示屏。GameSphere X是一款游戏机,具有高性能和多种游戏选择。对于预算有限的电脑,您可以考虑TechPro Chromebook或TechPro Ultrabook,它们都是较为经济实惠的选择。\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"'''中文Prompt'''\n",
|
||
"customer_msg_4 = f\"\"\"\n",
|
||
"告诉我关于CineView电视的信息,那款8K的,还有Gamesphere游戏机,X款的。\n",
|
||
"我预算有限,你们有哪些电脑?\"\"\"\n",
|
||
"\n",
|
||
"products_by_category_4 = find_category_and_product_v1(customer_msg_4,products_and_category)\n",
|
||
"print(products_by_category_4)"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "92b63d8b",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"source": [
|
||
"五、修改指令以处理难测试用例"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "ddcee6a5",
|
||
"metadata": {},
|
||
"source": [
|
||
"我们在提示中添加了以下内容,不要输出任何不在JSON格式中的附加文本,并添加了第二个示例,使用用户和助手消息进行few-shot提示。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 11,
|
||
"id": "5954e112",
|
||
"metadata": {
|
||
"height": 1016
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def find_category_and_product_v2(user_input,products_and_category):\n",
|
||
" \"\"\"\n",
|
||
" 添加:不要输出任何不符合JSON格式的额外文本。\n",
|
||
" 添加了第二个示例(用于few-shot提示),用户询问最便宜的计算机。\n",
|
||
" 在这两个few-shot示例中,显示的响应只是JSON格式的完整产品列表。\n",
|
||
" \"\"\"\n",
|
||
" delimiter = \"####\"\n",
|
||
" system_message = f\"\"\"\n",
|
||
" You will be provided with customer service queries. \\\n",
|
||
" The customer service query will be delimited with {delimiter} characters.\n",
|
||
" Output a python list of json objects, where each object has the following format:\n",
|
||
" 'category': <one of Computers and Laptops, Smartphones and Accessories, Televisions and Home Theater Systems, \\\n",
|
||
" Gaming Consoles and Accessories, Audio Equipment, Cameras and Camcorders>,\n",
|
||
" AND\n",
|
||
" 'products': <a list of products that must be found in the allowed products below>\n",
|
||
" Do not output any additional text that is not in JSON format.\n",
|
||
" Do not write any explanatory text after outputting the requested JSON.\n",
|
||
"\n",
|
||
"\n",
|
||
" Where the categories and products must be found in the customer service query.\n",
|
||
" If a product is mentioned, it must be associated with the correct category in the allowed products list below.\n",
|
||
" If no products or categories are found, output an empty list.\n",
|
||
" \n",
|
||
"\n",
|
||
" List out all products that are relevant to the customer service query based on how closely it relates\n",
|
||
" to the product name and product category.\n",
|
||
" Do not assume, from the name of the product, any features or attributes such as relative quality or price.\n",
|
||
"\n",
|
||
" The allowed products are provided in JSON format.\n",
|
||
" The keys of each item represent the category.\n",
|
||
" The values of each item is a list of products that are within that category.\n",
|
||
" Allowed products: {products_and_category}\n",
|
||
" \n",
|
||
"\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" few_shot_user_1 = \"\"\"I want the most expensive computer. What do you recommend?\"\"\"\n",
|
||
" few_shot_assistant_1 = \"\"\" \n",
|
||
" [{'category': 'Computers and Laptops', \\\n",
|
||
"'products': ['TechPro Ultrabook', 'BlueWave Gaming Laptop', 'PowerLite Convertible', 'TechPro Desktop', 'BlueWave Chromebook']}]\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" few_shot_user_2 = \"\"\"I want the most cheapest computer. What do you recommend?\"\"\"\n",
|
||
" few_shot_assistant_2 = \"\"\" \n",
|
||
" [{'category': 'Computers and Laptops', \\\n",
|
||
"'products': ['TechPro Ultrabook', 'BlueWave Gaming Laptop', 'PowerLite Convertible', 'TechPro Desktop', 'BlueWave Chromebook']}]\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" messages = [ \n",
|
||
" {'role':'system', 'content': system_message}, \n",
|
||
" {'role':'user', 'content': f\"{delimiter}{few_shot_user_1}{delimiter}\"}, \n",
|
||
" {'role':'assistant', 'content': few_shot_assistant_1 },\n",
|
||
" {'role':'user', 'content': f\"{delimiter}{few_shot_user_2}{delimiter}\"}, \n",
|
||
" {'role':'assistant', 'content': few_shot_assistant_2 },\n",
|
||
" {'role':'user', 'content': f\"{delimiter}{user_input}{delimiter}\"}, \n",
|
||
" ] \n",
|
||
" return get_completion_from_messages(messages)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 11,
|
||
"id": "d3b183bf",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def find_category_and_product_v2(user_input,products_and_category):\n",
|
||
" \"\"\"\n",
|
||
" 添加:不输出任何不是JSON格式的额外文本。\n",
|
||
" 添加了第二个例子(用于少数提示),用户询问最便宜的电脑。在两个少数提示的例子中,显示的响应只是产品列表的JSON格式。\n",
|
||
" \"\"\"\n",
|
||
" delimiter = \"####\"\n",
|
||
" system_message = f\"\"\"\n",
|
||
" 您将提供客户服务查询。\\\n",
|
||
" 客户服务查询将用{delimiter}字符分隔。\n",
|
||
" 输出一个python列表,列表中的每个对象都是json对象,每个对象的格式如下:\n",
|
||
" 'category': <Computers and Laptops, Smartphones and Accessories, Televisions and Home Theater Systems, \\\n",
|
||
" Gaming Consoles and Accessories, Audio Equipment, Cameras and Camcorders中的一个>,\n",
|
||
" AND\n",
|
||
" 'products': <必须在下面允许的产品中找到的产品列表>\n",
|
||
" 不要输出任何不是JSON格式的额外文本。\n",
|
||
" 输出请求的JSON后,不要写任何解释性的文本。\n",
|
||
" \n",
|
||
" 其中类别和产品必须在客户服务查询中找到。\n",
|
||
" 如果提到了一个产品,它必须与下面允许的产品列表中的正确类别关联。\n",
|
||
" 如果没有找到产品或类别,输出一个空列表。\n",
|
||
" \n",
|
||
" 根据产品名称和产品类别与客户服务查询的相关性,列出所有相关的产品。\n",
|
||
" 不要从产品的名称中假设任何特性或属性,如相对质量或价格。\n",
|
||
" \n",
|
||
" 允许的产品以JSON格式提供。\n",
|
||
" 每个项目的键代表类别。\n",
|
||
" 每个项目的值是该类别中的产品列表。\n",
|
||
" 允许的产品:{products_and_category}\n",
|
||
" \n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" few_shot_user_1 = \"\"\"我想要最贵的电脑。你推荐哪款?\"\"\"\n",
|
||
" few_shot_assistant_1 = \"\"\" \n",
|
||
" [{'category': 'Computers and Laptops', \\\n",
|
||
"'products': ['TechPro Ultrabook', 'BlueWave Gaming Laptop', 'PowerLite Convertible', 'TechPro Desktop', 'BlueWave Chromebook']}]\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" few_shot_user_2 = \"\"\"我想要最便宜的电脑。你推荐哪款?\"\"\"\n",
|
||
" few_shot_assistant_2 = \"\"\" \n",
|
||
" [{'category': 'Computers and Laptops', \\\n",
|
||
"'products': ['TechPro Ultrabook', 'BlueWave Gaming Laptop', 'PowerLite Convertible', 'TechPro Desktop', 'BlueWave Chromebook']}]\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" messages = [ \n",
|
||
" {'role':'system', 'content': system_message}, \n",
|
||
" {'role':'user', 'content': f\"{delimiter}{few_shot_user_1}{delimiter}\"}, \n",
|
||
" {'role':'assistant', 'content': few_shot_assistant_1 },\n",
|
||
" {'role':'user', 'content': f\"{delimiter}{few_shot_user_2}{delimiter}\"}, \n",
|
||
" {'role':'assistant', 'content': few_shot_assistant_2 },\n",
|
||
" {'role':'user', 'content': f\"{delimiter}{user_input}{delimiter}\"}, \n",
|
||
" ] \n",
|
||
" return get_completion_from_messages(messages)"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "83e8ab86",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"source": [
|
||
"六、在难测试用例上评估修改后的指令"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"id": "1e876345",
|
||
"metadata": {
|
||
"height": 132
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
" [{'category': 'Smartphones and Accessories', 'products': ['SmartX ProPhone']}, {'category': 'Cameras and Camcorders', 'products': ['FotoSnap DSLR Camera']}, {'category': 'Televisions and Home Theater Systems', 'products': ['CineView 4K TV', 'SoundMax Home Theater', 'CineView 8K TV', 'SoundMax Soundbar', 'CineView OLED TV']}]\n",
|
||
"\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"customer_msg_3 = f\"\"\"\n",
|
||
"tell me about the smartx pro phone and the fotosnap camera, the dslr one.\n",
|
||
"Also, what TVs do you have?\"\"\"\n",
|
||
"\n",
|
||
"products_by_category_3 = find_category_and_product_v2(customer_msg_3,\n",
|
||
" products_and_category)\n",
|
||
"print(products_by_category_3)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"id": "4a547b34",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
" [{'category': 'Smartphones and Accessories', 'products': ['SmartX ProPhone']}, {'category': 'Cameras and Camcorders', 'products': ['FotoSnap DSLR Camera']}, {'category': 'Televisions and Home Theater Systems', 'products': ['CineView 4K TV', 'SoundMax Home Theater', 'CineView 8K TV', 'SoundMax Soundbar', 'CineView OLED TV']}]\n",
|
||
"\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"customer_msg_3 = f\"\"\"\n",
|
||
"告诉我关于smartx pro手机和fotosnap相机的信息,那款DSLR的。\n",
|
||
"另外,你们有哪些电视?\"\"\"\n",
|
||
"\n",
|
||
"products_by_category_3 = find_category_and_product_v2(customer_msg_3,\n",
|
||
" products_and_category)\n",
|
||
"print(products_by_category_3)"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "22a0a17b",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"source": [
|
||
"七、回归测试:验证模型在以前的测试用例上仍然有效\n",
|
||
"\n",
|
||
"检查修改模型以修复难测试用例是否对其在以前测试用例上的性能产生负面影响。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"id": "f2a46445",
|
||
"metadata": {
|
||
"height": 98
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
" [{'category': 'Televisions and Home Theater Systems', 'products': ['CineView 4K TV', 'SoundMax Home Theater', 'CineView 8K TV', 'SoundMax Soundbar', 'CineView OLED TV']}]\n",
|
||
"\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"customer_msg_0 = f\"\"\"Which TV can I buy if I'm on a budget?\"\"\"\n",
|
||
"\n",
|
||
"products_by_category_0 = find_category_and_product_v2(customer_msg_0,\n",
|
||
" products_and_category)\n",
|
||
"print(products_by_category_0)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"id": "b5ba773b",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
" \n",
|
||
"\n",
|
||
" [{'category': 'Televisions and Home Theater Systems', 'products': ['CineView 4K TV', 'SoundMax Home Theater', 'CineView 8K TV', 'SoundMax Soundbar', 'CineView OLED TV']}]\n",
|
||
" \n",
|
||
" 如果您的预算有限,我们建议您购买CineView 4K电视或SoundMax家庭影院。\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"customer_msg_0 = f\"\"\"如果我预算有限,我可以买哪款电视?\"\"\"\n",
|
||
"\n",
|
||
"products_by_category_0 = find_category_and_product_v2(customer_msg_0,\n",
|
||
" products_and_category)\n",
|
||
"print(products_by_category_0)"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "4440ce1f",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"source": [
|
||
"八、收集开发集进行自动化测试"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "2af63218",
|
||
"metadata": {},
|
||
"source": [
|
||
"当你要调整的开发集不仅仅是一小部分示例时,开始自动化测试过程就变得有用了。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 14,
|
||
"id": "8a0b751f",
|
||
"metadata": {
|
||
"height": 207
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"msg_ideal_pairs_set = [\n",
|
||
" \n",
|
||
" # eg 0\n",
|
||
" {'customer_msg':\"\"\"Which TV can I buy if I'm on a budget?\"\"\",\n",
|
||
" 'ideal_answer':{\n",
|
||
" 'Televisions and Home Theater Systems':set(\n",
|
||
" ['CineView 4K TV', 'SoundMax Home Theater', 'CineView 8K TV', 'SoundMax Soundbar', 'CineView OLED TV']\n",
|
||
" )}\n",
|
||
" },\n",
|
||
"\n",
|
||
" # eg 1\n",
|
||
" {'customer_msg':\"\"\"I need a charger for my smartphone\"\"\",\n",
|
||
" 'ideal_answer':{\n",
|
||
" 'Smartphones and Accessories':set(\n",
|
||
" ['MobiTech PowerCase', 'MobiTech Wireless Charger', 'SmartX EarBuds']\n",
|
||
" )}\n",
|
||
" },\n",
|
||
" # eg 2\n",
|
||
" {'customer_msg':f\"\"\"What computers do you have?\"\"\",\n",
|
||
" 'ideal_answer':{\n",
|
||
" 'Computers and Laptops':set(\n",
|
||
" ['TechPro Ultrabook', 'BlueWave Gaming Laptop', 'PowerLite Convertible', 'TechPro Desktop', 'BlueWave Chromebook'\n",
|
||
" ])\n",
|
||
" }\n",
|
||
" },\n",
|
||
"\n",
|
||
" # eg 3\n",
|
||
" {'customer_msg':f\"\"\"tell me about the smartx pro phone and \\\n",
|
||
" the fotosnap camera, the dslr one.\\\n",
|
||
" Also, what TVs do you have?\"\"\",\n",
|
||
" 'ideal_answer':{\n",
|
||
" 'Smartphones and Accessories':set(\n",
|
||
" ['SmartX ProPhone']),\n",
|
||
" 'Cameras and Camcorders':set(\n",
|
||
" ['FotoSnap DSLR Camera']),\n",
|
||
" 'Televisions and Home Theater Systems':set(\n",
|
||
" ['CineView 4K TV', 'SoundMax Home Theater','CineView 8K TV', 'SoundMax Soundbar', 'CineView OLED TV'])\n",
|
||
" }\n",
|
||
" }, \n",
|
||
" \n",
|
||
" # eg 4\n",
|
||
" {'customer_msg':\"\"\"tell me about the CineView TV, the 8K one, Gamesphere console, the X one.\n",
|
||
"I'm on a budget, what computers do you have?\"\"\",\n",
|
||
" 'ideal_answer':{\n",
|
||
" 'Televisions and Home Theater Systems':set(\n",
|
||
" ['CineView 8K TV']),\n",
|
||
" 'Gaming Consoles and Accessories':set(\n",
|
||
" ['GameSphere X']),\n",
|
||
" 'Computers and Laptops':set(\n",
|
||
" ['TechPro Ultrabook', 'BlueWave Gaming Laptop', 'PowerLite Convertible', 'TechPro Desktop', 'BlueWave Chromebook'])\n",
|
||
" }\n",
|
||
" },\n",
|
||
" \n",
|
||
" # eg 5\n",
|
||
" {'customer_msg':f\"\"\"What smartphones do you have?\"\"\",\n",
|
||
" 'ideal_answer':{\n",
|
||
" 'Smartphones and Accessories':set(\n",
|
||
" ['SmartX ProPhone', 'MobiTech PowerCase', 'SmartX MiniPhone', 'MobiTech Wireless Charger', 'SmartX EarBuds'\n",
|
||
" ])\n",
|
||
" }\n",
|
||
" },\n",
|
||
" # eg 6\n",
|
||
" {'customer_msg':f\"\"\"I'm on a budget. Can you recommend some smartphones to me?\"\"\",\n",
|
||
" 'ideal_answer':{\n",
|
||
" 'Smartphones and Accessories':set(\n",
|
||
" ['SmartX EarBuds', 'SmartX MiniPhone', 'MobiTech PowerCase', 'SmartX ProPhone', 'MobiTech Wireless Charger']\n",
|
||
" )}\n",
|
||
" },\n",
|
||
"\n",
|
||
" # eg 7 # this will output a subset of the ideal answer\n",
|
||
" {'customer_msg':f\"\"\"What Gaming consoles would be good for my friend who is into racing games?\"\"\",\n",
|
||
" 'ideal_answer':{\n",
|
||
" 'Gaming Consoles and Accessories':set([\n",
|
||
" 'GameSphere X',\n",
|
||
" 'ProGamer Controller',\n",
|
||
" 'GameSphere Y',\n",
|
||
" 'ProGamer Racing Wheel',\n",
|
||
" 'GameSphere VR Headset'\n",
|
||
" ])}\n",
|
||
" },\n",
|
||
" # eg 8\n",
|
||
" {'customer_msg':f\"\"\"What could be a good present for my videographer friend?\"\"\",\n",
|
||
" 'ideal_answer': {\n",
|
||
" 'Cameras and Camcorders':set([\n",
|
||
" 'FotoSnap DSLR Camera', 'ActionCam 4K', 'FotoSnap Mirrorless Camera', 'ZoomMaster Camcorder', 'FotoSnap Instant Camera'\n",
|
||
" ])}\n",
|
||
" },\n",
|
||
" \n",
|
||
" # eg 9\n",
|
||
" {'customer_msg':f\"\"\"I would like a hot tub time machine.\"\"\",\n",
|
||
" 'ideal_answer': []\n",
|
||
" }\n",
|
||
" \n",
|
||
"]\n"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "6e0f1db4",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"source": [
|
||
"九、通过与理想答案比较来评估测试用例"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"id": "d9530285",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import json\n",
|
||
"# 与理想答案比较\n",
|
||
"def eval_response_with_ideal(response,\n",
|
||
" ideal,\n",
|
||
" debug=False):\n",
|
||
" \n",
|
||
" if debug:\n",
|
||
" print(\"回复:\")\n",
|
||
" print(response)\n",
|
||
" \n",
|
||
" # json.loads() 只能解析双引号,因此此处将单引号替换为双引号\n",
|
||
" json_like_str = response.replace(\"'\",'\"')\n",
|
||
" \n",
|
||
" # 解析为一系列的字典\n",
|
||
" l_of_d = json.loads(json_like_str)\n",
|
||
" \n",
|
||
" # 当响应为空,即没有找到任何商品时\n",
|
||
" if l_of_d == [] and ideal == []:\n",
|
||
" return 1\n",
|
||
" \n",
|
||
" # 另外一种异常情况是,标准答案数量与回复答案数量不匹配\n",
|
||
" elif l_of_d == [] or ideal == []:\n",
|
||
" return 0\n",
|
||
" \n",
|
||
" # 统计正确答案数量\n",
|
||
" correct = 0 \n",
|
||
" \n",
|
||
" if debug:\n",
|
||
" print(\"l_of_d is\")\n",
|
||
" print(l_of_d)\n",
|
||
"\n",
|
||
" # 对每一个问答对 \n",
|
||
" for d in l_of_d:\n",
|
||
"\n",
|
||
" # 获取产品和目录\n",
|
||
" cat = d.get('category')\n",
|
||
" prod_l = d.get('products')\n",
|
||
" # 有获取到产品和目录\n",
|
||
" if cat and prod_l:\n",
|
||
" # convert list to set for comparison\n",
|
||
" prod_set = set(prod_l)\n",
|
||
" # get ideal set of products\n",
|
||
" ideal_cat = ideal.get(cat)\n",
|
||
" if ideal_cat:\n",
|
||
" prod_set_ideal = set(ideal.get(cat))\n",
|
||
" else:\n",
|
||
" if debug:\n",
|
||
" print(f\"没有在标准答案中找到目录 {cat}\")\n",
|
||
" print(f\"标准答案: {ideal}\")\n",
|
||
" continue\n",
|
||
" \n",
|
||
" if debug:\n",
|
||
" print(\"产品集合:\\n\",prod_set)\n",
|
||
" print()\n",
|
||
" print(\"标准答案的产品集合:\\n\",prod_set_ideal)\n",
|
||
"\n",
|
||
" # 查找到的产品集合和标准的产品集合一致\n",
|
||
" if prod_set == prod_set_ideal:\n",
|
||
" if debug:\n",
|
||
" print(\"正确\")\n",
|
||
" correct +=1\n",
|
||
" else:\n",
|
||
" print(\"错误\")\n",
|
||
" print(f\"产品集合: {prod_set}\")\n",
|
||
" print(f\"标准的产品集合: {prod_set_ideal}\")\n",
|
||
" if prod_set <= prod_set_ideal:\n",
|
||
" print(\"回答是标准答案的一个子集\")\n",
|
||
" elif prod_set >= prod_set_ideal:\n",
|
||
" print(\"回答是标准答案的一个超集\")\n",
|
||
"\n",
|
||
" # 计算正确答案数\n",
|
||
" pc_correct = correct / len(l_of_d)\n",
|
||
" \n",
|
||
" return pc_correct"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"id": "e06d9fe3",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"用户提问: What Gaming consoles would be good for my friend who is into racing games?\n",
|
||
"标准答案: {'Gaming Consoles and Accessories': {'GameSphere VR Headset', 'GameSphere X', 'ProGamer Controller', 'ProGamer Racing Wheel', 'GameSphere Y'}}\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(f'用户提问: {msg_ideal_pairs_set[7][\"customer_msg\"]}')\n",
|
||
"print(f'标准答案: {msg_ideal_pairs_set[7][\"ideal_answer\"]}')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 17,
|
||
"id": "2ff332b4",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"回答: [{'category': 'Gaming Consoles and Accessories', 'products': ['ProGamer Controller', 'ProGamer Racing Wheel', 'GameSphere VR Headset']}]\n",
|
||
"错误\n",
|
||
"产品集合: {'ProGamer Racing Wheel', 'ProGamer Controller', 'GameSphere VR Headset'}\n",
|
||
"标准的产品集合: {'GameSphere VR Headset', 'GameSphere X', 'ProGamer Racing Wheel', 'ProGamer Controller', 'GameSphere Y'}\n",
|
||
"回答是标准答案的一个子集\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0.0"
|
||
]
|
||
},
|
||
"execution_count": 17,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"response = find_category_and_product_v2(msg_ideal_pairs_set[7][\"customer_msg\"],\n",
|
||
" products_and_category)\n",
|
||
"print(f'回答: {response}')\n",
|
||
"\n",
|
||
"eval_response_with_ideal(response,\n",
|
||
" msg_ideal_pairs_set[7][\"ideal_answer\"])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 17,
|
||
"id": "bb7f5a2f",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"回答: [{'category': 'Gaming Consoles and Accessories', 'products': ['GameSphere X', 'ProGamer Controller', 'GameSphere Y', 'ProGamer Racing Wheel', 'GameSphere VR Headset']}]\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0.0"
|
||
]
|
||
},
|
||
"execution_count": 17,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"'''调用中文Prompt'''\n",
|
||
"response = find_category_and_product_v2(msg_ideal_pairs_set[7][\"customer_msg\"],\n",
|
||
" products_and_category)\n",
|
||
"print(f'回答: {response}')\n",
|
||
"\n",
|
||
"eval_response_with_ideal(response,\n",
|
||
" msg_ideal_pairs_set[7][\"ideal_answer\"])"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "d1313b17",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"source": [
|
||
"十、在所有测试用例上运行评估,并计算正确的用例比例\n",
|
||
"\n",
|
||
"注意:如果任何api调用超时,将无法运行"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 20,
|
||
"id": "d39407c0",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"示例 0\n",
|
||
"0: 1.0\n",
|
||
"示例 1\n",
|
||
"1: 1.0\n",
|
||
"示例 2\n",
|
||
"2: 1.0\n",
|
||
"示例 3\n",
|
||
"3: 1.0\n",
|
||
"示例 4\n",
|
||
"4: 1.0\n",
|
||
"示例 5\n",
|
||
"5: 1.0\n",
|
||
"示例 6\n",
|
||
"6: 1.0\n",
|
||
"示例 7\n",
|
||
"错误\n",
|
||
"产品集合: {'ProGamer Racing Wheel', 'ProGamer Controller', 'GameSphere VR Headset'}\n",
|
||
"标准的产品集合: {'GameSphere VR Headset', 'GameSphere X', 'ProGamer Racing Wheel', 'ProGamer Controller', 'GameSphere Y'}\n",
|
||
"回答是标准答案的一个子集\n",
|
||
"7: 0.0\n",
|
||
"示例 8\n",
|
||
"8: 1.0\n",
|
||
"示例 9\n",
|
||
"9: 1\n",
|
||
"正确比例为 10: 0.9\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"score_accum = 0\n",
|
||
"for i, pair in enumerate(msg_ideal_pairs_set):\n",
|
||
" time.sleep(20)\n",
|
||
" print(f\"示例 {i}\")\n",
|
||
" \n",
|
||
" customer_msg = pair['customer_msg']\n",
|
||
" ideal = pair['ideal_answer']\n",
|
||
" \n",
|
||
" # print(\"Customer message\",customer_msg)\n",
|
||
" # print(\"ideal:\",ideal)\n",
|
||
" response = find_category_and_product_v2(customer_msg,\n",
|
||
" products_and_category)\n",
|
||
"\n",
|
||
" \n",
|
||
" # print(\"products_by_category\",products_by_category)\n",
|
||
" score = eval_response_with_ideal(response,ideal,debug=False)\n",
|
||
" print(f\"{i}: {score}\")\n",
|
||
" score_accum += score\n",
|
||
" \n",
|
||
"\n",
|
||
"n_examples = len(msg_ideal_pairs_set)\n",
|
||
"fraction_correct = score_accum / n_examples\n",
|
||
"print(f\"正确比例为 {n_examples}: {fraction_correct}\")"
|
||
]
|
||
},
|
||
{
|
||
"attachments": {},
|
||
"cell_type": "markdown",
|
||
"id": "5d885db6",
|
||
"metadata": {},
|
||
"source": [
|
||
"使用提示构建应用程序的工作流程与使用监督学习构建应用程序的工作流程非常不同。\n",
|
||
"\n",
|
||
"因此,我认为这是需要记住的一件好事,当你正在构建监督学习时,迭代的速度感觉要快得多。\n",
|
||
"\n",
|
||
"如果你还没有这样做过,你可能会惊讶于一个评估方法仅建立在一些手工策划的棘手例子上的表现如何。你可能认为只有10个例子是不具有统计学意义的。但当你实际使用这个过程时,你可能会惊讶于添加一些棘手的例子到开发集中的有效性。\n",
|
||
"\n",
|
||
"这对于帮助你和你的团队找到有效的提示和有效的系统非常有帮助。\n",
|
||
"\n",
|
||
"在这个视频中,输出可以定量评估,就像有一个期望的输出一样,你可以判断它是否给出了这个期望的输出。因此,在下一个视频中,让我们看看如何在这种更加模糊的情况下评估我们的输出。在那种情况下,什么是正确答案是有点模糊的。"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.10.11"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|