diff --git a/docs/content/C2 Building Systems with the ChatGPT API/8.搭建一个带评估的端到端问答系统 Evaluation.ipynb b/docs/content/C2 Building Systems with the ChatGPT API/8.搭建一个带评估的端到端问答系统 Evaluation.ipynb new file mode 100644 index 0000000..8f4212f --- /dev/null +++ b/docs/content/C2 Building Systems with the ChatGPT API/8.搭建一个带评估的端到端问答系统 Evaluation.ipynb @@ -0,0 +1,529 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 第八章 搭建一个带评估的端到端问答系统\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "在本章中,我们将搭建一个带评估的端到端问答系统,这个系统综合了之前多节课的内容,并加入了评估过程。\n", + "\n", + "1. 检查输入,确认其是否能通过审核 API 的审核。\n", + "\n", + "2. 如果通过了审核,我们将查找产品列表。\n", + "\n", + "3. 如果找到了产品,我们将尝试查找它们的相关信息。\n", + "\n", + "4. 我们使用模型回答用户提出的问题。\n", + "\n", + "5. 我们将通过审核 API 对生成的答案进行审核。\n", + "\n", + "如果没有被标记为有害的,我们将把答案返回给用户。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 二、端到端实现问答系统" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "第一步:输入通过 Moderation 检查\n", + "第二步:抽取出商品列表\n", + "第三步:查找抽取出的商品信息\n", + "第四步:生成用户回答\n", + "第五步:输出经过 Moderation 检查\n", + "第六步:模型评估该回答\n", + "第七步:模型赞同了该回答.\n", + "关于SmartX ProPhone和FotoSnap相机的信息如下:\n", + "\n", + "SmartX ProPhone:\n", + "- 品牌:SmartX\n", + "- 型号:SX-PP10\n", + "- 屏幕尺寸:6.1英寸\n", + "- 存储容量:128GB\n", + "- 相机:12MP双摄像头\n", + "- 网络:支持5G\n", + "- 保修:1年\n", + "- 价格:899.99美元\n", + "\n", + "FotoSnap相机系列:\n", + "1. FotoSnap DSLR相机:\n", + "- 品牌:FotoSnap\n", + "- 型号:FS-DSLR200\n", + "- 传感器:24.2MP\n", + "- 视频:1080p\n", + "- 屏幕:3英寸LCD\n", + "- 可更换镜头\n", + "- 保修:1年\n", + "- 价格:599.99美元\n", + "\n", + "2. FotoSnap无反相机:\n", + "- 品牌:FotoSnap\n", + "- 型号:FS-ML100\n", + "- 传感器:20.1MP\n", + "- 视频:4K\n", + "- 屏幕:3英寸触摸屏\n", + "- 可更换镜头\n", + "- 保修:1年\n", + "- 价格:799.99美元\n", + "\n", + "3. FotoSnap即时相机:\n", + "- 品牌:FotoSnap\n", + "- 型号:FS-IC10\n", + "- 即时打印\n", + "- 内置闪光灯\n", + "- 自拍镜\n", + "- 电池供电\n", + "- 保修:1年\n", + "- 价格:69.99美元\n", + "\n", + "关于我们的电视情况如下:\n", + "\n", + "1. CineView 4K电视:\n", + "- 品牌:CineView\n", + "- 型号:CV-4K55\n", + "- 屏幕尺寸:55英寸\n", + "- 分辨率:4K\n", + "- HDR支持\n", + "- 智能电视功能\n", + "- 保修:2年\n", + "- 价格:599.99美元\n", + "\n", + "2. CineView 8K电视:\n", + "- 品牌:\n" + ] + } + ], + "source": [ + "import openai \n", + "import utils_zh\n", + "from tool import get_completion_from_messages\n", + "\n", + "'''\n", + "注意:限于模型对中文理解能力较弱,中文 Prompt 可能会随机出现不成功,可以多次运行;也非常欢迎同学探究更稳定的中文 Prompt\n", + "'''\n", + "def process_user_message_ch(user_input, all_messages, debug=True):\n", + " \"\"\"\n", + " 对用户信息进行预处理\n", + " \n", + " 参数:\n", + " user_input : 用户输入\n", + " all_messages : 历史信息\n", + " debug : 是否开启 DEBUG 模式,默认开启\n", + " \"\"\"\n", + " # 分隔符\n", + " delimiter = \"```\"\n", + " \n", + " # 第一步: 使用 OpenAI 的 Moderation API 检查用户输入是否合规或者是一个注入的 Prompt\n", + " response = openai.Moderation.create(input=user_input)\n", + " moderation_output = response[\"results\"][0]\n", + "\n", + " # 经过 Moderation API 检查该输入不合规\n", + " if moderation_output[\"flagged\"]:\n", + " print(\"第一步:输入被 Moderation 拒绝\")\n", + " return \"抱歉,您的请求不合规\"\n", + "\n", + " # 如果开启了 DEBUG 模式,打印实时进度\n", + " if debug: print(\"第一步:输入通过 Moderation 检查\")\n", + " \n", + " # 第二步:抽取出商品和对应的目录,类似于之前课程中的方法,做了一个封装\n", + " category_and_product_response = utils_zh.find_category_and_product_only(user_input, utils_zh.get_products_and_category())\n", + " #print(category_and_product_response)\n", + " # 将抽取出来的字符串转化为列表\n", + " category_and_product_list = utils_zh.read_string_to_list(category_and_product_response)\n", + " #print(category_and_product_list)\n", + "\n", + " if debug: print(\"第二步:抽取出商品列表\")\n", + "\n", + " # 第三步:查找商品对应信息\n", + " product_information = utils_zh.generate_output_string(category_and_product_list)\n", + " if debug: print(\"第三步:查找抽取出的商品信息\")\n", + "\n", + " # 第四步:根据信息生成回答\n", + " system_message = f\"\"\"\n", + " 您是一家大型电子商店的客户服务助理。\\\n", + " 请以友好和乐于助人的语气回答问题,并提供简洁明了的答案。\\\n", + " 请确保向用户提出相关的后续问题。\n", + " \"\"\"\n", + " # 插入 message\n", + " messages = [\n", + " {'role': 'system', 'content': system_message},\n", + " {'role': 'user', 'content': f\"{delimiter}{user_input}{delimiter}\"},\n", + " {'role': 'assistant', 'content': f\"相关商品信息:\\n{product_information}\"}\n", + " ]\n", + " # 获取 GPT3.5 的回答\n", + " # 通过附加 all_messages 实现多轮对话\n", + " final_response = get_completion_from_messages(all_messages + messages)\n", + " if debug:print(\"第四步:生成用户回答\")\n", + " # 将该轮信息加入到历史信息中\n", + " all_messages = all_messages + messages[1:]\n", + "\n", + " # 第五步:基于 Moderation API 检查输出是否合规\n", + " response = openai.Moderation.create(input=final_response)\n", + " moderation_output = response[\"results\"][0]\n", + "\n", + " # 输出不合规\n", + " if moderation_output[\"flagged\"]:\n", + " if debug: print(\"第五步:输出被 Moderation 拒绝\")\n", + " return \"抱歉,我们不能提供该信息\"\n", + "\n", + " if debug: print(\"第五步:输出经过 Moderation 检查\")\n", + "\n", + " # 第六步:模型检查是否很好地回答了用户问题\n", + " user_message = f\"\"\"\n", + " 用户信息: {delimiter}{user_input}{delimiter}\n", + " 代理回复: {delimiter}{final_response}{delimiter}\n", + "\n", + " 回复是否足够回答问题\n", + " 如果足够,回答 Y\n", + " 如果不足够,回答 N\n", + " 仅回答上述字母即可\n", + " \"\"\"\n", + " # print(final_response)\n", + " messages = [\n", + " {'role': 'system', 'content': system_message},\n", + " {'role': 'user', 'content': user_message}\n", + " ]\n", + " # 要求模型评估回答\n", + " evaluation_response = get_completion_from_messages(messages)\n", + " # print(evaluation_response)\n", + " if debug: print(\"第六步:模型评估该回答\")\n", + "\n", + " # 第七步:如果评估为 Y,输出回答;如果评估为 N,反馈将由人工修正答案\n", + " if \"Y\" in evaluation_response: # 使用 in 来避免模型可能生成 Yes\n", + " if debug: print(\"第七步:模型赞同了该回答.\")\n", + " return final_response, all_messages\n", + " else:\n", + " if debug: print(\"第七步:模型不赞成该回答.\")\n", + " neg_str = \"很抱歉,我无法提供您所需的信息。我将为您转接到一位人工客服代表以获取进一步帮助。\"\n", + " return neg_str, all_messages\n", + "\n", + "user_input = \"请告诉我关于 smartx pro phone 和 the fotosnap camera 的信息。另外,请告诉我关于你们的tvs的情况。\"\n", + "response,_ = process_user_message_ch(user_input,[])\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 二、持续收集用户和助手消息" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "实现一个可视化界面" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "# 调用中文 Prompt 版本\n", + "def collect_messages_ch(debug=True):\n", + " \"\"\"\n", + " 用于收集用户的输入并生成助手的回答\n", + "\n", + " 参数:\n", + " debug: 用于觉得是否开启调试模式\n", + " \"\"\"\n", + " user_input = inp.value_input\n", + " if debug: print(f\"User Input = {user_input}\")\n", + " if user_input == \"\":\n", + " return\n", + " inp.value = ''\n", + " global context\n", + " # 调用 process_user_message 函数\n", + " #response, context = process_user_message(user_input, context, utils.get_products_and_category(),debug=True)\n", + " response, context = process_user_message_ch(user_input, context, debug=False)\n", + " # print(response)\n", + " context.append({'role':'assistant', 'content':f\"{response}\"})\n", + " panels.append(\n", + " pn.Row('User:', pn.pane.Markdown(user_input, width=600)))\n", + " panels.append(\n", + " pn.Row('Assistant:', pn.pane.Markdown(response, width=600, style={'background-color': '#F6F6F6'})))\n", + " \n", + " return pn.Column(*panels) # 包含了所有的对话信息" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import panel as pn # 用于图形化界面\n", + "pn.extension()\n", + "\n", + "panels = [] # collect display \n", + "\n", + "# 系统信息\n", + "context = [ {'role':'system', 'content':\"You are Service Assistant\"} ] \n", + "\n", + "inp = pn.widgets.TextInput( placeholder='Enter text here…')\n", + "button_conversation = pn.widgets.Button(name=\"Service Assistant\")\n", + "\n", + "interactive_conversation = pn.bind(collect_messages_ch, button_conversation)\n", + "\n", + "dashboard = pn.Column(\n", + " inp,\n", + " pn.Row(button_conversation),\n", + " pn.panel(interactive_conversation, loading_indicator=True, height=300),\n", + ")\n", + "\n", + "dashboard" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "下图展示了该问答系统的运行实况:\n", + "\n", + "![](../../../figures/docs/C2/ch8-example.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "通过监控系统在更多输入上的质量,您可以修改步骤,提高系统的整体性能。\n", + "\n", + "也许我们会发现,对于某些步骤,我们的提示可能更好,也许有些步骤甚至不必要,也许我们会找到更好的检索方法等等。\n", + "\n", + "我们将在下一章中进一步讨论这个问题。 " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 三、英文版" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**1.1 端到端问答系统**" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "第一步:输入通过 Moderation 检查\n", + "第二步:抽取出商品列表\n", + "第三步:查找抽取出的商品信息\n", + "第四步:生成用户回答\n", + "第五步:输出经过 Moderation 检查\n", + "第六步:模型评估该回答\n", + "第七步:模型赞同了该回答.\n", + "Sure! Here's some information about the SmartX ProPhone and the FotoSnap DSLR Camera:\n", + "\n", + "1. SmartX ProPhone:\n", + " - Brand: SmartX\n", + " - Model Number: SX-PP10\n", + " - Features: 6.1-inch display, 128GB storage, 12MP dual camera, 5G connectivity\n", + " - Description: A powerful smartphone with advanced camera features.\n", + " - Price: $899.99\n", + " - Warranty: 1 year\n", + "\n", + "2. FotoSnap DSLR Camera:\n", + " - Brand: FotoSnap\n", + " - Model Number: FS-DSLR200\n", + " - Features: 24.2MP sensor, 1080p video, 3-inch LCD, interchangeable lenses\n", + " - Description: Capture stunning photos and videos with this versatile DSLR camera.\n", + " - Price: $599.99\n", + " - Warranty: 1 year\n", + "\n", + "Now, could you please let me know which specific TV models you are interested in?\n" + ] + } + ], + "source": [ + "import utils_en\n", + "import openai\n", + "\n", + "def process_user_message(user_input, all_messages, debug=True):\n", + " \"\"\"\n", + " 对用户信息进行预处理\n", + " \n", + " 参数:\n", + " user_input : 用户输入\n", + " all_messages : 历史信息\n", + " debug : 是否开启 DEBUG 模式,默认开启\n", + " \"\"\"\n", + " # 分隔符\n", + " delimiter = \"```\"\n", + " \n", + " # 第一步: 使用 OpenAI 的 Moderation API 检查用户输入是否合规或者是一个注入的 Prompt\n", + " response = openai.Moderation.create(input=user_input)\n", + " moderation_output = response[\"results\"][0]\n", + "\n", + " # 经过 Moderation API 检查该输入不合规\n", + " if moderation_output[\"flagged\"]:\n", + " print(\"第一步:输入被 Moderation 拒绝\")\n", + " return \"抱歉,您的请求不合规\"\n", + "\n", + " # 如果开启了 DEBUG 模式,打印实时进度\n", + " if debug: print(\"第一步:输入通过 Moderation 检查\")\n", + " \n", + " # 第二步:抽取出商品和对应的目录,类似于之前课程中的方法,做了一个封装\n", + " category_and_product_response = utils_en.find_category_and_product_only(user_input, utils_en.get_products_and_category())\n", + " #print(category_and_product_response)\n", + " # 将抽取出来的字符串转化为列表\n", + " category_and_product_list = utils_en.read_string_to_list(category_and_product_response)\n", + " #print(category_and_product_list)\n", + "\n", + " if debug: print(\"第二步:抽取出商品列表\")\n", + "\n", + " # 第三步:查找商品对应信息\n", + " product_information = utils_en.generate_output_string(category_and_product_list)\n", + " if debug: print(\"第三步:查找抽取出的商品信息\")\n", + "\n", + " # 第四步:根据信息生成回答\n", + " system_message = f\"\"\"\n", + " You are a customer service assistant for a large electronic store. \\\n", + " Respond in a friendly and helpful tone, with concise answers. \\\n", + " Make sure to ask the user relevant follow-up questions.\n", + " \"\"\"\n", + " # 插入 message\n", + " messages = [\n", + " {'role': 'system', 'content': system_message},\n", + " {'role': 'user', 'content': f\"{delimiter}{user_input}{delimiter}\"},\n", + " {'role': 'assistant', 'content': f\"Relevant product information:\\n{product_information}\"}\n", + " ]\n", + " # 获取 GPT3.5 的回答\n", + " # 通过附加 all_messages 实现多轮对话\n", + " final_response = get_completion_from_messages(all_messages + messages)\n", + " if debug:print(\"第四步:生成用户回答\")\n", + " # 将该轮信息加入到历史信息中\n", + " all_messages = all_messages + messages[1:]\n", + "\n", + " # 第五步:基于 Moderation API 检查输出是否合规\n", + " response = openai.Moderation.create(input=final_response)\n", + " moderation_output = response[\"results\"][0]\n", + "\n", + " # 输出不合规\n", + " if moderation_output[\"flagged\"]:\n", + " if debug: print(\"第五步:输出被 Moderation 拒绝\")\n", + " return \"抱歉,我们不能提供该信息\"\n", + "\n", + " if debug: print(\"第五步:输出经过 Moderation 检查\")\n", + "\n", + " # 第六步:模型检查是否很好地回答了用户问题\n", + " user_message = f\"\"\"\n", + " Customer message: {delimiter}{user_input}{delimiter}\n", + " Agent response: {delimiter}{final_response}{delimiter}\n", + "\n", + " Does the response sufficiently answer the question?\n", + " \"\"\"\n", + " messages = [\n", + " {'role': 'system', 'content': system_message},\n", + " {'role': 'user', 'content': user_message}\n", + " ]\n", + " # 要求模型评估回答\n", + " evaluation_response = get_completion_from_messages(messages)\n", + " if debug: print(\"第六步:模型评估该回答\")\n", + "\n", + " # 第七步:如果评估为 Y,输出回答;如果评估为 N,反馈将由人工修正答案\n", + " if \"Y\" in evaluation_response: # 使用 in 来避免模型可能生成 Yes\n", + " if debug: print(\"第七步:模型赞同了该回答.\")\n", + " return final_response, all_messages\n", + " else:\n", + " if debug: print(\"第七步:模型不赞成该回答.\")\n", + " neg_str = \"很抱歉,我无法提供您所需的信息。我将为您转接到一位人工客服代表以获取进一步帮助。\"\n", + " return neg_str, all_messages\n", + "\n", + "user_input = \"tell me about the smartx pro phone and the fotosnap camera, the dslr one. Also what tell me about your tvs\"\n", + "response,_ = process_user_message(user_input,[])\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**2.1 持续收集用户和助手信息**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def collect_messages_en(debug=False):\n", + " \"\"\"\n", + " 用于收集用户的输入并生成助手的回答\n", + "\n", + " 参数:\n", + " debug: 用于觉得是否开启调试模式\n", + " \"\"\"\n", + " user_input = inp.value_input\n", + " if debug: print(f\"User Input = {user_input}\")\n", + " if user_input == \"\":\n", + " return\n", + " inp.value = ''\n", + " global context\n", + " # 调用 process_user_message 函数\n", + " #response, context = process_user_message(user_input, context, utils.get_products_and_category(),debug=True)\n", + " response, context = process_user_message(user_input, context, debug=False)\n", + " context.append({'role':'assistant', 'content':f\"{response}\"})\n", + " panels.append(\n", + " pn.Row('User:', pn.pane.Markdown(user_input, width=600)))\n", + " panels.append(\n", + " pn.Row('Assistant:', pn.pane.Markdown(response, width=600, style={'background-color': '#F6F6F6'})))\n", + " \n", + " return pn.Column(*panels) # 包含了所有的对话信息" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "gpt", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.11" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/figures/docs/C2/ch8-example.png b/figures/docs/C2/ch8-example.png new file mode 100644 index 0000000..ffaa036 Binary files /dev/null and b/figures/docs/C2/ch8-example.png differ