updated Chap 6,7

2023-07-10 21:42:06 +08:00
parent d24e54397e
commit a98446cab1
3 changed files with 167 additions and 86 deletions
--- a/Inferring.ipynb
+++ b/Inferring.ipynb
@ -90,8 +90,8 @@
   "id": "51d2fdfa-c99f-4750-8574-dba7712cd7f0",
   "metadata": {},
   "source": [
-    "# 二、情感推断与信息提取\n",
-    "## 2.1 情感分类\n",
+    "## 二、情感推断与信息提取\n",
+    "### 2.1 情感分类\n",
    "\n",
    "以电商平台关于一盏台灯的评论为例，可以对其传达的情感进行二分类（正向/负向）。"
   ]
@ -167,7 +167,7 @@
    }
   ],
   "source": [
-    " Prompt  = f\"\"\"\n",
+    "prompt = f\"\"\"\n",
    "What is the sentiment of the following product review, \n",
    "which is delimited with triple backticks?\n",
    "\n",
@ -213,7 +213,7 @@
   "id": "76be2320",
   "metadata": {},
   "source": [
-    "如果你想要给出更简洁的答案，以便更容易进行后处理，可以在上述 Prompt 基础上添加另一个指令：*以一个单词 “正面” 或 “负面” 的形式给出答案*。这样就只会打印出 “正面” 这个单词，这使得输出更加统一，方便后续处理。"
+    "如果你想要给出更简洁的答案，以便更容易进行后处理，可以在上述 Prompt 基础上添加另一个指令：*用一个单词回答：「正面」或「负面」*。这样就只会打印出 “正面” 这个单词，这使得输出更加统一，方便后续处理。"
   ]
  },
  {
@ -277,7 +277,7 @@
   "id": "81d2a973-1fa4-4a35-ae35-a2e746c0e91b",
   "metadata": {},
   "source": [
-    "## 2.2 识别情感类型\n",
+    "### 2.2 识别情感类型\n",
    "\n",
    "仍然使用台灯评论，我们尝试另一个 Prompt 。这次我需要模型识别出评论作者所表达的情感，并归纳为列表，不超过五项。"
   ]
@ -355,7 +355,7 @@
   "id": "a428d093-51c9-461c-b41e-114e80876409",
   "metadata": {},
   "source": [
-    "## 2.3 识别愤怒\n",
+    "### 2.3 识别愤怒\n",
    "\n",
    "对于很多企业来说，了解某个顾客是否非常生气很重要。所以产生了下述分类问题：以下评论的作者是否表达了愤怒情绪？因为如果有人真的很生气，那么可能值得额外关注，让客户支持或客户成功团队联系客户以了解情况，并为客户解决问题。"
   ]
@ -432,7 +432,7 @@
   "id": "936a771e-ca78-4e55-8088-2da6f3820ddc",
   "metadata": {},
   "source": [
-    "## 2.4 商品信息提取\n",
+    "### 2.4 商品信息提取\n",
    "\n",
    "接下来，让我们从客户评论中提取更丰富的信息。信息提取是自然语言处理（NLP）的一部分，与从文本中提取你想要知道的某些事物相关。因此，在这个 Prompt 中，我要求它识别以下内容：购买物品和制造物品的公司名称。\n",
    "\n",
@ -532,7 +532,7 @@
   "id": "a38880a5-088f-4609-9913-f8fa41fb7ba0",
   "metadata": {},
   "source": [
-    "## 2.5 综合完成任务\n",
+    "### 2.5 综合完成任务\n",
    "\n",
    "提取上述所有信息使用了 3 或 4 个 Prompt ，但实际上可以编写单个 Prompt 来同时提取所有这些信息。"
   ]
@ -706,7 +706,7 @@
   "id": "a8ea91d6-e841-4ee2-bed9-ca4a36df177f",
   "metadata": {},
   "source": [
-    "## 3.1 推断讨论主题\n",
+    "### 3.1 推断讨论主题\n",
    "\n",
    "上面是一篇虚构的关于政府工作人员对他们工作机构感受的报纸文章。我们可以让它确定五个正在讨论的主题，用一两个字描述每个主题，并将输出格式化为逗号分隔的列表。"
   ]
@ -799,12 +799,18 @@
    "print(response)"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "790d1435",
+   "metadata": {},
+   "source": []
+  },
  {
   "cell_type": "markdown",
   "id": "34be1d2a-1309-4512-841a-b6f67338938b",
   "metadata": {},
   "source": [
-    "## 3.2 为特定主题制作新闻提醒\n",
+    "### 3.2 为特定主题制作新闻提醒\n",
    "\n",
    "假设我们有一个新闻网站或类似的东西，这是我们感兴趣的主题：NASA、地方政府、工程、员工满意度、联邦政府等。假设我们想弄清楚，针对一篇新闻文章，其中涵盖了哪些主题。可以使用这样的prompt：确定以下主题列表中的每个项目是否是以下文本中的主题。以 0 或 1 的形式给出答案列表。"
   ]
--- a/Transforming.ipynb
+++ b/Transforming.ipynb
@ -1,12 +1,46 @@
 {
 "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "08879154",
+   "metadata": {},
+   "source": [
+    "# 第六章 文本转换"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c885ce7b",
+   "metadata": {},
+   "source": [
+    "<div class=\"toc\">\n",
+    " <ul class=\"toc-item\">\n",
+    "     <li><span><a href=\"#一引言\" data-toc-modified-id=\"一、引言\">一、引言</a></span></li>\n",
+    "     <li>\n",
+    "         <span><a href=\"#二文本翻译\" data-toc-modified-id=\"二、文本翻译\">二、文本翻译</a></span>\n",
+    "         <ul class=\"toc-item\">\n",
+    "             <li><span><a href=\"#21-中文转西班牙语\" data-toc-modified-id=\"2.1 中文转西班牙语\">2.1 中文转西班牙语</a></span></li> \n",
+    "             <li><span><a href=\"#22-识别语种\" data-toc-modified-id=\"2.2 识别语种\">2.2 识别语种</a></span></li>\n",
+    "             <li><span><a href=\"#23-多语种翻译\" data-toc-modified-id=\"2.3 多语种翻译\">2.3 多语种翻译</a></span></li>\n",
+    "             <li><span><a href=\"#24-同时进行语气转换\" data-toc-modified-id=\"2.4 同时进行语气转换\">2.4 同时进行语气转换</a></span></li>\n",
+    "             <li><span><a href=\"#25-通用翻译器\" data-toc-modified-id=\"2.5 通用翻译器\">2.5 通用翻译器</a></span></li>\n",
+    "             </ul>\n",
+    "         </li>\n",
+    "     <li><span><a href=\"#三语气与写作风格调整\" data-toc-modified-id=\"三、语气与写作风格调整\">三、语气与写作风格调整</a></span></li>\n",
+    "     <li><span><a href=\"#四文件格式转换\" data-toc-modified-id=\"四、文件格式转换\">四、文件格式转换</a></span></li>\n",
+    "     <li><span><a href=\"#五拼写及语法纠正\" data-toc-modified-id=\"五、拼写及语法纠正\">五、拼写及语法纠正</a></span></li>\n",
+    "     <li><span><a href=\"#六综合样例\" data-toc-modified-id=\"六、综合样例\">六、综合样例</a></span></li>\n",
+    "     </ul>\n",
+    "</div>"
+   ]
+  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "78624add",
   "metadata": {},
   "source": [
-    "## 1 引言"
+    "## 一、引言"
   ]
  },
  {
@ -66,7 +100,7 @@
   "id": "bf3733d4",
   "metadata": {},
   "source": [
-    "## 2 文本翻译"
+    "## 二、文本翻译"
   ]
  },
  {
@ -75,7 +109,7 @@
   "id": "1b418e32",
   "metadata": {},
   "source": [
-    "**中文转西班牙语**"
+    "### 2.1 中文转西班牙语"
   ]
  },
  {
@ -118,13 +152,19 @@
    "print(response)"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "7e7be208",
+   "metadata": {},
+   "source": []
+  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "e3e922b4",
   "metadata": {},
   "source": [
-    "**识别语种**"
+    "### 2.2 识别语种"
   ]
  },
  {
@ -165,13 +205,19 @@
    "print(response)"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "8a9477e9",
+   "metadata": {},
+   "source": []
+  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c1841354",
   "metadata": {},
   "source": [
-    "**多语种翻译**"
+    "### 2.3 多语种翻译"
   ]
  },
  {
@ -216,13 +262,19 @@
    "print(response)"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "8d5022c7",
+   "metadata": {},
+   "source": []
+  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "68723ba5",
   "metadata": {},
   "source": [
-    "**翻译+正式语气**"
+    "### 2.4 同时进行语气转换"
   ]
  },
  {
@ -265,13 +317,19 @@
    "print(response)"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "7b7f6c87",
+   "metadata": {},
+   "source": []
+  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b2dc4c56",
   "metadata": {},
   "source": [
-    "**通用翻译器**"
+    "### 2.5 通用翻译器"
   ]
  },
  {
@ -375,13 +433,19 @@
    "    print(response, \"\\n=========================================\")"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "607cdcba",
+   "metadata": {},
+   "source": []
+  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "6ab558a2",
   "metadata": {},
   "source": [
-    "## 3 语气/ 写作风格调整"
+    "## 三、语气与写作风格调整"
   ]
  },
  {
@ -441,13 +505,19 @@
    "print(response)"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "79da6b29",
+   "metadata": {},
+   "source": []
+  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "98df9009",
   "metadata": {},
   "source": [
-    "## 4 格式转换"
+    "## 四、文件格式转换"
   ]
  },
  {
@ -488,6 +558,14 @@
    "print(response)\n"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "e1c7f30c",
+   "metadata": {},
+   "source": [
+    "结果同下"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 10,
@ -584,7 +662,7 @@
   "id": "29b7167b",
   "metadata": {},
   "source": [
-    "## 5 拼写及语法纠正"
+    "## 五、拼写及语法纠正"
   ]
  },
  {
@ -667,13 +745,19 @@
    "    print(i, response)"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "ef7e1dae",
+   "metadata": {},
+   "source": []
+  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "538181e0",
   "metadata": {},
   "source": [
-    "以下是一个简单的语法纠错示例（译注：与Grammarly功能类似），输入文本为一段关于熊猫玩偶的评价，输出为纠正后的文本。本例使用的prompt较为简单，你也可以进一步要求进行语调的更改。"
+    "以下是一个简单的语法纠错示例（译注：与 Grammarly 功能类似），输入文本为一段关于熊猫玩偶的评价，输出为纠正后的文本。本例使用的 Prompt 较为简单，你也可以进一步要求进行语调的更改。"
   ]
  },
  {
@ -707,6 +791,14 @@
    "print(response)\n"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "63871b58",
+   "metadata": {},
+   "source": [
+    "结果同下"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 14,
@ -733,7 +825,7 @@
   "id": "2e2d1f6a",
   "metadata": {},
   "source": [
-    "引入Redlines包，详细显示并对比纠错过程："
+    "引入 ```Redlines``` 包，详细显示并对比纠错过程："
   ]
  },
  {
@ -780,7 +872,7 @@
   "id": "3ee5d487",
   "metadata": {},
   "source": [
-    "## 6 综合样例\n",
+    "## 六、综合样例\n",
    "下述例子展示了同一段评论，用一段prompt同时进行文本翻译+拼写纠正+风格调整+格式转换。"
   ]
  },
--- a/Expanding.ipynb
+++ b/Expanding.ipynb
@ -4,18 +4,34 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# 第七章 扩展\n",
-    "\n",
-    "扩展是将短文本，例如一组说明或主题列表，输入到大型语言模型中，让模型生成更长的文本，例如基于某个主题的电子邮件或论文。这样做有一些很好的用途，例如将大型语言模型用作头脑风暴的伙伴。但这种做法也存在一些问题，例如某人可能会使用它来生成大量垃圾邮件。因此，当你使用大型语言模型的这些功能时，请仅以负责任的方式和有益于人们的方式使用它们。\n",
-    "\n",
-    "在本章中，你将学会如何基于 OpenAI API 生成适用于每个客户评价的客户服务电子邮件。我们还将使用模型的另一个输入参数称为温度，这种参数允许您在模型响应中变化探索的程度和多样性。\n"
+    "# 第七章 文本扩展"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## 一、环境配置\n",
+    "<div class=\"toc\">\n",
+    "    <ul class=\"toc-item\">\n",
+    "        <li><span><a href=\"#一引言\" data-toc-modified-id=\"一、引言\">一、引言</a></span></li>\n",
+    "        <li>\n",
+    "            <span><a href=\"#二定制客户邮件\" data-toc-modified-id=\"二、定制客户邮件\">二、定制客户邮件</a></span>\n",
+    "        </li>\n",
+    "        <li><span><a href=\"#三引入温度系数\" data-toc-modified-id=\"三、引入温度系数\">三、引入温度系数</a></span>\n",
+    "        </li>\n",
+    "    </ul>\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 一、引言\n",
+    "\n",
+    "扩展是将短文本（例如一组说明或主题列表）输入到大型语言模型中，让模型生成更长的文本（例如基于某个主题的电子邮件或论文）。这种应用是一把双刃剑，好处例如将大型语言模型用作头脑风暴的伙伴；但也存在问题，例如某人可能会使用它来生成大量垃圾邮件。因此，当你使用大型语言模型的这些功能时，请仅以**负责任** (responsible) 和**有益于人们** (helps people) 的方式使用它们。\n",
+    "\n",
+    "在本章中，你将学会如何基于 OpenAI API 生成*针对每位客户评价优化*的客服电子邮件。我们还将利用模型的另一个输入参数称为温度，这种参数允许您在模型响应中变化探索的程度和多样性。\n",
    "\n",
    "同以上几章，你需要类似的代码来配置一个可以使用 OpenAI API 的环境"
   ]
@ -67,14 +83,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "我们将根据客户评价和情感撰写自定义电子邮件响应。因此，我们将给定客户评价和情感，并生成自定义响应即使用 LLM 根据客户评价和评论情感生成定制电子邮件。"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "我们首先给出一个示例，包括一个评论及对应的情感"
+    "我们将根据客户评价和情感，针对性写自动回复邮件。因此，我们将给定客户评价和情感，使用 LLM 针对性生成响应，即根据客户评价和评论情感生成定制电子邮件。\n",
+    "\n",
+    "我们首先给出一个示例，包括一个评论及对应的情感。"
   ]
  },
  {
@ -149,13 +160,18 @@
    "\"\"\""
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "我们已经使用推断课程中学到的提取了情感，这是一个关于搅拌机的客户评价，现在我们将根据情感定制回复。\n",
+    "我们已经使用推断课程中所学方法提取了情感，这是一个关于搅拌机的客户评价，现在我们将根据情感定制回复。\n",
    "\n",
-    "这里的指令是：假设你是一个客户服务AI助手，你的任务是为客户发送电子邮件回复，根据通过三个反引号分隔的客户电子邮件，生成一封回复以感谢客户的评价。"
+    "以下述 Prompt 为例：假设你是一个客户服务 AI 助手，你的任务是为客户发送电子邮件回复，根据通过三个反引号分隔的客户电子邮件，生成一封回复以感谢客户的评价。"
   ]
  },
  {
@ -239,62 +255,29 @@
    "print(response)"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## 三、使用温度系数\n",
+    "## 三、引入温度系数\n",
    "\n",
-    "接下来，我们将使用语言模型的一个称为“温度”的参数，它将允许我们改变模型响应的多样性。您可以将温度视为模型探索或随机性的程度。\n",
+    "接下来，我们将使用语言模型的一个称为“温度” (Temperature) 的参数，它将允许我们改变模型响应的多样性。您可以将温度视为模型探索或随机性的程度。\n",
    "\n",
-    "例如，在一个特定的短语中，“我的最爱食品”最有可能的下一个词是“比萨”，其次最有可能的是“寿司”和“塔可”。因此，在温度为零时，模型将总是选择最有可能的下一个词，而在较高的温度下，它还将选择其中一个不太可能的词，在更高的温度下，它甚至可能选择塔可，而这种可能性仅为五分之一。您可以想象，随着模型继续生成更多单词的最终响应，“我的最爱食品是比萨”将会与第一个响应“我的最爱食品是塔可”产生差异。因此，随着模型的继续，这两个响应将变得越来越不同。\n",
+    "例如，在一个特定的短语中，“我的最爱食品”最有可能的下一个词是“比萨”，其次最有可能的是“寿司”和“塔可”。因此，在温度为零时，模型将总是选择最有可能的下一个词，而在较高的温度下，它还将选择其中一个不太可能的词，在更高的温度下，它甚至可能选择塔可，而这种可能性仅为五分之一。您可以想象，随着模型继续生成更多单词的最终响应，“我的最爱食品是比萨”将会与第一个响应“我的最爱食品是塔可”产生差异。随着模型的继续，这两个响应也将变得越来越不同。\n",
    "\n",
-    "一般来说，在构建需要可预测响应的应用程序时，我建议使用温度为零。在所有课程中，我们一直设置温度为零，如果您正在尝试构建一个可靠和可预测的系统，我认为您应该选择这个温度。如果您尝试以更具创意的方式使用模型，可能需要更广泛地输出不同的结果，那么您可能需要使用更高的温度。"
+    "一般来说，在构建需要可预测响应的应用程序时，我建议**设置温度为零**。在所有课程中，我们一直设置温度为零，如果您正在尝试构建一个可靠和可预测的系统，我认为您应该选择这个温度。如果您尝试以更具创意的方式使用模型，可能需要更广泛地输出不同的结果，那么您可能需要使用更高的温度。"
   ]
  },
  {
-   "cell_type": "code",
-   "execution_count": 7,
+   "cell_type": "markdown",
   "metadata": {},
-   "outputs": [],
   "source": [
-    "# given the sentiment from the lesson on \"inferring\",\n",
-    "# and the original customer message, customize the email\n",
-    "sentiment = \"negative\"\n",
-    "\n",
-    "# review for a blender\n",
-    "review = f\"\"\"\n",
-    "So, they still had the 17 piece system on seasonal \\\n",
-    "sale for around $49 in the month of November, about \\\n",
-    "half off, but for some reason (call it price gouging) \\\n",
-    "around the second week of December the prices all went \\\n",
-    "up to about anywhere from between $70-$89 for the same \\\n",
-    "system. And the 11 piece system went up around $10 or \\\n",
-    "so in price also from the earlier sale price of $29. \\\n",
-    "So it looks okay, but if you look at the base, the part \\\n",
-    "where the blade locks into place doesn’t look as good \\\n",
-    "as in previous editions from a few years ago, but I \\\n",
-    "plan to be very gentle with it (example, I crush \\\n",
-    "very hard items like beans, ice, rice, etc. in the \\ \n",
-    "blender first then pulverize them in the serving size \\\n",
-    "I want in the blender then switch to the whipping \\\n",
-    "blade for a finer flour, and use the cross cutting blade \\\n",
-    "first when making smoothies, then use the flat blade \\\n",
-    "if I need them finer/less pulpy). Special tip when making \\\n",
-    "smoothies, finely cut and freeze the fruits and \\\n",
-    "vegetables (if using spinach-lightly stew soften the \\ \n",
-    "spinach then freeze until ready for use-and if making \\\n",
-    "sorbet, use a small to medium sized food processor) \\ \n",
-    "that you plan to use that way you can avoid adding so \\\n",
-    "much ice if at all-when making your smoothie. \\\n",
-    "After about a year, the motor was making a funny noise. \\\n",
-    "I called customer service but the warranty expired \\\n",
-    "already, so I had to buy another one. FYI: The overall \\\n",
-    "quality has gone done in these types of products, so \\\n",
-    "they are kind of counting on brand recognition and \\\n",
-    "consumer loyalty to maintain sales. Got it in about \\\n",
-    "two days.\n",
-    "\"\"\""
+    "同一段来信，我们提醒模型使用用户来信中的详细信息，并设置温度："
   ]
  },
  {
@ -394,11 +377,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "在温度为零时，每次执行相同的提示时，您应该期望获得相同的完成。而使用温度为0.7，则每次都会获得不同的输出。\n",
+    "在温度为零时，每次执行相同的 Prompt ，您获得的回复理应相同。而使用温度为 0.7 时，则每次都会获得不同的输出。\n",
    "\n",
-    "所以，您可以看到它与我们之前收到的电子邮件不同。让我们再次执行它，以显示我们将再次获得不同的电子邮件。\n",
+    "所以，您可以看到它与我们之前收到的电子邮件不同。再次执行将再次获得不同的电子邮件。\n",
    "\n",
-    "因此，我建议您自己尝试温度，以查看输出如何变化。总之，在更高的温度下，模型的输出更加随机。您几乎可以将其视为在更高的温度下，助手更易分心，但也许更有创造力。"
+    "因此，我建议您自己尝试温度，以查看输出如何变化。总之，在更高的温度下，模型的输出更加随机。您几乎可以将其视为在更高的温度下，助手**更易分心**，但也许**更有创造力**。"
   ]
  }
 ],