新增:文本概括

This commit is contained in:
huangyulin
2023-04-28 22:18:53 +08:00
parent eb5fc23571
commit cfabd60001

View File

@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "markdown",
"id": "716d29bd",
"id": "b58204ea",
"metadata": {},
"source": [
"# 文本概括 Summarizing"
@ -10,7 +10,7 @@
},
{
"cell_type": "markdown",
"id": "f6b7fca0",
"id": "b70ad003",
"metadata": {},
"source": [
"## 1 引言"
@ -18,7 +18,7 @@
},
{
"cell_type": "markdown",
"id": "4b819ee9",
"id": "12fa9ea4",
"metadata": {},
"source": [
"当今世界上有太多的文本信息几乎没有人能够拥有足够的时间去阅读所有我们想了解的东西。但令人感到欣喜的是目前LLM在文本概括任务上展现了强大的水准也已经有不少团队将这项功能插入了自己的软件应用中。\n",
@ -28,7 +28,7 @@
},
{
"cell_type": "markdown",
"id": "48b35259",
"id": "1de4fd1e",
"metadata": {},
"source": [
"首先我们需要OpenAI包加载API密钥定义getCompletion函数。"
@ -37,7 +37,7 @@
{
"cell_type": "code",
"execution_count": 1,
"id": "5ab873ff",
"id": "9f679f1f",
"metadata": {},
"outputs": [],
"source": [
@ -56,7 +56,7 @@
},
{
"cell_type": "markdown",
"id": "469f1276",
"id": "9cca835b",
"metadata": {},
"source": [
"## 2 单一文本概括Prompt实验"
@ -64,7 +64,7 @@
},
{
"cell_type": "markdown",
"id": "cd7d8580",
"id": "0c1e1b92",
"metadata": {},
"source": [
"这里我们举了个商品评论的例子。对于电商平台来说,网站上往往存在着海量的商品评论,这些评论反映了所有客户的想法。如果我们拥有一个工具去概括这些海量、冗长的评论,便能够快速地浏览更多评论,洞悉客户的偏好,从而指导平台与商家提供更优质的服务。"
@ -72,7 +72,7 @@
},
{
"cell_type": "markdown",
"id": "2fca0e2b",
"id": "9dc2e2bc",
"metadata": {},
"source": [
"**输入文本**"
@ -81,7 +81,7 @@
{
"cell_type": "code",
"execution_count": 2,
"id": "3d3bd87d",
"id": "4d9c0eeb",
"metadata": {},
"outputs": [],
"source": [
@ -99,7 +99,7 @@
},
{
"cell_type": "markdown",
"id": "1ec202f1",
"id": "aad5bd2a",
"metadata": {},
"source": [
"**输入文本(中文翻译)**"
@ -108,7 +108,7 @@
{
"cell_type": "code",
"execution_count": 3,
"id": "84a653de",
"id": "43b5dd25",
"metadata": {},
"outputs": [],
"source": [
@ -122,7 +122,7 @@
},
{
"cell_type": "markdown",
"id": "e0f78610",
"id": "662c9cd2",
"metadata": {},
"source": [
"### 2.1 限制输出文本长度"
@ -130,7 +130,7 @@
},
{
"cell_type": "markdown",
"id": "4066804b",
"id": "a6d10814",
"metadata": {},
"source": [
"我们尝试限制文本长度为最多30词。"
@ -139,7 +139,7 @@
{
"cell_type": "code",
"execution_count": 4,
"id": "d6e423f2",
"id": "02208fbc",
"metadata": {},
"outputs": [
{
@ -167,7 +167,7 @@
},
{
"cell_type": "markdown",
"id": "3586d82d",
"id": "0df0eb90",
"metadata": {},
"source": [
"中文翻译版本"
@ -176,7 +176,7 @@
{
"cell_type": "code",
"execution_count": 5,
"id": "51cbed99",
"id": "bf4b39f9",
"metadata": {},
"outputs": [
{
@ -202,7 +202,7 @@
},
{
"cell_type": "markdown",
"id": "08c1643c",
"id": "e9ab145e",
"metadata": {},
"source": [
"### 2.2 关键角度侧重"
@ -210,7 +210,7 @@
},
{
"cell_type": "markdown",
"id": "f2582b5f",
"id": "f84d0123",
"metadata": {},
"source": [
"有时,针对不同的业务,我们对文本的侧重会有所不同。例如对于商品评论文本,物流会更关心运输时效,商家更加关心价格与商品质量,平台更关心整体服务体验。\n",
@ -220,7 +220,7 @@
},
{
"cell_type": "markdown",
"id": "9da7a497",
"id": "d6f8509a",
"metadata": {},
"source": [
"**侧重于运输**"
@ -229,7 +229,7 @@
{
"cell_type": "code",
"execution_count": 6,
"id": "1432a7fe",
"id": "9d8a32a6",
"metadata": {},
"outputs": [
{
@ -259,7 +259,7 @@
},
{
"cell_type": "markdown",
"id": "d3b46d0f",
"id": "0bd4243a",
"metadata": {},
"source": [
"中文翻译版本"
@ -268,7 +268,7 @@
{
"cell_type": "code",
"execution_count": 8,
"id": "8fff34de",
"id": "80636c3e",
"metadata": {},
"outputs": [
{
@ -294,7 +294,7 @@
},
{
"cell_type": "markdown",
"id": "426fb25f",
"id": "76c97fea",
"metadata": {},
"source": [
"可以看到,输出结果以“快递提前一天到货”开头,体现了对于快递效率的侧重。"
@ -302,7 +302,7 @@
},
{
"cell_type": "markdown",
"id": "aaa71480",
"id": "83275907",
"metadata": {},
"source": [
"**侧重于价格与质量**"
@ -311,7 +311,7 @@
{
"cell_type": "code",
"execution_count": 9,
"id": "18546578",
"id": "767f252c",
"metadata": {},
"outputs": [
{
@ -342,7 +342,7 @@
},
{
"cell_type": "markdown",
"id": "63645a57",
"id": "cf54fac4",
"metadata": {},
"source": [
"中文翻译版本"
@ -351,7 +351,7 @@
{
"cell_type": "code",
"execution_count": 12,
"id": "826ecc6d",
"id": "728d6c57",
"metadata": {},
"outputs": [
{
@ -377,7 +377,7 @@
},
{
"cell_type": "markdown",
"id": "d181a9b4",
"id": "972dbb1b",
"metadata": {},
"source": [
"可以看到,输出结果以“质量好、价格小贵、尺寸小”开头,体现了对于产品价格与质量的侧重。"
@ -385,7 +385,7 @@
},
{
"cell_type": "markdown",
"id": "57595596",
"id": "b3ed53d2",
"metadata": {},
"source": [
"### 2.3 关键信息提取"
@ -393,7 +393,7 @@
},
{
"cell_type": "markdown",
"id": "5c1fe1d5",
"id": "ba6f5c25",
"metadata": {},
"source": [
"在2.2节中虽然我们通过添加关键角度侧重的Prompt使得文本摘要更侧重于某一特定方面但是可以发现结果中也会保留一些其他信息如价格与质量角度的概括中仍保留了“快递提前到货”的信息。有时这些信息是有帮助的但如果我们只想要提取某一角度的信息并过滤掉其他所有信息则可以要求LLM进行“文本提取(Extract)”而非“文本概括(Summarize)”。"
@ -402,7 +402,7 @@
{
"cell_type": "code",
"execution_count": 13,
"id": "8ae5c70f",
"id": "2d60dc58",
"metadata": {},
"outputs": [
{
@ -432,7 +432,7 @@
},
{
"cell_type": "markdown",
"id": "149b594e",
"id": "0339b877",
"metadata": {},
"source": [
"中文翻译版本"
@ -440,21 +440,21 @@
},
{
"cell_type": "code",
"execution_count": 16,
"id": "5993dd50",
"execution_count": 19,
"id": "c845ccab",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"快递提前一天到货,但产品有点小,相比价钱来说可以买到更大的。\n"
"快递比预期提前一天到货。\n"
]
}
],
"source": [
"prompt = f\"\"\"\n",
"你的任务是从电子商务网站上的产品评论中提取相关信息,以向运输部门提供反馈。\n",
"你的任务是从电子商务网站上的产品评论中提取相关信息。\n",
"\n",
"请从以下三个反引号之间的评论文本中提取产品运输相关的信息最多30个词汇。\n",
"\n",
@ -467,7 +467,7 @@
},
{
"cell_type": "markdown",
"id": "9fd0c3b8",
"id": "50498a2b",
"metadata": {},
"source": [
"## 3 多条文本概括Prompt实验"
@ -475,7 +475,7 @@
},
{
"cell_type": "markdown",
"id": "2b27575c",
"id": "a291541a",
"metadata": {},
"source": [
"在实际的工作流中我们往往有许许多多的评论文本以下展示了一个基于for循环调用“文本概括”工具并依次打印的示例。当然在实际生产中对于上百万甚至上千万的评论文本使用for循环也是不现实的可能需要考虑整合评论、分布式等方法提升运算效率。"
@ -484,7 +484,7 @@
{
"cell_type": "code",
"execution_count": 17,
"id": "74f9930d",
"id": "ee7caa78",
"metadata": {},
"outputs": [],
"source": [
@ -564,7 +564,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "5fb23db5",
"id": "9d1aa5ac",
"metadata": {},
"outputs": [],
"source": [
@ -586,7 +586,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c0be2824",
"id": "eb878522",
"metadata": {},
"outputs": [],
"source": []