From cfabd60001d41d7193a1e81f2e380061b682d22c Mon Sep 17 00:00:00 2001 From: huangyulin Date: Fri, 28 Apr 2023 22:18:53 +0800 Subject: [PATCH] =?UTF-8?q?=E6=96=B0=E5=A2=9E=EF=BC=9A=E6=96=87=E6=9C=AC?= =?UTF-8?q?=E6=A6=82=E6=8B=AC?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- content/文本概括 Summarizing.ipynb | 82 +++++++++++++++--------------- 1 file changed, 41 insertions(+), 41 deletions(-) diff --git a/content/文本概括 Summarizing.ipynb b/content/文本概括 Summarizing.ipynb index 9e5b7db..b40c397 100644 --- a/content/文本概括 Summarizing.ipynb +++ b/content/文本概括 Summarizing.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "716d29bd", + "id": "b58204ea", "metadata": {}, "source": [ "# 文本概括 Summarizing" @@ -10,7 +10,7 @@ }, { "cell_type": "markdown", - "id": "f6b7fca0", + "id": "b70ad003", "metadata": {}, "source": [ "## 1 引言" @@ -18,7 +18,7 @@ }, { "cell_type": "markdown", - "id": "4b819ee9", + "id": "12fa9ea4", "metadata": {}, "source": [ "当今世界上有太多的文本信息,几乎没有人能够拥有足够的时间去阅读所有我们想了解的东西。但令人感到欣喜的是,目前LLM在文本概括任务上展现了强大的水准,也已经有不少团队将这项功能插入了自己的软件应用中。\n", @@ -28,7 +28,7 @@ }, { "cell_type": "markdown", - "id": "48b35259", + "id": "1de4fd1e", "metadata": {}, "source": [ "首先,我们需要OpenAI包,加载API密钥,定义getCompletion函数。" @@ -37,7 +37,7 @@ { "cell_type": "code", "execution_count": 1, - "id": "5ab873ff", + "id": "9f679f1f", "metadata": {}, "outputs": [], "source": [ @@ -56,7 +56,7 @@ }, { "cell_type": "markdown", - "id": "469f1276", + "id": "9cca835b", "metadata": {}, "source": [ "## 2 单一文本概括Prompt实验" @@ -64,7 +64,7 @@ }, { "cell_type": "markdown", - "id": "cd7d8580", + "id": "0c1e1b92", "metadata": {}, "source": [ "这里我们举了个商品评论的例子。对于电商平台来说,网站上往往存在着海量的商品评论,这些评论反映了所有客户的想法。如果我们拥有一个工具去概括这些海量、冗长的评论,便能够快速地浏览更多评论,洞悉客户的偏好,从而指导平台与商家提供更优质的服务。" @@ -72,7 +72,7 @@ }, { "cell_type": "markdown", - "id": "2fca0e2b", + "id": "9dc2e2bc", "metadata": {}, "source": [ "**输入文本**" @@ -81,7 +81,7 @@ { "cell_type": "code", "execution_count": 2, - "id": "3d3bd87d", + "id": "4d9c0eeb", "metadata": {}, "outputs": [], "source": [ @@ -99,7 +99,7 @@ }, { "cell_type": "markdown", - "id": "1ec202f1", + "id": "aad5bd2a", "metadata": {}, "source": [ "**输入文本(中文翻译)**" @@ -108,7 +108,7 @@ { "cell_type": "code", "execution_count": 3, - "id": "84a653de", + "id": "43b5dd25", "metadata": {}, "outputs": [], "source": [ @@ -122,7 +122,7 @@ }, { "cell_type": "markdown", - "id": "e0f78610", + "id": "662c9cd2", "metadata": {}, "source": [ "### 2.1 限制输出文本长度" @@ -130,7 +130,7 @@ }, { "cell_type": "markdown", - "id": "4066804b", + "id": "a6d10814", "metadata": {}, "source": [ "我们尝试限制文本长度为最多30词。" @@ -139,7 +139,7 @@ { "cell_type": "code", "execution_count": 4, - "id": "d6e423f2", + "id": "02208fbc", "metadata": {}, "outputs": [ { @@ -167,7 +167,7 @@ }, { "cell_type": "markdown", - "id": "3586d82d", + "id": "0df0eb90", "metadata": {}, "source": [ "中文翻译版本" @@ -176,7 +176,7 @@ { "cell_type": "code", "execution_count": 5, - "id": "51cbed99", + "id": "bf4b39f9", "metadata": {}, "outputs": [ { @@ -202,7 +202,7 @@ }, { "cell_type": "markdown", - "id": "08c1643c", + "id": "e9ab145e", "metadata": {}, "source": [ "### 2.2 关键角度侧重" @@ -210,7 +210,7 @@ }, { "cell_type": "markdown", - "id": "f2582b5f", + "id": "f84d0123", "metadata": {}, "source": [ "有时,针对不同的业务,我们对文本的侧重会有所不同。例如对于商品评论文本,物流会更关心运输时效,商家更加关心价格与商品质量,平台更关心整体服务体验。\n", @@ -220,7 +220,7 @@ }, { "cell_type": "markdown", - "id": "9da7a497", + "id": "d6f8509a", "metadata": {}, "source": [ "**侧重于运输**" @@ -229,7 +229,7 @@ { "cell_type": "code", "execution_count": 6, - "id": "1432a7fe", + "id": "9d8a32a6", "metadata": {}, "outputs": [ { @@ -259,7 +259,7 @@ }, { "cell_type": "markdown", - "id": "d3b46d0f", + "id": "0bd4243a", "metadata": {}, "source": [ "中文翻译版本" @@ -268,7 +268,7 @@ { "cell_type": "code", "execution_count": 8, - "id": "8fff34de", + "id": "80636c3e", "metadata": {}, "outputs": [ { @@ -294,7 +294,7 @@ }, { "cell_type": "markdown", - "id": "426fb25f", + "id": "76c97fea", "metadata": {}, "source": [ "可以看到,输出结果以“快递提前一天到货”开头,体现了对于快递效率的侧重。" @@ -302,7 +302,7 @@ }, { "cell_type": "markdown", - "id": "aaa71480", + "id": "83275907", "metadata": {}, "source": [ "**侧重于价格与质量**" @@ -311,7 +311,7 @@ { "cell_type": "code", "execution_count": 9, - "id": "18546578", + "id": "767f252c", "metadata": {}, "outputs": [ { @@ -342,7 +342,7 @@ }, { "cell_type": "markdown", - "id": "63645a57", + "id": "cf54fac4", "metadata": {}, "source": [ "中文翻译版本" @@ -351,7 +351,7 @@ { "cell_type": "code", "execution_count": 12, - "id": "826ecc6d", + "id": "728d6c57", "metadata": {}, "outputs": [ { @@ -377,7 +377,7 @@ }, { "cell_type": "markdown", - "id": "d181a9b4", + "id": "972dbb1b", "metadata": {}, "source": [ "可以看到,输出结果以“质量好、价格小贵、尺寸小”开头,体现了对于产品价格与质量的侧重。" @@ -385,7 +385,7 @@ }, { "cell_type": "markdown", - "id": "57595596", + "id": "b3ed53d2", "metadata": {}, "source": [ "### 2.3 关键信息提取" @@ -393,7 +393,7 @@ }, { "cell_type": "markdown", - "id": "5c1fe1d5", + "id": "ba6f5c25", "metadata": {}, "source": [ "在2.2节中,虽然我们通过添加关键角度侧重的Prompt,使得文本摘要更侧重于某一特定方面,但是可以发现,结果中也会保留一些其他信息,如价格与质量角度的概括中仍保留了“快递提前到货”的信息。有时这些信息是有帮助的,但如果我们只想要提取某一角度的信息,并过滤掉其他所有信息,则可以要求LLM进行“文本提取(Extract)”而非“文本概括(Summarize)”。" @@ -402,7 +402,7 @@ { "cell_type": "code", "execution_count": 13, - "id": "8ae5c70f", + "id": "2d60dc58", "metadata": {}, "outputs": [ { @@ -432,7 +432,7 @@ }, { "cell_type": "markdown", - "id": "149b594e", + "id": "0339b877", "metadata": {}, "source": [ "中文翻译版本" @@ -440,21 +440,21 @@ }, { "cell_type": "code", - "execution_count": 16, - "id": "5993dd50", + "execution_count": 19, + "id": "c845ccab", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "快递提前一天到货,但产品有点小,相比价钱来说可以买到更大的。\n" + "快递比预期提前了一天到货。\n" ] } ], "source": [ "prompt = f\"\"\"\n", - "你的任务是从电子商务网站上的产品评论中提取相关信息,以向运输部门提供反馈。\n", + "你的任务是从电子商务网站上的产品评论中提取相关信息。\n", "\n", "请从以下三个反引号之间的评论文本中提取产品运输相关的信息,最多30个词汇。\n", "\n", @@ -467,7 +467,7 @@ }, { "cell_type": "markdown", - "id": "9fd0c3b8", + "id": "50498a2b", "metadata": {}, "source": [ "## 3 多条文本概括Prompt实验" @@ -475,7 +475,7 @@ }, { "cell_type": "markdown", - "id": "2b27575c", + "id": "a291541a", "metadata": {}, "source": [ "在实际的工作流中,我们往往有许许多多的评论文本,以下展示了一个基于for循环调用“文本概括”工具并依次打印的示例。当然,在实际生产中,对于上百万甚至上千万的评论文本,使用for循环也是不现实的,可能需要考虑整合评论、分布式等方法提升运算效率。" @@ -484,7 +484,7 @@ { "cell_type": "code", "execution_count": 17, - "id": "74f9930d", + "id": "ee7caa78", "metadata": {}, "outputs": [], "source": [ @@ -564,7 +564,7 @@ { "cell_type": "code", "execution_count": null, - "id": "5fb23db5", + "id": "9d1aa5ac", "metadata": {}, "outputs": [], "source": [ @@ -586,7 +586,7 @@ { "cell_type": "code", "execution_count": null, - "id": "c0be2824", + "id": "eb878522", "metadata": {}, "outputs": [], "source": []