diff --git a/docs/content/C1 Prompt Engineering for Developer/4. 文本概括 Summarizing.ipynb b/docs/content/C1 Prompt Engineering for Developer/4. 文本概括 Summarizing.ipynb new file mode 100644 index 0000000..5b877cf --- /dev/null +++ b/docs/content/C1 Prompt Engineering for Developer/4. 文本概括 Summarizing.ipynb @@ -0,0 +1 @@ +{"cells":[{"attachments":{},"cell_type":"markdown","id":"b58204ea","metadata":{},"source":["# 第四章 文本概括\n"]},{"attachments":{},"cell_type":"markdown","id":"12fa9ea4","metadata":{},"source":["当今世界上文本信息浩如烟海,我们很难拥有足够的时间去阅读所有想了解的东西。但欣喜的是,目前LLM在文本概括任务上展现了强大的水准,也已经有不少团队将概括功能实现在多种应用中。\n","\n","本章节将介绍如何使用编程的方式,调用API接口来实现“文本概括”功能。"]},{"attachments":{},"cell_type":"markdown","id":"9cca835b","metadata":{},"source":["## 一、单一文本概括"]},{"attachments":{},"cell_type":"markdown","id":"0c1e1b92","metadata":{},"source":["以商品评论的总结任务为例:对于电商平台来说,网站上往往存在着海量的商品评论,这些评论反映了所有客户的想法。如果我们拥有一个工具去概括这些海量、冗长的评论,便能够快速地浏览更多评论,洞悉客户的偏好,从而指导平台与商家提供更优质的服务。"]},{"attachments":{},"cell_type":"markdown","id":"aad5bd2a","metadata":{},"source":["**输入文本**"]},{"cell_type":"code","execution_count":2,"id":"43b5dd25","metadata":{},"outputs":[],"source":["prod_review = \"\"\"\n","这个熊猫公仔是我给女儿的生日礼物,她很喜欢,去哪都带着。\n","公仔很软,超级可爱,面部表情也很和善。但是相比于价钱来说,\n","它有点小,我感觉在别的地方用同样的价钱能买到更大的。\n","快递比预期提前了一天到货,所以在送给女儿之前,我自己玩了会。\n","\"\"\""]},{"attachments":{},"cell_type":"markdown","id":"662c9cd2","metadata":{},"source":["### 1.1 限制输出文本长度"]},{"attachments":{},"cell_type":"markdown","id":"a6d10814","metadata":{},"source":["我们尝试限制文本长度为最多30词。"]},{"cell_type":"code","execution_count":5,"id":"bf4b39f9","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["熊猫公仔软可爱,女儿喜欢,但有点小。快递提前一天到货。\n"]}],"source":["from tool import get_completion\n","\n","prompt = f\"\"\"\n","您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n","\n","请对三个反引号之间的评论文本进行概括,最多30个词汇。\n","\n","评论: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"attachments":{},"cell_type":"markdown","id":"e9ab145e","metadata":{},"source":["### 1.2 设置关键角度侧重"]},{"attachments":{},"cell_type":"markdown","id":"f84d0123","metadata":{},"source":["有时,针对不同的业务,我们对文本的侧重会有所不同。例如对于商品评论文本,物流会更关心运输时效,商家更加关心价格与商品质量,平台更关心整体服务体验。\n","\n","我们可以通过增加Prompt提示,来体现对于某个特定角度的侧重。"]},{"attachments":{},"cell_type":"markdown","id":"d6f8509a","metadata":{},"source":["#### 1.2.1 侧重于快递服务"]},{"cell_type":"code","execution_count":7,"id":"80636c3e","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["快递提前到货,公仔可爱但有点小。\n"]}],"source":["prompt = f\"\"\"\n","您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n","\n","请对三个反引号之间的评论文本进行概括,最多30个词汇,并且侧重在快递服务上。\n","\n","评论: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"attachments":{},"cell_type":"markdown","id":"76c97fea","metadata":{},"source":["可以看到,输出结果以“快递提前到货”开头,体现了对于快递效率的侧重。"]},{"attachments":{},"cell_type":"markdown","id":"83275907","metadata":{},"source":["#### 1.2.2 侧重于价格与质量"]},{"cell_type":"code","execution_count":8,"id":"728d6c57","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["可爱的熊猫公仔,质量好但有点小,价格稍高。快递提前到货。\n"]}],"source":["prompt = f\"\"\"\n","您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n","\n","请对三个反引号之间的评论文本进行概括,最多30个词汇,并且侧重在产品价格和质量上。\n","\n","评论: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"attachments":{},"cell_type":"markdown","id":"972dbb1b","metadata":{},"source":["可以看到,输出结果以“可爱的熊猫公仔,质量好但有点小,价格稍高”开头,体现了对于产品价格与质量的侧重。"]},{"attachments":{},"cell_type":"markdown","id":"b3ed53d2","metadata":{},"source":["### 1.3 关键信息提取"]},{"attachments":{},"cell_type":"markdown","id":"ba6f5c25","metadata":{},"source":["在1.2节中,虽然我们通过添加关键角度侧重的 Prompt ,使得文本摘要更侧重于某一特定方面,但是可以发现,结果中也会保留一些其他信息,如偏重价格与质量角度的概括中仍保留了“快递提前到货”的信息。如果我们只想要提取某一角度的信息,并过滤掉其他所有信息,则可以要求 LLM 进行“文本提取( Extract )”而非“概括( Summarize )”"]},{"cell_type":"code","execution_count":9,"id":"c845ccab","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["产品运输相关的信息:快递提前一天到货。\n"]}],"source":["prompt = f\"\"\"\n","您的任务是从电子商务网站上的产品评论中提取相关信息。\n","\n","请从以下三个反引号之间的评论文本中提取产品运输相关的信息,最多30个词汇。\n","\n","评论: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"attachments":{},"cell_type":"markdown","id":"50498a2b","metadata":{},"source":["## 二、同时概括多条文本"]},{"attachments":{},"cell_type":"markdown","id":"a291541a","metadata":{},"source":["在实际的工作流中,我们往往有许许多多的评论文本,以下示例将多条用户评价放进列表,并利用 ```for``` 循环,使用文本概括(Summarize)提示词,将评价概括至小于 20 词,并按顺序打印。当然,在实际生产中,对于不同规模的评论文本,除了使用 ```for``` 循环以外,还可能需要考虑整合评论、分布式等方法提升运算效率。您可以搭建主控面板,来总结大量用户评论,来方便您或他人快速浏览,还可以点击查看原评论。这样您能高效掌握顾客的所有想法。"]},{"cell_type":"code","execution_count":3,"id":"ef606961","metadata":{},"outputs":[],"source":["review_1 = prod_review\n","\n","# 一盏落地灯的评论\n","review_2 = \"\"\"\n","我需要一盏漂亮的卧室灯,这款灯不仅具备额外的储物功能,价格也并不算太高。\n","收货速度非常快,仅用了两天的时间就送到了。\n","不过,在运输过程中,灯的拉线出了问题,幸好,公司很乐意寄送了一根全新的灯线。\n","新的灯线也很快就送到手了,只用了几天的时间。\n","装配非常容易。然而,之后我发现有一个零件丢失了,于是我联系了客服,他们迅速地给我寄来了缺失的零件!\n","对我来说,这是一家非常关心客户和产品的优秀公司。\n","\"\"\"\n","\n","# 一把电动牙刷的评论\n","review_3 = \"\"\"\n","我的牙科卫生员推荐了电动牙刷,所以我就买了这款。\n","到目前为止,电池续航表现相当不错。\n","初次充电后,我在第一周一直将充电器插着,为的是对电池进行条件养护。\n","过去的3周里,我每天早晚都使用它刷牙,但电池依然维持着原来的充电状态。\n","不过,牙刷头太小了。我见过比这个牙刷头还大的婴儿牙刷。\n","我希望牙刷头更大一些,带有不同长度的刷毛,\n","这样可以更好地清洁牙齿间的空隙,但这款牙刷做不到。\n","总的来说,如果你能以50美元左右的价格购买到这款牙刷,那是一个不错的交易。\n","制造商的替换刷头相当昂贵,但你可以购买价格更为合理的通用刷头。\n","这款牙刷让我感觉就像每天都去了一次牙医,我的牙齿感觉非常干净!\n","\"\"\"\n","\n","# 一台搅拌机的评论\n","review_4 = \"\"\"\n","在11月份期间,这个17件套装还在季节性促销中,售价约为49美元,打了五折左右。\n","可是由于某种原因(我们可以称之为价格上涨),到了12月的第二周,所有的价格都上涨了,\n","同样的套装价格涨到了70-89美元不等。而11件套装的价格也从之前的29美元上涨了约10美元。\n","看起来还算不错,但是如果你仔细看底座,刀片锁定的部分看起来没有前几年版本的那么漂亮。\n","然而,我打算非常小心地使用它\n","(例如,我会先在搅拌机中研磨豆类、冰块、大米等坚硬的食物,然后再将它们研磨成所需的粒度,\n","接着切换到打蛋器刀片以获得更细的面粉,如果我需要制作更细腻/少果肉的食物)。\n","在制作冰沙时,我会将要使用的水果和蔬菜切成细小块并冷冻\n","(如果使用菠菜,我会先轻微煮熟菠菜,然后冷冻,直到使用时准备食用。\n","如果要制作冰糕,我会使用一个小到中号的食物加工器),这样你就可以避免添加过多的冰块。\n","大约一年后,电机开始发出奇怪的声音。我打电话给客户服务,但保修期已经过期了,\n","所以我只好购买了另一台。值得注意的是,这类产品的整体质量在过去几年里有所下降\n",",所以他们在一定程度上依靠品牌认知和消费者忠诚来维持销售。在大约两天内,我收到了新的搅拌机。\n","\"\"\"\n","\n","reviews = [review_1, review_2, review_3, review_4]\n"]},{"cell_type":"code","execution_count":4,"id":"eb878522","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["评论1: 熊猫公仔是生日礼物,女儿喜欢,软可爱,面部表情和善。价钱有点小,快递提前一天到货。 \n","\n","评论2: 漂亮卧室灯,储物功能,快速送达,灯线问题,快速解决,容易装配,关心客户和产品。 \n","\n","评论3: 这款电动牙刷电池续航好,但牙刷头太小,价格合理,清洁效果好。 \n","\n","评论4: 该评论提到了一个17件套装的产品,在11月份有折扣销售,但在12月份价格上涨。评论者提到了产品的外观和使用方法,并提到了产品质量下降的问题。最后,评论者提到他们购买了另一台搅拌机。 \n","\n"]}],"source":["for i in range(len(reviews)):\n"," prompt = f\"\"\"\n"," 你的任务是从电子商务网站上的产品评论中提取相关信息。\n","\n"," 请对三个反引号之间的评论文本进行概括,最多20个词汇。\n","\n"," 评论文本: ```{reviews[i]}```\n"," \"\"\"\n"," response = get_completion(prompt)\n"," print(f\"评论{i+1}: \", response, \"\\n\")\n"]},{"cell_type":"markdown","id":"f118c0cc","metadata":{},"source":["## 三、英文版"]},{"cell_type":"markdown","id":"a08635df","metadata":{},"source":["**1.1 单一文本概括**"]},{"cell_type":"code","execution_count":12,"id":"e55327d5","metadata":{},"outputs":[],"source":["prod_review = \"\"\"\n","Got this panda plush toy for my daughter's birthday, \\\n","who loves it and takes it everywhere. It's soft and \\ \n","super cute, and its face has a friendly look. It's \\ \n","a bit small for what I paid though. I think there \\ \n","might be other options that are bigger for the \\ \n","same price. It arrived a day earlier than expected, \\ \n","so I got to play with it myself before I gave it \\ \n","to her.\n","\"\"\""]},{"cell_type":"code","execution_count":13,"id":"30c2ef51","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["This panda plush toy is loved by the reviewer's daughter, but they feel it is a bit small for the price.\n"]}],"source":["prompt = f\"\"\"\n","Your task is to generate a short summary of a product \\\n","review from an ecommerce site. \n","\n","Summarize the review below, delimited by triple \n","backticks, in at most 30 words. \n","\n","Review: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"9bdcfc1b","metadata":{},"source":["**1.2 设置关键角度侧重**"]},{"cell_type":"markdown","id":"5dd0534f","metadata":{},"source":["1.2.1 侧重于快递服务"]},{"cell_type":"code","execution_count":14,"id":"b354cc3f","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["The customer is happy with the product but suggests offering larger options for the same price. They were pleased with the early delivery.\n"]}],"source":["prompt = f\"\"\"\n","Your task is to generate a short summary of a product \\\n","review from an ecommerce site to give feedback to the \\\n","Shipping deparmtment. \n","\n","Summarize the review below, delimited by triple \n","backticks, in at most 30 words, and focusing on any aspects \\\n","that mention shipping and delivery of the product. \n","\n","Review: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"af6aaf3a","metadata":{},"source":["1.2.2 侧重于价格和质量"]},{"cell_type":"code","execution_count":15,"id":"1b5358fd","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["The customer loves the panda plush toy for its softness and cuteness, but feels it is overpriced compared to other options available.\n"]}],"source":["prompt = f\"\"\"\n","Your task is to generate a short summary of a product \\\n","review from an ecommerce site to give feedback to the \\\n","pricing deparmtment, responsible for determining the \\\n","price of the product. \n","\n","Summarize the review below, delimited by triple \n","backticks, in at most 30 words, and focusing on any aspects \\\n","that are relevant to the price and perceived value. \n","\n","Review: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"0f582677","metadata":{},"source":["**1.3 关键信息提取**"]},{"cell_type":"code","execution_count":16,"id":"32c87014","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["The shipping department should take note that the product arrived a day earlier than expected.\n"]}],"source":["prompt = f\"\"\"\n","Your task is to extract relevant information from \\ \n","a product review from an ecommerce site to give \\\n","feedback to the Shipping department. \n","\n","From the review below, delimited by triple quotes \\\n","extract the information relevant to shipping and \\ \n","delivery. Limit to 30 words. \n","\n","Review: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"2043d100","metadata":{},"source":["**2.1 同时概括多条文本**"]},{"cell_type":"code","execution_count":17,"id":"cff48486","metadata":{},"outputs":[],"source":["review_1 = prod_review \n","\n","# review for a standing lamp\n","review_2 = \"\"\"\n","Needed a nice lamp for my bedroom, and this one \\\n","had additional storage and not too high of a price \\\n","point. Got it fast - arrived in 2 days. The string \\\n","to the lamp broke during the transit and the company \\\n","happily sent over a new one. Came within a few days \\\n","as well. It was easy to put together. Then I had a \\\n","missing part, so I contacted their support and they \\\n","very quickly got me the missing piece! Seems to me \\\n","to be a great company that cares about their customers \\\n","and products. \n","\"\"\"\n","\n","# review for an electric toothbrush\n","review_3 = \"\"\"\n","My dental hygienist recommended an electric toothbrush, \\\n","which is why I got this. The battery life seems to be \\\n","pretty impressive so far. After initial charging and \\\n","leaving the charger plugged in for the first week to \\\n","condition the battery, I've unplugged the charger and \\\n","been using it for twice daily brushing for the last \\\n","3 weeks all on the same charge. But the toothbrush head \\\n","is too small. I’ve seen baby toothbrushes bigger than \\\n","this one. I wish the head was bigger with different \\\n","length bristles to get between teeth better because \\\n","this one doesn’t. Overall if you can get this one \\\n","around the $50 mark, it's a good deal. The manufactuer's \\\n","replacements heads are pretty expensive, but you can \\\n","get generic ones that're more reasonably priced. This \\\n","toothbrush makes me feel like I've been to the dentist \\\n","every day. My teeth feel sparkly clean! \n","\"\"\"\n","\n","# review for a blender\n","review_4 = \"\"\"\n","So, they still had the 17 piece system on seasonal \\\n","sale for around $49 in the month of November, about \\\n","half off, but for some reason (call it price gouging) \\\n","around the second week of December the prices all went \\\n","up to about anywhere from between $70-$89 for the same \\\n","system. And the 11 piece system went up around $10 or \\\n","so in price also from the earlier sale price of $29. \\\n","So it looks okay, but if you look at the base, the part \\\n","where the blade locks into place doesn’t look as good \\\n","as in previous editions from a few years ago, but I \\\n","plan to be very gentle with it (example, I crush \\\n","very hard items like beans, ice, rice, etc. in the \\\n","blender first then pulverize them in the serving size \\\n","I want in the blender then switch to the whipping \\\n","blade for a finer flour, and use the cross cutting blade \\\n","first when making smoothies, then use the flat blade \\\n","if I need them finer/less pulpy). Special tip when making \\\n","smoothies, finely cut and freeze the fruits and \\\n","vegetables (if using spinach-lightly stew soften the \\\n","spinach then freeze until ready for use-and if making \\\n","sorbet, use a small to medium sized food processor) \\\n","that you plan to use that way you can avoid adding so \\\n","much ice if at all-when making your smoothie. \\\n","After about a year, the motor was making a funny noise. \\\n","I called customer service but the warranty expired \\\n","already, so I had to buy another one. FYI: The overall \\\n","quality has gone done in these types of products, so \\\n","they are kind of counting on brand recognition and \\\n","consumer loyalty to maintain sales. Got it in about \\\n","two days.\n","\"\"\"\n","\n","reviews = [review_1, review_2, review_3, review_4]"]},{"cell_type":"code","execution_count":18,"id":"3f61080b","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["0 Soft and cute panda plush toy loved by daughter, but small for the price. Arrived early. \n","\n","1 Great lamp with storage, fast delivery, excellent customer service, and easy assembly. Highly recommended. \n","\n","2 Impressive battery life, but toothbrush head is too small. Good deal if bought around $50. \n","\n","3 The reviewer found the price increase after the sale disappointing and noticed a decrease in quality over time. \n","\n"]}],"source":["for i in range(len(reviews)):\n"," prompt = f\"\"\"\n"," Your task is to generate a short summary of a product \\\n"," review from an ecommerce site. \n","\n"," Summarize the review below, delimited by triple \\\n"," backticks in at most 20 words. \n","\n"," Review: ```{reviews[i]}```\n"," \"\"\"\n"," response = get_completion(prompt)\n"," print(i, response, \"\\n\")"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.11"},"latex_envs":{"LaTeX_envs_menu_present":true,"autoclose":false,"autocomplete":true,"bibliofile":"biblio.bib","cite_by":"apalike","current_citInitial":1,"eqLabelWithNumbers":true,"eqNumInitial":1,"hotkeys":{"equation":"Ctrl-E","itemize":"Ctrl-I"},"labels_anchors":false,"latex_user_defs":false,"report_style_numbering":false,"user_envs_cfg":false},"toc":{"base_numbering":1,"nav_menu":{},"number_sections":true,"sideBar":true,"skip_h1_title":false,"title_cell":"Table of Contents","title_sidebar":"Contents","toc_cell":false,"toc_position":{},"toc_section_display":true,"toc_window_display":true}},"nbformat":4,"nbformat_minor":5}