diff --git a/docs/content/C1 Prompt Engineering for Developer/4. 文本概括 Summarizing.ipynb b/docs/content/C1 Prompt Engineering for Developer/4. 文本概括 Summarizing.ipynb index cef2062..6c2f9ea 100644 --- a/docs/content/C1 Prompt Engineering for Developer/4. 文本概括 Summarizing.ipynb +++ b/docs/content/C1 Prompt Engineering for Developer/4. 文本概括 Summarizing.ipynb @@ -1 +1,742 @@ -{"cells":[{"attachments":{},"cell_type":"markdown","id":"b58204ea","metadata":{},"source":["# 第四章 文本概括\n"]},{"attachments":{},"cell_type":"markdown","id":"12fa9ea4","metadata":{},"source":["在繁忙的信息时代,小明是一名热心的开发者,面临着海量的文本信息处理的挑战。他需要通过研究无数的文献资料来为他的项目找到关键的信息,但是时间却远远不够。在他焦头烂额之际,他发现了大型语言模型(LLM)的文本摘要功能。\n","\n","这个功能对小明来说如同灯塔一样,照亮了他处理信息海洋的道路。LLM的强大能力在于它可以将复杂的文本信息简化,提炼出关键的观点,这对于他来说无疑是巨大的帮助。他不再需要花费大量的时间去阅读所有的文档,只需要用LLM将它们概括,就可以快速获取到他所需要的信息。\n","\n","通过编程调用API接口,小明成功实现了这个文本摘要的功能。他感叹道:“这简直就像一道魔法,将无尽的信息海洋变成了清晰的信息源泉。”小明的经历,展现了LLM文本摘要功能的巨大优势:**节省时间**,**提高效率**,以及**精准获取信息**。这就是我们本章要介绍的内容,让我们一起来探索如何利用编程和调用API接口,掌握这个强大的工具。"]},{"attachments":{},"cell_type":"markdown","id":"9cca835b","metadata":{},"source":["## 一、单一文本概括"]},{"attachments":{},"cell_type":"markdown","id":"0c1e1b92","metadata":{},"source":["以商品评论的总结任务为例:对于电商平台来说,网站上往往存在着海量的商品评论,这些评论反映了所有客户的想法。如果我们拥有一个工具去概括这些海量、冗长的评论,便能够快速地浏览更多评论,洞悉客户的偏好,从而指导平台与商家提供更优质的服务。"]},{"attachments":{},"cell_type":"markdown","id":"aad5bd2a","metadata":{},"source":["**输入文本**"]},{"cell_type":"markdown","id":"11c360ae","metadata":{},"source":["这是一段在线商品评价,可能来自于一个在线购物平台,例如亚马逊、淘宝、京东等。评价者为一款熊猫公仔进行了点评,评价内容包括商品的质量、大小、价格和物流速度等因素,以及他的女儿对该商品的喜爱程度。"]},{"cell_type":"code","execution_count":2,"id":"43b5dd25","metadata":{},"outputs":[],"source":["prod_review = \"\"\"\n","这个熊猫公仔是我给女儿的生日礼物,她很喜欢,去哪都带着。\n","公仔很软,超级可爱,面部表情也很和善。但是相比于价钱来说,\n","它有点小,我感觉在别的地方用同样的价钱能买到更大的。\n","快递比预期提前了一天到货,所以在送给女儿之前,我自己玩了会。\n","\"\"\""]},{"attachments":{},"cell_type":"markdown","id":"662c9cd2","metadata":{},"source":["### 1.1 限制输出文本长度"]},{"attachments":{},"cell_type":"markdown","id":"a6d10814","metadata":{},"source":["我们尝试将文本的长度限制在30个字以内。"]},{"cell_type":"code","execution_count":5,"id":"bf4b39f9","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["熊猫公仔软可爱,女儿喜欢,但有点小。快递提前一天到货。\n"]}],"source":["from tool import get_completion\n","\n","prompt = f\"\"\"\n","您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n","\n","请对三个反引号之间的评论文本进行概括,最多30个字。\n","\n","评论: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"fce32884","metadata":{},"source":["我们可以看到语言模型给了我们一个符合要求的结果。\n","\n","注意:在上一节中我们提到了语言模型在计算和判断文本长度时依赖于分词器,而分词器在字符统计方面不具备完美精度。"]},{"attachments":{},"cell_type":"markdown","id":"e9ab145e","metadata":{},"source":["### 1.2 设置关键角度侧重"]},{"attachments":{},"cell_type":"markdown","id":"f84d0123","metadata":{},"source":["在某些情况下,我们会针对不同的业务场景对文本的侧重会有所不同。例如,在商品评论文本中,物流部门可能更专注于运输的时效性,商家则更关注价格和商品质量,而平台则更看重整体的用户体验。\n","\n","我们可以通过增强输入提示(Prompt),来强调我们对某一特定视角的重视。"]},{"attachments":{},"cell_type":"markdown","id":"d6f8509a","metadata":{},"source":["#### 1.2.1 侧重于快递服务"]},{"cell_type":"code","execution_count":7,"id":"80636c3e","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["快递提前到货,公仔可爱但有点小。\n"]}],"source":["prompt = f\"\"\"\n","您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n","\n","请对三个反引号之间的评论文本进行概括,最多30个字,并且侧重在快递服务上。\n","\n","评论: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"attachments":{},"cell_type":"markdown","id":"76c97fea","metadata":{},"source":["通过输出结果,我们可以看到,文本以“快递提前到货”开头,体现了对于快递效率的侧重。"]},{"attachments":{},"cell_type":"markdown","id":"83275907","metadata":{},"source":["#### 1.2.2 侧重于价格与质量"]},{"cell_type":"code","execution_count":8,"id":"728d6c57","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["可爱的熊猫公仔,质量好但有点小,价格稍高。快递提前到货。\n"]}],"source":["prompt = f\"\"\"\n","您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n","\n","请对三个反引号之间的评论文本进行概括,最多30个词汇,并且侧重在产品价格和质量上。\n","\n","评论: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"attachments":{},"cell_type":"markdown","id":"972dbb1b","metadata":{},"source":["通过输出的结果,我们可以看到,文本以“可爱的熊猫公仔,质量好但有点小,价格稍高”开头,体现了对于产品价格与质量的侧重。"]},{"attachments":{},"cell_type":"markdown","id":"b3ed53d2","metadata":{},"source":["### 1.3 关键信息提取"]},{"attachments":{},"cell_type":"markdown","id":"ba6f5c25","metadata":{},"source":["在1.2节中,虽然我们通过添加关键角度侧重的 Prompt ,确实让文本摘要更侧重于某一特定方面,然而,我们可以发现,在结果中也会保留一些其他信息,比如偏重价格与质量角度的概括中仍保留了“快递提前到货”的信息。如果我们只想要提取某一角度的信息,并过滤掉其他所有信息,则可以要求 LLM 进行“**文本提取( Extract )**”而非“概括( Summarize )”。"]},{"cell_type":"markdown","id":"da39760c","metadata":{},"source":["下面让我们来一起来对文本进行提取信息吧!"]},{"cell_type":"code","execution_count":9,"id":"c845ccab","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["产品运输相关的信息:快递提前一天到货。\n"]}],"source":["prompt = f\"\"\"\n","您的任务是从电子商务网站上的产品评论中提取相关信息。\n","\n","请从以下三个反引号之间的评论文本中提取产品运输相关的信息,最多30个词汇。\n","\n","评论: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"attachments":{},"cell_type":"markdown","id":"50498a2b","metadata":{},"source":["## 二、同时概括多条文本"]},{"attachments":{},"cell_type":"markdown","id":"a291541a","metadata":{},"source":["在实际的工作流中,我们往往要处理大量的评论文本,下面的示例将多条用户评价集合在一个列表中,并利用 ```for``` 循环和文本概括(Summarize)提示词,将评价概括至小于 20 个词以下,并按顺序打印。当然,在实际生产中,对于不同规模的评论文本,除了使用 ```for``` 循环以外,还可能需要考虑整合评论、分布式等方法提升运算效率。您可以搭建主控面板,来总结大量用户评论,以及方便您或他人快速浏览,还可以点击查看原评论。这样,您就能高效掌握顾客的所有想法。"]},{"cell_type":"code","execution_count":3,"id":"ef606961","metadata":{},"outputs":[],"source":["review_1 = prod_review\n","\n","# 一盏落地灯的评论\n","review_2 = \"\"\"\n","我需要一盏漂亮的卧室灯,这款灯不仅具备额外的储物功能,价格也并不算太高。\n","收货速度非常快,仅用了两天的时间就送到了。\n","不过,在运输过程中,灯的拉线出了问题,幸好,公司很乐意寄送了一根全新的灯线。\n","新的灯线也很快就送到手了,只用了几天的时间。\n","装配非常容易。然而,之后我发现有一个零件丢失了,于是我联系了客服,他们迅速地给我寄来了缺失的零件!\n","对我来说,这是一家非常关心客户和产品的优秀公司。\n","\"\"\"\n","\n","# 一把电动牙刷的评论\n","review_3 = \"\"\"\n","我的牙科卫生员推荐了电动牙刷,所以我就买了这款。\n","到目前为止,电池续航表现相当不错。\n","初次充电后,我在第一周一直将充电器插着,为的是对电池进行条件养护。\n","过去的3周里,我每天早晚都使用它刷牙,但电池依然维持着原来的充电状态。\n","不过,牙刷头太小了。我见过比这个牙刷头还大的婴儿牙刷。\n","我希望牙刷头更大一些,带有不同长度的刷毛,\n","这样可以更好地清洁牙齿间的空隙,但这款牙刷做不到。\n","总的来说,如果你能以50美元左右的价格购买到这款牙刷,那是一个不错的交易。\n","制造商的替换刷头相当昂贵,但你可以购买价格更为合理的通用刷头。\n","这款牙刷让我感觉就像每天都去了一次牙医,我的牙齿感觉非常干净!\n","\"\"\"\n","\n","# 一台搅拌机的评论\n","review_4 = \"\"\"\n","在11月份期间,这个17件套装还在季节性促销中,售价约为49美元,打了五折左右。\n","可是由于某种原因(我们可以称之为价格上涨),到了12月的第二周,所有的价格都上涨了,\n","同样的套装价格涨到了70-89美元不等。而11件套装的价格也从之前的29美元上涨了约10美元。\n","看起来还算不错,但是如果你仔细看底座,刀片锁定的部分看起来没有前几年版本的那么漂亮。\n","然而,我打算非常小心地使用它\n","(例如,我会先在搅拌机中研磨豆类、冰块、大米等坚硬的食物,然后再将它们研磨成所需的粒度,\n","接着切换到打蛋器刀片以获得更细的面粉,如果我需要制作更细腻/少果肉的食物)。\n","在制作冰沙时,我会将要使用的水果和蔬菜切成细小块并冷冻\n","(如果使用菠菜,我会先轻微煮熟菠菜,然后冷冻,直到使用时准备食用。\n","如果要制作冰糕,我会使用一个小到中号的食物加工器),这样你就可以避免添加过多的冰块。\n","大约一年后,电机开始发出奇怪的声音。我打电话给客户服务,但保修期已经过期了,\n","所以我只好购买了另一台。值得注意的是,这类产品的整体质量在过去几年里有所下降\n",",所以他们在一定程度上依靠品牌认知和消费者忠诚来维持销售。在大约两天内,我收到了新的搅拌机。\n","\"\"\"\n","\n","reviews = [review_1, review_2, review_3, review_4]\n"]},{"cell_type":"code","execution_count":4,"id":"eb878522","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["评论1: 熊猫公仔是生日礼物,女儿喜欢,软可爱,面部表情和善。价钱有点小,快递提前一天到货。 \n","\n","评论2: 漂亮卧室灯,储物功能,快速送达,灯线问题,快速解决,容易装配,关心客户和产品。 \n","\n","评论3: 这款电动牙刷电池续航好,但牙刷头太小,价格合理,清洁效果好。 \n","\n","评论4: 该评论提到了一个17件套装的产品,在11月份有折扣销售,但在12月份价格上涨。评论者提到了产品的外观和使用方法,并提到了产品质量下降的问题。最后,评论者提到他们购买了另一台搅拌机。 \n","\n"]}],"source":["for i in range(len(reviews)):\n"," prompt = f\"\"\"\n"," 你的任务是从电子商务网站上的产品评论中提取相关信息。\n","\n"," 请对三个反引号之间的评论文本进行概括,最多20个词汇。\n","\n"," 评论文本: ```{reviews[i]}```\n"," \"\"\"\n"," response = get_completion(prompt)\n"," print(f\"评论{i+1}: \", response, \"\\n\")\n"]},{"cell_type":"markdown","id":"f118c0cc","metadata":{},"source":["## 三、英文版"]},{"cell_type":"markdown","id":"a08635df","metadata":{},"source":["**1.1 单一文本概括**"]},{"cell_type":"code","execution_count":12,"id":"e55327d5","metadata":{},"outputs":[],"source":["prod_review = \"\"\"\n","Got this panda plush toy for my daughter's birthday, \\\n","who loves it and takes it everywhere. It's soft and \\ \n","super cute, and its face has a friendly look. It's \\ \n","a bit small for what I paid though. I think there \\ \n","might be other options that are bigger for the \\ \n","same price. It arrived a day earlier than expected, \\ \n","so I got to play with it myself before I gave it \\ \n","to her.\n","\"\"\""]},{"cell_type":"code","execution_count":13,"id":"30c2ef51","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["This panda plush toy is loved by the reviewer's daughter, but they feel it is a bit small for the price.\n"]}],"source":["prompt = f\"\"\"\n","Your task is to generate a short summary of a product \\\n","review from an ecommerce site. \n","\n","Summarize the review below, delimited by triple \n","backticks, in at most 30 words. \n","\n","Review: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"9bdcfc1b","metadata":{},"source":["**1.2 设置关键角度侧重**"]},{"cell_type":"markdown","id":"5dd0534f","metadata":{},"source":["1.2.1 侧重于快递服务"]},{"cell_type":"code","execution_count":14,"id":"b354cc3f","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["The customer is happy with the product but suggests offering larger options for the same price. They were pleased with the early delivery.\n"]}],"source":["prompt = f\"\"\"\n","Your task is to generate a short summary of a product \\\n","review from an ecommerce site to give feedback to the \\\n","Shipping deparmtment. \n","\n","Summarize the review below, delimited by triple \n","backticks, in at most 30 words, and focusing on any aspects \\\n","that mention shipping and delivery of the product. \n","\n","Review: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"af6aaf3a","metadata":{},"source":["1.2.2 侧重于价格和质量"]},{"cell_type":"code","execution_count":15,"id":"1b5358fd","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["The customer loves the panda plush toy for its softness and cuteness, but feels it is overpriced compared to other options available.\n"]}],"source":["prompt = f\"\"\"\n","Your task is to generate a short summary of a product \\\n","review from an ecommerce site to give feedback to the \\\n","pricing deparmtment, responsible for determining the \\\n","price of the product. \n","\n","Summarize the review below, delimited by triple \n","backticks, in at most 30 words, and focusing on any aspects \\\n","that are relevant to the price and perceived value. \n","\n","Review: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"0f582677","metadata":{},"source":["**1.3 关键信息提取**"]},{"cell_type":"code","execution_count":16,"id":"32c87014","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["The shipping department should take note that the product arrived a day earlier than expected.\n"]}],"source":["prompt = f\"\"\"\n","Your task is to extract relevant information from \\ \n","a product review from an ecommerce site to give \\\n","feedback to the Shipping department. \n","\n","From the review below, delimited by triple quotes \\\n","extract the information relevant to shipping and \\ \n","delivery. Limit to 30 words. \n","\n","Review: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"2043d100","metadata":{},"source":["**2.1 同时概括多条文本**"]},{"cell_type":"code","execution_count":17,"id":"cff48486","metadata":{},"outputs":[],"source":["review_1 = prod_review \n","\n","# review for a standing lamp\n","review_2 = \"\"\"\n","Needed a nice lamp for my bedroom, and this one \\\n","had additional storage and not too high of a price \\\n","point. Got it fast - arrived in 2 days. The string \\\n","to the lamp broke during the transit and the company \\\n","happily sent over a new one. Came within a few days \\\n","as well. It was easy to put together. Then I had a \\\n","missing part, so I contacted their support and they \\\n","very quickly got me the missing piece! Seems to me \\\n","to be a great company that cares about their customers \\\n","and products. \n","\"\"\"\n","\n","# review for an electric toothbrush\n","review_3 = \"\"\"\n","My dental hygienist recommended an electric toothbrush, \\\n","which is why I got this. The battery life seems to be \\\n","pretty impressive so far. After initial charging and \\\n","leaving the charger plugged in for the first week to \\\n","condition the battery, I've unplugged the charger and \\\n","been using it for twice daily brushing for the last \\\n","3 weeks all on the same charge. But the toothbrush head \\\n","is too small. I’ve seen baby toothbrushes bigger than \\\n","this one. I wish the head was bigger with different \\\n","length bristles to get between teeth better because \\\n","this one doesn’t. Overall if you can get this one \\\n","around the $50 mark, it's a good deal. The manufactuer's \\\n","replacements heads are pretty expensive, but you can \\\n","get generic ones that're more reasonably priced. This \\\n","toothbrush makes me feel like I've been to the dentist \\\n","every day. My teeth feel sparkly clean! \n","\"\"\"\n","\n","# review for a blender\n","review_4 = \"\"\"\n","So, they still had the 17 piece system on seasonal \\\n","sale for around $49 in the month of November, about \\\n","half off, but for some reason (call it price gouging) \\\n","around the second week of December the prices all went \\\n","up to about anywhere from between $70-$89 for the same \\\n","system. And the 11 piece system went up around $10 or \\\n","so in price also from the earlier sale price of $29. \\\n","So it looks okay, but if you look at the base, the part \\\n","where the blade locks into place doesn’t look as good \\\n","as in previous editions from a few years ago, but I \\\n","plan to be very gentle with it (example, I crush \\\n","very hard items like beans, ice, rice, etc. in the \\\n","blender first then pulverize them in the serving size \\\n","I want in the blender then switch to the whipping \\\n","blade for a finer flour, and use the cross cutting blade \\\n","first when making smoothies, then use the flat blade \\\n","if I need them finer/less pulpy). Special tip when making \\\n","smoothies, finely cut and freeze the fruits and \\\n","vegetables (if using spinach-lightly stew soften the \\\n","spinach then freeze until ready for use-and if making \\\n","sorbet, use a small to medium sized food processor) \\\n","that you plan to use that way you can avoid adding so \\\n","much ice if at all-when making your smoothie. \\\n","After about a year, the motor was making a funny noise. \\\n","I called customer service but the warranty expired \\\n","already, so I had to buy another one. FYI: The overall \\\n","quality has gone done in these types of products, so \\\n","they are kind of counting on brand recognition and \\\n","consumer loyalty to maintain sales. Got it in about \\\n","two days.\n","\"\"\"\n","\n","reviews = [review_1, review_2, review_3, review_4]"]},{"cell_type":"code","execution_count":18,"id":"3f61080b","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["0 Soft and cute panda plush toy loved by daughter, but small for the price. Arrived early. \n","\n","1 Great lamp with storage, fast delivery, excellent customer service, and easy assembly. Highly recommended. \n","\n","2 Impressive battery life, but toothbrush head is too small. Good deal if bought around $50. \n","\n","3 The reviewer found the price increase after the sale disappointing and noticed a decrease in quality over time. \n","\n"]}],"source":["for i in range(len(reviews)):\n"," prompt = f\"\"\"\n"," Your task is to generate a short summary of a product \\\n"," review from an ecommerce site. \n","\n"," Summarize the review below, delimited by triple \\\n"," backticks in at most 20 words. \n","\n"," Review: ```{reviews[i]}```\n"," \"\"\"\n"," response = get_completion(prompt)\n"," print(i, response, \"\\n\")"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.11"},"latex_envs":{"LaTeX_envs_menu_present":true,"autoclose":false,"autocomplete":true,"bibliofile":"biblio.bib","cite_by":"apalike","current_citInitial":1,"eqLabelWithNumbers":true,"eqNumInitial":1,"hotkeys":{"equation":"Ctrl-E","itemize":"Ctrl-I"},"labels_anchors":false,"latex_user_defs":false,"report_style_numbering":false,"user_envs_cfg":false},"toc":{"base_numbering":1,"nav_menu":{},"number_sections":true,"sideBar":true,"skip_h1_title":false,"title_cell":"Table of Contents","title_sidebar":"Contents","toc_cell":false,"toc_position":{},"toc_section_display":true,"toc_window_display":true}},"nbformat":4,"nbformat_minor":5} +{ + "cells": [ + { + "cell_type": "markdown", + "id": "b58204ea", + "metadata": {}, + "source": [ + "# 第四章 文本概括\n" + ] + }, + { + "cell_type": "markdown", + "id": "12fa9ea4", + "metadata": {}, + "source": [ + "在繁忙的信息时代,小明是一名热心的开发者,面临着海量的文本信息处理的挑战。他需要通过研究无数的文献资料来为他的项目找到关键的信息,但是时间却远远不够。在他焦头烂额之际,他发现了大型语言模型(LLM)的文本摘要功能。\n", + "\n", + "这个功能对小明来说如同灯塔一样,照亮了他处理信息海洋的道路。LLM 的强大能力在于它可以将复杂的文本信息简化,提炼出关键的观点,这对于他来说无疑是巨大的帮助。他不再需要花费大量的时间去阅读所有的文档,只需要用 LLM 将它们概括,就可以快速获取到他所需要的信息。\n", + "\n", + "通过编程调用 AP I接口,小明成功实现了这个文本摘要的功能。他感叹道:“这简直就像一道魔法,将无尽的信息海洋变成了清晰的信息源泉。”小明的经历,展现了LLM文本摘要功能的巨大优势:**节省时间**,**提高效率**,以及**精准获取信息**。这就是我们本章要介绍的内容,让我们一起来探索如何利用编程和调用API接口,掌握这个强大的工具。" + ] + }, + { + "cell_type": "markdown", + "id": "9cca835b", + "metadata": {}, + "source": [ + "## 一、单一文本概括" + ] + }, + { + "cell_type": "markdown", + "id": "0c1e1b92", + "metadata": {}, + "source": [ + "以商品评论的总结任务为例:对于电商平台来说,网站上往往存在着海量的商品评论,这些评论反映了所有客户的想法。如果我们拥有一个工具去概括这些海量、冗长的评论,便能够快速地浏览更多评论,洞悉客户的偏好,从而指导平台与商家提供更优质的服务。" + ] + }, + { + "cell_type": "markdown", + "id": "11c360ae", + "metadata": {}, + "source": [ + "接下来我们提供一段在线商品评价作为示例,可能来自于一个在线购物平台,例如亚马逊、淘宝、京东等。评价者为一款熊猫公仔进行了点评,评价内容包括商品的质量、大小、价格和物流速度等因素,以及他的女儿对该商品的喜爱程度。" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "43b5dd25", + "metadata": {}, + "outputs": [], + "source": [ + "prod_review = \"\"\"\n", + "这个熊猫公仔是我给女儿的生日礼物,她很喜欢,去哪都带着。\n", + "公仔很软,超级可爱,面部表情也很和善。但是相比于价钱来说,\n", + "它有点小,我感觉在别的地方用同样的价钱能买到更大的。\n", + "快递比预期提前了一天到货,所以在送给女儿之前,我自己玩了会。\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "id": "662c9cd2", + "metadata": {}, + "source": [ + "### 1.1 限制输出文本长度" + ] + }, + { + "cell_type": "markdown", + "id": "a6d10814", + "metadata": {}, + "source": [ + "我们首先尝试将文本的长度限制在30个字以内。" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "bf4b39f9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "熊猫公仔软可爱,女儿喜欢,但有点小。快递提前一天到货。\n" + ] + } + ], + "source": [ + "from tool import get_completion\n", + "\n", + "prompt = f\"\"\"\n", + "您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n", + "\n", + "请对三个反引号之间的评论文本进行概括,最多30个字。\n", + "\n", + "评论: ```{prod_review}```\n", + "\"\"\"\n", + "\n", + "response = get_completion(prompt)\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "fce32884", + "metadata": {}, + "source": [ + "我们可以看到语言模型给了我们一个符合要求的结果。\n", + "\n", + "注意:在上一节中我们提到了语言模型在计算和判断文本长度时依赖于分词器,而分词器在字符统计方面不具备完美精度。" + ] + }, + { + "cell_type": "markdown", + "id": "e9ab145e", + "metadata": {}, + "source": [ + "### 1.2 设置关键角度侧重" + ] + }, + { + "cell_type": "markdown", + "id": "f84d0123", + "metadata": {}, + "source": [ + "在某些情况下,我们会针对不同的业务场景对文本的侧重会有所不同。例如,在商品评论文本中,物流部门可能更专注于运输的时效性,商家则更关注价格和商品质量,而平台则更看重整体的用户体验。\n", + "\n", + "我们可以通过增强输入提示(Prompt),来强调我们对某一特定视角的重视。" + ] + }, + { + "cell_type": "markdown", + "id": "d6f8509a", + "metadata": {}, + "source": [ + "#### 1.2.1 侧重于快递服务" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "80636c3e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "快递提前到货,公仔可爱但有点小。\n" + ] + } + ], + "source": [ + "prompt = f\"\"\"\n", + "您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n", + "\n", + "请对三个反引号之间的评论文本进行概括,最多30个字,并且侧重在快递服务上。\n", + "\n", + "评论: ```{prod_review}```\n", + "\"\"\"\n", + "\n", + "response = get_completion(prompt)\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "76c97fea", + "metadata": {}, + "source": [ + "通过输出结果,我们可以看到,文本以“快递提前到货”开头,体现了对于快递效率的侧重。" + ] + }, + { + "cell_type": "markdown", + "id": "83275907", + "metadata": {}, + "source": [ + "#### 1.2.2 侧重于价格与质量" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "728d6c57", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "可爱的熊猫公仔,质量好但有点小,价格稍高。快递提前到货。\n" + ] + } + ], + "source": [ + "prompt = f\"\"\"\n", + "您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n", + "\n", + "请对三个反引号之间的评论文本进行概括,最多30个词汇,并且侧重在产品价格和质量上。\n", + "\n", + "评论: ```{prod_review}```\n", + "\"\"\"\n", + "\n", + "response = get_completion(prompt)\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "972dbb1b", + "metadata": {}, + "source": [ + "通过输出的结果,我们可以看到,文本以“可爱的熊猫公仔,质量好但有点小,价格稍高”开头,体现了对于产品价格与质量的侧重。" + ] + }, + { + "cell_type": "markdown", + "id": "b3ed53d2", + "metadata": {}, + "source": [ + "### 1.3 关键信息提取" + ] + }, + { + "cell_type": "markdown", + "id": "ba6f5c25", + "metadata": {}, + "source": [ + "在1.2节中,虽然我们通过添加关键角度侧重的 Prompt ,确实让文本摘要更侧重于某一特定方面,然而,我们可以发现,在结果中也会保留一些其他信息,比如偏重价格与质量角度的概括中仍保留了“快递提前到货”的信息。如果我们只想要提取某一角度的信息,并过滤掉其他所有信息,则可以要求 LLM 进行 **文本提取(Extract)** 而非概括( Summarize )。" + ] + }, + { + "cell_type": "markdown", + "id": "da39760c", + "metadata": {}, + "source": [ + "下面让我们来一起来对文本进行提取信息吧!" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "c845ccab", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "产品运输相关的信息:快递提前一天到货。\n" + ] + } + ], + "source": [ + "prompt = f\"\"\"\n", + "您的任务是从电子商务网站上的产品评论中提取相关信息。\n", + "\n", + "请从以下三个反引号之间的评论文本中提取产品运输相关的信息,最多30个词汇。\n", + "\n", + "评论: ```{prod_review}```\n", + "\"\"\"\n", + "\n", + "response = get_completion(prompt)\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "50498a2b", + "metadata": {}, + "source": [ + "## 二、同时概括多条文本" + ] + }, + { + "cell_type": "markdown", + "id": "a291541a", + "metadata": {}, + "source": [ + "在实际的工作流中,我们往往要处理大量的评论文本,下面的示例将多条用户评价集合在一个列表中,并利用 ```for``` 循环和文本概括(Summarize)提示词,将评价概括至小于 20 个词以下,并按顺序打印。当然,在实际生产中,对于不同规模的评论文本,除了使用 ```for``` 循环以外,还可能需要考虑整合评论、分布式等方法提升运算效率。您可以搭建主控面板,来总结大量用户评论,以及方便您或他人快速浏览,还可以点击查看原评论。这样,您就能高效掌握顾客的所有想法。" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "ef606961", + "metadata": {}, + "outputs": [], + "source": [ + "review_1 = prod_review\n", + "\n", + "# 一盏落地灯的评论\n", + "review_2 = \"\"\"\n", + "我需要一盏漂亮的卧室灯,这款灯不仅具备额外的储物功能,价格也并不算太高。\n", + "收货速度非常快,仅用了两天的时间就送到了。\n", + "不过,在运输过程中,灯的拉线出了问题,幸好,公司很乐意寄送了一根全新的灯线。\n", + "新的灯线也很快就送到手了,只用了几天的时间。\n", + "装配非常容易。然而,之后我发现有一个零件丢失了,于是我联系了客服,他们迅速地给我寄来了缺失的零件!\n", + "对我来说,这是一家非常关心客户和产品的优秀公司。\n", + "\"\"\"\n", + "\n", + "# 一把电动牙刷的评论\n", + "review_3 = \"\"\"\n", + "我的牙科卫生员推荐了电动牙刷,所以我就买了这款。\n", + "到目前为止,电池续航表现相当不错。\n", + "初次充电后,我在第一周一直将充电器插着,为的是对电池进行条件养护。\n", + "过去的3周里,我每天早晚都使用它刷牙,但电池依然维持着原来的充电状态。\n", + "不过,牙刷头太小了。我见过比这个牙刷头还大的婴儿牙刷。\n", + "我希望牙刷头更大一些,带有不同长度的刷毛,\n", + "这样可以更好地清洁牙齿间的空隙,但这款牙刷做不到。\n", + "总的来说,如果你能以50美元左右的价格购买到这款牙刷,那是一个不错的交易。\n", + "制造商的替换刷头相当昂贵,但你可以购买价格更为合理的通用刷头。\n", + "这款牙刷让我感觉就像每天都去了一次牙医,我的牙齿感觉非常干净!\n", + "\"\"\"\n", + "\n", + "# 一台搅拌机的评论\n", + "review_4 = \"\"\"\n", + "在11月份期间,这个17件套装还在季节性促销中,售价约为49美元,打了五折左右。\n", + "可是由于某种原因(我们可以称之为价格上涨),到了12月的第二周,所有的价格都上涨了,\n", + "同样的套装价格涨到了70-89美元不等。而11件套装的价格也从之前的29美元上涨了约10美元。\n", + "看起来还算不错,但是如果你仔细看底座,刀片锁定的部分看起来没有前几年版本的那么漂亮。\n", + "然而,我打算非常小心地使用它\n", + "(例如,我会先在搅拌机中研磨豆类、冰块、大米等坚硬的食物,然后再将它们研磨成所需的粒度,\n", + "接着切换到打蛋器刀片以获得更细的面粉,如果我需要制作更细腻/少果肉的食物)。\n", + "在制作冰沙时,我会将要使用的水果和蔬菜切成细小块并冷冻\n", + "(如果使用菠菜,我会先轻微煮熟菠菜,然后冷冻,直到使用时准备食用。\n", + "如果要制作冰糕,我会使用一个小到中号的食物加工器),这样你就可以避免添加过多的冰块。\n", + "大约一年后,电机开始发出奇怪的声音。我打电话给客户服务,但保修期已经过期了,\n", + "所以我只好购买了另一台。值得注意的是,这类产品的整体质量在过去几年里有所下降\n", + ",所以他们在一定程度上依靠品牌认知和消费者忠诚来维持销售。在大约两天内,我收到了新的搅拌机。\n", + "\"\"\"\n", + "\n", + "reviews = [review_1, review_2, review_3, review_4]\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "eb878522", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "评论1: 熊猫公仔是生日礼物,女儿喜欢,软可爱,面部表情和善。价钱有点小,快递提前一天到货。 \n", + "\n", + "评论2: 漂亮卧室灯,储物功能,快速送达,灯线问题,快速解决,容易装配,关心客户和产品。 \n", + "\n", + "评论3: 这款电动牙刷电池续航好,但牙刷头太小,价格合理,清洁效果好。 \n", + "\n", + "评论4: 该评论提到了一个17件套装的产品,在11月份有折扣销售,但在12月份价格上涨。评论者提到了产品的外观和使用方法,并提到了产品质量下降的问题。最后,评论者提到他们购买了另一台搅拌机。 \n", + "\n" + ] + } + ], + "source": [ + "for i in range(len(reviews)):\n", + " prompt = f\"\"\"\n", + " 你的任务是从电子商务网站上的产品评论中提取相关信息。\n", + "\n", + " 请对三个反引号之间的评论文本进行概括,最多20个词汇。\n", + "\n", + " 评论文本: ```{reviews[i]}```\n", + " \"\"\"\n", + " response = get_completion(prompt)\n", + " print(f\"评论{i+1}: \", response, \"\\n\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "f118c0cc", + "metadata": {}, + "source": [ + "## 三、英文版" + ] + }, + { + "cell_type": "markdown", + "id": "a08635df", + "metadata": {}, + "source": [ + "**1.1 单一文本概括**" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "e55327d5", + "metadata": {}, + "outputs": [], + "source": [ + "prod_review = \"\"\"\n", + "Got this panda plush toy for my daughter's birthday, \\\n", + "who loves it and takes it everywhere. It's soft and \\ \n", + "super cute, and its face has a friendly look. It's \\ \n", + "a bit small for what I paid though. I think there \\ \n", + "might be other options that are bigger for the \\ \n", + "same price. It arrived a day earlier than expected, \\ \n", + "so I got to play with it myself before I gave it \\ \n", + "to her.\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "30c2ef51", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "This panda plush toy is loved by the reviewer's daughter, but they feel it is a bit small for the price.\n" + ] + } + ], + "source": [ + "prompt = f\"\"\"\n", + "Your task is to generate a short summary of a product \\\n", + "review from an ecommerce site. \n", + "\n", + "Summarize the review below, delimited by triple \n", + "backticks, in at most 30 words. \n", + "\n", + "Review: ```{prod_review}```\n", + "\"\"\"\n", + "\n", + "response = get_completion(prompt)\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "9bdcfc1b", + "metadata": {}, + "source": [ + "**1.2 设置关键角度侧重**" + ] + }, + { + "cell_type": "markdown", + "id": "5dd0534f", + "metadata": {}, + "source": [ + "1.2.1 侧重于快递服务" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "b354cc3f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The customer is happy with the product but suggests offering larger options for the same price. They were pleased with the early delivery.\n" + ] + } + ], + "source": [ + "prompt = f\"\"\"\n", + "Your task is to generate a short summary of a product \\\n", + "review from an ecommerce site to give feedback to the \\\n", + "Shipping deparmtment. \n", + "\n", + "Summarize the review below, delimited by triple \n", + "backticks, in at most 30 words, and focusing on any aspects \\\n", + "that mention shipping and delivery of the product. \n", + "\n", + "Review: ```{prod_review}```\n", + "\"\"\"\n", + "\n", + "response = get_completion(prompt)\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "af6aaf3a", + "metadata": {}, + "source": [ + "1.2.2 侧重于价格和质量" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "1b5358fd", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The customer loves the panda plush toy for its softness and cuteness, but feels it is overpriced compared to other options available.\n" + ] + } + ], + "source": [ + "prompt = f\"\"\"\n", + "Your task is to generate a short summary of a product \\\n", + "review from an ecommerce site to give feedback to the \\\n", + "pricing deparmtment, responsible for determining the \\\n", + "price of the product. \n", + "\n", + "Summarize the review below, delimited by triple \n", + "backticks, in at most 30 words, and focusing on any aspects \\\n", + "that are relevant to the price and perceived value. \n", + "\n", + "Review: ```{prod_review}```\n", + "\"\"\"\n", + "\n", + "response = get_completion(prompt)\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "0f582677", + "metadata": {}, + "source": [ + "**1.3 关键信息提取**" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "32c87014", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The shipping department should take note that the product arrived a day earlier than expected.\n" + ] + } + ], + "source": [ + "prompt = f\"\"\"\n", + "Your task is to extract relevant information from \\ \n", + "a product review from an ecommerce site to give \\\n", + "feedback to the Shipping department. \n", + "\n", + "From the review below, delimited by triple quotes \\\n", + "extract the information relevant to shipping and \\ \n", + "delivery. Limit to 30 words. \n", + "\n", + "Review: ```{prod_review}```\n", + "\"\"\"\n", + "\n", + "response = get_completion(prompt)\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "id": "2043d100", + "metadata": {}, + "source": [ + "**2.1 同时概括多条文本**" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "cff48486", + "metadata": {}, + "outputs": [], + "source": [ + "review_1 = prod_review \n", + "\n", + "# review for a standing lamp\n", + "review_2 = \"\"\"\n", + "Needed a nice lamp for my bedroom, and this one \\\n", + "had additional storage and not too high of a price \\\n", + "point. Got it fast - arrived in 2 days. The string \\\n", + "to the lamp broke during the transit and the company \\\n", + "happily sent over a new one. Came within a few days \\\n", + "as well. It was easy to put together. Then I had a \\\n", + "missing part, so I contacted their support and they \\\n", + "very quickly got me the missing piece! Seems to me \\\n", + "to be a great company that cares about their customers \\\n", + "and products. \n", + "\"\"\"\n", + "\n", + "# review for an electric toothbrush\n", + "review_3 = \"\"\"\n", + "My dental hygienist recommended an electric toothbrush, \\\n", + "which is why I got this. The battery life seems to be \\\n", + "pretty impressive so far. After initial charging and \\\n", + "leaving the charger plugged in for the first week to \\\n", + "condition the battery, I've unplugged the charger and \\\n", + "been using it for twice daily brushing for the last \\\n", + "3 weeks all on the same charge. But the toothbrush head \\\n", + "is too small. I’ve seen baby toothbrushes bigger than \\\n", + "this one. I wish the head was bigger with different \\\n", + "length bristles to get between teeth better because \\\n", + "this one doesn’t. Overall if you can get this one \\\n", + "around the $50 mark, it's a good deal. The manufactuer's \\\n", + "replacements heads are pretty expensive, but you can \\\n", + "get generic ones that're more reasonably priced. This \\\n", + "toothbrush makes me feel like I've been to the dentist \\\n", + "every day. My teeth feel sparkly clean! \n", + "\"\"\"\n", + "\n", + "# review for a blender\n", + "review_4 = \"\"\"\n", + "So, they still had the 17 piece system on seasonal \\\n", + "sale for around $49 in the month of November, about \\\n", + "half off, but for some reason (call it price gouging) \\\n", + "around the second week of December the prices all went \\\n", + "up to about anywhere from between $70-$89 for the same \\\n", + "system. And the 11 piece system went up around $10 or \\\n", + "so in price also from the earlier sale price of $29. \\\n", + "So it looks okay, but if you look at the base, the part \\\n", + "where the blade locks into place doesn’t look as good \\\n", + "as in previous editions from a few years ago, but I \\\n", + "plan to be very gentle with it (example, I crush \\\n", + "very hard items like beans, ice, rice, etc. in the \\\n", + "blender first then pulverize them in the serving size \\\n", + "I want in the blender then switch to the whipping \\\n", + "blade for a finer flour, and use the cross cutting blade \\\n", + "first when making smoothies, then use the flat blade \\\n", + "if I need them finer/less pulpy). Special tip when making \\\n", + "smoothies, finely cut and freeze the fruits and \\\n", + "vegetables (if using spinach-lightly stew soften the \\\n", + "spinach then freeze until ready for use-and if making \\\n", + "sorbet, use a small to medium sized food processor) \\\n", + "that you plan to use that way you can avoid adding so \\\n", + "much ice if at all-when making your smoothie. \\\n", + "After about a year, the motor was making a funny noise. \\\n", + "I called customer service but the warranty expired \\\n", + "already, so I had to buy another one. FYI: The overall \\\n", + "quality has gone done in these types of products, so \\\n", + "they are kind of counting on brand recognition and \\\n", + "consumer loyalty to maintain sales. Got it in about \\\n", + "two days.\n", + "\"\"\"\n", + "\n", + "reviews = [review_1, review_2, review_3, review_4]" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "3f61080b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 Soft and cute panda plush toy loved by daughter, but small for the price. Arrived early. \n", + "\n", + "1 Great lamp with storage, fast delivery, excellent customer service, and easy assembly. Highly recommended. \n", + "\n", + "2 Impressive battery life, but toothbrush head is too small. Good deal if bought around $50. \n", + "\n", + "3 The reviewer found the price increase after the sale disappointing and noticed a decrease in quality over time. \n", + "\n" + ] + } + ], + "source": [ + "for i in range(len(reviews)):\n", + " prompt = f\"\"\"\n", + " Your task is to generate a short summary of a product \\\n", + " review from an ecommerce site. \n", + "\n", + " Summarize the review below, delimited by triple \\\n", + " backticks in at most 20 words. \n", + "\n", + " Review: ```{reviews[i]}```\n", + " \"\"\"\n", + " response = get_completion(prompt)\n", + " print(i, response, \"\\n\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + }, + "latex_envs": { + "LaTeX_envs_menu_present": true, + "autoclose": false, + "autocomplete": true, + "bibliofile": "biblio.bib", + "cite_by": "apalike", + "current_citInitial": 1, + "eqLabelWithNumbers": true, + "eqNumInitial": 1, + "hotkeys": { + "equation": "Ctrl-E", + "itemize": "Ctrl-I" + }, + "labels_anchors": false, + "latex_user_defs": false, + "report_style_numbering": false, + "user_envs_cfg": false + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": true + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/content/C1 Prompt Engineering for Developer/5. 推断 Inferring.ipynb b/docs/content/C1 Prompt Engineering for Developer/5. 推断 Inferring.ipynb index b3a2e59..8e01279 100644 --- a/docs/content/C1 Prompt Engineering for Developer/5. 推断 Inferring.ipynb +++ b/docs/content/C1 Prompt Engineering for Developer/5. 推断 Inferring.ipynb @@ -17,9 +17,9 @@ "\n", "让我们先想象一下,你是一名初创公司的数据分析师,你的任务是从各种产品评论和新闻文章中提取出关键的情感和主题。这些任务包括了标签提取、实体提取、以及理解文本的情感等等。在传统的机器学习流程中,你需要收集标签化的数据集、训练模型、确定如何在云端部署模型并进行推断。尽管这种方式可能会产生不错的效果,但完成这一全流程需要耗费大量的时间和精力。而且,每一个任务,比如情感分析、实体提取等等,都需要训练和部署单独的模型。\n", "\n", - "然而,就在你准备投入繁重工作的时候,你发现了大型语言模型(LLM)。LLM的一个明显优点是,对于许多这样的任务,你只需要编写一个 Prompt,就可以开始生成结果,大大减轻了你的工作负担。这个发现像是找到了一把神奇的钥匙,让应用程序开发的速度加快了许多。最令你兴奋的是,你可以仅仅使用一个模型和一个API来执行许多不同的任务,无需再纠结如何训练和部署许多不同的模型。\n", + "然而,就在你准备投入繁重工作的时候,你发现了大型语言模型(LLM)。LLM 的一个明显优点是,对于许多这样的任务,你只需要编写一个 Prompt,就可以开始生成结果,大大减轻了你的工作负担。这个发现像是找到了一把神奇的钥匙,让应用程序开发的速度加快了许多。最令你兴奋的是,你可以仅仅使用一个模型和一个 API 来执行许多不同的任务,无需再纠结如何训练和部署许多不同的模型。\n", "\n", - "让我们开始这一章的学习,一起探索如何利用LLM加快我们的工作进程,提高我们的工作效率。" + "让我们开始这一章的学习,一起探索如何利用 LLM 加快我们的工作进程,提高我们的工作效率。" ] }, { @@ -102,7 +102,7 @@ "id": "76be2320", "metadata": {}, "source": [ - "如果你想要给出更简洁的答案,以便更容易进行后处理,可以在上述 Prompt 基础上添加另一个指令:*用一个单词回答:「正面」或「负面」*。这样就只会打印出 “正面” 这个单词,这使得输出更加统一,方便后续处理。" + "如果你想要给出更简洁的答案,以便更容易进行后期处理,可以在上述 Prompt 基础上添加另一个指令:*用一个单词回答:「正面」或「负面」*。这样就只会打印出 “正面” 这个单词,这使得输出更加统一,方便后续处理。" ] }, { @@ -136,7 +136,7 @@ "id": "81d2a973-1fa4-4a35-ae35-a2e746c0e91b", "metadata": {}, "source": [ - "### 2.2 识别情感类型" + "### 1.2 识别情感类型" ] }, { @@ -456,7 +456,7 @@ "id": "95b636f1", "metadata": {}, "source": [ - "假设我们有一个新闻网站或类似的平台,这是我们感兴趣的主题:美国航空航天局、当地政府、工程、员工满意度、联邦政府等。我们想要分析一篇新闻文章,理解其包含了哪些主题。可以使用这样的prompt:确定以下主题列表中的每个项目是否是以下文本中的主题。以 0 或 1 的形式给出答案列表。" + "假设我们有一个新闻网站或类似的平台,这是我们感兴趣的主题:美国航空航天局、当地政府、工程、员工满意度、联邦政府等。我们想要分析一篇新闻文章,理解其包含了哪些主题。可以使用这样的 Prompt:确定以下主题列表中的每个项目是否是以下文本中的主题。以 0 或 1 的形式给出答案列表。" ] }, { @@ -499,7 +499,7 @@ "id": "08247dbf", "metadata": {}, "source": [ - "从输出结果来看,这个`story`与关于“美国航空航天局”、“员工满意度”、“联邦政府”、“当地政府”有关,而与“工程”无关。这种能力在机器学习领域被称为零样本(Zero-Shot)学习。这是因为我们并没有提供任何带标签的训练数据,仅凭 Prompt ,它便能判定哪些主题在新闻文章中被包含。\n", + "从输出结果来看,这个 `story` 与关于“美国航空航天局”、“员工满意度”、“联邦政府”、“当地政府”有关,而与“工程”无关。这种能力在机器学习领域被称为零样本(Zero-Shot)学习。这是因为我们并没有提供任何带标签的训练数据,仅凭 Prompt ,它便能判定哪些主题在新闻文章中被包含。\n", "\n", "如果我们希望制定一个新闻提醒,我们同样可以运用这种处理新闻的流程。假设我对“美国航空航天局”的工作深感兴趣,那么你就可以构建一个如此的系统:每当出现与'美国宇航局'相关的新闻,系统就会输出提醒。" ] @@ -961,7 +961,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -975,7 +975,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.11" + "version": "3.8.10" }, "latex_envs": { "LaTeX_envs_menu_present": true, diff --git a/docs/content/C1 Prompt Engineering for Developer/6. 文本转换 Transforming.ipynb b/docs/content/C1 Prompt Engineering for Developer/6. 文本转换 Transforming.ipynb index baa9c47..219459a 100644 --- a/docs/content/C1 Prompt Engineering for Developer/6. 文本转换 Transforming.ipynb +++ b/docs/content/C1 Prompt Engineering for Developer/6. 文本转换 Transforming.ipynb @@ -9,7 +9,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "2fac57c2", "metadata": {}, @@ -22,7 +21,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "bf3733d4", "metadata": {}, @@ -35,7 +33,7 @@ "id": "06ba53ba", "metadata": {}, "source": [ - "文本翻译是大语言模型的典型应用场景之一。相比于传统统计机器翻译系统,大语言模型翻译更加流畅自然,还原度更高。通过在大规模高质量平行语料上进行fine-tuning训练,大语言模型可以深入学习不同语言间的词汇、语法、语义等层面的对应关系,模拟双语者的转换思维,进行意义传递的精准转换,而非简单的逐词替换。\n", + "文本翻译是大语言模型的典型应用场景之一。相比于传统统计机器翻译系统,大语言模型翻译更加流畅自然,还原度更高。通过在大规模高质量平行语料上进行 Fine-Tune,大语言模型可以深入学习不同语言间的词汇、语法、语义等层面的对应关系,模拟双语者的转换思维,进行意义传递的精准转换,而非简单的逐词替换。\n", "\n", "以英译汉为例,传统统计机器翻译多倾向直接替换英文词汇,语序保持英语结构,容易出现中文词汇使用不地道、语序不顺畅的现象。而大语言模型可以学习英汉两种语言的语法区别,进行动态的结构转换。同时,它还可以通过上下文理解原句意图,选择合适的中文词汇进行转换,而非生硬的字面翻译。\n", "\n", @@ -43,7 +41,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "1b418e32", "metadata": {}, @@ -79,7 +76,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "e3e922b4", "metadata": {}, @@ -111,7 +107,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "c1841354", "metadata": {}, @@ -146,7 +141,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "68723ba5", "metadata": {}, @@ -154,31 +148,6 @@ "### 1.4 同时进行语气转换" ] }, - { - "cell_type": "code", - "execution_count": 13, - "id": "a4770dcc", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Formal: ¿Le gustaría ordenar una almohada?\n", - "Informal: ¿Te gustaría ordenar una almohada?\n" - ] - } - ], - "source": [ - "prompt = f\"\"\"\n", - "Translate the following text to Spanish in both the \\\n", - "formal and informal forms: \n", - "'Would you like to order a pillow?'\n", - "\"\"\"\n", - "response = get_completion(prompt)\n", - "print(response)\n" - ] - }, { "cell_type": "code", "execution_count": 14, @@ -204,7 +173,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "b2dc4c56", "metadata": {}, @@ -213,7 +181,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "54b00aa4", "metadata": {}, @@ -297,7 +264,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "6ab558a2", "metadata": {}, @@ -306,7 +272,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "b85ae847", "metadata": {}, @@ -350,7 +315,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "98df9009", "metadata": {}, @@ -359,14 +323,13 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "0bf9c074", "metadata": {}, "source": [ - "大语言模型如ChatGPT在不同数据格式之间转换方面表现出色。它可以轻松实现JSON到HTML、XML、Markdown等格式的相互转化。下面是一个示例,展示如何使用大语言模型**将JSON数据转换为HTML格式**:\n", + "大语言模型如 ChatGPT 在不同数据格式之间转换方面表现出色。它可以轻松实现 JSON 到 HTML、XML、Markdown 等格式的相互转化。下面是一个示例,展示如何使用大语言模型**将 JSON 数据转换为 HTML 格式**:\n", "\n", - "假设我们有一个JSON数据,包含餐厅员工的姓名和邮箱信息。现在我们需要将这个JSON转换为HTML表格格式,以便在网页中展示。在这个案例中,我们就可以使用大语言模型,直接输入JSON数据,并给出需要转换为HTML表格的要求。语言模型会自动解析JSON结构,并以HTML表格形式输出,完成格式转换的任务。\n", + "假设我们有一个 JSON 数据,包含餐厅员工的姓名和邮箱信息。现在我们需要将这个 JSON 转换为 HTML 表格格式,以便在网页中展示。在这个案例中,我们就可以使用大语言模型,直接输入JSON 数据,并给出需要转换为 HTML 表格的要求。语言模型会自动解析 JSON 结构,并以 HTML 表格形式输出,完成格式转换的任务。\n", "\n", "利用大语言模型强大的格式转换能力,我们可以快速实现各种结构化数据之间的相互转化,大大简化开发流程。掌握这一转换技巧将有助于读者**更高效地处理结构化数据**。\n" ] @@ -429,6 +392,14 @@ "print(response)" ] }, + { + "cell_type": "markdown", + "id": "99c8114e", + "metadata": {}, + "source": [ + "将上述 HTML 代码展示出来如下:" + ] + }, { "cell_type": "code", "execution_count": 11, @@ -476,7 +447,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "29b7167b", "metadata": {}, @@ -485,7 +455,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "22776140", "metadata": {}, @@ -552,7 +521,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "538181e0", "metadata": {}, @@ -602,7 +570,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "2e2d1f6a", "metadata": {}, @@ -657,7 +624,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "3ee5d487", "metadata": {}, @@ -1343,7 +1309,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -1357,7 +1323,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.11" + "version": "3.8.10" }, "latex_envs": { "LaTeX_envs_menu_present": true, diff --git a/docs/content/C1 Prompt Engineering for Developer/7. 文本扩展 Expanding.ipynb b/docs/content/C1 Prompt Engineering for Developer/7. 文本扩展 Expanding.ipynb index 6ed2d37..12c83ed 100644 --- a/docs/content/C1 Prompt Engineering for Developer/7. 文本扩展 Expanding.ipynb +++ b/docs/content/C1 Prompt Engineering for Developer/7. 文本扩展 Expanding.ipynb @@ -13,11 +13,9 @@ "source": [ "**文本扩展**是大语言模型的一个重要应用方向,它可以输入简短文本,生成更加丰富的长文。这为创作提供了强大支持,但也可能被滥用。因此开发者在使用时,必须谨记社会责任,避免生成有害内容。\n", "\n", - "在本章中,我们将学习*基于OpenAI API实现一个客户邮件自动生成的示例*,用于*根据客户反馈优化客服邮件*。这里还会介绍“温度”(temperature)这一超参数,它可以**控制文本生成的多样性**。\n", + "在本章中,我们将学习*基于 OpenAI API 实现一个客户邮件自动生成的示例*,用于*根据客户反馈优化客服邮件*。这里还会介绍“温度”(temperature)这一超参数,它可以**控制文本生成的多样性**。\n", "\n", - "需要注意,扩展功能只应用来辅助人类创作,而非大规模自动生成内容。开发者应审慎使用,避免产生负面影响。只有以负责任和有益的方式应用语言模型,才能发挥其最大价值。\n", - "\n", - "相信践行社会责任的开发者可以利用语言模型的扩展功能,开发出真正造福人类的创新应用。\n" + "需要注意,扩展功能只应用来辅助人类创作,而非大规模自动生成内容。开发者应审慎使用,避免产生负面影响。只有以负责任和有益的方式应用语言模型,才能发挥其最大价值。相信践行社会责任的开发者可以利用语言模型的扩展功能,开发出真正造福人类的创新应用。\n" ] }, { @@ -33,7 +31,7 @@ "source": [ "在这个客户邮件自动生成的示例中,我们**将根据客户的评价和其中的情感倾向,使用大语言模型针对性地生成回复邮件**。\n", "\n", - "具体来说,我们先输入客户的评论文本和对应的情感分析结果(正面或者负面)。然后构造一个Prompt,要求大语言模型基于这些信息来生成一封定制的回复电子邮件。\n", + "具体来说,我们先输入客户的评论文本和对应的情感分析结果(正面或者负面)。然后构造一个 Prompt,要求大语言模型基于这些信息来生成一封定制的回复电子邮件。\n", "\n", "下面先给出一个实例,包括一条客户评价和这个评价表达的情感。这为后续的语言模型生成回复邮件提供了关键输入信息。通过输入客户反馈的具体内容和情感态度,语言模型可以生成针对这个特定客户、考虑其具体情感因素的个性化回复。这种**针对个体客户特点的邮件生成方式,将大大提升客户满意度**。" ] @@ -129,7 +127,7 @@ "source": [ "## 二、引入温度系数\n", "\n", - "大语言模型中的 “温度”(temperature) 参数可以控制生成文本的随机性和多样性。temperature的值越大,语言模型输出的多样性越大;temperature的值越小,输出越倾向高概率的文本。\n", + "大语言模型中的 “温度”(temperature) 参数可以控制生成文本的随机性和多样性。temperature 的值越大,语言模型输出的多样性越大;temperature 的值越小,输出越倾向高概率的文本。\n", "\n", "举个例子,在某一上下文中,语言模型可能认为“比萨”是接下来最可能的词,其次是“寿司”和“塔可”。若 temperature 为0,则每次都会生成“比萨”;而当 temperature 越接近 1 时,生成结果是“寿司”或“塔可”的可能性越大,使文本更加多样。\n", "\n", @@ -192,6 +190,13 @@ "print(response)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "第二次运行输出结果会发生变化:" + ] + }, { "cell_type": "code", "execution_count": 5, @@ -241,9 +246,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**温度(temperature)参数可以控制语言模型生成文本的随机性**。温度为0时,每次使用同样的Prompt,得到的结果总是一致的。而在上面的样例中,当温度设为0.7时,则每次执行都会生成不同的文本。\n", + "**温度(temperature)参数可以控制语言模型生成文本的随机性**。温度为0时,每次使用同样的 Prompt,得到的结果总是一致的。而在上面的样例中,当温度设为0.7时,则每次执行都会生成不同的文本。\n", "\n", - "所以,这次的结果与之前得到的邮件就不太一样了。再次执行同样的Prompt,邮件内容还会有变化。因此。我建议读者朋友们可以自己尝试不同的 temperature ,来观察输出的变化。总体来说,temperature 越高,语言模型的文本生成就越具有随机性。可以想象,高温度下,语言模型就像心绪更加活跃,但也可能更有创造力。\n", + "所以,这次的结果与之前得到的邮件就不太一样了。再次执行同样的 Prompt,邮件内容还会有变化。因此。我建议读者朋友们可以自己尝试不同的 temperature ,来观察输出的变化。总体来说,temperature 越高,语言模型的文本生成就越具有随机性。可以想象,高温度下,语言模型就像心绪更加活跃,但也可能更有创造力。\n", "\n", "适当调节这个超参数,可以让语言模型的生成更富有多样性,也更能意外惊喜。希望这些经验可以帮助你在不同场景中找到最合适的温度设置。\n" ] @@ -411,7 +416,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -425,7 +430,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.11" + "version": "3.8.10" }, "latex_envs": { "LaTeX_envs_menu_present": true, diff --git a/docs/content/C1 Prompt Engineering for Developer/8. 聊天机器人 Chatbot.ipynb b/docs/content/C1 Prompt Engineering for Developer/8. 聊天机器人 Chatbot.ipynb index 17e29ef..52689ac 100644 --- a/docs/content/C1 Prompt Engineering for Developer/8. 聊天机器人 Chatbot.ipynb +++ b/docs/content/C1 Prompt Engineering for Developer/8. 聊天机器人 Chatbot.ipynb @@ -15,14 +15,8 @@ "id": "f0bdc2c9", "metadata": {}, "source": [ - "大型语言模型带给我们的激动人心的一种可能性是,我们可以通过它构建定制的聊天机器人(Chatbot),而且只需很少的工作量。在这一章节的探索中,我们将带你了解如何利用会话形式,与具有个性化特性(或专门为特定任务或行为设计)的聊天机器人进行深度对话。" - ] - }, - { - "cell_type": "markdown", - "id": "e6fae355", - "metadata": {}, - "source": [ + "大型语言模型带给我们的激动人心的一种可能性是,我们可以通过它构建定制的聊天机器人(Chatbot),而且只需很少的工作量。在这一章节的探索中,我们将带你了解如何利用会话形式,与具有个性化特性(或专门为特定任务或行为设计)的聊天机器人进行深度对话。\n", + "\n", "像 ChatGPT 这样的聊天模型实际上是组装成以一系列消息作为输入,并返回一个模型生成的消息作为输出的。这种聊天格式原本的设计目标是简便多轮对话,但我们通过之前的学习可以知道,它对于不会涉及任何对话的**单轮任务**也同样有用。" ] }, @@ -380,6 +374,14 @@ "!pip install panel" ] }, + { + "cell_type": "markdown", + "id": "b61e475a", + "metadata": {}, + "source": [ + "如果你还没有安装 panel 库(用于可视化界面),请运行上述指令以安装该第三方库。" + ] + }, { "cell_type": "code", "execution_count": null, @@ -850,7 +852,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.11" + "version": "3.8.10" }, "latex_envs": { "LaTeX_envs_menu_present": true,