diff --git a/docs/content/C1 Prompt Engineering for Developer/4. 文本概括 Summarizing.ipynb b/docs/content/C1 Prompt Engineering for Developer/4. 文本概括 Summarizing.ipynb index 5b877cf..cef2062 100644 --- a/docs/content/C1 Prompt Engineering for Developer/4. 文本概括 Summarizing.ipynb +++ b/docs/content/C1 Prompt Engineering for Developer/4. 文本概括 Summarizing.ipynb @@ -1 +1 @@ -{"cells":[{"attachments":{},"cell_type":"markdown","id":"b58204ea","metadata":{},"source":["# 第四章 文本概括\n"]},{"attachments":{},"cell_type":"markdown","id":"12fa9ea4","metadata":{},"source":["当今世界上文本信息浩如烟海,我们很难拥有足够的时间去阅读所有想了解的东西。但欣喜的是,目前LLM在文本概括任务上展现了强大的水准,也已经有不少团队将概括功能实现在多种应用中。\n","\n","本章节将介绍如何使用编程的方式,调用API接口来实现“文本概括”功能。"]},{"attachments":{},"cell_type":"markdown","id":"9cca835b","metadata":{},"source":["## 一、单一文本概括"]},{"attachments":{},"cell_type":"markdown","id":"0c1e1b92","metadata":{},"source":["以商品评论的总结任务为例:对于电商平台来说,网站上往往存在着海量的商品评论,这些评论反映了所有客户的想法。如果我们拥有一个工具去概括这些海量、冗长的评论,便能够快速地浏览更多评论,洞悉客户的偏好,从而指导平台与商家提供更优质的服务。"]},{"attachments":{},"cell_type":"markdown","id":"aad5bd2a","metadata":{},"source":["**输入文本**"]},{"cell_type":"code","execution_count":2,"id":"43b5dd25","metadata":{},"outputs":[],"source":["prod_review = \"\"\"\n","这个熊猫公仔是我给女儿的生日礼物,她很喜欢,去哪都带着。\n","公仔很软,超级可爱,面部表情也很和善。但是相比于价钱来说,\n","它有点小,我感觉在别的地方用同样的价钱能买到更大的。\n","快递比预期提前了一天到货,所以在送给女儿之前,我自己玩了会。\n","\"\"\""]},{"attachments":{},"cell_type":"markdown","id":"662c9cd2","metadata":{},"source":["### 1.1 限制输出文本长度"]},{"attachments":{},"cell_type":"markdown","id":"a6d10814","metadata":{},"source":["我们尝试限制文本长度为最多30词。"]},{"cell_type":"code","execution_count":5,"id":"bf4b39f9","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["熊猫公仔软可爱,女儿喜欢,但有点小。快递提前一天到货。\n"]}],"source":["from tool import get_completion\n","\n","prompt = f\"\"\"\n","您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n","\n","请对三个反引号之间的评论文本进行概括,最多30个词汇。\n","\n","评论: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"attachments":{},"cell_type":"markdown","id":"e9ab145e","metadata":{},"source":["### 1.2 设置关键角度侧重"]},{"attachments":{},"cell_type":"markdown","id":"f84d0123","metadata":{},"source":["有时,针对不同的业务,我们对文本的侧重会有所不同。例如对于商品评论文本,物流会更关心运输时效,商家更加关心价格与商品质量,平台更关心整体服务体验。\n","\n","我们可以通过增加Prompt提示,来体现对于某个特定角度的侧重。"]},{"attachments":{},"cell_type":"markdown","id":"d6f8509a","metadata":{},"source":["#### 1.2.1 侧重于快递服务"]},{"cell_type":"code","execution_count":7,"id":"80636c3e","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["快递提前到货,公仔可爱但有点小。\n"]}],"source":["prompt = f\"\"\"\n","您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n","\n","请对三个反引号之间的评论文本进行概括,最多30个词汇,并且侧重在快递服务上。\n","\n","评论: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"attachments":{},"cell_type":"markdown","id":"76c97fea","metadata":{},"source":["可以看到,输出结果以“快递提前到货”开头,体现了对于快递效率的侧重。"]},{"attachments":{},"cell_type":"markdown","id":"83275907","metadata":{},"source":["#### 1.2.2 侧重于价格与质量"]},{"cell_type":"code","execution_count":8,"id":"728d6c57","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["可爱的熊猫公仔,质量好但有点小,价格稍高。快递提前到货。\n"]}],"source":["prompt = f\"\"\"\n","您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n","\n","请对三个反引号之间的评论文本进行概括,最多30个词汇,并且侧重在产品价格和质量上。\n","\n","评论: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"attachments":{},"cell_type":"markdown","id":"972dbb1b","metadata":{},"source":["可以看到,输出结果以“可爱的熊猫公仔,质量好但有点小,价格稍高”开头,体现了对于产品价格与质量的侧重。"]},{"attachments":{},"cell_type":"markdown","id":"b3ed53d2","metadata":{},"source":["### 1.3 关键信息提取"]},{"attachments":{},"cell_type":"markdown","id":"ba6f5c25","metadata":{},"source":["在1.2节中,虽然我们通过添加关键角度侧重的 Prompt ,使得文本摘要更侧重于某一特定方面,但是可以发现,结果中也会保留一些其他信息,如偏重价格与质量角度的概括中仍保留了“快递提前到货”的信息。如果我们只想要提取某一角度的信息,并过滤掉其他所有信息,则可以要求 LLM 进行“文本提取( Extract )”而非“概括( Summarize )”"]},{"cell_type":"code","execution_count":9,"id":"c845ccab","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["产品运输相关的信息:快递提前一天到货。\n"]}],"source":["prompt = f\"\"\"\n","您的任务是从电子商务网站上的产品评论中提取相关信息。\n","\n","请从以下三个反引号之间的评论文本中提取产品运输相关的信息,最多30个词汇。\n","\n","评论: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"attachments":{},"cell_type":"markdown","id":"50498a2b","metadata":{},"source":["## 二、同时概括多条文本"]},{"attachments":{},"cell_type":"markdown","id":"a291541a","metadata":{},"source":["在实际的工作流中,我们往往有许许多多的评论文本,以下示例将多条用户评价放进列表,并利用 ```for``` 循环,使用文本概括(Summarize)提示词,将评价概括至小于 20 词,并按顺序打印。当然,在实际生产中,对于不同规模的评论文本,除了使用 ```for``` 循环以外,还可能需要考虑整合评论、分布式等方法提升运算效率。您可以搭建主控面板,来总结大量用户评论,来方便您或他人快速浏览,还可以点击查看原评论。这样您能高效掌握顾客的所有想法。"]},{"cell_type":"code","execution_count":3,"id":"ef606961","metadata":{},"outputs":[],"source":["review_1 = prod_review\n","\n","# 一盏落地灯的评论\n","review_2 = \"\"\"\n","我需要一盏漂亮的卧室灯,这款灯不仅具备额外的储物功能,价格也并不算太高。\n","收货速度非常快,仅用了两天的时间就送到了。\n","不过,在运输过程中,灯的拉线出了问题,幸好,公司很乐意寄送了一根全新的灯线。\n","新的灯线也很快就送到手了,只用了几天的时间。\n","装配非常容易。然而,之后我发现有一个零件丢失了,于是我联系了客服,他们迅速地给我寄来了缺失的零件!\n","对我来说,这是一家非常关心客户和产品的优秀公司。\n","\"\"\"\n","\n","# 一把电动牙刷的评论\n","review_3 = \"\"\"\n","我的牙科卫生员推荐了电动牙刷,所以我就买了这款。\n","到目前为止,电池续航表现相当不错。\n","初次充电后,我在第一周一直将充电器插着,为的是对电池进行条件养护。\n","过去的3周里,我每天早晚都使用它刷牙,但电池依然维持着原来的充电状态。\n","不过,牙刷头太小了。我见过比这个牙刷头还大的婴儿牙刷。\n","我希望牙刷头更大一些,带有不同长度的刷毛,\n","这样可以更好地清洁牙齿间的空隙,但这款牙刷做不到。\n","总的来说,如果你能以50美元左右的价格购买到这款牙刷,那是一个不错的交易。\n","制造商的替换刷头相当昂贵,但你可以购买价格更为合理的通用刷头。\n","这款牙刷让我感觉就像每天都去了一次牙医,我的牙齿感觉非常干净!\n","\"\"\"\n","\n","# 一台搅拌机的评论\n","review_4 = \"\"\"\n","在11月份期间,这个17件套装还在季节性促销中,售价约为49美元,打了五折左右。\n","可是由于某种原因(我们可以称之为价格上涨),到了12月的第二周,所有的价格都上涨了,\n","同样的套装价格涨到了70-89美元不等。而11件套装的价格也从之前的29美元上涨了约10美元。\n","看起来还算不错,但是如果你仔细看底座,刀片锁定的部分看起来没有前几年版本的那么漂亮。\n","然而,我打算非常小心地使用它\n","(例如,我会先在搅拌机中研磨豆类、冰块、大米等坚硬的食物,然后再将它们研磨成所需的粒度,\n","接着切换到打蛋器刀片以获得更细的面粉,如果我需要制作更细腻/少果肉的食物)。\n","在制作冰沙时,我会将要使用的水果和蔬菜切成细小块并冷冻\n","(如果使用菠菜,我会先轻微煮熟菠菜,然后冷冻,直到使用时准备食用。\n","如果要制作冰糕,我会使用一个小到中号的食物加工器),这样你就可以避免添加过多的冰块。\n","大约一年后,电机开始发出奇怪的声音。我打电话给客户服务,但保修期已经过期了,\n","所以我只好购买了另一台。值得注意的是,这类产品的整体质量在过去几年里有所下降\n",",所以他们在一定程度上依靠品牌认知和消费者忠诚来维持销售。在大约两天内,我收到了新的搅拌机。\n","\"\"\"\n","\n","reviews = [review_1, review_2, review_3, review_4]\n"]},{"cell_type":"code","execution_count":4,"id":"eb878522","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["评论1: 熊猫公仔是生日礼物,女儿喜欢,软可爱,面部表情和善。价钱有点小,快递提前一天到货。 \n","\n","评论2: 漂亮卧室灯,储物功能,快速送达,灯线问题,快速解决,容易装配,关心客户和产品。 \n","\n","评论3: 这款电动牙刷电池续航好,但牙刷头太小,价格合理,清洁效果好。 \n","\n","评论4: 该评论提到了一个17件套装的产品,在11月份有折扣销售,但在12月份价格上涨。评论者提到了产品的外观和使用方法,并提到了产品质量下降的问题。最后,评论者提到他们购买了另一台搅拌机。 \n","\n"]}],"source":["for i in range(len(reviews)):\n"," prompt = f\"\"\"\n"," 你的任务是从电子商务网站上的产品评论中提取相关信息。\n","\n"," 请对三个反引号之间的评论文本进行概括,最多20个词汇。\n","\n"," 评论文本: ```{reviews[i]}```\n"," \"\"\"\n"," response = get_completion(prompt)\n"," print(f\"评论{i+1}: \", response, \"\\n\")\n"]},{"cell_type":"markdown","id":"f118c0cc","metadata":{},"source":["## 三、英文版"]},{"cell_type":"markdown","id":"a08635df","metadata":{},"source":["**1.1 单一文本概括**"]},{"cell_type":"code","execution_count":12,"id":"e55327d5","metadata":{},"outputs":[],"source":["prod_review = \"\"\"\n","Got this panda plush toy for my daughter's birthday, \\\n","who loves it and takes it everywhere. It's soft and \\ \n","super cute, and its face has a friendly look. It's \\ \n","a bit small for what I paid though. I think there \\ \n","might be other options that are bigger for the \\ \n","same price. It arrived a day earlier than expected, \\ \n","so I got to play with it myself before I gave it \\ \n","to her.\n","\"\"\""]},{"cell_type":"code","execution_count":13,"id":"30c2ef51","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["This panda plush toy is loved by the reviewer's daughter, but they feel it is a bit small for the price.\n"]}],"source":["prompt = f\"\"\"\n","Your task is to generate a short summary of a product \\\n","review from an ecommerce site. \n","\n","Summarize the review below, delimited by triple \n","backticks, in at most 30 words. \n","\n","Review: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"9bdcfc1b","metadata":{},"source":["**1.2 设置关键角度侧重**"]},{"cell_type":"markdown","id":"5dd0534f","metadata":{},"source":["1.2.1 侧重于快递服务"]},{"cell_type":"code","execution_count":14,"id":"b354cc3f","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["The customer is happy with the product but suggests offering larger options for the same price. They were pleased with the early delivery.\n"]}],"source":["prompt = f\"\"\"\n","Your task is to generate a short summary of a product \\\n","review from an ecommerce site to give feedback to the \\\n","Shipping deparmtment. \n","\n","Summarize the review below, delimited by triple \n","backticks, in at most 30 words, and focusing on any aspects \\\n","that mention shipping and delivery of the product. \n","\n","Review: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"af6aaf3a","metadata":{},"source":["1.2.2 侧重于价格和质量"]},{"cell_type":"code","execution_count":15,"id":"1b5358fd","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["The customer loves the panda plush toy for its softness and cuteness, but feels it is overpriced compared to other options available.\n"]}],"source":["prompt = f\"\"\"\n","Your task is to generate a short summary of a product \\\n","review from an ecommerce site to give feedback to the \\\n","pricing deparmtment, responsible for determining the \\\n","price of the product. \n","\n","Summarize the review below, delimited by triple \n","backticks, in at most 30 words, and focusing on any aspects \\\n","that are relevant to the price and perceived value. \n","\n","Review: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"0f582677","metadata":{},"source":["**1.3 关键信息提取**"]},{"cell_type":"code","execution_count":16,"id":"32c87014","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["The shipping department should take note that the product arrived a day earlier than expected.\n"]}],"source":["prompt = f\"\"\"\n","Your task is to extract relevant information from \\ \n","a product review from an ecommerce site to give \\\n","feedback to the Shipping department. \n","\n","From the review below, delimited by triple quotes \\\n","extract the information relevant to shipping and \\ \n","delivery. Limit to 30 words. \n","\n","Review: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"2043d100","metadata":{},"source":["**2.1 同时概括多条文本**"]},{"cell_type":"code","execution_count":17,"id":"cff48486","metadata":{},"outputs":[],"source":["review_1 = prod_review \n","\n","# review for a standing lamp\n","review_2 = \"\"\"\n","Needed a nice lamp for my bedroom, and this one \\\n","had additional storage and not too high of a price \\\n","point. Got it fast - arrived in 2 days. The string \\\n","to the lamp broke during the transit and the company \\\n","happily sent over a new one. Came within a few days \\\n","as well. It was easy to put together. Then I had a \\\n","missing part, so I contacted their support and they \\\n","very quickly got me the missing piece! Seems to me \\\n","to be a great company that cares about their customers \\\n","and products. \n","\"\"\"\n","\n","# review for an electric toothbrush\n","review_3 = \"\"\"\n","My dental hygienist recommended an electric toothbrush, \\\n","which is why I got this. The battery life seems to be \\\n","pretty impressive so far. After initial charging and \\\n","leaving the charger plugged in for the first week to \\\n","condition the battery, I've unplugged the charger and \\\n","been using it for twice daily brushing for the last \\\n","3 weeks all on the same charge. But the toothbrush head \\\n","is too small. I’ve seen baby toothbrushes bigger than \\\n","this one. I wish the head was bigger with different \\\n","length bristles to get between teeth better because \\\n","this one doesn’t. Overall if you can get this one \\\n","around the $50 mark, it's a good deal. The manufactuer's \\\n","replacements heads are pretty expensive, but you can \\\n","get generic ones that're more reasonably priced. This \\\n","toothbrush makes me feel like I've been to the dentist \\\n","every day. My teeth feel sparkly clean! \n","\"\"\"\n","\n","# review for a blender\n","review_4 = \"\"\"\n","So, they still had the 17 piece system on seasonal \\\n","sale for around $49 in the month of November, about \\\n","half off, but for some reason (call it price gouging) \\\n","around the second week of December the prices all went \\\n","up to about anywhere from between $70-$89 for the same \\\n","system. And the 11 piece system went up around $10 or \\\n","so in price also from the earlier sale price of $29. \\\n","So it looks okay, but if you look at the base, the part \\\n","where the blade locks into place doesn’t look as good \\\n","as in previous editions from a few years ago, but I \\\n","plan to be very gentle with it (example, I crush \\\n","very hard items like beans, ice, rice, etc. in the \\\n","blender first then pulverize them in the serving size \\\n","I want in the blender then switch to the whipping \\\n","blade for a finer flour, and use the cross cutting blade \\\n","first when making smoothies, then use the flat blade \\\n","if I need them finer/less pulpy). Special tip when making \\\n","smoothies, finely cut and freeze the fruits and \\\n","vegetables (if using spinach-lightly stew soften the \\\n","spinach then freeze until ready for use-and if making \\\n","sorbet, use a small to medium sized food processor) \\\n","that you plan to use that way you can avoid adding so \\\n","much ice if at all-when making your smoothie. \\\n","After about a year, the motor was making a funny noise. \\\n","I called customer service but the warranty expired \\\n","already, so I had to buy another one. FYI: The overall \\\n","quality has gone done in these types of products, so \\\n","they are kind of counting on brand recognition and \\\n","consumer loyalty to maintain sales. Got it in about \\\n","two days.\n","\"\"\"\n","\n","reviews = [review_1, review_2, review_3, review_4]"]},{"cell_type":"code","execution_count":18,"id":"3f61080b","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["0 Soft and cute panda plush toy loved by daughter, but small for the price. Arrived early. \n","\n","1 Great lamp with storage, fast delivery, excellent customer service, and easy assembly. Highly recommended. \n","\n","2 Impressive battery life, but toothbrush head is too small. Good deal if bought around $50. \n","\n","3 The reviewer found the price increase after the sale disappointing and noticed a decrease in quality over time. \n","\n"]}],"source":["for i in range(len(reviews)):\n"," prompt = f\"\"\"\n"," Your task is to generate a short summary of a product \\\n"," review from an ecommerce site. \n","\n"," Summarize the review below, delimited by triple \\\n"," backticks in at most 20 words. \n","\n"," Review: ```{reviews[i]}```\n"," \"\"\"\n"," response = get_completion(prompt)\n"," print(i, response, \"\\n\")"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.11"},"latex_envs":{"LaTeX_envs_menu_present":true,"autoclose":false,"autocomplete":true,"bibliofile":"biblio.bib","cite_by":"apalike","current_citInitial":1,"eqLabelWithNumbers":true,"eqNumInitial":1,"hotkeys":{"equation":"Ctrl-E","itemize":"Ctrl-I"},"labels_anchors":false,"latex_user_defs":false,"report_style_numbering":false,"user_envs_cfg":false},"toc":{"base_numbering":1,"nav_menu":{},"number_sections":true,"sideBar":true,"skip_h1_title":false,"title_cell":"Table of Contents","title_sidebar":"Contents","toc_cell":false,"toc_position":{},"toc_section_display":true,"toc_window_display":true}},"nbformat":4,"nbformat_minor":5} +{"cells":[{"attachments":{},"cell_type":"markdown","id":"b58204ea","metadata":{},"source":["# 第四章 文本概括\n"]},{"attachments":{},"cell_type":"markdown","id":"12fa9ea4","metadata":{},"source":["在繁忙的信息时代,小明是一名热心的开发者,面临着海量的文本信息处理的挑战。他需要通过研究无数的文献资料来为他的项目找到关键的信息,但是时间却远远不够。在他焦头烂额之际,他发现了大型语言模型(LLM)的文本摘要功能。\n","\n","这个功能对小明来说如同灯塔一样,照亮了他处理信息海洋的道路。LLM的强大能力在于它可以将复杂的文本信息简化,提炼出关键的观点,这对于他来说无疑是巨大的帮助。他不再需要花费大量的时间去阅读所有的文档,只需要用LLM将它们概括,就可以快速获取到他所需要的信息。\n","\n","通过编程调用API接口,小明成功实现了这个文本摘要的功能。他感叹道:“这简直就像一道魔法,将无尽的信息海洋变成了清晰的信息源泉。”小明的经历,展现了LLM文本摘要功能的巨大优势:**节省时间**,**提高效率**,以及**精准获取信息**。这就是我们本章要介绍的内容,让我们一起来探索如何利用编程和调用API接口,掌握这个强大的工具。"]},{"attachments":{},"cell_type":"markdown","id":"9cca835b","metadata":{},"source":["## 一、单一文本概括"]},{"attachments":{},"cell_type":"markdown","id":"0c1e1b92","metadata":{},"source":["以商品评论的总结任务为例:对于电商平台来说,网站上往往存在着海量的商品评论,这些评论反映了所有客户的想法。如果我们拥有一个工具去概括这些海量、冗长的评论,便能够快速地浏览更多评论,洞悉客户的偏好,从而指导平台与商家提供更优质的服务。"]},{"attachments":{},"cell_type":"markdown","id":"aad5bd2a","metadata":{},"source":["**输入文本**"]},{"cell_type":"markdown","id":"11c360ae","metadata":{},"source":["这是一段在线商品评价,可能来自于一个在线购物平台,例如亚马逊、淘宝、京东等。评价者为一款熊猫公仔进行了点评,评价内容包括商品的质量、大小、价格和物流速度等因素,以及他的女儿对该商品的喜爱程度。"]},{"cell_type":"code","execution_count":2,"id":"43b5dd25","metadata":{},"outputs":[],"source":["prod_review = \"\"\"\n","这个熊猫公仔是我给女儿的生日礼物,她很喜欢,去哪都带着。\n","公仔很软,超级可爱,面部表情也很和善。但是相比于价钱来说,\n","它有点小,我感觉在别的地方用同样的价钱能买到更大的。\n","快递比预期提前了一天到货,所以在送给女儿之前,我自己玩了会。\n","\"\"\""]},{"attachments":{},"cell_type":"markdown","id":"662c9cd2","metadata":{},"source":["### 1.1 限制输出文本长度"]},{"attachments":{},"cell_type":"markdown","id":"a6d10814","metadata":{},"source":["我们尝试将文本的长度限制在30个字以内。"]},{"cell_type":"code","execution_count":5,"id":"bf4b39f9","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["熊猫公仔软可爱,女儿喜欢,但有点小。快递提前一天到货。\n"]}],"source":["from tool import get_completion\n","\n","prompt = f\"\"\"\n","您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n","\n","请对三个反引号之间的评论文本进行概括,最多30个字。\n","\n","评论: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"fce32884","metadata":{},"source":["我们可以看到语言模型给了我们一个符合要求的结果。\n","\n","注意:在上一节中我们提到了语言模型在计算和判断文本长度时依赖于分词器,而分词器在字符统计方面不具备完美精度。"]},{"attachments":{},"cell_type":"markdown","id":"e9ab145e","metadata":{},"source":["### 1.2 设置关键角度侧重"]},{"attachments":{},"cell_type":"markdown","id":"f84d0123","metadata":{},"source":["在某些情况下,我们会针对不同的业务场景对文本的侧重会有所不同。例如,在商品评论文本中,物流部门可能更专注于运输的时效性,商家则更关注价格和商品质量,而平台则更看重整体的用户体验。\n","\n","我们可以通过增强输入提示(Prompt),来强调我们对某一特定视角的重视。"]},{"attachments":{},"cell_type":"markdown","id":"d6f8509a","metadata":{},"source":["#### 1.2.1 侧重于快递服务"]},{"cell_type":"code","execution_count":7,"id":"80636c3e","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["快递提前到货,公仔可爱但有点小。\n"]}],"source":["prompt = f\"\"\"\n","您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n","\n","请对三个反引号之间的评论文本进行概括,最多30个字,并且侧重在快递服务上。\n","\n","评论: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"attachments":{},"cell_type":"markdown","id":"76c97fea","metadata":{},"source":["通过输出结果,我们可以看到,文本以“快递提前到货”开头,体现了对于快递效率的侧重。"]},{"attachments":{},"cell_type":"markdown","id":"83275907","metadata":{},"source":["#### 1.2.2 侧重于价格与质量"]},{"cell_type":"code","execution_count":8,"id":"728d6c57","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["可爱的熊猫公仔,质量好但有点小,价格稍高。快递提前到货。\n"]}],"source":["prompt = f\"\"\"\n","您的任务是从电子商务网站上生成一个产品评论的简短摘要。\n","\n","请对三个反引号之间的评论文本进行概括,最多30个词汇,并且侧重在产品价格和质量上。\n","\n","评论: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"attachments":{},"cell_type":"markdown","id":"972dbb1b","metadata":{},"source":["通过输出的结果,我们可以看到,文本以“可爱的熊猫公仔,质量好但有点小,价格稍高”开头,体现了对于产品价格与质量的侧重。"]},{"attachments":{},"cell_type":"markdown","id":"b3ed53d2","metadata":{},"source":["### 1.3 关键信息提取"]},{"attachments":{},"cell_type":"markdown","id":"ba6f5c25","metadata":{},"source":["在1.2节中,虽然我们通过添加关键角度侧重的 Prompt ,确实让文本摘要更侧重于某一特定方面,然而,我们可以发现,在结果中也会保留一些其他信息,比如偏重价格与质量角度的概括中仍保留了“快递提前到货”的信息。如果我们只想要提取某一角度的信息,并过滤掉其他所有信息,则可以要求 LLM 进行“**文本提取( Extract )**”而非“概括( Summarize )”。"]},{"cell_type":"markdown","id":"da39760c","metadata":{},"source":["下面让我们来一起来对文本进行提取信息吧!"]},{"cell_type":"code","execution_count":9,"id":"c845ccab","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["产品运输相关的信息:快递提前一天到货。\n"]}],"source":["prompt = f\"\"\"\n","您的任务是从电子商务网站上的产品评论中提取相关信息。\n","\n","请从以下三个反引号之间的评论文本中提取产品运输相关的信息,最多30个词汇。\n","\n","评论: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"attachments":{},"cell_type":"markdown","id":"50498a2b","metadata":{},"source":["## 二、同时概括多条文本"]},{"attachments":{},"cell_type":"markdown","id":"a291541a","metadata":{},"source":["在实际的工作流中,我们往往要处理大量的评论文本,下面的示例将多条用户评价集合在一个列表中,并利用 ```for``` 循环和文本概括(Summarize)提示词,将评价概括至小于 20 个词以下,并按顺序打印。当然,在实际生产中,对于不同规模的评论文本,除了使用 ```for``` 循环以外,还可能需要考虑整合评论、分布式等方法提升运算效率。您可以搭建主控面板,来总结大量用户评论,以及方便您或他人快速浏览,还可以点击查看原评论。这样,您就能高效掌握顾客的所有想法。"]},{"cell_type":"code","execution_count":3,"id":"ef606961","metadata":{},"outputs":[],"source":["review_1 = prod_review\n","\n","# 一盏落地灯的评论\n","review_2 = \"\"\"\n","我需要一盏漂亮的卧室灯,这款灯不仅具备额外的储物功能,价格也并不算太高。\n","收货速度非常快,仅用了两天的时间就送到了。\n","不过,在运输过程中,灯的拉线出了问题,幸好,公司很乐意寄送了一根全新的灯线。\n","新的灯线也很快就送到手了,只用了几天的时间。\n","装配非常容易。然而,之后我发现有一个零件丢失了,于是我联系了客服,他们迅速地给我寄来了缺失的零件!\n","对我来说,这是一家非常关心客户和产品的优秀公司。\n","\"\"\"\n","\n","# 一把电动牙刷的评论\n","review_3 = \"\"\"\n","我的牙科卫生员推荐了电动牙刷,所以我就买了这款。\n","到目前为止,电池续航表现相当不错。\n","初次充电后,我在第一周一直将充电器插着,为的是对电池进行条件养护。\n","过去的3周里,我每天早晚都使用它刷牙,但电池依然维持着原来的充电状态。\n","不过,牙刷头太小了。我见过比这个牙刷头还大的婴儿牙刷。\n","我希望牙刷头更大一些,带有不同长度的刷毛,\n","这样可以更好地清洁牙齿间的空隙,但这款牙刷做不到。\n","总的来说,如果你能以50美元左右的价格购买到这款牙刷,那是一个不错的交易。\n","制造商的替换刷头相当昂贵,但你可以购买价格更为合理的通用刷头。\n","这款牙刷让我感觉就像每天都去了一次牙医,我的牙齿感觉非常干净!\n","\"\"\"\n","\n","# 一台搅拌机的评论\n","review_4 = \"\"\"\n","在11月份期间,这个17件套装还在季节性促销中,售价约为49美元,打了五折左右。\n","可是由于某种原因(我们可以称之为价格上涨),到了12月的第二周,所有的价格都上涨了,\n","同样的套装价格涨到了70-89美元不等。而11件套装的价格也从之前的29美元上涨了约10美元。\n","看起来还算不错,但是如果你仔细看底座,刀片锁定的部分看起来没有前几年版本的那么漂亮。\n","然而,我打算非常小心地使用它\n","(例如,我会先在搅拌机中研磨豆类、冰块、大米等坚硬的食物,然后再将它们研磨成所需的粒度,\n","接着切换到打蛋器刀片以获得更细的面粉,如果我需要制作更细腻/少果肉的食物)。\n","在制作冰沙时,我会将要使用的水果和蔬菜切成细小块并冷冻\n","(如果使用菠菜,我会先轻微煮熟菠菜,然后冷冻,直到使用时准备食用。\n","如果要制作冰糕,我会使用一个小到中号的食物加工器),这样你就可以避免添加过多的冰块。\n","大约一年后,电机开始发出奇怪的声音。我打电话给客户服务,但保修期已经过期了,\n","所以我只好购买了另一台。值得注意的是,这类产品的整体质量在过去几年里有所下降\n",",所以他们在一定程度上依靠品牌认知和消费者忠诚来维持销售。在大约两天内,我收到了新的搅拌机。\n","\"\"\"\n","\n","reviews = [review_1, review_2, review_3, review_4]\n"]},{"cell_type":"code","execution_count":4,"id":"eb878522","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["评论1: 熊猫公仔是生日礼物,女儿喜欢,软可爱,面部表情和善。价钱有点小,快递提前一天到货。 \n","\n","评论2: 漂亮卧室灯,储物功能,快速送达,灯线问题,快速解决,容易装配,关心客户和产品。 \n","\n","评论3: 这款电动牙刷电池续航好,但牙刷头太小,价格合理,清洁效果好。 \n","\n","评论4: 该评论提到了一个17件套装的产品,在11月份有折扣销售,但在12月份价格上涨。评论者提到了产品的外观和使用方法,并提到了产品质量下降的问题。最后,评论者提到他们购买了另一台搅拌机。 \n","\n"]}],"source":["for i in range(len(reviews)):\n"," prompt = f\"\"\"\n"," 你的任务是从电子商务网站上的产品评论中提取相关信息。\n","\n"," 请对三个反引号之间的评论文本进行概括,最多20个词汇。\n","\n"," 评论文本: ```{reviews[i]}```\n"," \"\"\"\n"," response = get_completion(prompt)\n"," print(f\"评论{i+1}: \", response, \"\\n\")\n"]},{"cell_type":"markdown","id":"f118c0cc","metadata":{},"source":["## 三、英文版"]},{"cell_type":"markdown","id":"a08635df","metadata":{},"source":["**1.1 单一文本概括**"]},{"cell_type":"code","execution_count":12,"id":"e55327d5","metadata":{},"outputs":[],"source":["prod_review = \"\"\"\n","Got this panda plush toy for my daughter's birthday, \\\n","who loves it and takes it everywhere. It's soft and \\ \n","super cute, and its face has a friendly look. It's \\ \n","a bit small for what I paid though. I think there \\ \n","might be other options that are bigger for the \\ \n","same price. It arrived a day earlier than expected, \\ \n","so I got to play with it myself before I gave it \\ \n","to her.\n","\"\"\""]},{"cell_type":"code","execution_count":13,"id":"30c2ef51","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["This panda plush toy is loved by the reviewer's daughter, but they feel it is a bit small for the price.\n"]}],"source":["prompt = f\"\"\"\n","Your task is to generate a short summary of a product \\\n","review from an ecommerce site. \n","\n","Summarize the review below, delimited by triple \n","backticks, in at most 30 words. \n","\n","Review: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"9bdcfc1b","metadata":{},"source":["**1.2 设置关键角度侧重**"]},{"cell_type":"markdown","id":"5dd0534f","metadata":{},"source":["1.2.1 侧重于快递服务"]},{"cell_type":"code","execution_count":14,"id":"b354cc3f","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["The customer is happy with the product but suggests offering larger options for the same price. They were pleased with the early delivery.\n"]}],"source":["prompt = f\"\"\"\n","Your task is to generate a short summary of a product \\\n","review from an ecommerce site to give feedback to the \\\n","Shipping deparmtment. \n","\n","Summarize the review below, delimited by triple \n","backticks, in at most 30 words, and focusing on any aspects \\\n","that mention shipping and delivery of the product. \n","\n","Review: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"af6aaf3a","metadata":{},"source":["1.2.2 侧重于价格和质量"]},{"cell_type":"code","execution_count":15,"id":"1b5358fd","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["The customer loves the panda plush toy for its softness and cuteness, but feels it is overpriced compared to other options available.\n"]}],"source":["prompt = f\"\"\"\n","Your task is to generate a short summary of a product \\\n","review from an ecommerce site to give feedback to the \\\n","pricing deparmtment, responsible for determining the \\\n","price of the product. \n","\n","Summarize the review below, delimited by triple \n","backticks, in at most 30 words, and focusing on any aspects \\\n","that are relevant to the price and perceived value. \n","\n","Review: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"0f582677","metadata":{},"source":["**1.3 关键信息提取**"]},{"cell_type":"code","execution_count":16,"id":"32c87014","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["The shipping department should take note that the product arrived a day earlier than expected.\n"]}],"source":["prompt = f\"\"\"\n","Your task is to extract relevant information from \\ \n","a product review from an ecommerce site to give \\\n","feedback to the Shipping department. \n","\n","From the review below, delimited by triple quotes \\\n","extract the information relevant to shipping and \\ \n","delivery. Limit to 30 words. \n","\n","Review: ```{prod_review}```\n","\"\"\"\n","\n","response = get_completion(prompt)\n","print(response)"]},{"cell_type":"markdown","id":"2043d100","metadata":{},"source":["**2.1 同时概括多条文本**"]},{"cell_type":"code","execution_count":17,"id":"cff48486","metadata":{},"outputs":[],"source":["review_1 = prod_review \n","\n","# review for a standing lamp\n","review_2 = \"\"\"\n","Needed a nice lamp for my bedroom, and this one \\\n","had additional storage and not too high of a price \\\n","point. Got it fast - arrived in 2 days. The string \\\n","to the lamp broke during the transit and the company \\\n","happily sent over a new one. Came within a few days \\\n","as well. It was easy to put together. Then I had a \\\n","missing part, so I contacted their support and they \\\n","very quickly got me the missing piece! Seems to me \\\n","to be a great company that cares about their customers \\\n","and products. \n","\"\"\"\n","\n","# review for an electric toothbrush\n","review_3 = \"\"\"\n","My dental hygienist recommended an electric toothbrush, \\\n","which is why I got this. The battery life seems to be \\\n","pretty impressive so far. After initial charging and \\\n","leaving the charger plugged in for the first week to \\\n","condition the battery, I've unplugged the charger and \\\n","been using it for twice daily brushing for the last \\\n","3 weeks all on the same charge. But the toothbrush head \\\n","is too small. I’ve seen baby toothbrushes bigger than \\\n","this one. I wish the head was bigger with different \\\n","length bristles to get between teeth better because \\\n","this one doesn’t. Overall if you can get this one \\\n","around the $50 mark, it's a good deal. The manufactuer's \\\n","replacements heads are pretty expensive, but you can \\\n","get generic ones that're more reasonably priced. This \\\n","toothbrush makes me feel like I've been to the dentist \\\n","every day. My teeth feel sparkly clean! \n","\"\"\"\n","\n","# review for a blender\n","review_4 = \"\"\"\n","So, they still had the 17 piece system on seasonal \\\n","sale for around $49 in the month of November, about \\\n","half off, but for some reason (call it price gouging) \\\n","around the second week of December the prices all went \\\n","up to about anywhere from between $70-$89 for the same \\\n","system. And the 11 piece system went up around $10 or \\\n","so in price also from the earlier sale price of $29. \\\n","So it looks okay, but if you look at the base, the part \\\n","where the blade locks into place doesn’t look as good \\\n","as in previous editions from a few years ago, but I \\\n","plan to be very gentle with it (example, I crush \\\n","very hard items like beans, ice, rice, etc. in the \\\n","blender first then pulverize them in the serving size \\\n","I want in the blender then switch to the whipping \\\n","blade for a finer flour, and use the cross cutting blade \\\n","first when making smoothies, then use the flat blade \\\n","if I need them finer/less pulpy). Special tip when making \\\n","smoothies, finely cut and freeze the fruits and \\\n","vegetables (if using spinach-lightly stew soften the \\\n","spinach then freeze until ready for use-and if making \\\n","sorbet, use a small to medium sized food processor) \\\n","that you plan to use that way you can avoid adding so \\\n","much ice if at all-when making your smoothie. \\\n","After about a year, the motor was making a funny noise. \\\n","I called customer service but the warranty expired \\\n","already, so I had to buy another one. FYI: The overall \\\n","quality has gone done in these types of products, so \\\n","they are kind of counting on brand recognition and \\\n","consumer loyalty to maintain sales. Got it in about \\\n","two days.\n","\"\"\"\n","\n","reviews = [review_1, review_2, review_3, review_4]"]},{"cell_type":"code","execution_count":18,"id":"3f61080b","metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["0 Soft and cute panda plush toy loved by daughter, but small for the price. Arrived early. \n","\n","1 Great lamp with storage, fast delivery, excellent customer service, and easy assembly. Highly recommended. \n","\n","2 Impressive battery life, but toothbrush head is too small. Good deal if bought around $50. \n","\n","3 The reviewer found the price increase after the sale disappointing and noticed a decrease in quality over time. \n","\n"]}],"source":["for i in range(len(reviews)):\n"," prompt = f\"\"\"\n"," Your task is to generate a short summary of a product \\\n"," review from an ecommerce site. \n","\n"," Summarize the review below, delimited by triple \\\n"," backticks in at most 20 words. \n","\n"," Review: ```{reviews[i]}```\n"," \"\"\"\n"," response = get_completion(prompt)\n"," print(i, response, \"\\n\")"]}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.11"},"latex_envs":{"LaTeX_envs_menu_present":true,"autoclose":false,"autocomplete":true,"bibliofile":"biblio.bib","cite_by":"apalike","current_citInitial":1,"eqLabelWithNumbers":true,"eqNumInitial":1,"hotkeys":{"equation":"Ctrl-E","itemize":"Ctrl-I"},"labels_anchors":false,"latex_user_defs":false,"report_style_numbering":false,"user_envs_cfg":false},"toc":{"base_numbering":1,"nav_menu":{},"number_sections":true,"sideBar":true,"skip_h1_title":false,"title_cell":"Table of Contents","title_sidebar":"Contents","toc_cell":false,"toc_position":{},"toc_section_display":true,"toc_window_display":true}},"nbformat":4,"nbformat_minor":5} diff --git a/docs/content/C1 Prompt Engineering for Developer/5. 推断 Inferring.ipynb b/docs/content/C1 Prompt Engineering for Developer/5. 推断 Inferring.ipynb index f0c549f..b3a2e59 100644 --- a/docs/content/C1 Prompt Engineering for Developer/5. 推断 Inferring.ipynb +++ b/docs/content/C1 Prompt Engineering for Developer/5. 推断 Inferring.ipynb @@ -5,10 +5,7 @@ "id": "3630c235-f891-4874-bd0a-5277d4d6aa82", "metadata": {}, "source": [ - "# 第五章 推断\n", - "\n", - "在这节课中,你将从产品评论和新闻文章中推断情感和主题。\n", - "\n" + "# 第五章 推断" ] }, { @@ -16,10 +13,13 @@ "id": "5f3abbee", "metadata": {}, "source": [ + "在这一章中,我们将通过一个故事,引领你了解如何从产品评价和新闻文章中推导出情感和主题。\n", "\n", - "推断任务可以看作是模型接收文本作为输入,并执行某种分析的过程。其中涉及提取标签、提取实体、理解文本情感等等。如果你想要从一段文本中提取正面或负面情感,在传统的机器学习工作流程中,需要收集标签数据集、训练模型、确定如何在云端部署模型并进行推断。这样做可能效果还不错,但是执行全流程需要很多工作。而且对于每个任务,如情感分析、提取实体等等,都需要训练和部署单独的模型。\n", + "让我们先想象一下,你是一名初创公司的数据分析师,你的任务是从各种产品评论和新闻文章中提取出关键的情感和主题。这些任务包括了标签提取、实体提取、以及理解文本的情感等等。在传统的机器学习流程中,你需要收集标签化的数据集、训练模型、确定如何在云端部署模型并进行推断。尽管这种方式可能会产生不错的效果,但完成这一全流程需要耗费大量的时间和精力。而且,每一个任务,比如情感分析、实体提取等等,都需要训练和部署单独的模型。\n", "\n", - "LLM 的一个非常好的特点是,对于许多这样的任务,你只需要编写一个 Prompt 即可开始产出结果,而不需要进行大量的工作。这极大地加快了应用程序开发的速度。你还可以只使用一个模型和一个 API 来执行许多不同的任务,而不需要弄清楚如何训练和部署许多不同的模型。" + "然而,就在你准备投入繁重工作的时候,你发现了大型语言模型(LLM)。LLM的一个明显优点是,对于许多这样的任务,你只需要编写一个 Prompt,就可以开始生成结果,大大减轻了你的工作负担。这个发现像是找到了一把神奇的钥匙,让应用程序开发的速度加快了许多。最令你兴奋的是,你可以仅仅使用一个模型和一个API来执行许多不同的任务,无需再纠结如何训练和部署许多不同的模型。\n", + "\n", + "让我们开始这一章的学习,一起探索如何利用LLM加快我们的工作进程,提高我们的工作效率。" ] }, { @@ -27,11 +27,23 @@ "id": "51d2fdfa-c99f-4750-8574-dba7712cd7f0", "metadata": {}, "source": [ - "## 一、情感推断\n", - "\n", - "### 1.1 情感倾向分析\n", - "\n", - "以电商平台关于一盏台灯的评论为例,可以对其传达的情感进行二分类(正向/负向)。" + "## 一、情感推断" + ] + }, + { + "cell_type": "markdown", + "id": "ffc63a4b", + "metadata": {}, + "source": [ + "### 1.1 情感倾向分析" + ] + }, + { + "cell_type": "markdown", + "id": "21767f0b", + "metadata": {}, + "source": [ + "让我们以一则电商平台上的台灯评论为例,通过此例,我们将学习如何对评论进行情感二分类(正面/负面)。" ] }, { @@ -49,20 +61,14 @@ "\"\"\"" ] }, - { - "cell_type": "markdown", - "id": "cc4ec4ca", - "metadata": {}, - "source": [] - }, { "cell_type": "markdown", "id": "30d6e4bd-3337-45a3-8c99-a734cdd06743", "metadata": {}, "source": [ - "现在让我们来编写一个 Prompt 来分类这个评论的情感。如果我想让系统告诉我这个评论的情感是什么,只需要编写 “以下产品评论的情感是什么” 这个 Prompt ,加上通常的分隔符和评论文本等等。\n", + "接下来,我们将尝试编写一个 Prompt ,用以分类这条商品评论的情感。如果我们想让系统解析这条评论的情感倾向,只需编写“以下商品评论的情感倾向是什么?”这样的 Prompt ,再加上一些标准的分隔符和评论文本等。\n", "\n", - "然后让我们运行一下。结果显示这个产品评论的情感是积极的,这似乎是非常正确的。虽然这盏台灯不完美,但这个客户似乎非常满意。这似乎是一家关心客户和产品的伟大公司,可以认为积极的情感似乎是正确的答案。" + "然后,我们将这个程序运行一遍。结果表明,这条商品评论的情感倾向是正面的,这似乎非常准确。尽管这款台灯并非完美无缺,但是这位顾客对它似乎相当满意。这个公司看起来非常重视客户体验和产品质量,因此,认定评论的情感倾向为正面似乎是正确的判断。" ] }, { @@ -91,12 +97,6 @@ "print(response)" ] }, - { - "cell_type": "markdown", - "id": "a562e656", - "metadata": {}, - "source": [] - }, { "cell_type": "markdown", "id": "76be2320", @@ -136,9 +136,15 @@ "id": "81d2a973-1fa4-4a35-ae35-a2e746c0e91b", "metadata": {}, "source": [ - "### 2.2 识别情感类型\n", - "\n", - "仍然使用台灯评论,我们尝试另一个 Prompt 。这次我需要模型识别出评论作者所表达的情感,并归纳为列表,不超过五项。" + "### 2.2 识别情感类型" + ] + }, + { + "cell_type": "markdown", + "id": "c696daa9", + "metadata": {}, + "source": [ + "接下来,我们将继续使用之前的台灯评论,但这次我们会试用一个新的 Prompt 。我们希望模型能够识别出评论作者所表达的情感,并且将这些情感整理为一个不超过五项的列表。" ] }, { @@ -166,12 +172,6 @@ "print(response)" ] }, - { - "cell_type": "markdown", - "id": "c7743a53", - "metadata": {}, - "source": [] - }, { "cell_type": "markdown", "id": "cc4444f7", @@ -187,7 +187,7 @@ "source": [ "### 1.3 识别愤怒\n", "\n", - "对于很多企业来说,了解某个顾客是否非常生气很重要。所以产生了下述分类问题:以下评论的作者是否表达了愤怒情绪?因为如果有人真的很生气,那么可能值得额外关注,让客户支持或客户成功团队联系客户以了解情况,并为客户解决问题。" + "对于许多企业来说,洞察到顾客的愤怒情绪是至关重要的。这就引出了一个分类问题:下述的评论作者是否流露出了愤怒?因为如果有人真的情绪激动,那可能就意味着需要给予额外的关注,因为每一个愤怒的顾客都是一个改进服务的机会,也是一个提升公司口碑的机会。这时,客户支持或者客服团队就应该介入,与客户接触,了解具体情况,然后解决他们的问题。" ] }, { @@ -215,12 +215,6 @@ "print(response)" ] }, - { - "cell_type": "markdown", - "id": "77905fd8", - "metadata": {}, - "source": [] - }, { "cell_type": "markdown", "id": "11ca57a2", @@ -234,15 +228,27 @@ "id": "936a771e-ca78-4e55-8088-2da6f3820ddc", "metadata": {}, "source": [ - "## 二、信息提取\n", + "## 二、信息提取" + ] + }, + { + "cell_type": "markdown", + "id": "506264c6", + "metadata": {}, + "source": [ + "### 2.1 商品信息提取 " + ] + }, + { + "cell_type": "markdown", + "id": "af474353", + "metadata": {}, + "source": [ + "信息提取是自然语言处理(NLP)的重要组成部分,它帮助我们从文本中抽取特定的、我们关心的信息。我们将深入挖掘客户评论中的丰富信息。在接下来的示例中,我们将要求模型识别两个关键元素:购买的商品和商品的制造商。\n", "\n", - "### 2.1 商品信息提取 \n", + "想象一下,如果你正在尝试分析一个在线电商网站上的众多评论,了解评论中提到的商品是什么、由谁制造,以及相关的积极或消极情绪,将极大地帮助你追踪特定商品或制造商在用户心中的情感趋势。\n", "\n", - "接下来,让我们从客户评论中提取更丰富的信息。信息提取是自然语言处理(NLP)的一部分,与从文本中提取你想要知道的某些事物相关。因此,在这个 Prompt 中,我要求它识别以下内容:购买物品和制造物品的公司名称。\n", - "\n", - "同样,如果你试图总结在线购物电子商务网站的许多评论,对于这些评论来说,弄清楚是什么物品、谁制造了该物品,弄清楚积极和消极的情感,有助于追踪特定物品或制造商收获的用户情感趋势。\n", - "\n", - "在下面这个示例中,我们要求它将响应格式化为一个 JSON 对象,其中物品和品牌作为键。" + "在接下来的示例中,我们会要求模型将回应以一个 JSON 对象的形式呈现,其中的 key 就是商品和品牌。" ] }, { @@ -272,19 +278,13 @@ "评论文本用三个反引号分隔。将你的响应格式化为以 “物品” 和 “品牌” 为键的 JSON 对象。\n", "如果信息不存在,请使用 “未知” 作为值。\n", "让你的回应尽可能简短。\n", - " \n", + "\n", "评论文本: ```{lamp_review}```\n", "\"\"\"\n", "response = get_completion(prompt)\n", "print(response)" ] }, - { - "cell_type": "markdown", - "id": "1342c732", - "metadata": {}, - "source": [] - }, { "cell_type": "markdown", "id": "954d125d", @@ -298,9 +298,15 @@ "id": "a38880a5-088f-4609-9913-f8fa41fb7ba0", "metadata": {}, "source": [ - "### 2.2 综合情感推断和信息提取\n", - "\n", - "提取上述所有信息使用了 3 或 4 个 Prompt ,但实际上可以编写单个 Prompt 来同时提取所有这些信息。" + "### 2.2 综合情感推断和信息提取" + ] + }, + { + "cell_type": "markdown", + "id": "6d7a4474", + "metadata": {}, + "source": [ + "在上面小节中,我们采用了三至四个 Prompt 来提取评论中的“情绪倾向”、“是否生气”、“物品类型”和“品牌”等信息。然而,事实上,我们可以设计一个单一的 Prompt ,来同时提取所有这些信息。" ] }, { @@ -333,10 +339,10 @@ "- 评论者购买的物品\n", "- 制造该物品的公司\n", "\n", - "评论用三个反引号分隔。将您的响应格式化为 JSON 对象,以 “情感倾向”、“是否生气”、“物品类型” 和 “品牌” 作为键。\n", + "评论用三个反引号分隔。将你的响应格式化为 JSON 对象,以 “情感倾向”、“是否生气”、“物品类型” 和 “品牌” 作为键。\n", "如果信息不存在,请使用 “未知” 作为值。\n", "让你的回应尽可能简短。\n", - "将 Anger 值格式化为布尔值。\n", + "将 “是否生气” 值格式化为布尔值。\n", "\n", "评论文本: ```{lamp_review}```\n", "\"\"\"\n", @@ -349,7 +355,7 @@ "id": "5e09a673", "metadata": {}, "source": [ - "这个例子中,我们告诉它将愤怒值格式化为布尔值,然后输出一个 JSON。您可以自己尝试不同的变化,或者甚至尝试完全不同的评论,看看是否仍然可以准确地提取这些内容。" + "这个例子中,我们指导 LLM 将“是否生气”的情况格式化为布尔值,并输出 JSON 格式。你可以尝试对格式化模式进行各种变化,或者使用完全不同的评论来试验,看看 LLM 是否仍然可以准确地提取这些内容。" ] }, { @@ -357,9 +363,15 @@ "id": "235fc223-2c89-49ec-ac2d-78a8e74a43ac", "metadata": {}, "source": [ - "## 三、主题推断\n", - "\n", - "大型语言模型的另一个很酷的应用是推断主题。给定一段长文本,这段文本是关于什么的?有什么话题?以以下一段虚构的报纸报道为例。" + "## 三、主题推断" + ] + }, + { + "cell_type": "markdown", + "id": "1386570b", + "metadata": {}, + "source": [ + "大型语言模型的另一个很酷的应用是推断主题。假设我们有一段长文本,我们如何判断这段文本的主旨是什么?它涉及了哪些主题?让我们通过以下一段虚构的报纸报道来具体了解一下。" ] }, { @@ -391,9 +403,15 @@ "id": "a8ea91d6-e841-4ee2-bed9-ca4a36df177f", "metadata": {}, "source": [ - "### 3.1 推断讨论主题\n", - "\n", - "上面是一篇虚构的关于政府工作人员对他们工作机构感受的报纸文章。我们可以让它确定五个正在讨论的主题,用一两个字描述每个主题,并将输出格式化为逗号分隔的列表。" + "### 3.1 推断讨论主题" + ] + }, + { + "cell_type": "markdown", + "id": "a76f21f5", + "metadata": {}, + "source": [ + "以上是一篇关于政府员工对其工作单位感受的虚构报纸文章。我们可以要求大语言模型确定其中讨论的五个主题,并用一两个词语概括每个主题。输出结果将会以逗号分隔的Python列表形式呈现。" ] }, { @@ -425,20 +443,20 @@ "print(response)" ] }, - { - "cell_type": "markdown", - "id": "790d1435", - "metadata": {}, - "source": [] - }, { "cell_type": "markdown", "id": "34be1d2a-1309-4512-841a-b6f67338938b", "metadata": {}, "source": [ - "### 3.2 为特定主题制作新闻提醒\n", - "\n", - "假设我们有一个新闻网站或类似的东西,这是我们感兴趣的主题:NASA、地方政府、工程、员工满意度、联邦政府等。假设我们想弄清楚,针对一篇新闻文章,其中涵盖了哪些主题。可以使用这样的prompt:确定以下主题列表中的每个项目是否是以下文本中的主题。以 0 或 1 的形式给出答案列表。" + "### 3.2 为特定主题制作新闻提醒" + ] + }, + { + "cell_type": "markdown", + "id": "95b636f1", + "metadata": {}, + "source": [ + "假设我们有一个新闻网站或类似的平台,这是我们感兴趣的主题:美国航空航天局、当地政府、工程、员工满意度、联邦政府等。我们想要分析一篇新闻文章,理解其包含了哪些主题。可以使用这样的prompt:确定以下主题列表中的每个项目是否是以下文本中的主题。以 0 或 1 的形式给出答案列表。" ] }, { @@ -476,20 +494,14 @@ "print(response)" ] }, - { - "cell_type": "markdown", - "id": "8f39f24a", - "metadata": {}, - "source": [] - }, { "cell_type": "markdown", "id": "08247dbf", "metadata": {}, "source": [ - "有结果可见,这个故事是与关于 NASA 、员工满意度、联邦政府有关,而与当地政府的、工程学无关。这在机器学习中有时被称为 Zero-Shot (零样本)学习算法,因为我们没有给它任何标记的训练数据。仅凭 Prompt ,它就能确定哪些主题在新闻文章中有所涵盖。\n", + "从输出结果来看,这个`story`与关于“美国航空航天局”、“员工满意度”、“联邦政府”、“当地政府”有关,而与“工程”无关。这种能力在机器学习领域被称为零样本(Zero-Shot)学习。这是因为我们并没有提供任何带标签的训练数据,仅凭 Prompt ,它便能判定哪些主题在新闻文章中被包含。\n", "\n", - "如果我们想生成一个新闻提醒,也可以使用这个处理新闻的过程。假设我非常喜欢 NASA 所做的工作,就可以构建一个这样的系统,每当 NASA 新闻出现时,输出提醒。" + "如果我们希望制定一个新闻提醒,我们同样可以运用这种处理新闻的流程。假设我对“美国航空航天局”的工作深感兴趣,那么你就可以构建一个如此的系统:每当出现与'美国宇航局'相关的新闻,系统就会输出提醒。" ] }, { @@ -515,18 +527,12 @@ " print(\"提醒: 关于美国航空航天局的新消息\")" ] }, - { - "cell_type": "markdown", - "id": "9fc2c643", - "metadata": {}, - "source": [] - }, { "cell_type": "markdown", "id": "76ccd189", "metadata": {}, "source": [ - "这就是关于推断的全部内容了,仅用几分钟时间,我们就可以构建多个用于对文本进行推理的系统,而以前则需要熟练的机器学习开发人员数天甚至数周的时间。这非常令人兴奋,无论是对于熟练的机器学习开发人员,还是对于新手来说,都可以使用 Prompt 来非常快速地构建和开始相当复杂的自然语言处理任务。" + "这就是我们关于推断的全面介绍。在短短几分钟内,我们已经能够建立多个用于文本推理的系统,这是以前需要机器学习专家数天甚至数周时间才能完成的任务。这一变化无疑是令人兴奋的,因为无论你是经验丰富的机器学习开发者,还是刚入门的新手,都能利用输入 Prompt 快速开始复杂的自然语言处理任务。" ] }, { diff --git a/docs/content/C1 Prompt Engineering for Developer/8. 聊天机器人 Chatbot.ipynb b/docs/content/C1 Prompt Engineering for Developer/8. 聊天机器人 Chatbot.ipynb index b008f73..17e29ef 100644 --- a/docs/content/C1 Prompt Engineering for Developer/8. 聊天机器人 Chatbot.ipynb +++ b/docs/content/C1 Prompt Engineering for Developer/8. 聊天机器人 Chatbot.ipynb @@ -15,8 +15,7 @@ "id": "f0bdc2c9", "metadata": {}, "source": [ - "\n", - "使用一个大型语言模型的一个令人兴奋的事情是,我们可以用它来构建一个定制的聊天机器人 (Chatbot) ,只需要很少的工作量。在这一节中,我们将探索如何利用聊天的方式,与个性化(或专门针对特定任务或行为的)聊天机器人进行扩展对话。" + "大型语言模型带给我们的激动人心的一种可能性是,我们可以通过它构建定制的聊天机器人(Chatbot),而且只需很少的工作量。在这一章节的探索中,我们将带你了解如何利用会话形式,与具有个性化特性(或专门为特定任务或行为设计)的聊天机器人进行深度对话。" ] }, { @@ -24,7 +23,7 @@ "id": "e6fae355", "metadata": {}, "source": [ - "像 ChatGPT 这样的聊天模型实际上是组装成以一系列消息作为输入,并返回一个模型生成的消息作为输出的。这种聊天格式原本的设计目标是简便多轮对话,但我们通过之前的学习可以知道,它对于不会涉及任何对话的**单轮任务**也同样有用。\n" + "像 ChatGPT 这样的聊天模型实际上是组装成以一系列消息作为输入,并返回一个模型生成的消息作为输出的。这种聊天格式原本的设计目标是简便多轮对话,但我们通过之前的学习可以知道,它对于不会涉及任何对话的**单轮任务**也同样有用。" ] }, { @@ -93,9 +92,21 @@ "id": "e105c1b4", "metadata": {}, "source": [ - "### 1.1 讲笑话\n", + "### 1.1 讲笑话" + ] + }, + { + "cell_type": "markdown", + "id": "a0b37933", + "metadata": {}, + "source": [ + "我们通过系统消息来定义:“你是一个说话像莎士比亚的助手。”这是我们向助手描述**它应该如何表现的方式**。\n", "\n", - "系统消息说,你是一个说话像莎士比亚的助手。这是我们向助手描述**它应该如何表现的方式**。然后,第一个用户消息是*给我讲个笑话*。接下来以助手身份给出回复是,*为什么鸡会过马路?* 最后发送用户消息是*我不知道*。" + "然后,第一个用户消息:“给我讲个笑话。”\n", + "\n", + "接下来以助手身份给出回复:“为什么鸡会过马路?” \n", + "\n", + "最后发送用户消息是:“我不知道。”" ] }, { @@ -182,7 +193,9 @@ "id": "5f76bedb", "metadata": {}, "source": [ - "让我们看另一个例子。助手的消息是*你是一个友好的聊天机器人*,第一个用户消息是*嗨,我叫Isa*。我们想要得到第一个用户消息。" + "让我们看另一个例子。系统消息来定义:“*你是一个友好的聊天机器人*”,第一个用户消息:“*嗨,我叫Isa*。”\n", + "\n", + "我们想要得到第一个用户消息的回复。" ] }, { @@ -221,7 +234,7 @@ "id": "1e9f96ba", "metadata": {}, "source": [ - "让我们再试一个例子。系统消息是,你是一个友好的聊天机器人,第一个用户消息是,是的,你能提醒我我的名字是什么吗?" + "让我们再试一个例子。系统消息来定义:“你是一个友好的聊天机器人”,第一个用户消息:“是的,你能提醒我我的名字是什么吗?”" ] }, { @@ -299,12 +312,30 @@ "id": "bBg_MpXeYnTq" }, "source": [ - "## 三、订餐机器人\n", - "\n", - "现在,我们构建一个 “订餐机器人”,我们需要它自动收集用户信息,接受比萨饼店的订单。\n", - "\n", - "### 3.1 构建机器人\n", - "\n", + "## 三、订餐机器人" + ] + }, + { + "cell_type": "markdown", + "id": "8f0f678c", + "metadata": {}, + "source": [ + "在这一新的章节中,我们将探索如何构建一个 “点餐助手机器人”。这个机器人将被设计为自动收集用户信息,并接收来自比萨饼店的订单。让我们开始这个有趣的项目,深入理解它如何帮助简化日常的订餐流程。" + ] + }, + { + "cell_type": "markdown", + "id": "4edeede6", + "metadata": {}, + "source": [ + "### 3.1 构建机器人" + ] + }, + { + "cell_type": "markdown", + "id": "3357a655", + "metadata": {}, + "source": [ "下面这个函数将收集我们的用户消息,以便我们可以避免像刚才一样手动输入。这个函数将从我们下面构建的用户界面中收集 Prompt ,然后将其附加到一个名为上下文( ```context``` )的列表中,并在每次调用模型时使用该上下文。模型的响应也会添加到上下文中,所以用户消息和模型消息都被添加到上下文中,上下文逐渐变长。这样,模型就有了需要的信息来确定下一步要做什么。" ] }, @@ -434,7 +465,14 @@ "source": [ "此处我们另外要求模型创建一个 JSON 摘要,方便我们发送给订单系统。\n", "\n", - "因此我们需要在上下文的基础上追加另一个系统消息,作为另一条指示 (instruction) 。我们说*创建一个刚刚订单的 JSON 摘要,列出每个项目的价格,字段应包括 1)披萨,包括尺寸,2)配料列表,3)饮料列表,4)辅菜列表,包括尺寸,最后是总价格*。此处也可以定义为用户消息,不一定是系统消息。\n", + "因此我们需要在上下文的基础上追加另一个系统消息,作为另一条指示 (instruction) 。我们说创建一个刚刚订单的 JSON 摘要,列出每个项目的价格,字段应包括:\n", + "1. 披萨,包括尺寸\n", + "2. 配料列表\n", + "3. 饮料列表\n", + "4. 辅菜列表,包括尺寸,\n", + "5. 总价格。\n", + "\n", + "此处也可以定义为用户消息,不一定是系统消息。\n", "\n", "请注意,这里我们使用了一个较低的温度,因为对于这些类型的任务,我们希望输出相对可预测。" ] @@ -510,7 +548,7 @@ "id": "ef17c2b2", "metadata": {}, "source": [ - "现在,我们已经建立了自己的订餐聊天机器人。请随意自定义并修改系统消息,以更改聊天机器人的行为,并使其扮演不同的角色,拥有不同的知识。" + "我们已经成功创建了自己的订餐聊天机器人。你可以根据自己的喜好和需求,自由地定制和修改机器人的系统消息,改变它的行为,让它扮演各种各样的角色,赋予它丰富多彩的知识。让我们一起探索聊天机器人的无限可能性吧!" ] }, { diff --git a/docs/content/C1 Prompt Engineering for Developer/9. 总结 Summary.md b/docs/content/C1 Prompt Engineering for Developer/9. 总结 Summary.md new file mode 100644 index 0000000..1322bbf --- /dev/null +++ b/docs/content/C1 Prompt Engineering for Developer/9. 总结 Summary.md @@ -0,0 +1,14 @@ +**恭喜您完成了本书第一单元内容的学习!** + +总的来说,在第一单元中,我们学习并掌握了关于 Prompt 的两个核心原则: + +- 编写清晰具体的指令; +- 如果适当的话,给模型一些思考时间。 + +您还学习了迭代式 Prompt 开发的方法,并了解了如何找到适合您应用程序的 Prompt 的过程是非常关键的。 + +我们还讨论了大型语言模型的许多功能,包括摘要、推断、转换和扩展。您也学习了如何搭建个性化的聊天机器人。在第一单元中,您的收获应该颇丰,希望通过第一单元学习能为您带来愉悦的体验。 + +我们期待您能灵感迸发,尝试创建自己的应用。请大胆尝试,并分享给我们您的想法。您可以从一个微型项目开始,或许它具备一定的实用性,或者仅仅是一项有趣的创新。请利用您在第一个项目中得到的经验,去创造更优秀的下一项目,以此类推。如果您已经有一个宏大的项目设想,那么,请毫不犹豫地去实现它。 + +最后,希望您在完成第一单元的过程中感到满足,感谢您的参与。我们热切期待着您的惊艳作品。接下来,我们将进入第二单元的学习!