849 lines
29 KiB
Plaintext
849 lines
29 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "f200ba9a",
|
||
"metadata": {},
|
||
"source": [
|
||
"# 5 基于文档的问答 \n",
|
||
"<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#5.1-导入embedding模型和向量存储组件\" data-toc-modified-id=\"5.1-导入embedding模型和向量存储组件-1\">5.1 导入embedding模型和向量存储组件</a></span><ul class=\"toc-item\"><li><span><a href=\"#5.1.2-创建向量存储\" data-toc-modified-id=\"5.1.2-创建向量存储-1.1\">5.1.2 创建向量存储</a></span></li><li><span><a href=\"#5.1.3-使用语言模型与文档结合使用\" data-toc-modified-id=\"5.1.3-使用语言模型与文档结合使用-1.2\">5.1.3 使用语言模型与文档结合使用</a></span></li></ul></li><li><span><a href=\"#5.2-如何回答我们文档的相关问题\" data-toc-modified-id=\"5.2-如何回答我们文档的相关问题-2\">5.2 如何回答我们文档的相关问题</a></span><ul class=\"toc-item\"><li><span><a href=\"#5.2.1-不同类型的chain链\" data-toc-modified-id=\"5.2.1-不同类型的chain链-2.1\">5.2.1 不同类型的chain链</a></span></li></ul></li></ul></div>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "52824b89-532a-4e54-87e9-1410813cd39e",
|
||
"metadata": {},
|
||
"source": [
|
||
"\n",
|
||
"本章内容主要利用langchain构建向量数据库,可以在文档上方或关于文档回答问题,因此,给定从PDF文件、网页或某些公司的内部文档收集中提取的文本,使用llm回答有关这些文档内容的问题"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "4aac484b",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"source": [
|
||
"\n",
|
||
"\n",
|
||
"安装langchain,设置chatGPT的OPENAI_API_KEY\n",
|
||
"\n",
|
||
"* 安装langchain\n",
|
||
"\n",
|
||
"```\n",
|
||
"pip install langchain\n",
|
||
"```\n",
|
||
"* 安装docarray\n",
|
||
"\n",
|
||
"```\n",
|
||
"pip install docarray\n",
|
||
"```\n",
|
||
"* 设置API-KEY环境变量\n",
|
||
"\n",
|
||
"```\n",
|
||
"export OPENAI_API_KEY='api-key'\n",
|
||
"\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"id": "b7ed03ed-1322-49e3-b2a2-33e94fb592ef",
|
||
"metadata": {
|
||
"height": 81,
|
||
"tags": []
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"import os\n",
|
||
"\n",
|
||
"from dotenv import load_dotenv, find_dotenv\n",
|
||
"_ = load_dotenv(find_dotenv()) #读取环境变量"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 52,
|
||
"id": "af8c3c96",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'\\n\\n人工智能是一项极具前景的技术,它的发展正在改变人类的生活方式,带来了无数的便利,也被认为是未来发展的重要标志。人工智能的发展让许多复杂的任务变得更加容易,更高效的完成,节省了大量的时间和精力,为人类发展带来了极大的帮助。'"
|
||
]
|
||
},
|
||
"execution_count": 52,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"from langchain.llms import OpenAI\n",
|
||
"\n",
|
||
"llm = OpenAI(model_name=\"text-davinci-003\",max_tokens=1024)\n",
|
||
"llm(\"怎么评价人工智能\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "8cb7a7ec",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"source": [
|
||
"## 5.1 导入embedding模型和向量存储组件\n",
|
||
"使用Dock Array内存搜索向量存储,作为一个内存向量存储,不需要连接外部数据库"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"id": "974acf8e-8f88-42de-88f8-40a82cb58e8b",
|
||
"metadata": {
|
||
"height": 98
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"from langchain.chains import RetrievalQA #检索QA链,在文档上进行检索\n",
|
||
"from langchain.chat_models import ChatOpenAI #openai模型\n",
|
||
"from langchain.document_loaders import CSVLoader #文档加载器,采用csv格式存储\n",
|
||
"from langchain.vectorstores import DocArrayInMemorySearch #向量存储\n",
|
||
"from IPython.display import display, Markdown #在jupyter显示信息的工具"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"id": "7249846e",
|
||
"metadata": {
|
||
"height": 75
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#读取文件\n",
|
||
"file = 'OutdoorClothingCatalog_1000.csv'\n",
|
||
"loader = CSVLoader(file_path=file)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 24,
|
||
"id": "7724f00e",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>0</th>\n",
|
||
" <th>1</th>\n",
|
||
" <th>2</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>name</td>\n",
|
||
" <td>description</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>Women's Campside Oxfords</td>\n",
|
||
" <td>This ultracomfortable lace-to-toe Oxford boast...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1.0</td>\n",
|
||
" <td>Recycled Waterhog Dog Mat, Chevron Weave</td>\n",
|
||
" <td>Protect your floors from spills and splashing ...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>2.0</td>\n",
|
||
" <td>Infant and Toddler Girls' Coastal Chill Swimsu...</td>\n",
|
||
" <td>She'll love the bright colors, ruffles and exc...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>3.0</td>\n",
|
||
" <td>Refresh Swimwear, V-Neck Tankini Contrasts</td>\n",
|
||
" <td>Whether you're going for a swim or heading out...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>996</th>\n",
|
||
" <td>995.0</td>\n",
|
||
" <td>Men's Classic Denim, Standard Fit</td>\n",
|
||
" <td>Crafted from premium denim that will last wash...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>997</th>\n",
|
||
" <td>996.0</td>\n",
|
||
" <td>CozyPrint Sweater Fleece Pullover</td>\n",
|
||
" <td>The ultimate sweater fleece - made from superi...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>998</th>\n",
|
||
" <td>997.0</td>\n",
|
||
" <td>Women's NRS Endurance Spray Paddling Pants</td>\n",
|
||
" <td>These comfortable and affordable splash paddli...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>999</th>\n",
|
||
" <td>998.0</td>\n",
|
||
" <td>Women's Stop Flies Hoodie</td>\n",
|
||
" <td>This great-looking hoodie uses No Fly Zone Tec...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1000</th>\n",
|
||
" <td>999.0</td>\n",
|
||
" <td>Modern Utility Bag</td>\n",
|
||
" <td>This US-made crossbody bag is built with the s...</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>1001 rows × 3 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" 0 1 \n",
|
||
"0 NaN name \\\n",
|
||
"1 0.0 Women's Campside Oxfords \n",
|
||
"2 1.0 Recycled Waterhog Dog Mat, Chevron Weave \n",
|
||
"3 2.0 Infant and Toddler Girls' Coastal Chill Swimsu... \n",
|
||
"4 3.0 Refresh Swimwear, V-Neck Tankini Contrasts \n",
|
||
"... ... ... \n",
|
||
"996 995.0 Men's Classic Denim, Standard Fit \n",
|
||
"997 996.0 CozyPrint Sweater Fleece Pullover \n",
|
||
"998 997.0 Women's NRS Endurance Spray Paddling Pants \n",
|
||
"999 998.0 Women's Stop Flies Hoodie \n",
|
||
"1000 999.0 Modern Utility Bag \n",
|
||
"\n",
|
||
" 2 \n",
|
||
"0 description \n",
|
||
"1 This ultracomfortable lace-to-toe Oxford boast... \n",
|
||
"2 Protect your floors from spills and splashing ... \n",
|
||
"3 She'll love the bright colors, ruffles and exc... \n",
|
||
"4 Whether you're going for a swim or heading out... \n",
|
||
"... ... \n",
|
||
"996 Crafted from premium denim that will last wash... \n",
|
||
"997 The ultimate sweater fleece - made from superi... \n",
|
||
"998 These comfortable and affordable splash paddli... \n",
|
||
"999 This great-looking hoodie uses No Fly Zone Tec... \n",
|
||
"1000 This US-made crossbody bag is built with the s... \n",
|
||
"\n",
|
||
"[1001 rows x 3 columns]"
|
||
]
|
||
},
|
||
"execution_count": 24,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"#查看数据\n",
|
||
"import pandas as pd\n",
|
||
"data = pd.read_csv(file,header=None)\n",
|
||
"data"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "3bd6422c",
|
||
"metadata": {},
|
||
"source": [
|
||
"提供了一个户外服装的CSV文件,我们将使用它与语言模型结合使用"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "2963fc63",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 5.1.2 创建向量存储\n",
|
||
"将导入一个索引,即向量存储索引创建器"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 25,
|
||
"id": "5bfaba30",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"from langchain.indexes import VectorstoreIndexCreator #导入向量存储索引创建器"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "9e200726",
|
||
"metadata": {
|
||
"height": 64
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"'''\n",
|
||
"将指定向量存储类,创建完成后,我们将从加载器中调用,通过文档记载器列表加载\n",
|
||
"'''\n",
|
||
"\n",
|
||
"index = VectorstoreIndexCreator(\n",
|
||
" vectorstore_cls=DocArrayInMemorySearch\n",
|
||
").from_loaders([loader])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"id": "34562d81",
|
||
"metadata": {
|
||
"height": 47
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"query =\"Please list all your shirts with sun protection \\\n",
|
||
"in a table in markdown and summarize each one.\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 21,
|
||
"id": "cfd0cc37",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"response = index.query(query)#使用索引查询创建一个响应,并传入这个查询"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 23,
|
||
"id": "ae21f1ff",
|
||
"metadata": {
|
||
"height": 30,
|
||
"scrolled": true
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/markdown": [
|
||
"\n",
|
||
"\n",
|
||
"| Name | Description |\n",
|
||
"| --- | --- |\n",
|
||
"| Men's Tropical Plaid Short-Sleeve Shirt | UPF 50+ rated, 100% polyester, wrinkle-resistant, front and back cape venting, two front bellows pockets |\n",
|
||
"| Men's Plaid Tropic Shirt, Short-Sleeve | UPF 50+ rated, 52% polyester and 48% nylon, machine washable and dryable, front and back cape venting, two front bellows pockets |\n",
|
||
"| Men's TropicVibe Shirt, Short-Sleeve | UPF 50+ rated, 71% Nylon, 29% Polyester, 100% Polyester knit mesh, machine wash and dry, front and back cape venting, two front bellows pockets |\n",
|
||
"| Sun Shield Shirt by | UPF 50+ rated, 78% nylon, 22% Lycra Xtra Life fiber, handwash, line dry, wicks moisture, fits comfortably over swimsuit, abrasion resistant |\n",
|
||
"\n",
|
||
"All four shirts provide UPF 50+ sun protection, blocking 98% of the sun's harmful rays. The Men's Tropical Plaid Short-Sleeve Shirt is made of 100% polyester and is wrinkle-resistant"
|
||
],
|
||
"text/plain": [
|
||
"<IPython.core.display.Markdown object>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"display(Markdown(response))#查看查询返回的内容"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "eb74cc79",
|
||
"metadata": {},
|
||
"source": [
|
||
"得到了一个Markdown表格,其中包含所有带有防晒衣的衬衫的名称和描述,还得到了一个语言模型提供的不错的小总结"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "dd34e50e",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 5.1.3 使用语言模型与文档结合使用\n",
|
||
"想要使用语言模型并将其与我们的许多文档结合使用,但是语言模型一次只能检查几千个单词,如果我们有非常大的文档,如何让语言模型回答关于其中所有内容的问题呢?通过embedding和向量存储实现\n",
|
||
"* embedding \n",
|
||
"文本片段创建数值表示文本语义,相似内容的文本片段将具有相似的向量,这使我们可以在向量空间中比较文本片段\n",
|
||
"* 向量数据库 \n",
|
||
"向量数据库是存储我们在上一步中创建的这些向量表示的一种方式,我们创建这个向量数据库的方式是用来自传入文档的文本块填充它。\n",
|
||
"当我们获得一个大的传入文档时,我们首先将其分成较小的块,因为我们可能无法将整个文档传递给语言模型,因此采用分块embedding的方式储存到向量数据库中。这就是创建索引的过程。\n",
|
||
"\n",
|
||
"通过运行时使用索引来查找与传入查询最相关的文本片段,然后我们将其与向量数据库中的所有向量进行比较,并选择最相似的n个,返回语言模型得到最终答案"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 26,
|
||
"id": "631396c6",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#创建一个文档加载器,通过csv格式加载\n",
|
||
"loader = CSVLoader(file_path=file)\n",
|
||
"docs = loader.load()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 27,
|
||
"id": "4a977f44",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Document(page_content=\": 0\\nname: Women's Campside Oxfords\\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \\n\\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \\n\\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \\n\\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \\n\\nQuestions? Please contact us for any inquiries.\", metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 0})"
|
||
]
|
||
},
|
||
"execution_count": 27,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"docs[0]#查看单个文档,我们可以看到每个文档对应于CSV中的一个块"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 31,
|
||
"id": "e875693a",
|
||
"metadata": {
|
||
"height": 47
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"'''\n",
|
||
"因为这些文档已经非常小了,所以我们实际上不需要在这里进行任何分块,可以直接进行embedding\n",
|
||
"'''\n",
|
||
"\n",
|
||
"from langchain.embeddings import OpenAIEmbeddings #要创建可以直接进行embedding,我们将使用OpenAI的可以直接进行embedding类\n",
|
||
"embeddings = OpenAIEmbeddings() #初始化"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 32,
|
||
"id": "779bec75",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"embed = embeddings.embed_query(\"Hi my name is Harrison\")#让我们使用embedding上的查询方法为特定文本创建embedding"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 33,
|
||
"id": "699aaaf9",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"1536\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(len(embed))#查看这个embedding,我们可以看到有超过一千个不同的元素"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 34,
|
||
"id": "9d00d346",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"[-0.021933607757091522, 0.006697045173496008, -0.01819835603237152, -0.039113257080316544, -0.014060650952160358]\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(embed[:5])#每个元素都是不同的数字值,组合起来,这就创建了这段文本的总体数值表示"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 35,
|
||
"id": "27ad0bb0",
|
||
"metadata": {
|
||
"height": 81
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"'''\n",
|
||
"为刚才的文本创建embedding,准备将它们存储在向量存储中,使用向量存储上的from documents方法来实现。\n",
|
||
"该方法接受文档列表、嵌入对象,然后我们将创建一个总体向量存储\n",
|
||
"'''\n",
|
||
"db = DocArrayInMemorySearch.from_documents(\n",
|
||
" docs, \n",
|
||
" embeddings\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 36,
|
||
"id": "0329bfd5",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"query = \"Please suggest a shirt with sunblocking\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 37,
|
||
"id": "7909c6b7",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"docs = db.similarity_search(query)#使用这个向量存储来查找与传入查询类似的文本,如果我们在向量存储中使用相似性搜索方法并传入一个查询,我们将得到一个文档列表"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 38,
|
||
"id": "43321853",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"4"
|
||
]
|
||
},
|
||
"execution_count": 38,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(docs)# 我们可以看到它返回了四个文档"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 39,
|
||
"id": "6eba90b5",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Document(page_content=': 255\\nname: Sun Shield Shirt by\\ndescription: \"Block the sun, not the fun – our high-performance sun shirt is guaranteed to protect from harmful UV rays. \\n\\nSize & Fit: Slightly Fitted: Softly shapes the body. Falls at hip.\\n\\nFabric & Care: 78% nylon, 22% Lycra Xtra Life fiber. UPF 50+ rated – the highest rated sun protection possible. Handwash, line dry.\\n\\nAdditional Features: Wicks moisture for quick-drying comfort. Fits comfortably over your favorite swimsuit. Abrasion resistant for season after season of wear. Imported.\\n\\nSun Protection That Won\\'t Wear Off\\nOur high-performance fabric provides SPF 50+ sun protection, blocking 98% of the sun\\'s harmful rays. This fabric is recommended by The Skin Cancer Foundation as an effective UV protectant.', metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 255})"
|
||
]
|
||
},
|
||
"execution_count": 39,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"docs[0] #,如果我们看第一个文档,我们可以看到它确实是一件关于防晒的衬衫"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "fe41b36f",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 5.2 如何回答我们文档的相关问题\n",
|
||
"首先,我们需要从这个向量存储中创建一个检索器,检索器是一个通用接口,可以由任何接受查询并返回文档的方法支持。接下来,因为我们想要进行文本生成并返回自然语言响应\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 40,
|
||
"id": "c0c3596e",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"retriever = db.as_retriever() #创建检索器通用接口"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 55,
|
||
"id": "0625f5e8",
|
||
"metadata": {
|
||
"height": 47
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"llm = ChatOpenAI(temperature = 0.0,max_tokens=1024) #导入语言模型\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 43,
|
||
"id": "a573f58a",
|
||
"metadata": {
|
||
"height": 47
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"qdocs = \"\".join([docs[i].page_content for i in range(len(docs))]) # 将合并文档中的所有页面内容到一个变量中\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "14682d95",
|
||
"metadata": {
|
||
"height": 64
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"response = llm.call_as_llm(f\"{qdocs} Question: Please list all your \\\n",
|
||
"shirts with sun protection in a table in markdown and summarize each one.\") #列出所有具有防晒功能的衬衫并在Markdown表格中总结每个衬衫的语言模型\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 28,
|
||
"id": "8bba545b",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/markdown": [
|
||
"| Name | Description |\n",
|
||
"| --- | --- |\n",
|
||
"| Sun Shield Shirt | High-performance sun shirt with UPF 50+ sun protection, moisture-wicking, and abrasion-resistant fabric. Recommended by The Skin Cancer Foundation. |\n",
|
||
"| Men's Plaid Tropic Shirt | Ultracomfortable shirt with UPF 50+ sun protection, wrinkle-free fabric, and front/back cape venting. Made with 52% polyester and 48% nylon. |\n",
|
||
"| Men's TropicVibe Shirt | Men's sun-protection shirt with built-in UPF 50+ and front/back cape venting. Made with 71% nylon and 29% polyester. |\n",
|
||
"| Men's Tropical Plaid Short-Sleeve Shirt | Lightest hot-weather shirt with UPF 50+ sun protection, front/back cape venting, and two front bellows pockets. Made with 100% polyester and is wrinkle-resistant. |\n",
|
||
"\n",
|
||
"All of these shirts provide UPF 50+ sun protection, blocking 98% of the sun's harmful rays. They are made with high-performance fabrics that are moisture-wicking, wrinkle-resistant, and abrasion-resistant. The Men's Plaid Tropic Shirt and Men's Tropical Plaid Short-Sleeve Shirt both have front/back cape venting for added breathability. The Sun Shield Shirt is recommended by The Skin Cancer Foundation as an effective UV protectant."
|
||
],
|
||
"text/plain": [
|
||
"<IPython.core.display.Markdown object>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"display(Markdown(response))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "12f042e7",
|
||
"metadata": {},
|
||
"source": [
|
||
"在此处打印响应,我们可以看到我们得到了一个表格,正如我们所要求的那样"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 56,
|
||
"id": "32c94d22",
|
||
"metadata": {
|
||
"height": 115
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"''' \n",
|
||
"通过LangChain链封装起来\n",
|
||
"创建一个检索QA链,对检索到的文档进行问题回答,要创建这样的链,我们将传入几个不同的东西\n",
|
||
"1、语言模型,在最后进行文本生成\n",
|
||
"2、传入链类型,这里使用stuff,将所有文档塞入上下文并对语言模型进行一次调用\n",
|
||
"3、传入一个检索器\n",
|
||
"'''\n",
|
||
"\n",
|
||
"\n",
|
||
"qa_stuff = RetrievalQA.from_chain_type(\n",
|
||
" llm=llm, \n",
|
||
" chain_type=\"stuff\", \n",
|
||
" retriever=retriever, \n",
|
||
" verbose=True\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 46,
|
||
"id": "e4769316",
|
||
"metadata": {
|
||
"height": 47
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"query = \"Please list all your shirts with sun protection in a table \\\n",
|
||
"in markdown and summarize each one.\"#创建一个查询并在此查询上运行链"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "1fc3c2f3",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"response = qa_stuff.run(query)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 58,
|
||
"id": "fba1a5db",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/markdown": [
|
||
"\n",
|
||
"\n",
|
||
"| Name | Description |\n",
|
||
"| --- | --- |\n",
|
||
"| Men's Tropical Plaid Short-Sleeve Shirt | UPF 50+ rated, 100% polyester, wrinkle-resistant, front and back cape venting, two front bellows pockets |\n",
|
||
"| Men's Plaid Tropic Shirt, Short-Sleeve | UPF 50+ rated, 52% polyester and 48% nylon, machine washable and dryable, front and back cape venting, two front bellows pockets |\n",
|
||
"| Men's TropicVibe Shirt, Short-Sleeve | UPF 50+ rated, 71% Nylon, 29% Polyester, 100% Polyester knit mesh, machine wash and dry, front and back cape venting, two front bellows pockets |\n",
|
||
"| Sun Shield Shirt by | UPF 50+ rated, 78% nylon, 22% Lycra Xtra Life fiber, handwash, line dry, wicks moisture, fits comfortably over swimsuit, abrasion resistant |\n",
|
||
"\n",
|
||
"All four shirts provide UPF 50+ sun protection, blocking 98% of the sun's harmful rays. The Men's Tropical Plaid Short-Sleeve Shirt is made of 100% polyester and is wrinkle-resistant"
|
||
],
|
||
"text/plain": [
|
||
"<IPython.core.display.Markdown object>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"display(Markdown(response))#使用 display 和 markdown 显示它"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "e28c5657",
|
||
"metadata": {},
|
||
"source": [
|
||
"这两个方式返回相同的结果"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "44f1fa38",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 5.2.1 不同类型的chain链\n",
|
||
"想在许多不同类型的块上执行相同类型的问答,该怎么办?之前的实验中只返回了4个文档,如果有多个文档,那么我们可以使用几种不同的方法\n",
|
||
"* Map Reduce \n",
|
||
"将所有块与问题一起传递给语言模型,获取回复,使用另一个语言模型调用将所有单独的回复总结成最终答案,它可以在任意数量的文档上运行。可以并行处理单个问题,同时也需要更多的调用。它将所有文档视为独立的\n",
|
||
"* Refine \n",
|
||
"用于循环许多文档,际上是迭代的,建立在先前文档的答案之上,非常适合前后因果信息并随时间逐步构建答案,依赖于先前调用的结果。它通常需要更长的时间,并且基本上需要与Map Reduce一样多的调用\n",
|
||
"* Map Re-rank \n",
|
||
"对每个文档进行单个语言模型调用,要求它返回一个分数,选择最高分,这依赖于语言模型知道分数应该是什么,需要告诉它,如果它与文档相关,则应该是高分,并在那里精细调整说明,可以批量处理它们相对较快,但是更加昂贵\n",
|
||
"* Stuff \n",
|
||
"将所有内容组合成一个文档"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.9.12"
|
||
},
|
||
"toc": {
|
||
"base_numbering": 1,
|
||
"nav_menu": {},
|
||
"number_sections": false,
|
||
"sideBar": true,
|
||
"skip_h1_title": false,
|
||
"title_cell": "Table of Contents",
|
||
"title_sidebar": "Contents",
|
||
"toc_cell": false,
|
||
"toc_position": {},
|
||
"toc_section_display": true,
|
||
"toc_window_display": true
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|