425 lines
13 KiB
Plaintext
425 lines
13 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "51e47840",
|
||
"metadata": {},
|
||
"source": [
|
||
"# 第四章 基于LangChain的文档问答\n",
|
||
"本章内容主要利用langchain构建向量数据库,可以在文档上方或关于文档回答问题,因此,给定从PDF文件、网页或某些公司的内部文档收集中提取的文本,使用llm回答有关这些文档内容的问题"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "ef807f79",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 一、环境配置\n",
|
||
"\n",
|
||
"安装langchain,设置chatGPT的OPENAI_API_KEY\n",
|
||
"* 安装langchain\n",
|
||
"```\n",
|
||
"pip install --upgrade langchain\n",
|
||
"```\n",
|
||
"* 安装docarray\n",
|
||
"```\n",
|
||
"pip install docarray\n",
|
||
"```\n",
|
||
"* 设置API-KEY环境变量\n",
|
||
"```\n",
|
||
"export OPENAI_API_KEY='api-key'\n",
|
||
"\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "af3ffa97",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import os\n",
|
||
"\n",
|
||
"from dotenv import load_dotenv, find_dotenv\n",
|
||
"_ = load_dotenv(find_dotenv()) # 读取系统中的环境变量"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"id": "49081091",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"False"
|
||
]
|
||
},
|
||
"execution_count": 8,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"load_dotenv(find_dotenv())\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"id": "3bcb095f",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import os\n",
|
||
"os.environ[\"OPENAI_API_KEY\"] = \"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\"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"id": "46595e8c",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#导入检索QA链,在文档上进行检索\n",
|
||
"from langchain.chains import RetrievalQA\n",
|
||
"from langchain.chat_models import ChatOpenAI\n",
|
||
"from langchain.document_loaders import CSVLoader\n",
|
||
"from langchain.vectorstores import DocArrayInMemorySearch\n",
|
||
"from IPython.display import display, Markdown"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "e511efa5",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 使用 LangChain 完成一次问答"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"id": "3ab4b9d1",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'\\n\\n人工智能是一种重要的现代科技,它可以大大改善人类生活,减轻人类负担,提升工作效率。它可以帮助人们提高生产力,更有效地管理组织,并且可以提供更为准确的数据,帮助人们更好地决策。另外,人工智能可以帮助科学家发现新的药物,改善医疗服务,以及发展新的环保技术。总之,人工智能是一项重要的科技,具有广泛的应用前景。'"
|
||
]
|
||
},
|
||
"execution_count": 13,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"from langchain.llms import OpenAI\n",
|
||
"\n",
|
||
"llm = OpenAI(model_name=\"text-davinci-003\",max_tokens=1024)\n",
|
||
"llm(\"怎么评价人工智能\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 14,
|
||
"id": "884399f1",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"file = 'OutdoorClothingCatalog_1000.csv'\n",
|
||
"loader = CSVLoader(file_path=file)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 15,
|
||
"id": "52ec965a",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>0</th>\n",
|
||
" <th>1</th>\n",
|
||
" <th>2</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>name</td>\n",
|
||
" <td>description</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>Women's Campside Oxfords</td>\n",
|
||
" <td>This ultracomfortable lace-to-toe Oxford boast...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1.0</td>\n",
|
||
" <td>Recycled Waterhog Dog Mat, Chevron Weave</td>\n",
|
||
" <td>Protect your floors from spills and splashing ...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>2.0</td>\n",
|
||
" <td>Infant and Toddler Girls' Coastal Chill Swimsu...</td>\n",
|
||
" <td>She'll love the bright colors, ruffles and exc...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>3.0</td>\n",
|
||
" <td>Refresh Swimwear, V-Neck Tankini Contrasts</td>\n",
|
||
" <td>Whether you're going for a swim or heading out...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>996</th>\n",
|
||
" <td>995.0</td>\n",
|
||
" <td>Men's Classic Denim, Standard Fit</td>\n",
|
||
" <td>Crafted from premium denim that will last wash...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>997</th>\n",
|
||
" <td>996.0</td>\n",
|
||
" <td>CozyPrint Sweater Fleece Pullover</td>\n",
|
||
" <td>The ultimate sweater fleece - made from superi...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>998</th>\n",
|
||
" <td>997.0</td>\n",
|
||
" <td>Women's NRS Endurance Spray Paddling Pants</td>\n",
|
||
" <td>These comfortable and affordable splash paddli...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>999</th>\n",
|
||
" <td>998.0</td>\n",
|
||
" <td>Women's Stop Flies Hoodie</td>\n",
|
||
" <td>This great-looking hoodie uses No Fly Zone Tec...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1000</th>\n",
|
||
" <td>999.0</td>\n",
|
||
" <td>Modern Utility Bag</td>\n",
|
||
" <td>This US-made crossbody bag is built with the s...</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>1001 rows × 3 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" 0 1 \\\n",
|
||
"0 NaN name \n",
|
||
"1 0.0 Women's Campside Oxfords \n",
|
||
"2 1.0 Recycled Waterhog Dog Mat, Chevron Weave \n",
|
||
"3 2.0 Infant and Toddler Girls' Coastal Chill Swimsu... \n",
|
||
"4 3.0 Refresh Swimwear, V-Neck Tankini Contrasts \n",
|
||
"... ... ... \n",
|
||
"996 995.0 Men's Classic Denim, Standard Fit \n",
|
||
"997 996.0 CozyPrint Sweater Fleece Pullover \n",
|
||
"998 997.0 Women's NRS Endurance Spray Paddling Pants \n",
|
||
"999 998.0 Women's Stop Flies Hoodie \n",
|
||
"1000 999.0 Modern Utility Bag \n",
|
||
"\n",
|
||
" 2 \n",
|
||
"0 description \n",
|
||
"1 This ultracomfortable lace-to-toe Oxford boast... \n",
|
||
"2 Protect your floors from spills and splashing ... \n",
|
||
"3 She'll love the bright colors, ruffles and exc... \n",
|
||
"4 Whether you're going for a swim or heading out... \n",
|
||
"... ... \n",
|
||
"996 Crafted from premium denim that will last wash... \n",
|
||
"997 The ultimate sweater fleece - made from superi... \n",
|
||
"998 These comfortable and affordable splash paddli... \n",
|
||
"999 This great-looking hoodie uses No Fly Zone Tec... \n",
|
||
"1000 This US-made crossbody bag is built with the s... \n",
|
||
"\n",
|
||
"[1001 rows x 3 columns]"
|
||
]
|
||
},
|
||
"execution_count": 15,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"import pandas as pd\n",
|
||
"file = 'OutdoorClothingCatalog_1000.csv'\n",
|
||
"\n",
|
||
"data = pd.read_csv(file,header=None)\n",
|
||
"data"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"id": "efc6c592",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from langchain.indexes import VectorstoreIndexCreator"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 17,
|
||
"id": "5e90139b",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"ename": "",
|
||
"evalue": "",
|
||
"output_type": "error",
|
||
"traceback": [
|
||
"\u001b[1;31mCanceled future for execute_request message before replies were done"
|
||
]
|
||
},
|
||
{
|
||
"ename": "",
|
||
"evalue": "",
|
||
"output_type": "error",
|
||
"traceback": [
|
||
"\u001b[1;31m在当前单元格或上一个单元格中执行代码时 Kernel 崩溃。请查看单元格中的代码,以确定故障的可能原因。有关详细信息,请单击 <a href='https://aka.ms/vscodeJupyterKernelCrash'>此处</a>。有关更多详细信息,请查看 Jupyter <a href='command:jupyter.viewOutput'>log</a>。"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"index = VectorstoreIndexCreator(\n",
|
||
" vectorstore_cls=DocArrayInMemorySearch\n",
|
||
").from_loaders([loader])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"id": "8249a523",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"ename": "",
|
||
"evalue": "",
|
||
"output_type": "error",
|
||
"traceback": [
|
||
"\u001b[1;31mCanceled future for execute_request message before replies were done"
|
||
]
|
||
},
|
||
{
|
||
"ename": "",
|
||
"evalue": "",
|
||
"output_type": "error",
|
||
"traceback": [
|
||
"\u001b[1;31m在当前单元格或上一个单元格中执行代码时 Kernel 崩溃。请查看单元格中的代码,以确定故障的可能原因。有关详细信息,请单击 <a href='https://aka.ms/vscodeJupyterKernelCrash'>此处</a>。有关更多详细信息,请查看 Jupyter <a href='command:jupyter.viewOutput'>log</a>。"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"from docarray import DocumentArray\n",
|
||
"\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"id": "3b160609",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"query =\"Please list all your shirts with sun protection \\\n",
|
||
"in a table in markdown and summarize each one.\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"id": "cf61d864",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"ename": "NameError",
|
||
"evalue": "name 'index' is not defined",
|
||
"output_type": "error",
|
||
"traceback": [
|
||
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
|
||
"Cell \u001b[0;32mIn[2], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[43mindex\u001b[49m\u001b[38;5;241m.\u001b[39mquery(query)\n",
|
||
"\u001b[0;31mNameError\u001b[0m: name 'index' is not defined"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"response = index.query(query)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "0737f809",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"display(Markdown(response))"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "chatGPT",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.9.16"
|
||
},
|
||
"vscode": {
|
||
"interpreter": {
|
||
"hash": "4d8dc73ac51fd938ce7dec941fbf542c26232b3529b0c2a6ebc607bfa3d5aa69"
|
||
}
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|