Files
prompt-engineering-for-deve…/content/LangChain for LLM Application Development/Untitled.ipynb
gouxiaopan1 8642c12014 配套代码
2023-06-02 20:48:00 +08:00

425 lines
13 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "51e47840",
"metadata": {},
"source": [
"# 第四章 基于LangChain的文档问答\n",
"本章内容主要利用langchain构建向量数据库可以在文档上方或关于文档回答问题因此给定从PDF文件、网页或某些公司的内部文档收集中提取的文本使用llm回答有关这些文档内容的问题"
]
},
{
"cell_type": "markdown",
"id": "ef807f79",
"metadata": {},
"source": [
"## 一、环境配置\n",
"\n",
"安装langchain设置chatGPT的OPENAI_API_KEY\n",
"* 安装langchain\n",
"```\n",
"pip install --upgrade langchain\n",
"```\n",
"* 安装docarray\n",
"```\n",
"pip install docarray\n",
"```\n",
"* 设置API-KEY环境变量\n",
"```\n",
"export OPENAI_API_KEY='api-key'\n",
"\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "af3ffa97",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from dotenv import load_dotenv, find_dotenv\n",
"_ = load_dotenv(find_dotenv()) # 读取系统中的环境变量"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "49081091",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"load_dotenv(find_dotenv())\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "3bcb095f",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"os.environ[\"OPENAI_API_KEY\"] = \"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\"\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "46595e8c",
"metadata": {},
"outputs": [],
"source": [
"#导入检索QA链在文档上进行检索\n",
"from langchain.chains import RetrievalQA\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.document_loaders import CSVLoader\n",
"from langchain.vectorstores import DocArrayInMemorySearch\n",
"from IPython.display import display, Markdown"
]
},
{
"cell_type": "markdown",
"id": "e511efa5",
"metadata": {},
"source": [
"## 使用 LangChain 完成一次问答"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "3ab4b9d1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'\\n\\n人工智能是一种重要的现代科技它可以大大改善人类生活减轻人类负担提升工作效率。它可以帮助人们提高生产力更有效地管理组织并且可以提供更为准确的数据帮助人们更好地决策。另外人工智能可以帮助科学家发现新的药物改善医疗服务以及发展新的环保技术。总之人工智能是一项重要的科技具有广泛的应用前景。'"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.llms import OpenAI\n",
"\n",
"llm = OpenAI(model_name=\"text-davinci-003\",max_tokens=1024)\n",
"llm(\"怎么评价人工智能\")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "884399f1",
"metadata": {},
"outputs": [],
"source": [
"file = 'OutdoorClothingCatalog_1000.csv'\n",
"loader = CSVLoader(file_path=file)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "52ec965a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>NaN</td>\n",
" <td>name</td>\n",
" <td>description</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.0</td>\n",
" <td>Women's Campside Oxfords</td>\n",
" <td>This ultracomfortable lace-to-toe Oxford boast...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1.0</td>\n",
" <td>Recycled Waterhog Dog Mat, Chevron Weave</td>\n",
" <td>Protect your floors from spills and splashing ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2.0</td>\n",
" <td>Infant and Toddler Girls' Coastal Chill Swimsu...</td>\n",
" <td>She'll love the bright colors, ruffles and exc...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>3.0</td>\n",
" <td>Refresh Swimwear, V-Neck Tankini Contrasts</td>\n",
" <td>Whether you're going for a swim or heading out...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>996</th>\n",
" <td>995.0</td>\n",
" <td>Men's Classic Denim, Standard Fit</td>\n",
" <td>Crafted from premium denim that will last wash...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>997</th>\n",
" <td>996.0</td>\n",
" <td>CozyPrint Sweater Fleece Pullover</td>\n",
" <td>The ultimate sweater fleece - made from superi...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>998</th>\n",
" <td>997.0</td>\n",
" <td>Women's NRS Endurance Spray Paddling Pants</td>\n",
" <td>These comfortable and affordable splash paddli...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>999</th>\n",
" <td>998.0</td>\n",
" <td>Women's Stop Flies Hoodie</td>\n",
" <td>This great-looking hoodie uses No Fly Zone Tec...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1000</th>\n",
" <td>999.0</td>\n",
" <td>Modern Utility Bag</td>\n",
" <td>This US-made crossbody bag is built with the s...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1001 rows × 3 columns</p>\n",
"</div>"
],
"text/plain": [
" 0 1 \\\n",
"0 NaN name \n",
"1 0.0 Women's Campside Oxfords \n",
"2 1.0 Recycled Waterhog Dog Mat, Chevron Weave \n",
"3 2.0 Infant and Toddler Girls' Coastal Chill Swimsu... \n",
"4 3.0 Refresh Swimwear, V-Neck Tankini Contrasts \n",
"... ... ... \n",
"996 995.0 Men's Classic Denim, Standard Fit \n",
"997 996.0 CozyPrint Sweater Fleece Pullover \n",
"998 997.0 Women's NRS Endurance Spray Paddling Pants \n",
"999 998.0 Women's Stop Flies Hoodie \n",
"1000 999.0 Modern Utility Bag \n",
"\n",
" 2 \n",
"0 description \n",
"1 This ultracomfortable lace-to-toe Oxford boast... \n",
"2 Protect your floors from spills and splashing ... \n",
"3 She'll love the bright colors, ruffles and exc... \n",
"4 Whether you're going for a swim or heading out... \n",
"... ... \n",
"996 Crafted from premium denim that will last wash... \n",
"997 The ultimate sweater fleece - made from superi... \n",
"998 These comfortable and affordable splash paddli... \n",
"999 This great-looking hoodie uses No Fly Zone Tec... \n",
"1000 This US-made crossbody bag is built with the s... \n",
"\n",
"[1001 rows x 3 columns]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"file = 'OutdoorClothingCatalog_1000.csv'\n",
"\n",
"data = pd.read_csv(file,header=None)\n",
"data"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "efc6c592",
"metadata": {},
"outputs": [],
"source": [
"from langchain.indexes import VectorstoreIndexCreator"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "5e90139b",
"metadata": {},
"outputs": [
{
"ename": "",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[1;31mCanceled future for execute_request message before replies were done"
]
},
{
"ename": "",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[1;31m在当前单元格或上一个单元格中执行代码时 Kernel 崩溃。请查看单元格中的代码,以确定故障的可能原因。有关详细信息,请单击 <a href='https://aka.ms/vscodeJupyterKernelCrash'>此处</a>。有关更多详细信息,请查看 Jupyter <a href='command:jupyter.viewOutput'>log</a>。"
]
}
],
"source": [
"index = VectorstoreIndexCreator(\n",
" vectorstore_cls=DocArrayInMemorySearch\n",
").from_loaders([loader])"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "8249a523",
"metadata": {},
"outputs": [
{
"ename": "",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[1;31mCanceled future for execute_request message before replies were done"
]
},
{
"ename": "",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[1;31m在当前单元格或上一个单元格中执行代码时 Kernel 崩溃。请查看单元格中的代码,以确定故障的可能原因。有关详细信息,请单击 <a href='https://aka.ms/vscodeJupyterKernelCrash'>此处</a>。有关更多详细信息,请查看 Jupyter <a href='command:jupyter.viewOutput'>log</a>。"
]
}
],
"source": [
"from docarray import DocumentArray\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "3b160609",
"metadata": {},
"outputs": [],
"source": [
"query =\"Please list all your shirts with sun protection \\\n",
"in a table in markdown and summarize each one.\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "cf61d864",
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'index' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[2], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[43mindex\u001b[49m\u001b[38;5;241m.\u001b[39mquery(query)\n",
"\u001b[0;31mNameError\u001b[0m: name 'index' is not defined"
]
}
],
"source": [
"response = index.query(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0737f809",
"metadata": {},
"outputs": [],
"source": [
"display(Markdown(response))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "chatGPT",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
},
"vscode": {
"interpreter": {
"hash": "4d8dc73ac51fd938ce7dec941fbf542c26232b3529b0c2a6ebc607bfa3d5aa69"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}