{ "cells": [ { "cell_type": "markdown", "id": "51e47840", "metadata": {}, "source": [ "# 第四章 基于LangChain的文档问答\n", "本章内容主要利用langchain构建向量数据库,可以在文档上方或关于文档回答问题,因此,给定从PDF文件、网页或某些公司的内部文档收集中提取的文本,使用llm回答有关这些文档内容的问题" ] }, { "cell_type": "markdown", "id": "ef807f79", "metadata": {}, "source": [ "## 一、环境配置\n", "\n", "安装langchain,设置chatGPT的OPENAI_API_KEY\n", "* 安装langchain\n", "```\n", "pip install --upgrade langchain\n", "```\n", "* 安装docarray\n", "```\n", "pip install docarray\n", "```\n", "* 设置API-KEY环境变量\n", "```\n", "export OPENAI_API_KEY='api-key'\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "af3ffa97", "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "from dotenv import load_dotenv, find_dotenv\n", "_ = load_dotenv(find_dotenv()) # 读取系统中的环境变量" ] }, { "cell_type": "code", "execution_count": 8, "id": "49081091", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "load_dotenv(find_dotenv())\n" ] }, { "cell_type": "code", "execution_count": 12, "id": "3bcb095f", "metadata": {}, "outputs": [], "source": [ "import os\n", "os.environ[\"OPENAI_API_KEY\"] = \"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\"\n" ] }, { "cell_type": "code", "execution_count": 9, "id": "46595e8c", "metadata": {}, "outputs": [], "source": [ "#导入检索QA链,在文档上进行检索\n", "from langchain.chains import RetrievalQA\n", "from langchain.chat_models import ChatOpenAI\n", "from langchain.document_loaders import CSVLoader\n", "from langchain.vectorstores import DocArrayInMemorySearch\n", "from IPython.display import display, Markdown" ] }, { "cell_type": "markdown", "id": "e511efa5", "metadata": {}, "source": [ "## 使用 LangChain 完成一次问答" ] }, { "cell_type": "code", "execution_count": 13, "id": "3ab4b9d1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'\\n\\n人工智能是一种重要的现代科技,它可以大大改善人类生活,减轻人类负担,提升工作效率。它可以帮助人们提高生产力,更有效地管理组织,并且可以提供更为准确的数据,帮助人们更好地决策。另外,人工智能可以帮助科学家发现新的药物,改善医疗服务,以及发展新的环保技术。总之,人工智能是一项重要的科技,具有广泛的应用前景。'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from langchain.llms import OpenAI\n", "\n", "llm = OpenAI(model_name=\"text-davinci-003\",max_tokens=1024)\n", "llm(\"怎么评价人工智能\")" ] }, { "cell_type": "code", "execution_count": 14, "id": "884399f1", "metadata": {}, "outputs": [], "source": [ "file = 'OutdoorClothingCatalog_1000.csv'\n", "loader = CSVLoader(file_path=file)" ] }, { "cell_type": "code", "execution_count": 15, "id": "52ec965a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012
0NaNnamedescription
10.0Women's Campside OxfordsThis ultracomfortable lace-to-toe Oxford boast...
21.0Recycled Waterhog Dog Mat, Chevron WeaveProtect your floors from spills and splashing ...
32.0Infant and Toddler Girls' Coastal Chill Swimsu...She'll love the bright colors, ruffles and exc...
43.0Refresh Swimwear, V-Neck Tankini ContrastsWhether you're going for a swim or heading out...
............
996995.0Men's Classic Denim, Standard FitCrafted from premium denim that will last wash...
997996.0CozyPrint Sweater Fleece PulloverThe ultimate sweater fleece - made from superi...
998997.0Women's NRS Endurance Spray Paddling PantsThese comfortable and affordable splash paddli...
999998.0Women's Stop Flies HoodieThis great-looking hoodie uses No Fly Zone Tec...
1000999.0Modern Utility BagThis US-made crossbody bag is built with the s...
\n", "

1001 rows × 3 columns

\n", "
" ], "text/plain": [ " 0 1 \\\n", "0 NaN name \n", "1 0.0 Women's Campside Oxfords \n", "2 1.0 Recycled Waterhog Dog Mat, Chevron Weave \n", "3 2.0 Infant and Toddler Girls' Coastal Chill Swimsu... \n", "4 3.0 Refresh Swimwear, V-Neck Tankini Contrasts \n", "... ... ... \n", "996 995.0 Men's Classic Denim, Standard Fit \n", "997 996.0 CozyPrint Sweater Fleece Pullover \n", "998 997.0 Women's NRS Endurance Spray Paddling Pants \n", "999 998.0 Women's Stop Flies Hoodie \n", "1000 999.0 Modern Utility Bag \n", "\n", " 2 \n", "0 description \n", "1 This ultracomfortable lace-to-toe Oxford boast... \n", "2 Protect your floors from spills and splashing ... \n", "3 She'll love the bright colors, ruffles and exc... \n", "4 Whether you're going for a swim or heading out... \n", "... ... \n", "996 Crafted from premium denim that will last wash... \n", "997 The ultimate sweater fleece - made from superi... \n", "998 These comfortable and affordable splash paddli... \n", "999 This great-looking hoodie uses No Fly Zone Tec... \n", "1000 This US-made crossbody bag is built with the s... \n", "\n", "[1001 rows x 3 columns]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "file = 'OutdoorClothingCatalog_1000.csv'\n", "\n", "data = pd.read_csv(file,header=None)\n", "data" ] }, { "cell_type": "code", "execution_count": 16, "id": "efc6c592", "metadata": {}, "outputs": [], "source": [ "from langchain.indexes import VectorstoreIndexCreator" ] }, { "cell_type": "code", "execution_count": 17, "id": "5e90139b", "metadata": {}, "outputs": [ { "ename": "", "evalue": "", "output_type": "error", "traceback": [ "\u001b[1;31mCanceled future for execute_request message before replies were done" ] }, { "ename": "", "evalue": "", "output_type": "error", "traceback": [ "\u001b[1;31m在当前单元格或上一个单元格中执行代码时 Kernel 崩溃。请查看单元格中的代码,以确定故障的可能原因。有关详细信息,请单击 此处。有关更多详细信息,请查看 Jupyter log。" ] } ], "source": [ "index = VectorstoreIndexCreator(\n", " vectorstore_cls=DocArrayInMemorySearch\n", ").from_loaders([loader])" ] }, { "cell_type": "code", "execution_count": 1, "id": "8249a523", "metadata": {}, "outputs": [ { "ename": "", "evalue": "", "output_type": "error", "traceback": [ "\u001b[1;31mCanceled future for execute_request message before replies were done" ] }, { "ename": "", "evalue": "", "output_type": "error", "traceback": [ "\u001b[1;31m在当前单元格或上一个单元格中执行代码时 Kernel 崩溃。请查看单元格中的代码,以确定故障的可能原因。有关详细信息,请单击 此处。有关更多详细信息,请查看 Jupyter log。" ] } ], "source": [ "from docarray import DocumentArray\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "3b160609", "metadata": {}, "outputs": [], "source": [ "query =\"Please list all your shirts with sun protection \\\n", "in a table in markdown and summarize each one.\"" ] }, { "cell_type": "code", "execution_count": 2, "id": "cf61d864", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'index' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[2], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[43mindex\u001b[49m\u001b[38;5;241m.\u001b[39mquery(query)\n", "\u001b[0;31mNameError\u001b[0m: name 'index' is not defined" ] } ], "source": [ "response = index.query(query)" ] }, { "cell_type": "code", "execution_count": null, "id": "0737f809", "metadata": {}, "outputs": [], "source": [ "display(Markdown(response))" ] } ], "metadata": { "kernelspec": { "display_name": "chatGPT", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" }, "vscode": { "interpreter": { "hash": "4d8dc73ac51fd938ce7dec941fbf542c26232b3529b0c2a6ebc607bfa3d5aa69" } } }, "nbformat": 4, "nbformat_minor": 5 }