{ "cells": [ { "cell_type": "markdown", "id": "51e47840", "metadata": {}, "source": [ "# 第四章 基于LangChain的文档问答\n", "本章内容主要利用langchain构建向量数据库,可以在文档上方或关于文档回答问题,因此,给定从PDF文件、网页或某些公司的内部文档收集中提取的文本,使用llm回答有关这些文档内容的问题" ] }, { "cell_type": "markdown", "id": "ef807f79", "metadata": {}, "source": [ "## 一、环境配置\n", "\n", "安装langchain,设置chatGPT的OPENAI_API_KEY\n", "* 安装langchain\n", "```\n", "pip install --upgrade langchain\n", "```\n", "* 安装docarray\n", "```\n", "pip install docarray\n", "```\n", "* 设置API-KEY环境变量\n", "```\n", "export OPENAI_API_KEY='api-key'\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "af3ffa97", "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "from dotenv import load_dotenv, find_dotenv\n", "_ = load_dotenv(find_dotenv()) # 读取系统中的环境变量" ] }, { "cell_type": "code", "execution_count": 8, "id": "49081091", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "load_dotenv(find_dotenv())\n" ] }, { "cell_type": "code", "execution_count": 12, "id": "3bcb095f", "metadata": {}, "outputs": [], "source": [ "import os\n", "os.environ[\"OPENAI_API_KEY\"] = \"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\"\n" ] }, { "cell_type": "code", "execution_count": 9, "id": "46595e8c", "metadata": {}, "outputs": [], "source": [ "#导入检索QA链,在文档上进行检索\n", "from langchain.chains import RetrievalQA\n", "from langchain.chat_models import ChatOpenAI\n", "from langchain.document_loaders import CSVLoader\n", "from langchain.vectorstores import DocArrayInMemorySearch\n", "from IPython.display import display, Markdown" ] }, { "cell_type": "markdown", "id": "e511efa5", "metadata": {}, "source": [ "## 使用 LangChain 完成一次问答" ] }, { "cell_type": "code", "execution_count": 13, "id": "3ab4b9d1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'\\n\\n人工智能是一种重要的现代科技,它可以大大改善人类生活,减轻人类负担,提升工作效率。它可以帮助人们提高生产力,更有效地管理组织,并且可以提供更为准确的数据,帮助人们更好地决策。另外,人工智能可以帮助科学家发现新的药物,改善医疗服务,以及发展新的环保技术。总之,人工智能是一项重要的科技,具有广泛的应用前景。'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from langchain.llms import OpenAI\n", "\n", "llm = OpenAI(model_name=\"text-davinci-003\",max_tokens=1024)\n", "llm(\"怎么评价人工智能\")" ] }, { "cell_type": "code", "execution_count": 14, "id": "884399f1", "metadata": {}, "outputs": [], "source": [ "file = 'OutdoorClothingCatalog_1000.csv'\n", "loader = CSVLoader(file_path=file)" ] }, { "cell_type": "code", "execution_count": 15, "id": "52ec965a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | 0 | \n", "1 | \n", "2 | \n", "
|---|---|---|---|
| 0 | \n", "NaN | \n", "name | \n", "description | \n", "
| 1 | \n", "0.0 | \n", "Women's Campside Oxfords | \n", "This ultracomfortable lace-to-toe Oxford boast... | \n", "
| 2 | \n", "1.0 | \n", "Recycled Waterhog Dog Mat, Chevron Weave | \n", "Protect your floors from spills and splashing ... | \n", "
| 3 | \n", "2.0 | \n", "Infant and Toddler Girls' Coastal Chill Swimsu... | \n", "She'll love the bright colors, ruffles and exc... | \n", "
| 4 | \n", "3.0 | \n", "Refresh Swimwear, V-Neck Tankini Contrasts | \n", "Whether you're going for a swim or heading out... | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "
| 996 | \n", "995.0 | \n", "Men's Classic Denim, Standard Fit | \n", "Crafted from premium denim that will last wash... | \n", "
| 997 | \n", "996.0 | \n", "CozyPrint Sweater Fleece Pullover | \n", "The ultimate sweater fleece - made from superi... | \n", "
| 998 | \n", "997.0 | \n", "Women's NRS Endurance Spray Paddling Pants | \n", "These comfortable and affordable splash paddli... | \n", "
| 999 | \n", "998.0 | \n", "Women's Stop Flies Hoodie | \n", "This great-looking hoodie uses No Fly Zone Tec... | \n", "
| 1000 | \n", "999.0 | \n", "Modern Utility Bag | \n", "This US-made crossbody bag is built with the s... | \n", "
1001 rows × 3 columns
\n", "