diff --git a/content/LangChain Chat with Data/1.简介 Introduction.md b/content/LangChain Chat with Data/1.简介 Introduction.md deleted file mode 100644 index 9ec6676..0000000 --- a/content/LangChain Chat with Data/1.简介 Introduction.md +++ /dev/null @@ -1,35 +0,0 @@ -# 第一章 简介 - -本课程由哈里森·蔡斯 (Harrison Chase,LangChain作者)与Deeplearning.ai合作开发,课程将介绍如何使用LangChain和自有数据进行对话。 - - -## 一、背景 -大语言模型(Large Language Model, LLM), 比如ChatGPT, 可以回答许多不同的问题。但是大语言模型的知识来源于其训练数据集,并没有用户的信息(比如用户的个人数据,公司的自有数据),也没有最新发生时事的信息(在大模型数据训练后发表的文章或者新闻)。因此大模型能给出的答案比较受限。 - -如果能够让大模型在训练数据集的基础上,利用我们自有数据中的信息来回答我们的问题,那便能够得到更有用的答案。 - - -## 二、 课程基本内容 - -在本课程中,我们学习如何使用LangChain和自有数据进行对话。 - -LangChain是用于构建大模型应用程序的开源框架,有Python和JavaScript两个不同版本的包。LangChain基于模块化组合,有许多单独的组件,可以一起使用或单独使用。LangChain的组件包括: - -- 提示(Prompts): 使模型执行操作的方式。 -- 模型(Models):大语言模型、对话模型,文本表示模型。目前包含多个模型的集成。 -- 索引(Indexes): 获取数据的方式,可以与模型结合使用。 -- 链式(Chains): 端到端功能实现。 -- 代理(Agents): 使用模型作为推理引擎 - -此外LangChain还拥有很多应用案例,帮助我们了解如何将这些模块化组件以链式方式组合,以形成更多端到端的应用程序。如果你想要了解关于LangChain的基础知识,可以学习使用 LangChain 开发基于 LLM 的应用程序课程(LangChain for LLM Application Development)。 - -在本课程中,我们将重点介绍LangChain常见的使用场景:使用LangChain和自有数据进行对话。我们首先会介绍如何使用LangChain文档加载器 (Document Loader)从不同数据源加载文档。然后,我们学习如何将这些文档切割为具有语意的段落。这步看起来简单,不同的处理可能会影响颇大。接下来,我们简要介绍语义搜索(Semantic search),以及信息检索的基础方法 - 对于的用户输入的问题,获取最相关的信息。该方法很简单,但是在某些情况下可能无法使用。我们将分析这些情况并给出解决方案。最后,我们介绍如何使用检索得到的文档,来让大语言模型(LLM)来回答关于文档的问题。 - - -## 三、致谢课程重要贡献者 - -最后特别感谢对本课程内容贡献者 -- Ankush Gola(LandChain) -- Lance Martin(LandChain) -- Geoff Ladwig(DeepLearning.AI) -- Diala Ezzedine(DeepLearning.AI) diff --git a/content/LangChain Chat with Data/2.文档加载 Document Loading.ipynb b/content/LangChain Chat with Data/2.文档加载 Document Loading.ipynb deleted file mode 100644 index e94972e..0000000 --- a/content/LangChain Chat with Data/2.文档加载 Document Loading.ipynb +++ /dev/null @@ -1,819 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "cc2eb3ad-8c1c-406a-b7aa-a3f61b754ac5", - "metadata": {}, - "source": [ - "# 第一章 文档加载\n", - "文本加载器(Document Loaders) 可以处理不同类型的数据类型。数据类型可以是结构化/非结构化" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "582125c3-2afb-4cca-b651-c1810a5e5c22", - "metadata": {}, - "outputs": [], - "source": [ - "!pip install -q langchain --upgrade" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "bb73a77f-e17c-45a2-b456-e3ad2bf0fb5c", - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import openai\n", - "import sys\n", - "sys.path.append('../..')\n", - "\n", - "from dotenv import load_dotenv, find_dotenv\n", - "_ = load_dotenv(find_dotenv()) \n", - "\n", - "openai.api_key = os.environ['OPENAI_API_KEY']" - ] - }, - { - "cell_type": "markdown", - "id": "63558db2-5279-4c1b-9bec-355ab04731e6", - "metadata": {}, - "source": [ - "## 一、PDF文档\n", - "\n", - "首先,我们来加载一个[PDF文档](https://see.stanford.edu/materials/aimlcs229/transcripts/MachineLearning-Lecture01.pdf)。该文档为吴恩达教授的2009年机器学习课程的字幕文件。因为这些字幕为自动生成,所以词句直接可能不太连贯和通畅。" - ] - }, - { - "cell_type": "markdown", - "id": "dd5fe85c-6aae-4739-9b47-68e791afc9ac", - "metadata": {}, - "source": [ - "### 1.1 安装相关包 " - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "c527f944-35dc-44a2-9cf9-9887cf315f3a", - "metadata": {}, - "outputs": [], - "source": [ - "!pip install -q pypdf" - ] - }, - { - "cell_type": "markdown", - "id": "8dcb2102-0414-4130-952b-3b6fa33b61bb", - "metadata": {}, - "source": [ - "### 1.2 加载PDF文档" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "52d9891f-a8cc-47c4-8c09-81794647a720", - "metadata": {}, - "outputs": [], - "source": [ - "from langchain.document_loaders import PyPDFLoader\n", - "\n", - "# 创建一个 PyPDFLoader Class 实例,输入为待加载的pdf文档路径\n", - "loader = PyPDFLoader(\"docs/cs229_lectures/MachineLearning-Lecture01.pdf\")\n", - "\n", - "# 调用 PyPDFLoader Class 的函数 load对pdf文件进行加载\n", - "pages = loader.load()" - ] - }, - { - "cell_type": "markdown", - "id": "68d40600-49ab-42a3-97d0-b9a2c4ab8139", - "metadata": {}, - "source": [ - "### 1.3 探索加载的数据" - ] - }, - { - "cell_type": "markdown", - "id": "feca9f1e-1596-49f2-a6d9-6eeaeffbd90b", - "metadata": {}, - "source": [ - "文档加载后储存在`pages`变量中:\n", - "- `page`的变量类型为`List`\n", - "- 打印 `pages` 的长度可以看到pdf一共包含多少页" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "9463b982-c71c-4241-b3a3-b040170eef2e", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], - "source": [ - "print(type(pages))" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "67a2b815-586f-43a5-96a4-cfe46001a766", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "22\n" - ] - } - ], - "source": [ - "print(len(pages))" - ] - }, - { - "cell_type": "markdown", - "id": "2cde6b9d-71c8-4851-a8f6-a3f0e76f6dab", - "metadata": {}, - "source": [ - "`page`中的每一元素为一个文档,变量类型为`langchain.schema.Document`, 文档变量类型包含两个属性\n", - "- `page_content` 包含该文档的内容。\n", - "- `meta_data` 为文档相关的描述性数据。" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "921827ae-3a5b-4f29-b015-a5dde3be1410", - "metadata": {}, - "outputs": [], - "source": [ - "page = pages[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "id": "aadaa840-0f30-4ae3-b06b-7fe8f468d146", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], - "source": [ - "print(type(page))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "id": "85777ce2-42c7-4e11-b1ba-06fd6a0d8502", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "MachineLearning-Lecture01 \n", - "Instructor (Andrew Ng): Okay. Good morning. Welcome to CS229, the machine \n", - "learning class. So what I wanna do today is ju st spend a little time going over the logistics \n", - "of the class, and then we'll start to talk a bit about machine learning. \n", - "By way of introduction, my name's Andrew Ng and I'll be instru ctor for this class. And so \n", - "I personally work in machine learning, and I' ve worked on it for about 15 years now, and \n", - "I actually think that machine learning i\n" - ] - } - ], - "source": [ - "print(page.page_content[0:500])" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "8a1f8acd-f8c7-46af-a29f-df172067deba", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{'source': 'docs/cs229_lectures/MachineLearning-Lecture01.pdf', 'page': 0}\n" - ] - } - ], - "source": [ - "print(page.metadata)" - ] - }, - { - "cell_type": "markdown", - "id": "1e9cead7-a967-4a8f-8d3d-0f94f2ff129e", - "metadata": {}, - "source": [ - "## 二、YouTube音频\n", - "\n", - "在第一部分的内容,我们学习了如何加载PDF文档。在这部分的内容,我们学习对于给定的 YouTube 视频链接\n", - "- 如何使用LongChain加载器将视频的音频下载到本地\n", - "- 然后使用OpenAIWhisperPaser解析器将音频转化为文本" - ] - }, - { - "cell_type": "markdown", - "id": "b4720268-ddab-4c18-9072-10aab8f0ac7c", - "metadata": {}, - "source": [ - "### 2.1 安装相关包 " - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "37dbeb50-d6c5-4db0-88da-1ef9d3e47417", - "metadata": {}, - "outputs": [], - "source": [ - "!pip -q install yt_dlp\n", - "!pip -q install pydub" - ] - }, - { - "cell_type": "markdown", - "id": "a243b258-eae3-46b0-803f-cd897b31cf78", - "metadata": {}, - "source": [ - "### 2.2 加载Youtube音频文档" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "f25593f9-a6d2-4137-94cb-881141ca99fd", - "metadata": {}, - "outputs": [], - "source": [ - "from langchain.document_loaders.generic import GenericLoader\n", - "from langchain.document_loaders.parsers import OpenAIWhisperParser\n", - "from langchain.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "5ca7b99a-ba4d-4989-aed6-be76acb405c0", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[youtube] Extracting URL: https://www.youtube.com/watch?v=jGwO_UgTS7I\n", - "[youtube] jGwO_UgTS7I: Downloading webpage\n", - "[youtube] jGwO_UgTS7I: Downloading ios player API JSON\n", - "[youtube] jGwO_UgTS7I: Downloading android player API JSON\n", - "[youtube] jGwO_UgTS7I: Downloading m3u8 information\n", - "[info] jGwO_UgTS7I: Downloading 1 format(s): 140\n", - "[download] docs/youtube//Stanford CS229: Machine Learning Course, Lecture 1 - Andrew Ng (Autumn 2018).m4a has already been downloaded\n", - "[download] 100% of 69.76MiB\n", - "[ExtractAudio] Not converting audio docs/youtube//Stanford CS229: Machine Learning Course, Lecture 1 - Andrew Ng (Autumn 2018).m4a; file is already in target format m4a\n", - "Transcribing part 1!\n", - "Transcribing part 2!\n", - "Transcribing part 3!\n", - "Transcribing part 4!\n" - ] - } - ], - "source": [ - "url=\"https://www.youtube.com/watch?v=jGwO_UgTS7I\"\n", - "save_dir=\"docs/youtube/\"\n", - "\n", - "# 创建一个 GenericLoader Class 实例\n", - "loader = GenericLoader(\n", - " #将链接url中的Youtube视频的音频下载下来,存在本地路径save_dir\n", - " YoutubeAudioLoader([url],save_dir), \n", - " \n", - " #使用OpenAIWhisperPaser解析器将音频转化为文本\n", - " OpenAIWhisperParser()\n", - ")\n", - "\n", - "# 调用 GenericLoader Class 的函数 load对视频的音频文件进行加载\n", - "docs = loader.load()" - ] - }, - { - "cell_type": "markdown", - "id": "ffb4db5d-39b5-4cd7-82d9-824ed71fc116", - "metadata": { - "tags": [] - }, - "source": [ - "### 2.3 探索加载的数据" - ] - }, - { - "cell_type": "markdown", - "id": "0fd91c34-ac19-4a09-8ca0-99262011d9ba", - "metadata": {}, - "source": [ - "文档加载后储存在`docs`变量中:\n", - "- `docs`的变量类型为`List`\n", - "- 打印 `docs` 的长度可以看到一共包含多少页" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "ddb89cee-32bd-4c5f-91f1-c46d1f0300da", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], - "source": [ - "print(type(docs))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "id": "2cea4ff3-8548-4158-9e55-a574de0fd29e", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "1\n" - ] - } - ], - "source": [ - "print(len(docs))" - ] - }, - { - "cell_type": "markdown", - "id": "24952655-128e-4a7b-b8c0-e93156acbe1b", - "metadata": {}, - "source": [ - "`docs`中的每一元素为一个文档,变量类型为`langchain.schema.document.Document`, 文档变量类型包含两个属性\n", - "- `page_content` 包含该文档的内容。\n", - "- `meta_data` 为文档相关的描述性数据。" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "id": "d89bf91d-d39b-4682-9c56-0cde449d6051", - "metadata": {}, - "outputs": [], - "source": [ - "doc = docs[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "id": "e6df253f-ad9e-42d5-b6e1-47b7c0d2d564", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], - "source": [ - "print(type(doc))" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "80e4b798-875e-4f0e-ba16-5277f8ec1f62", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Welcome to CS229 Machine Learning. Uh, some of you know that this is a class that's taught at Stanford for a long time. And this is often the class that, um, I most look forward to teaching each year because this is where we've helped, I think, several generations of Stanford students become experts in machine learning, got- built many of their products and services and startups that I'm sure, many of you or probably all of you are using, uh, uh, today. Um, so what I want to do today was spend s\n" - ] - } - ], - "source": [ - "print(doc.page_content[0:500])" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "7f363e33-6a4d-4b78-aa7d-1b8cf6b59567", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{'source': 'docs/youtube/Stanford CS229: Machine Learning Course, Lecture 1 - Andrew Ng (Autumn 2018).m4a', 'chunk': 0}\n" - ] - } - ], - "source": [ - "print(doc.metadata)" - ] - }, - { - "cell_type": "markdown", - "id": "5b7ddc7d-2d40-4811-8cb3-5e73344ebe24", - "metadata": {}, - "source": [ - "## 三、网页文档\n", - "\n", - "在第二部分,我们对于给定的 YouTube 视频链接 (URL),使用 LongChain 加载器将视频的音频下载到本地,然后使用 OpenAIWhisperPaser 解析器将音频转化为文本。\n", - "\n", - "本部分,对于给定网页文档链接(URLs),我们学习如何对其进行加载。这里我们对Github上的网页文档进行加载,该文档格式为markdown。" - ] - }, - { - "cell_type": "markdown", - "id": "b28abf4d-4907-47f6-b54d-6d322a5794e6", - "metadata": {}, - "source": [ - "### 3.1 加载网页文档" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "id": "1a68375f-44ae-4905-bf9c-1f01ec800481", - "metadata": {}, - "outputs": [], - "source": [ - "from langchain.document_loaders import WebBaseLoader\n", - "\n", - "\n", - "# 创建一个 WebBaseLoader Class 实例\n", - "url = \"https://github.com/basecamp/handbook/blob/master/37signals-is-you.md\"\n", - "header = {'User-Agent': 'python-requests/2.27.1', \n", - " 'Accept-Encoding': 'gzip, deflate, br', \n", - " 'Accept': '*/*',\n", - " 'Connection': 'keep-alive'}\n", - "loader = WebBaseLoader(web_path=url,header_template=header)\n", - "\n", - "# 调用 WebBaseLoader Class 的函数 load对文件进行加载\n", - "docs = loader.load()" - ] - }, - { - "cell_type": "markdown", - "id": "fc24f44a-01f5-49a3-9529-2f05c1053b2c", - "metadata": { - "tags": [] - }, - "source": [ - "### 3.2 探索加载的数据" - ] - }, - { - "cell_type": "markdown", - "id": "f2f108b9-713b-4b98-b44d-4dfc3dcbcde2", - "metadata": {}, - "source": [ - "文档加载后储存在`docs`变量中:\n", - "- `docs`的变量类型为`List`\n", - "- 打印 `docs` 的长度可以看到一共包含多少页" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "id": "8c8670fa-203b-4c35-9266-f976f50f0f5d", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], - "source": [ - "print(type(docs))" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "id": "0e85a526-55e7-4186-8697-d16a891bcabc", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "1\n" - ] - } - ], - "source": [ - "print(len(docs))" - ] - }, - { - "cell_type": "markdown", - "id": "b4397791-4d3c-4609-86be-3cda90a3f2fc", - "metadata": {}, - "source": [ - "`docs`中的每一元素为一个文档,变量类型为`langchain.schema.document.Document`, 文档变量类型包含两个属性\n", - "- `page_content` 包含该文档的内容。\n", - "- `meta_data` 为文档相关的描述性数据。" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "id": "26423c51-6503-478c-b17d-bbe57049a04c", - "metadata": {}, - "outputs": [], - "source": [ - "doc = docs[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "id": "a243638f-0c23-46b8-8854-13f1f1de6f0a", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], - "source": [ - "print(type(doc))" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "id": "9f237541-79b7-4ae4-9e0e-27a28af99b7a", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"payload\":{\"allShortcutsEnabled\":false,\"fileTree\":{\"\":{\"items\":[{\"name\":\"37signals-is-you.md\",\"path\":\"37signals-is-you.md\",\"contentType\":\"file\"},{\"name\":\"LICENSE.md\",\"path\":\"LICENSE.md\",\"contentType\":\"file\"},{\"name\":\"README.md\",\"path\":\"README.md\",\"contentType\":\"file\"},{\"name\":\"benefits-and-perks.md\",\"path\":\"benefits-and-perks.md\",\"contentType\":\"file\"},{\"name\":\"code-of-conduct.md\",\"path\":\"code-of-conduct.md\",\"contentType\":\"file\"},{\"name\":\"faq.md\",\"path\":\"faq.md\",\"contentType\":\"file\"},{\"name\":\"getting-started.md\",\"path\":\"getting-started.md\",\"contentType\":\"file\"},{\"name\":\"how-we-work.md\",\"path\":\"how-we-work.md\",\"contentType\":\"file\"},{\"name\":\"international-travel-guide.md\",\"path\":\"international-travel-guide.md\",\"contentType\":\"file\"},{\"name\":\"making-a-career.md\",\"path\":\"making-a-career.md\",\"contentType\":\"file\"},{\"name\":\"managing-work-devices.md\",\"path\":\"managing-work-devices.md\",\"contentType\":\"file\"},{\"name\":\"moonlighting.md\",\"path\":\"moonlighting.md\",\"contentType\":\"file\"},{\"name\":\"our-internal-systems.md\",\"path\":\"our-internal-systems.md\",\"contentType\":\"file\"},{\"name\":\"our-rituals.md\",\"path\":\"our-rituals.md\",\"contentType\":\"file\"},{\"name\":\"performance-plans.md\",\"path\":\"performance-plans.md\",\"contentType\":\"file\"},{\"name\":\"product-histories.md\",\"path\":\"product-histories.md\",\"contentType\":\"file\"},{\"name\":\"stateFMLA.md\",\"path\":\"stateFMLA.md\",\"contentType\":\"file\"},{\"name\":\"titles-for-data.md\",\"path\":\"titles-for-data.md\",\"contentType\":\"file\"},{\"name\":\"titles-for-designers.md\",\"path\":\"titles-for-designers.md\",\"contentType\":\"file\"},{\"name\":\"titles-for-ops.md\",\"path\":\"titles-for-ops.md\",\"contentType\":\"file\"},{\"name\":\"titles-for-programmers.md\",\"path\":\"titles-for-programmers.md\",\"contentType\":\"file\"},{\"name\":\"titles-for-support.md\",\"path\":\"titles-for-support.md\",\"contentType\":\"file\"},{\"name\":\"vocabulary.md\",\"path\":\"vocabulary.md\",\"contentType\":\"file\"},{\"name\":\"what-influenced-us.md\",\"path\":\"what-influenced-us.md\",\"contentType\":\"file\"},{\"name\":\"what-we-stand-for.md\",\"path\":\"what-we-stand-for.md\",\"contentType\":\"file\"},{\"name\":\"where-we-work.md\",\"path\":\"where-we-work.md\",\"contentType\":\"file\"}],\"totalCount\":26}},\"fileTreeProcessingTime\":3.936437,\"foldersToFetch\":[],\"reducedMotionEnabled\":null,\"repo\":{\"id\":90042196,\"defaultBranch\":\"master\",\"name\":\"handbook\",\"ownerLogin\":\"basecamp\",\"currentUserCanPush\":false,\"isFork\":false,\"isEmpty\":false,\"createdAt\":\"2017-05-02T14:23:23.000Z\",\"ownerAvatar\":\"https://avatars.githubusercontent.com/u/13131?v=4\",\"public\":true,\"private\":false,\"isOrgOwned\":true},\"refInfo\":{\"name\":\"master\",\"listCacheKey\":\"v0:1682672280.0\",\"canEdit\":false,\"refType\":\"branch\",\"currentOid\":\"1577f27c63aa8df61996924824afb8df6f1bf20e\"},\"path\":\"37signals-is-you.md\",\"currentUser\":null,\"blob\":{\"rawBlob\":null,\"colorizedLines\":null,\"stylingDirectives\":null,\"csv\":null,\"csvError\":null,\"dependabotInfo\":{\"showConfigurationBanner\":false,\"configFilePath\":null,\"networkDependabotPath\":\"/basecamp/handbook/network/updates\",\"dismissConfigurationNoticePath\":\"/settings/dismiss-notice/dependabot_configuration_notice\",\"configurationNoticeDismissed\":null,\"repoAlertsPath\":\"/basecamp/handbook/security/dependabot\",\"repoSecurityAndAnalysisPath\":\"/basecamp/handbook/settings/security_analysis\",\"repoOwnerIsOrg\":true,\"currentUserCanAdminRepo\":false},\"displayName\":\"37signals-is-you.md\",\"displayUrl\":\"https://github.com/basecamp/handbook/blob/master/37signals-is-you.md?raw=true\",\"headerInfo\":{\"blobSize\":\"2.19 KB\",\"deleteInfo\":{\"deletePath\":null,\"deleteTooltip\":\"You must be signed in to make or propose changes\"},\"editInfo\":{\"editTooltip\":\"You must be signed in to make or propose changes\"},\"ghDesktopPath\":\"https://desktop.github.com\",\"gitLfsPath\":null,\"onBranch\":true,\"shortPath\":\"e5ca0f0\",\"siteNavLoginPath\":\"/login?return_to=https%3A%2F%2Fgithub.com%2Fbasecamp%2Fhandbook%2Fblob%2Fmaster%2F37signals-is-you.md\",\"isCSV\":false,\"isRichtext\":true,\"toc\":[{\"level\":1,\"text\":\"37signals Is You\",\"anchor\":\"37signals-is-you\",\"htmlText\":\"37signals Is You\"}],\"lineInfo\":{\"truncatedLoc\":\"11\",\"truncatedSloc\":\"6\"},\"mode\":\"file\"},\"image\":false,\"isCodeownersFile\":null,\"isValidLegacyIssueTemplate\":false,\"issueTemplateHelpUrl\":\"https://docs.github.com/articles/about-issue-and-pull-request-templates\",\"issueTemplate\":null,\"discussionTemplate\":null,\"language\":\"Markdown\",\"large\":false,\"loggedIn\":false,\"newDiscussionPath\":\"/basecamp/handbook/discussions/new\",\"newIssuePath\":\"/basecamp/handbook/issues/new\",\"planSupportInfo\":{\"repoIsFork\":null,\"repoOwnedByCurrentUser\":null,\"requestFullPath\":\"/basecamp/handbook/blob/master/37signals-is-you.md\",\"showFreeOrgGatedFeatureMessage\":null,\"showPlanSupportBanner\":null,\"upgradeDataAttributes\":null,\"upgradePath\":null},\"publishBannersInfo\":{\"dismissActionNoticePath\":\"/settings/dismiss-notice/publish_action_from_dockerfile\",\"dismissStackNoticePath\":\"/settings/dismiss-notice/publish_stack_from_file\",\"releasePath\":\"/basecamp/handbook/releases/new?marketplace=true\",\"showPublishActionBanner\":false,\"showPublishStackBanner\":false},\"renderImageOrRaw\":false,\"richText\":\"37signals Is You\\nEveryone working at 37signals represents 37signals. When a customer gets a response from Merissa on support, Merissa is 37signals. When a customer reads a tweet by Eron that our systems are down, Eron is 37signals. In those situations, all the other stuff we do to cultivate our best image is secondary. What’s right in front of someone in a time of need is what they’ll remember.\\nThat’s what we mean when we say marketing is everyone’s responsibility, and that it pays to spend the time to recognize that. This means avoiding the bullshit of outage language and bending our policies, not just lending your ears. It means taking the time to get the writing right and consider how you’d feel if you were on the other side of the interaction.\\nThe vast majority of our customers come from word of mouth and much of that word comes from people in our audience. This is an audience we’ve been educating and entertaining for 20 years and counting, and your voice is part of us now, whether you like it or not! Tell us and our audience what you have to say!\\nThis goes for tools and techniques as much as it goes for prose. 37signals not only tries to out-teach the competition, but also out-share and out-collaborate. We’re prolific open source contributors through Ruby on Rails, Trix, Turbolinks, Stimulus, and many other projects. Extracting the common infrastructure that others could use as well is satisfying, important work, and we should continue to do that.\\nIt’s also worth mentioning that joining 37signals can be all-consuming. We’ve seen it happen. You dig 37signals, so you feel pressure to contribute, maybe overwhelmingly so. The people who work here are some of the best and brightest in our industry, so the self-imposed burden to be exceptional is real. But here’s the thing: stop it. Settle in. We’re glad you love this job because we all do too, but at the end of the day it’s a job. Do your best work, collaborate with your team, write, read, learn, and then turn off your computer and play with your dog. We’ll all be better for it.\\n\",\"renderedFileInfo\":null,\"tabSize\":8,\"topBannersInfo\":{\"overridingGlobalFundingFile\":false,\"globalPreferredFundingPath\":null,\"repoOwner\":\"basecamp\",\"repoName\":\"handbook\",\"showInvalidCitationWarning\":false,\"citationHelpUrl\":\"https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-on-github/about-citation-files\",\"showDependabotConfigurationBanner\":false,\"actionsOnboardingTip\":null},\"truncated\":false,\"viewable\":true,\"workflowRedirectUrl\":null,\"symbols\":{\"timedOut\":false,\"notAnalyzed\":true,\"symbols\":[]}},\"csrf_tokens\":{\"/basecamp/handbook/branches\":{\"post\":\"o3HTNEDyuKtINffBkguVz-P3KUwBN04ZM_vvyoNKymcy66lDUtXVvEi7EvsbgFoz2d3qgU_earsuIftbbtKlcg\"}}},\"title\":\"handbook/37signals-is-you.md at master · basecamp/handbook\",\"locale\":\"en\"}\n" - ] - } - ], - "source": [ - "print(doc.page_content)" - ] - }, - { - "cell_type": "markdown", - "id": "26538237-1fc2-4915-944a-1f68f3ae3759", - "metadata": {}, - "source": [ - " " - ] - }, - { - "cell_type": "markdown", - "id": "52025103-205e-4137-a116-89f37fcfece1", - "metadata": {}, - "source": [ - "可以看到上面的文档内容包含许多冗余的信息。通常来讲,我们需要进行对这种数据进行进一步处理(Post Processing)。" - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "id": "a7c5281f-aeed-4ee7-849b-bbf9fd3e35c7", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "37signals Is You\n", - "Everyone working at 37signals represents 37signals. When a customer gets a response from Merissa on support, Merissa is 37signals. When a customer reads a tweet by Eron that our systems are down, Eron is 37signals. In those situations, all the other stuff we do to cultivate our best image is secondary. What’s right in front of someone in a time of need is what they’ll remember.\n", - "That’s what we mean when we say marketing is everyone’s responsibility, and that it pays to spend the time to recognize that. This means avoiding the bullshit of outage language and bending our policies, not just lending your ears. It means taking the time to get the writing right and consider how you’d feel if you were on the other side of the interaction.\n", - "The vast majority of our customers come from word of mouth and much of that word comes from people in our audience. This is an audience we’ve been educating and entertaining for 20 years and counting, and your voice is part of us now, whether you like it or not! Tell us and our audience what you have to say!\n", - "This goes for tools and techniques as much as it goes for prose. 37signals not only tries to out-teach the competition, but also out-share and out-collaborate. We’re prolific open source contributors through Ruby on Rails, Trix, Turbolinks, Stimulus, and many other projects. Extracting the common infrastructure that others could use as well is satisfying, important work, and we should continue to do that.\n", - "It’s also worth mentioning that joining 37signals can be all-consuming. We’ve seen it happen. You dig 37signals, so you feel pressure to contribute, maybe overwhelmingly so. The people who work here are some of the best and brightest in our industry, so the self-imposed burden to be exceptional is real. But here’s the thing: stop it. Settle in. We’re glad you love this job because we all do too, but at the end of the day it’s a job. Do your best work, collaborate with your team, write, read, learn, and then turn off your computer and play with your dog. We’ll all be better for it.\n", - "\n" - ] - } - ], - "source": [ - "import json\n", - "convert_to_json = json.loads(doc.page_content)\n", - "extracted_markdow = convert_to_json['payload']['blob']['richText']\n", - "print(extracted_markdow)" - ] - }, - { - "cell_type": "markdown", - "id": "d35e99b4-dd67-4940-bbeb-b2a59bf8cd3d", - "metadata": {}, - "source": [ - "## 四、Notion文档\n", - "\n", - "- 点击[Notion示例文档](https://yolospace.notion.site/Blendle-s-Employee-Handbook-e31bff7da17346ee99f531087d8b133f)右上方复制按钮(Duplicate),复制文档到你的Notion空间\n", - "- 点击右上方`⋯` 按钮,选择导出为Mardown&CSV。导出的文件将为zip文件夹\n", - "- 解压并保存mardown文档到本地路径`docs/Notion_DB/`" - ] - }, - { - "cell_type": "markdown", - "id": "f8cf2778-288c-4964-81e7-0ed881e31652", - "metadata": {}, - "source": [ - "### 4.1 加载Notion Markdown文档" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "id": "081f5ee4-6b5d-45bf-a7e6-079abc560729", - "metadata": {}, - "outputs": [], - "source": [ - "from langchain.document_loaders import NotionDirectoryLoader\n", - "loader = NotionDirectoryLoader(\"docs/Notion_DB\")\n", - "docs = loader.load()" - ] - }, - { - "cell_type": "markdown", - "id": "88d5d094-a490-4c64-ab93-5c5cec0853aa", - "metadata": { - "tags": [] - }, - "source": [ - "### 4.2 探索加载的数据" - ] - }, - { - "cell_type": "markdown", - "id": "a3ffe318-d22a-4687-b613-f679ad9ad616", - "metadata": {}, - "source": [ - "文档加载后储存在`docs`变量中:\n", - "- `docs`的变量类型为`List`\n", - "- 打印 `docs` 的长度可以看到一共包含多少页" - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "id": "106323dd-0d24-40d4-8302-ed2a35d13347", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], - "source": [ - "print(type(docs))" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "id": "fa3ee0fe-7daa-4193-9ee1-4ee89cb3b843", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "1\n" - ] - } - ], - "source": [ - "print(len(docs))" - ] - }, - { - "cell_type": "markdown", - "id": "b3cd3feb-d2c1-46b6-8b16-d4b2101c5632", - "metadata": {}, - "source": [ - "`docs`中的每一元素为一个文档,变量类型为`langchain.schema.document.Document`, 文档变量类型包含两个属性\n", - "- `page_content` 包含该文档的内容。\n", - "- `meta_data` 为文档相关的描述性数据。" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "id": "5b220df6-fd0b-4b62-9da4-29962926fe87", - "metadata": {}, - "outputs": [], - "source": [ - "doc = docs[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "id": "98ef8fbf-e820-41de-919d-c462a910f4f1", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], - "source": [ - "print(type(doc))" - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "id": "933da0f3-4d5f-4363-9142-050ecf226c1f", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "# Blendle's Employee Handbook\n", - "\n", - "This is a living document with everything we've learned working with people while running a startup. And, of course, we continue to learn. Therefore it's a document that will continue to change. \n", - "\n", - "**Everything related to working at Blendle and the people of Blendle, made public.**\n", - "\n", - "These are the lessons from three years of working with the people of Blendle. It contains everything from [how our leaders lead](https://www.notion.so/ecfb7e647136468a9a0a32f1771a8f52?pv\n" - ] - } - ], - "source": [ - "print(doc.page_content[0:500])" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.12" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/content/LangChain Chat with Data/docs/Notion_DB/Blendle's Employee Handbook d331da39bd0341ed8d5ee2942fecf17a.md b/content/LangChain Chat with Data/docs/Notion_DB/Blendle's Employee Handbook d331da39bd0341ed8d5ee2942fecf17a.md deleted file mode 100644 index a39a3c2..0000000 --- a/content/LangChain Chat with Data/docs/Notion_DB/Blendle's Employee Handbook d331da39bd0341ed8d5ee2942fecf17a.md +++ /dev/null @@ -1,119 +0,0 @@ -# Blendle's Employee Handbook - -This is a living document with everything we've learned working with people while running a startup. And, of course, we continue to learn. Therefore it's a document that will continue to change. - -**Everything related to working at Blendle and the people of Blendle, made public.** - -These are the lessons from three years of working with the people of Blendle. It contains everything from [how our leaders lead](https://www.notion.so/ecfb7e647136468a9a0a32f1771a8f52?pvs=21) to [how we increase salaries](https://www.notion.so/Salary-Review-e11b6161c6d34f5c9568bb3e83ed96b6?pvs=21), from [how we hire](https://www.notion.so/Hiring-451bbcfe8d9b49438c0633326bb7af0a?pvs=21) and [fire](https://www.notion.so/Firing-5567687a2000496b8412e53cd58eed9d?pvs=21) to [how we think people should give each other feedback](https://www.notion.so/Our-Feedback-Process-eb64f1de796b4350aeab3bc068e3801f?pvs=21) — and much more. - -We've made this document public because we want to learn from you. We're very much interested in your feedback (including weeding out typo's and Dunglish ;)). Email us at hr@blendle.com. If you're starting your own company or if you're curious as to how we do things at Blendle, we hope that our employee handbook inspires you. - -If you want to work at Blendle you can check our [job ads here](https://blendle.homerun.co/). If you want to be kept in the loop about Blendle, you can sign up for [our behind the scenes newsletter](https://blendle.homerun.co/yes-keep-me-posted/tr/apply?token=8092d4128c306003d97dd3821bad06f2). - -## Blendle general - -*Information gap closing in 3... 2... 1...* - ---- - -[To Do/Read in your first week](https://www.notion.so/To-Do-Read-in-your-first-week-9ef69b65b63a4ec7b8394ec703856c32?pvs=21) - -[History](https://www.notion.so/History-29b2b8fd36dd48db80dc682119aaefef?pvs=21) - -[DNA & culture](https://www.notion.so/DNA-culture-7723839e26124ed2ba3adafe8de0a080?pvs=21) - -[General & practical ](https://www.notion.so/General-practical-87085be150824011b79891eb30ca9530?pvs=21) - -## People operations - -*You can tell a company's DNA by looking at how they deal with the practical stuff.* - ---- - -[Office](https://www.notion.so/Office-b014d3d2c62240308865d11bba495322?pvs=21) - -[Time off: holidays and national holidays](https://www.notion.so/Time-off-holidays-and-national-holidays-bd94b931280a45a6b8eb3f29c2c4b42a?pvs=21) - -[Calling in sick/better](https://www.notion.so/Calling-in-sick-better-b82ec184fd544a8e9aa926ac37bb1ab1?pvs=21) - -[Perks and benefits](https://www.notion.so/Perks-and-benefits-820593b38ebc44209fe35ae553100de6?pvs=21) - -[Travel costs and reimbursements](https://www.notion.so/Travel-costs-and-reimbursements-e76623c6e0664863a769aeed028954e2?pvs=21) - -[Parenthood](https://www.notion.so/Parenthood-a6d62b65a9d84489a75586a3c542b3f1?pvs=21) - -## People topics - -*Themes we care about.* - ---- - -[Blendle Social Code](https://www.notion.so/Blendle-Social-Code-685a79c8df154ee09f35b35cc147af6b?pvs=21) - -[Diversity and inclusion](https://www.notion.so/Diversity-and-inclusion-d7f9d3e6b6ef4a1ab8f2c0a7b3ea3eec?pvs=21) - -[#letstalkaboutstress](https://www.notion.so/letstalkaboutstress-d46961f6ac98432ab07b5d5afc52c2d0?pvs=21) - -## Feedback and development - -*The number 1 reason for people to work at Blendle is growth and learning from smart people.* - ---- - -[Your 1st month ](https://www.notion.so/Your-1st-month-85909edc55a34f349bbed522c5245a65?pvs=21) - -[Goals](https://www.notion.so/Goals-122bff69bd634c519cd3c6dc01dbc282?pvs=21) - -[Feedback cycle](https://www.notion.so/Feedback-cycle-5f32358dba874c39be5ca5aa464c310e?pvs=21) - -[The Matrix™ (job profiles)](https://www.notion.so/The-Matrix-job-profiles-da91736ff35545458559eceb0075ed66?pvs=21) - -[Blendle library](https://www.notion.so/Blendle-library-f34188e536234c9a8976c9d4602b0be3?pvs=21) - -## **Hiring** - -*The coolest and most impactful thing when done right.* - ---- - -[Rating systems](https://www.notion.so/Rating-systems-2ba332377459427194acc798e5f8869c?pvs=21) - -[Getting people in (branding&sourcing)](https://www.notion.so/Getting-people-in-branding-sourcing-a3277fef078041a881f56556e24f0d8a?pvs=21) - -[Highly Skilled Migrants and relocation](https://www.notion.so/Highly-Skilled-Migrants-and-relocation-84a6576fb27d4a8fae2f73e4eae57d21?pvs=21) - -## How to lead at Blendle - -*Here are some tips and tools to help you become a great leader.* - ---- - -[How to lead at Blendle ](https://www.notion.so/How-to-lead-at-Blendle-f8c6b1d989d841bb87510fc2ab1ba970?pvs=21) - -[Your check-list](https://www.notion.so/Your-check-list-aaca857a846848688da3a37f28682c15?pvs=21) - -[Leading Feedback ](https://www.notion.so/Leading-Feedback-a1970c9f7b70443d881ca92d4e98be25?pvs=21) - -[Salary talks](https://www.notion.so/Salary-talks-35681ab732c048a9bbdf8c50babe64b5?pvs=21) - -[Hiring ](https://www.notion.so/Hiring-0bdf54d3d25f4c59bfdf3712a5104bbc?pvs=21) - -[Firing](https://www.notion.so/Firing-e0da1de62b304751bbd95a681908c7ad?pvs=21) - -[Party and study budget](https://www.notion.so/Party-and-study-budget-4e31001531c24d0fa447bbfcd6ccfd3f?pvs=21) - -[Holidays](https://www.notion.so/Holidays-1529506bb8884f0aa11cc799ced11ed0?pvs=21) - -[Sickness absence](https://www.notion.so/Sickness-absence-79a495f601df4004801475ea79b3d198?pvs=21) - -[Personal User Guide](https://www.notion.so/Personal-User-Guide-be2238ccb597412e8a517d40cda7e7d5?pvs=21) - -[Soft shizzle](https://www.notion.so/Soft-shizzle-41255d79fbe84492b153121cd7a2e3e8?pvs=21) - -## About this document - ---- - -*Lessons from three years of HR* - -[About this document and the author](https://www.notion.so/About-this-document-and-the-author-ee1faab1bcae4456b8c62043a8a194cd?pvs=21) \ No newline at end of file diff --git a/content/LangChain Chat with Data/docs/cs229_lectures/MachineLearning-Lecture01.pdf b/content/LangChain Chat with Data/docs/cs229_lectures/MachineLearning-Lecture01.pdf deleted file mode 100644 index 34da5de..0000000 Binary files a/content/LangChain Chat with Data/docs/cs229_lectures/MachineLearning-Lecture01.pdf and /dev/null differ diff --git a/content/Prompt Engineering/1. 简介.md b/content/Prompt Engineering for Developer/1. 简介.md similarity index 100% rename from content/Prompt Engineering/1. 简介.md rename to content/Prompt Engineering for Developer/1. 简介.md diff --git a/content/Prompt Engineering/2. 提示原则 Guidelines.ipynb b/content/Prompt Engineering for Developer/2. 提示原则 Guidelines.ipynb similarity index 100% rename from content/Prompt Engineering/2. 提示原则 Guidelines.ipynb rename to content/Prompt Engineering for Developer/2. 提示原则 Guidelines.ipynb diff --git a/content/Prompt Engineering/3. 迭代优化 Iterative.ipynb b/content/Prompt Engineering for Developer/3. 迭代优化 Iterative.ipynb similarity index 100% rename from content/Prompt Engineering/3. 迭代优化 Iterative.ipynb rename to content/Prompt Engineering for Developer/3. 迭代优化 Iterative.ipynb diff --git a/content/Prompt Engineering/4. 文本概括 Summarizing.ipynb b/content/Prompt Engineering for Developer/4. 文本概括 Summarizing.ipynb similarity index 100% rename from content/Prompt Engineering/4. 文本概括 Summarizing.ipynb rename to content/Prompt Engineering for Developer/4. 文本概括 Summarizing.ipynb diff --git a/content/Prompt Engineering/5. 推断 Inferring.ipynb b/content/Prompt Engineering for Developer/5. 推断 Inferring.ipynb similarity index 100% rename from content/Prompt Engineering/5. 推断 Inferring.ipynb rename to content/Prompt Engineering for Developer/5. 推断 Inferring.ipynb diff --git a/content/Prompt Engineering/6. 文本转换 Transforming.ipynb b/content/Prompt Engineering for Developer/6. 文本转换 Transforming.ipynb similarity index 82% rename from content/Prompt Engineering/6. 文本转换 Transforming.ipynb rename to content/Prompt Engineering for Developer/6. 文本转换 Transforming.ipynb index 9d88f7d..7c78657 100644 --- a/content/Prompt Engineering/6. 文本转换 Transforming.ipynb +++ b/content/Prompt Engineering for Developer/6. 文本转换 Transforming.ipynb @@ -65,21 +65,21 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 5, "id": "acf125be", "metadata": {}, "outputs": [], "source": [ "import openai\n", "# 导入第三方库\n", + "import os\n", "\n", - "openai.api_key = \"sk-...\"\n", - "# 设置 API_KEY, 请替换成您自己的 API_KEY\n" + "openai.api_key = \"sk-...\"\n" ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 6, "id": "ac57ad72", "metadata": {}, "outputs": [], @@ -114,10 +114,18 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "id": "5b521646", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Hola, me gustaría ordenar una licuadora.\n" + ] + } + ], "source": [ "prompt = f\"\"\"\n", "Translate the following English text to Spanish: \\ \n", @@ -129,7 +137,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 8, "id": "8a5bee0c", "metadata": { "scrolled": true @@ -169,10 +177,18 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 9, "id": "769b6e2e", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "This language is French.\n" + ] + } + ], "source": [ "prompt = f\"\"\"\n", "Tell me which language this is: \n", @@ -184,7 +200,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 10, "id": "c2c66002", "metadata": {}, "outputs": [ @@ -192,7 +208,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "这是法语。\n" + "这段文本是法语。\n" ] } ], @@ -222,10 +238,20 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 11, "id": "a53bc53b", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "French: ```Je veux commander un ballon de basket```\n", + "Spanish: ```Quiero ordenar una pelota de baloncesto```\n", + "English: ```I want to order a basketball```\n" + ] + } + ], "source": [ "prompt = f\"\"\"\n", "Translate the following text to French and Spanish\n", @@ -238,7 +264,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 12, "id": "b0c4fa41", "metadata": {}, "outputs": [ @@ -279,10 +305,19 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 13, "id": "a4770dcc", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Formal: ¿Le gustaría ordenar una almohada?\n", + "Informal: ¿Te gustaría ordenar una almohada?\n" + ] + } + ], "source": [ "prompt = f\"\"\"\n", "Translate the following text to Spanish in both the \\\n", @@ -295,7 +330,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 14, "id": "2c52ca54", "metadata": {}, "outputs": [ @@ -303,8 +338,8 @@ "name": "stdout", "output_type": "stream", "text": [ - "正式语气:请问您需要订购枕头吗?\n", - "非正式语气:你要不要订一个枕头?\n" + "正式语气:您是否需要订购一个枕头?\n", + "非正式语气:你想要订购一个枕头吗?\n" ] } ], @@ -343,7 +378,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 15, "id": "21f3af91", "metadata": {}, "outputs": [], @@ -359,10 +394,39 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 16, "id": "5cb69e31", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Original message (The language is French.): La performance du système est plus lente que d'habitude.\n", + "The performance of the system is slower than usual.\n", + "\n", + "시스템의 성능이 평소보다 느립니다. \n", + "\n", + "Original message (The language is Spanish.): Mi monitor tiene píxeles que no se iluminan.\n", + "English: \"My monitor has pixels that do not light up.\"\n", + "\n", + "Korean: \"내 모니터에는 밝아지지 않는 픽셀이 있습니다.\" \n", + "\n", + "Original message (The language is Italian.): Il mio mouse non funziona\n", + "English: \"My mouse is not working.\"\n", + "Korean: \"내 마우스가 작동하지 않습니다.\" \n", + "\n", + "Original message (The language is Polish.): Mój klawisz Ctrl jest zepsuty\n", + "English: \"My Ctrl key is broken\"\n", + "Korean: \"내 Ctrl 키가 고장 났어요\" \n", + "\n", + "Original message (The language is Chinese.): 我的屏幕在闪烁\n", + "English: My screen is flickering.\n", + "Korean: 내 화면이 깜박거립니다. \n", + "\n" + ] + } + ], "source": [ "for issue in user_messages:\n", " prompt = f\"Tell me what language this is: ```{issue}```\"\n", @@ -379,7 +443,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 18, "id": "6a884190", "metadata": {}, "outputs": [ @@ -395,27 +459,28 @@ "原始消息 (西班牙语): Mi monitor tiene píxeles que no se iluminan.\n", "\n", "中文翻译:我的显示器有一些像素点不亮。\n", - "英文翻译:My monitor has pixels that don't light up. \n", + "英文翻译:My monitor has pixels that do not light up. \n", "=========================================\n", "原始消息 (意大利语): Il mio mouse non funziona\n", "\n", - "中文翻译:我的鼠标不工作了。\n", - "英文翻译:My mouse is not working. \n", + "中文翻译:我的鼠标不工作\n", + "英文翻译:My mouse is not working \n", "=========================================\n", - "原始消息 (波兰语): Mój klawisz Ctrl jest zepsuty\n", + "原始消息 (这段文本是波兰语。): Mój klawisz Ctrl jest zepsuty\n", "\n", "中文翻译:我的Ctrl键坏了\n", - "英文翻译:My Ctrl key is broken. \n", + "英文翻译:My Ctrl key is broken \n", "=========================================\n", "原始消息 (中文): 我的屏幕在闪烁\n", "\n", - "中文翻译:我的屏幕在闪烁。\n", + "中文翻译:我的屏幕在闪烁\n", "英文翻译:My screen is flickering. \n", "=========================================\n" ] } ], "source": [ + "import time\n", "for issue in user_messages:\n", " time.sleep(20)\n", " prompt = f\"告诉我以下文本是什么语种,直接输出语种,如法语,无需输出标点符号: ```{issue}```\"\n", @@ -459,10 +524,27 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 19, "id": "d62ac977", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Dear Sir/Madam,\n", + "\n", + "I hope this letter finds you well. My name is Joe, and I am writing to bring your attention to a specification document regarding a standing lamp. \n", + "\n", + "I kindly request that you take a moment to review the attached spec, as it contains important details about the standing lamp in question. \n", + "\n", + "Thank you for your time and consideration. I look forward to hearing from you soon.\n", + "\n", + "Sincerely,\n", + "Joe\n" + ] + } + ], "source": [ "prompt = f\"\"\"\n", "Translate the following from slang to a business letter: \n", @@ -474,7 +556,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 20, "id": "84ce3099", "metadata": {}, "outputs": [ @@ -482,17 +564,17 @@ "name": "stdout", "output_type": "stream", "text": [ - "尊敬的XXX(收件人姓名):\n", + "尊敬的先生/女士,\n", "\n", - "您好!我是XXX(发件人姓名),在此向您咨询一个问题。上次我们交流时,您提到我们部门需要采购显示器,但我忘记了您所需的尺寸是多少英寸。希望您能够回复我,以便我们能够及时采购所需的设备。\n", + "我是小羊,我希望能够向您确认一下我们部门需要采购的显示器尺寸是多少寸。上次我们交谈时,您提到了这个问题。\n", "\n", - "谢谢您的帮助!\n", + "期待您的回复。\n", "\n", - "此致\n", + "谢谢!\n", "\n", - "敬礼\n", + "此致,\n", "\n", - "XXX(发件人姓名)\n" + "小羊\n" ] } ], @@ -531,7 +613,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 21, "id": "fad3f358", "metadata": {}, "outputs": [], @@ -545,10 +627,63 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 22, "id": "7e904f70", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "

Restaurant Employees

\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
NameEmail
Shyamshyamjaiswal@gmail.com
Bobbob32@gmail.com
Jaijai87@gmail.com
\n", + "\n", + "\n", + "\n" + ] + } + ], "source": [ "prompt = f\"\"\"\n", "Translate the following python dictionary from JSON to an HTML \\\n", @@ -678,7 +813,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 23, "id": "b7d04bc0", "metadata": {}, "outputs": [], @@ -696,10 +831,24 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 24, "id": "d48f8d3f", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The girl with the black and white puppies has a ball.\n", + "No errors found.\n", + "It's going to be a long day. Does the car need its oil changed?\n", + "There goes my freedom. They're going to bring their suitcases.\n", + "You're going to need your notebook.\n", + "That medicine affects my ability to sleep. Have you heard of the butterfly effect?\n", + "This phrase is to check chatGPT for spelling ability.\n" + ] + } + ], "source": [ "for t in text:\n", " prompt = f\"\"\"Proofread and correct the following text\n", @@ -713,7 +862,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 25, "id": "1ef55b7b", "metadata": {}, "outputs": [ @@ -722,10 +871,10 @@ "output_type": "stream", "text": [ "0 The girl with the black and white puppies has a ball.\n", - "1 未发现错误。\n", + "1 Yolanda has her notebook.\n", "2 It's going to be a long day. Does the car need its oil changed?\n", - "3 Their goes my freedom. They're going to bring their suitcases.\n", - "4 输出:You're going to need your notebook.\n", + "3 Their goes my freedom. There going to bring their suitcases.\n", + "4 You're going to need your notebook.\n", "5 That medicine affects my ability to sleep. Have you heard of the butterfly effect?\n", "6 This phrase is to check chatGPT for spelling ability.\n" ] @@ -762,7 +911,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 26, "id": "6696b06a", "metadata": {}, "outputs": [], @@ -781,10 +930,18 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 27, "id": "8f3b2341", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Got this for my daughter for her birthday because she keeps taking mine from my room. Yes, adults also like pandas too. She takes it everywhere with her, and it's super soft and cute. However, one of the ears is a bit lower than the other, and I don't think that was designed to be asymmetrical. Additionally, it's a bit small for what I paid for it. I believe there might be other options that are bigger for the same price. On the positive side, it arrived a day earlier than expected, so I got to play with it myself before I gave it to my daughter.\n" + ] + } + ], "source": [ "prompt = f\"proofread and correct this review: ```{text}```\"\n", "response = get_completion(prompt)\n", @@ -1009,7 +1166,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.13" + "version": "3.10.11" }, "latex_envs": { "LaTeX_envs_menu_present": true, diff --git a/content/Prompt Engineering/7. 文本扩展 Expanding.ipynb b/content/Prompt Engineering for Developer/7. 文本扩展 Expanding.ipynb similarity index 100% rename from content/Prompt Engineering/7. 文本扩展 Expanding.ipynb rename to content/Prompt Engineering for Developer/7. 文本扩展 Expanding.ipynb diff --git a/content/Prompt Engineering/8. 聊天机器人 Chatbot.ipynb b/content/Prompt Engineering for Developer/8. 聊天机器人 Chatbot.ipynb similarity index 100% rename from content/Prompt Engineering/8. 聊天机器人 Chatbot.ipynb rename to content/Prompt Engineering for Developer/8. 聊天机器人 Chatbot.ipynb diff --git a/content/Prompt Engineering/9. 总结.md b/content/Prompt Engineering for Developer/9. 总结.md similarity index 100% rename from content/Prompt Engineering/9. 总结.md rename to content/Prompt Engineering for Developer/9. 总结.md diff --git a/content/Prompt Engineering/readme.md b/content/Prompt Engineering for Developer/readme.md similarity index 100% rename from content/Prompt Engineering/readme.md rename to content/Prompt Engineering for Developer/readme.md diff --git a/content/Prompt Engineering/附1-使用ChatGLM进行学习.ipynb b/content/Prompt Engineering for Developer/附1-使用ChatGLM进行学习.ipynb similarity index 100% rename from content/Prompt Engineering/附1-使用ChatGLM进行学习.ipynb rename to content/Prompt Engineering for Developer/附1-使用ChatGLM进行学习.ipynb diff --git a/content/readme.md b/content/readme.md index 96e88dd..e71ed2b 100644 --- a/content/readme.md +++ b/content/readme.md @@ -1,3 +1,7 @@ # 面向开发者的 LLM 入门课程 -LLM 正在逐步改变人们的生活,而对于开发者,如何基于 LLM 提供的 API 快速、便捷地开发一些具备更强能力、集成LLM 的应用,来便捷地实现一些更新颖、更实用的能力,是一个急需学习的重要能力。由吴恩达老师与 OpenAI 合作推出的大模型系列教程,包括 、等教程,其中,《ChatGPT Prompt Engineering for Developers》教程面向入门 LLM 的开发者,深入浅出地介绍了对于开发者,如何构造 Prompt 并基于 OpenAI 提供的 API 实现包括总结、推断、转换等多种常用功能,是入门 LLM 开发的经典教程;《Building Systems with the ChatGPT API》、《LangChain for LLM Application Development》教程面向想要基于 LLM 开发应用程序的开发者,简洁有效而又系统全面地介绍了如何基于 LangChain 与 ChatGPT API 开发具备实用功能的应用程序,适用于开发者学习以开启基于 LLM 实际搭建应用程序之路。因此,我们将该系列课程翻译为中文,并复现其范例代码,也为其中一个视频增加了中文字幕,支持国内中文学习者直接使用,以帮助中文学习者更好地学习 LLM 开发;我们也同时实现了效果大致相当的中文 Prompt,支持学习者感受中文语境下 LLM 的学习使用。未来,我们也将加入更多 Prompt 高级技巧,以丰富本课程内容,帮助开发者掌握更多、更巧妙的 Prompt 技能。 +LLM 正在逐步改变人们的生活,而对于开发者,如何基于 LLM 提供的 API 快速、便捷地开发一些具备更强能力、集成LLM 的应用,来便捷地实现一些更新颖、更实用的能力,是一个急需学习的重要能力。 + +由吴恩达老师与 OpenAI 合作推出的大模型系列教程,讲解了如何入门基于 OpenAI API 以及 LangChain 大模型开发。其中,《Prompt Engineering for Developers》教程面向入门 LLM 的开发者,深入浅出地介绍了对于开发者,如何构造 Prompt 并基于 OpenAI 提供的 API 实现包括总结、推断、转换等多种常用功能,是入门 LLM 开发的经典教程;《Building Systems with the ChatGPT API》、《LangChain for LLM Application Development》教程面向想要基于 LLM 开发应用程序的开发者,简洁有效而又系统全面地介绍了如何基于 ChatGPT API 与 LangChain 开发具备实用功能的应用程序,适用于开发者学习以开启基于 LLM 实际搭建应用程序之路。 + +因此,我们将该系列课程翻译为中文,并复现其范例代码,也为其中一个视频增加了中文字幕,支持国内中文学习者直接使用,以帮助中文学习者更好地学习 LLM 开发;我们也同时实现了效果大致相当的中文 Prompt,支持学习者感受中文语境下 LLM 的学习使用。未来,我们也将加入更多 Prompt 高级技巧,以丰富本课程内容,帮助开发者掌握更多、更巧妙的 Prompt 技能。