Delete 润色.gif

Delete 公式.gif
Update README.md
2023-04-14 15:53:01 +08:00 · 2023-04-14 15:52:53 +08:00 · 2023-04-14 15:52:01 +08:00 · 2023-04-14 15:41:44 +08:00 · 2023-04-14 15:38:06 +08:00 · 2023-04-14 15:34:45 +08:00
20 changed files with 619 additions and 189 deletions
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@ -7,11 +7,17 @@ assignees: ''
 ---
-**Describe the bug 简述**
+- **(1) Describe the bug 简述**
 **Screen Shot 截图**
-**Terminal Traceback 终端traceback（如果有）**
+- **(2) Screen Shot 截图**
 - **(3) Terminal Traceback 终端traceback（如有）**
 - **(4) Material to Help Reproduce Bugs 帮助我们复现的测试材料样本（如有）**
 Before submitting an issue 提交issue之前：
--- a/50
+++ b/50
@ -1,50 +0,0 @@
 # How to build | 如何构建: docker build -t gpt-academic --network=host  -f Dockerfile+ChatGLM .
 # How to run | 如何运行 (1) 直接运行: docker run --rm -it --net=host --gpus=all gpt-academic
 # How to run | 如何运行 (2) 我想运行之前进容器做一些调整: docker run --rm -it --net=host --gpus=all gpt-academic bash
 # 从NVIDIA源，从而支持显卡运损（检查宿主的nvidia-smi中的cuda版本必须>=11.3）
 FROM nvidia/cuda:11.3.1-runtime-ubuntu20.04
 ARG useProxyNetwork=''
 RUN apt-get update
 RUN apt-get install -y curl proxychains curl 
 RUN apt-get install -y git python python3 python-dev python3-dev --fix-missing
 # 配置代理网络（构建Docker镜像时使用）
 # # comment out below if you do not need proxy network | 如果不需要翻墙 - 从此行向下删除
 RUN $useProxyNetwork curl cip.cc
 RUN sed -i '$ d' /etc/proxychains.conf
 RUN sed -i '$ d' /etc/proxychains.conf
 RUN echo "socks5 127.0.0.1 10880" >> /etc/proxychains.conf
 ARG useProxyNetwork=proxychains
 # # comment out above if you do not need proxy network | 如果不需要翻墙 - 从此行向上删除
 # use python3 as the system default python
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8
 # 下载分支
 WORKDIR /gpt
 RUN $useProxyNetwork git clone https://github.com/binary-husky/chatgpt_academic.git -b v3.0
 WORKDIR /gpt/chatgpt_academic
 RUN $useProxyNetwork python3 -m pip install -r requirements.txt
 RUN $useProxyNetwork python3 -m pip install -r request_llm/requirements_chatglm.txt
 RUN $useProxyNetwork python3 -m pip install torch --extra-index-url https://download.pytorch.org/whl/cu113
 # 预热CHATGLM参数（非必要 可选步骤）
 RUN echo ' \n\
 from transformers import AutoModel, AutoTokenizer \n\
 chatglm_tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True) \n\
 chatglm_model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).float() ' >> warm_up_chatglm.py
 RUN python3 -u warm_up_chatglm.py
 RUN $useProxyNetwork git pull
 # 为chatgpt-academic配置代理和API-KEY （非必要 可选步骤）
 RUN echo ' \n\
 API_KEY = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \n\
 USE_PROXY = True \n\
 LLM_MODEL = "chatglm" \n\
 LOCAL_MODEL_DEVICE = "cuda" \n\
 proxies = { "http": "socks5h://localhost:10880", "https": "socks5h://localhost:10880", } ' >> config_private.py
 # 启动
 CMD ["python3", "-u", "main.py"]
--- a/README.md
+++ b/README.md
@ -1,8 +1,8 @@
 # ChatGPT 学术优化
-**如果喜欢这个项目，请给它一个Star；如果你发明了更好用的快捷键或函数插件，欢迎发issue或者pull requests（dev分支）**
+**如果喜欢这个项目，请给它一个Star；如果你发明了更好用的快捷键或函数插件，欢迎发issue或者pull requests**
-If you like this project, please give it a Star. If you've come up with more useful academic shortcuts or functional plugins, feel free to open an issue or pull request （to `dev` branch）.
+If you like this project, please give it a Star. If you've come up with more useful academic shortcuts or functional plugins, feel free to open an issue or pull request. We also have a [README in English](img/README_EN.md) translated by this project itself.
 > **Note**
 >
@ -10,8 +10,6 @@ If you like this project, please give it a Star. If you've come up with more use
 >
 > 2.本项目中每个文件的功能都在自译解[`self_analysis.md`](https://github.com/binary-husky/chatgpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)详细说明。随着版本的迭代，您也可以随时自行点击相关函数插件，调用GPT重新生成项目的自我解析报告。常见问题汇总在[`wiki`](https://github.com/binary-husky/chatgpt_academic/wiki/%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98)当中。
 > 
 > 3.如果您不太习惯部分中文命名的函数、注释或者界面，您可以随时点击相关函数插件，调用ChatGPT一键生成纯英文的项目源代码。
 >
 <div align="center">
@ -25,48 +23,39 @@ If you like this project, please give it a Star. If you've come up with more use
 [配置代理服务器](https://www.bilibili.com/video/BV1rc411W7Dr) | 支持配置代理服务器
 模块化设计 | 支持自定义高阶的实验性功能与[函数插件]，插件支持[热更新](https://github.com/binary-husky/chatgpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)
 [自我程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] [一键读懂](https://github.com/binary-husky/chatgpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)本项目的源代码
-[程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] 一键可以剖析其他Python/C/C++/Java/Golang/Lua/Rect项目树
+[程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] 一键可以剖析其他Python/C/C++/Java/Lua/...项目树
 读论文 | [函数插件] 一键解读latex论文全文并生成摘要
 Latex全文翻译、润色 | [函数插件] 一键翻译或润色latex论文
 批量注释生成 | [函数插件] 一键批量生成函数注释
 chat分析报告生成 | [函数插件] 运行后自动生成总结汇报
 [arxiv小助手](https://www.bilibili.com/video/BV1LM4y1279X) | [函数插件] 输入arxiv文章url即可一键翻译摘要+下载PDF
 [PDF论文全文翻译功能](https://www.bilibili.com/video/BV1KT411x7Wn) | [函数插件] PDF论文提取题目&摘要+翻译全文（多线程）
-[谷歌学术统合小助手](https://www.bilibili.com/video/BV19L411U7ia) (Version>=2.45) | [函数插件] 给定任意谷歌学术搜索页面URL，让gpt帮你选择有趣的文章
+[谷歌学术统合小助手](https://www.bilibili.com/video/BV19L411U7ia) | [函数插件] 给定任意谷歌学术搜索页面URL，让gpt帮你选择有趣的文章
-公式显示 | 可以同时显示公式的tex形式和渲染形式
+公式/图片/表格显示 | 可以同时显示公式的tex形式和渲染形式，支持公式、代码高亮
 图片显示 | 可以在markdown中显示图片
 多线程函数插件支持 | 支持多线调用chatgpt，一键处理海量文本或程序
 支持GPT输出的markdown表格 | 可以输出支持GPT的markdown表格
 启动暗色gradio[主题](https://github.com/binary-husky/chatgpt_academic/issues/173) | 在浏览器url后面添加```/?__dark-theme=true```可以切换dark主题
-huggingface免科学上网[在线体验](https://huggingface.co/spaces/qingxu98/gpt-academic) | 登陆huggingface后复制[此空间](https://huggingface.co/spaces/qingxu98/gpt-academic)
+[多LLM模型](https://www.bilibili.com/video/BV1EM411K7VH/)支持（[v3.0分支](https://github.com/binary-husky/chatgpt_academic/tree/v3.0)） | 同时被ChatGPT和[清华ChatGLM](https://github.com/THUDM/ChatGLM-6B)伺候的感觉一定会很不错吧？
 [多LLM模型](https://www.bilibili.com/video/BV1EM411K7VH/)混合支持（[v3.0分支](https://github.com/binary-husky/chatgpt_academic/tree/v3.0)测试中） | 同时被ChatGPT和[清华ChatGLM](https://github.com/THUDM/ChatGLM-6B)伺候的感觉一定会很不错吧？
 兼容[TGUI](https://github.com/oobabooga/text-generation-webui)接入更多样的语言模型 | 接入opt-1.3b, galactica-1.3b等模型（[v3.0分支](https://github.com/binary-husky/chatgpt_academic/tree/v3.0)测试中）
 huggingface免科学上网[在线体验](https://huggingface.co/spaces/qingxu98/gpt-academic) | 登陆huggingface后复制[此空间](https://huggingface.co/spaces/qingxu98/gpt-academic)
 …… | ……
 </div>
-<!-- - 新界面（左：master主分支, 右：dev开发前沿） -->
+
 - 新界面（修改config.py中的LAYOUT选项即可实现“左右布局”和“上下布局”的切换）
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/230361456-61078362-a966-4eb5-b49e-3c62ef18b860.gif" width="700" >
 </div>
 - 所有按钮都通过读取functional.py动态生成，可随意加自定义功能，解放粘贴板
 <div align="center">
-<img src="img/公式.gif" width="700" >
+<img src="https://user-images.githubusercontent.com/96192199/231975334-b4788e91-4887-412f-8b43-2b9c5f41d248.gif" width="700" >
 </div>
 - 润色/纠错
 <div align="center">
-<img src="img/润色.gif" width="700" >
+<img src="https://user-images.githubusercontent.com/96192199/231980294-f374bdcb-3309-4560-b424-38ef39f04ebd.gif" width="700" >
 </div>
 - 支持GPT输出的markdown表格
 <div align="center">
 <img src="img/demo2.jpg" width="500" >
 </div>
 - 如果输出包含公式，会同时以tex形式和渲染形式显示，方便复制和阅读
@ -74,15 +63,12 @@ huggingface免科学上网[在线体验](https://huggingface.co/spaces/qingxu98/
 <img src="https://user-images.githubusercontent.com/96192199/230598842-1d7fcddd-815d-40ee-af60-baf488a199df.png" width="700" >
 </div>
 - 懒得看项目代码？整个工程直接给chatgpt炫嘴里
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/226935232-6b6a73ce-8900-4aee-93f9-733c7e6fef53.png" width="700" >
 </div>
 - 多种大语言模型混合调用（[v3.0分支](https://github.com/binary-husky/chatgpt_academic/tree/v3.0)测试中）
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/231222778-34776885-a7f0-4f2c-b5f4-7cc2ef3ecb58.png" width="700" >
 </div>
@ -170,13 +156,13 @@ input区域 输入 ./crazy_functions/test_project/python/dqn ， 然后点击 "[
 ```
 ## 其他部署方式
 - 远程云服务器部署
 请访问[部署wiki-2](https://github.com/binary-husky/chatgpt_academic/wiki/%E4%BA%91%E6%9C%8D%E5%8A%A1%E5%99%A8%E8%BF%9C%E7%A8%8B%E9%83%A8%E7%BD%B2%E6%8C%87%E5%8D%97)
 - 使用WSL2（Windows Subsystem for Linux 子系统）
 请访问[部署wiki-1](https://github.com/binary-husky/chatgpt_academic/wiki/%E4%BD%BF%E7%94%A8WSL2%EF%BC%88Windows-Subsystem-for-Linux-%E5%AD%90%E7%B3%BB%E7%BB%9F%EF%BC%89%E9%83%A8%E7%BD%B2)
 - nginx远程部署
 请访问[部署wiki-2](https://github.com/binary-husky/chatgpt_academic/wiki/%E8%BF%9C%E7%A8%8B%E9%83%A8%E7%BD%B2%E7%9A%84%E6%8C%87%E5%AF%BC)
 ## 自定义新的便捷按钮（学术快捷键自定义）
 打开functional.py，添加条目如下，然后重启程序即可。（如果按钮已经添加成功并可见，那么前缀、后缀都支持热修改，无需重启程序即可生效。）
 例如
@ -269,13 +255,9 @@ python check_proxy.py
 ## Todo 与 版本规划:
- version 3 (Todo): 
+- version 3.0 (Todo): 优化对chatglm和其他小型llm的支持
- - 支持gpt4和其他更多llm
+- version 2.6: 重构了插件结构，提高了交互性，加入更多插件
- version 2.4+ (Todo): 
+- version 2.5: 自更新，解决总结大工程源代码时文本过长、token溢出的问题
 - - 总结大工程源代码时文本过长、token溢出的问题
 - - 实现项目打包部署
 - - 函数插件参数接口优化
 - - 自更新
 - version 2.4: (1)新增PDF全文翻译功能; (2)新增输入区切换位置的功能; (3)新增垂直布局选项; (4)多线程函数插件优化。
 - version 2.3: 增强多线程交互性
 - version 2.2: 函数插件支持热重载
--- a/config.py
+++ b/config.py
@ -19,6 +19,11 @@ if USE_PROXY:
 else:
    proxies = None
 # 多线程函数插件中，默认允许多少路线程同时访问OpenAI。
 # Free trial users的限制是每分钟3次，Pay-as-you-go users的限制是每分钟3500次。提高限制请查询：
 # https://platform.openai.com/docs/guides/rate-limits/overview
 DEFAULT_WORKER_NUM = 3
 # [step 3]>> 以下配置可以优化体验，但大部分场合下并不需要修改
 # 对话窗的高度
--- a/crazy_functional.py
+++ b/crazy_functional.py
@ -16,7 +16,7 @@ def get_crazy_functions():
    from crazy_functions.高级功能函数模板 import 高阶功能模板函数
    from crazy_functions.代码重写为全英文_多线程 import 全项目切换英文
    from crazy_functions.Latex全文润色 import Latex英文润色
-
+    from crazy_functions.解析项目源代码 import 解析一个Lua项目
    function_plugins = {
        "解析整个Python项目": {
@ -47,6 +47,11 @@ def get_crazy_functions():
            "AsButton": False,  # 加入下拉菜单中
            "Function": HotReload(解析一个Rect项目)
        },
        "解析整个Lua项目": {
            "Color": "stop",    # 按钮颜色
            "AsButton": False,  # 加入下拉菜单中
            "Function": HotReload(解析一个Lua项目)
        },
        "读Tex论文写摘要": {
            "Color": "stop",    # 按钮颜色
            "Function": HotReload(读文章写摘要)
@ -80,6 +85,8 @@ def get_crazy_functions():
    from crazy_functions.Latex全文润色 import Latex中文润色
    from crazy_functions.Latex全文翻译 import Latex中译英
    from crazy_functions.Latex全文翻译 import Latex英译中
    from crazy_functions.批量Markdown翻译 import Markdown中译英
    from crazy_functions.批量Markdown翻译 import Markdown英译中
    function_plugins.update({
        "批量翻译PDF文档（多线程）": {
@ -137,7 +144,18 @@ def get_crazy_functions():
            "AsButton": False,  # 加入下拉菜单中
            "Function": HotReload(Latex英译中)
        },
-
+        "[测试功能] 批量Markdown中译英（输入路径或上传压缩包）": {
            # HotReload 的意思是热更新，修改函数插件代码后，不需要重启程序，代码直接生效
            "Color": "stop",
            "AsButton": False,  # 加入下拉菜单中
            "Function": HotReload(Markdown中译英)
        },
        "[测试功能] 批量Markdown英译中（输入路径或上传压缩包）": {
            # HotReload 的意思是热更新，修改函数插件代码后，不需要重启程序，代码直接生效
            "Color": "stop",
            "AsButton": False,  # 加入下拉菜单中
            "Function": HotReload(Markdown英译中)
        },
    })
@ -156,14 +174,7 @@ def get_crazy_functions():
    except Exception as err:
        print(f'[下载arxiv论文并翻译摘要] 插件导入失败 {str(err)}')
-    from crazy_functions.解析项目源代码 import 解析一个Lua项目
+
    function_plugins.update({
        "解析整个Lua项目": {
            "Color": "stop",    # 按钮颜色
            "AsButton": False,  # 加入下拉菜单中
            "Function": HotReload(解析一个Lua项目)
        },
    })        
    ###################### 第n组插件 ###########################
    return function_plugins
--- a/crazy_functions/Latex全文润色.py
+++ b/crazy_functions/Latex全文润色.py
@ -14,7 +14,7 @@ class PaperFileGroup():
        import tiktoken
        from toolbox import get_conf
        enc = tiktoken.encoding_for_model(*get_conf('LLM_MODEL'))
-        def get_token_num(txt): return len(enc.encode(txt))
+        def get_token_num(txt): return len(enc.encode(txt, disallowed_special=()))
        self.get_token_num = get_token_num
    def run_file_split(self, max_token_limit=1900):
@ -92,7 +92,7 @@ def 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
        chatbot=chatbot,
        history_array=[[""] for _ in range(n_split)],
        sys_prompt_array=sys_prompt_array,
-        max_workers=10,  # OpenAI所允许的最大并行过载
+        # max_workers=5,  # 并行任务数量限制，最多同时执行5个，其他的排队等待
        scroller_max_len = 80
    )
--- a/crazy_functions/Latex全文翻译.py
+++ b/crazy_functions/Latex全文翻译.py
@ -14,7 +14,7 @@ class PaperFileGroup():
        import tiktoken
        from toolbox import get_conf
        enc = tiktoken.encoding_for_model(*get_conf('LLM_MODEL'))
-        def get_token_num(txt): return len(enc.encode(txt))
+        def get_token_num(txt): return len(enc.encode(txt, disallowed_special=()))
        self.get_token_num = get_token_num
    def run_file_split(self, max_token_limit=1900):
@ -80,7 +80,7 @@ def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
    elif language == 'zh->en':
        inputs_array = [f"Below is a section from a Chinese academic paper, translate it into English, do not modify any latex command such as \section, \cite and equations:" + 
                        f"\n\n{frag}" for frag in pfg.sp_file_contents]
-        inputs_show_user_array = [f"润色 {f}" for f in pfg.sp_file_tag]
+        inputs_show_user_array = [f"翻译 {f}" for f in pfg.sp_file_tag]
        sys_prompt_array = ["You are a professional academic paper translator." for _ in range(n_split)]
    gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
@ -90,7 +90,7 @@ def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
        chatbot=chatbot,
        history_array=[[""] for _ in range(n_split)],
        sys_prompt_array=sys_prompt_array,
-        max_workers=10,  # OpenAI所允许的最大并行过载
+        # max_workers=5,  # OpenAI所允许的最大并行过载
        scroller_max_len = 80
    )
--- a/crazy_functions/crazy_utils.py
+++ b/crazy_functions/crazy_utils.py
@ -1,12 +1,11 @@
 import traceback
-from toolbox import update_ui
+from toolbox import update_ui, get_conf
 def input_clipping(inputs, history, max_token_limit):
    import tiktoken
    import numpy as np
    from toolbox import get_conf
    enc = tiktoken.encoding_for_model(*get_conf('LLM_MODEL'))
-    def get_token_num(txt): return len(enc.encode(txt))
+    def get_token_num(txt): return len(enc.encode(txt, disallowed_special=()))
    mode = 'input-and-history'
    # 当 输入部分的token占比 小于 全文的一半时，只裁剪历史
@ -23,7 +22,7 @@ def input_clipping(inputs, history, max_token_limit):
    while n_token > max_token_limit:
        where = np.argmax(everything_token)
-        encoded = enc.encode(everything[where])
+        encoded = enc.encode(everything[where], disallowed_special=())
        clipped_encoded = encoded[:len(encoded)-delta]
        everything[where] = enc.decode(clipped_encoded)[:-1]    # -1 to remove the may-be illegal char
        everything_token[where] = get_token_num(everything[where])
@ -65,7 +64,6 @@ def request_gpt_model_in_new_thread_with_ui_alive(
    from request_llm.bridge_chatgpt import predict_no_ui_long_connection
    # 用户反馈
    chatbot.append([inputs_show_user, ""])
    msg = '正常'
    yield from update_ui(chatbot=chatbot, history=[]) # 刷新界面
    executor = ThreadPoolExecutor(max_workers=16)
    mutable = ["", time.time()]
@ -73,6 +71,9 @@ def request_gpt_model_in_new_thread_with_ui_alive(
        retry_op = retry_times_at_unknown_error
        exceeded_cnt = 0
        while True:
            # watchdog error
            if len(mutable) >= 2 and (time.time()-mutable[1]) > 5: 
                raise RuntimeError("检测到程序终止。")
            try:
                # 【第一种情况】：顺利完成
                result = predict_no_ui_long_connection(
@ -99,16 +100,20 @@ def request_gpt_model_in_new_thread_with_ui_alive(
            except:
                # 【第三种情况】：其他错误：重试几次
                tb_str = '```\n' + traceback.format_exc() + '```'
                print(tb_str)
                mutable[0] += f"[Local Message] 警告，在执行过程中遭遇问题, Traceback：\n\n{tb_str}\n\n"
                if retry_op > 0:
                    retry_op -= 1
-                    mutable[0] += f"[Local Message] 重试中 {retry_times_at_unknown_error-retry_op}/{retry_times_at_unknown_error}：\n\n"
+                    mutable[0] += f"[Local Message] 重试中，请稍等 {retry_times_at_unknown_error-retry_op}/{retry_times_at_unknown_error}：\n\n"
                    if "Rate limit reached" in tb_str:
                        time.sleep(30)
                    time.sleep(5)
                    continue # 返回重试
                else:
                    time.sleep(5)
                    return mutable[0] # 放弃
    # 提交任务
    future = executor.submit(_req_gpt, inputs, history, sys_prompt)
    while True:
        # yield一次以刷新前端页面
@ -129,7 +134,7 @@ def request_gpt_model_in_new_thread_with_ui_alive(
 def request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
        inputs_array, inputs_show_user_array, llm_kwargs, 
        chatbot, history_array, sys_prompt_array, 
-        refresh_interval=0.2, max_workers=10, scroller_max_len=30,
+        refresh_interval=0.2, max_workers=-1, scroller_max_len=30,
        handle_token_exceed=True, show_user_at_complete=False,
        retry_times_at_unknown_error=2,
        ):
@ -150,7 +155,7 @@ def request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
        history_array (list): List of chat history （历史对话输入，双层列表，第一层列表是子任务分解，第二层列表是对话历史）
        sys_prompt_array (list): List of system prompts （系统输入，列表，用于输入给GPT的前提提示，比如你是翻译官怎样怎样）
        refresh_interval (float, optional): Refresh interval for UI (default: 0.2) （刷新时间间隔频率，建议低于1，不可高于3，仅仅服务于视觉效果）
-        max_workers (int, optional): Maximum number of threads (default: 10) （最大线程数，如果子任务非常多，需要用此选项防止高频地请求openai导致错误）
+        max_workers (int, optional): Maximum number of threads (default: see config.py) （最大线程数，如果子任务非常多，需要用此选项防止高频地请求openai导致错误）
        scroller_max_len (int, optional): Maximum length for scroller (default: 30)（数据流的显示最后收到的多少个字符，仅仅服务于视觉效果）
        handle_token_exceed (bool, optional): （是否在输入过长时，自动缩减文本）
        handle_token_exceed：是否自动处理token溢出的情况，如果选择自动处理，则会在溢出时暴力截断，默认开启
@ -165,21 +170,28 @@ def request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
    from request_llm.bridge_chatgpt import predict_no_ui_long_connection
    assert len(inputs_array) == len(history_array)
    assert len(inputs_array) == len(sys_prompt_array)
    if max_workers == -1: # 读取配置文件
        try: max_workers, = get_conf('DEFAULT_WORKER_NUM')
        except: max_workers = 8
        if max_workers <= 0 or max_workers >= 20: max_workers = 8
    executor = ThreadPoolExecutor(max_workers=max_workers)
    n_frag = len(inputs_array)
    # 用户反馈
    chatbot.append(["请开始多线程操作。", ""])
    msg = '正常'
    yield from update_ui(chatbot=chatbot, history=[]) # 刷新界面
-    # 异步原子
+    # 跨线程传递
    mutable = [["", time.time(), "等待中"] for _ in range(n_frag)]
    # 子线程任务
    def _req_gpt(index, inputs, history, sys_prompt):
        gpt_say = ""
        retry_op = retry_times_at_unknown_error
        exceeded_cnt = 0
        mutable[index][2] = "执行中"
        while True:
            # watchdog error
            if len(mutable[index]) >= 2 and (time.time()-mutable[index][1]) > 5: 
                raise RuntimeError("检测到程序终止。")
            try:
                # 【第一种情况】：顺利完成
                # time.sleep(10); raise RuntimeError("测试")
@ -212,13 +224,21 @@ def request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
            except:
                # 【第三种情况】：其他错误
                tb_str = '```\n' + traceback.format_exc() + '```'
                print(tb_str)
                gpt_say += f"[Local Message] 警告，线程{index}在执行过程中遭遇问题, Traceback：\n\n{tb_str}\n\n"
                if len(mutable[index][0]) > 0: gpt_say += "此线程失败前收到的回答：\n\n" + mutable[index][0]
                if retry_op > 0: 
                    retry_op -= 1
                    wait = random.randint(5, 20)
-                    for i in range(wait):# 也许等待十几秒后，情况会好转
+                    if "Rate limit reached" in tb_str: 
-                        mutable[index][2] = f"等待重试 {wait-i}"; time.sleep(1)
+                        wait = wait * 3
                        fail_info = "OpenAI请求速率限制 "
                    else:
                        fail_info = ""
                    # 也许等待十几秒后，情况会好转
                    for i in range(wait):
                        mutable[index][2] = f"{fail_info}等待重试 {wait-i}"; time.sleep(1)
                    # 开始重试
                    mutable[index][2] = f"重试中 {retry_times_at_unknown_error-retry_op}/{retry_times_at_unknown_error}"
                    continue # 返回重试
                else:
@ -241,7 +261,6 @@ def request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
            break
        # 更好的UI视觉效果
        observe_win = []
        # print([mutable[thread_index][2] for thread_index, _ in enumerate(worker_done)])
        # 每个线程都要“喂狗”（看门狗）
        for thread_index, _ in enumerate(worker_done):
            mutable[thread_index][1] = time.time()
@ -251,49 +270,30 @@ def request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
                replace('\n', '').replace('```', '...').replace(
                    ' ', '.').replace('<br/>', '.....').replace('$', '.')+"`... ]"
            observe_win.append(print_something_really_funny)
        # 在前端打印些好玩的东西
        stat_str = ''.join([f'`{mutable[thread_index][2]}`: {obs}\n\n' 
                            if not done else f'`{mutable[thread_index][2]}`\n\n' 
                            for thread_index, done, obs in zip(range(len(worker_done)), worker_done, observe_win)])
        # 在前端打印些好玩的东西
        chatbot[-1] = [chatbot[-1][0], f'多线程操作已经开始，完成情况: \n\n{stat_str}' + ''.join(['.']*(cnt % 10+1))]
        msg = "正常"
        yield from update_ui(chatbot=chatbot, history=[]) # 刷新界面
    # 异步任务结束
    gpt_response_collection = []
    for inputs_show_user, f in zip(inputs_show_user_array, futures):
        gpt_res = f.result()
        gpt_response_collection.extend([inputs_show_user, gpt_res])
    # 是否在结束时，在界面上显示结果
    if show_user_at_complete:
        for inputs_show_user, f in zip(inputs_show_user_array, futures):
            gpt_res = f.result()
            chatbot.append([inputs_show_user, gpt_res])
            yield from update_ui(chatbot=chatbot, history=[]) # 刷新界面
-            time.sleep(1)
+            time.sleep(0.3)
    return gpt_response_collection
 def WithRetry(f):
    """
        装饰器函数，用于自动重试。
    """
    def decorated(retry, res_when_fail, *args, **kwargs):
        assert retry >= 0
        while True:
            try:
                res = yield from f(*args, **kwargs)
                return res
            except:
                retry -= 1
                if retry<0:
                    print("达到最大重试次数")
                    break
                else:
                    print("重试中……")
                    continue
        return res_when_fail
    return decorated
 def breakdown_txt_to_satisfy_token_limit(txt, get_token_fn, limit):
    def cut(txt_tocut, must_break_at_empty_line):  # 递归
        if get_token_fn(txt_tocut) <= limit:
@ -312,7 +312,6 @@ def breakdown_txt_to_satisfy_token_limit(txt, get_token_fn, limit):
                if get_token_fn(prev) < limit:
                    break
            if cnt == 0:
                print('what the fuck ?')
                raise RuntimeError("存在一行极长的文本！")
            # print(len(post))
            # 列表递归接龙
@ -325,8 +324,18 @@ def breakdown_txt_to_satisfy_token_limit(txt, get_token_fn, limit):
        return cut(txt, must_break_at_empty_line=False)
 def force_breakdown(txt, limit, get_token_fn):
    """
    当无法用标点、空行分割时，我们用最暴力的方法切割
    """
    for i in reversed(range(len(txt))):
        if get_token_fn(txt[:i]) < limit:
            return txt[:i], txt[i:]
    return "Tiktoken未知错误", "Tiktoken未知错误"
 def breakdown_txt_to_satisfy_token_limit_for_pdf(txt, get_token_fn, limit):
-    def cut(txt_tocut, must_break_at_empty_line):  # 递归
+    # 递归
    def cut(txt_tocut, must_break_at_empty_line, break_anyway=False):  
        if get_token_fn(txt_tocut) <= limit:
            return [txt_tocut]
        else:
@ -338,28 +347,40 @@ def breakdown_txt_to_satisfy_token_limit_for_pdf(txt, get_token_fn, limit):
                if must_break_at_empty_line:
                    if lines[cnt] != "":
                        continue
                print(cnt)
                prev = "\n".join(lines[:cnt])
                post = "\n".join(lines[cnt:])
                if get_token_fn(prev) < limit:
                    break
            if cnt == 0:
-                # print('what the fuck ? 存在一行极长的文本！')
+                if break_anyway:
-                raise RuntimeError("存在一行极长的文本！")
+                    prev, post = force_breakdown(txt_tocut, limit, get_token_fn)
                else:
                    raise RuntimeError(f"存在一行极长的文本！{txt_tocut}")
            # print(len(post))
            # 列表递归接龙
            result = [prev]
-            result.extend(cut(post, must_break_at_empty_line))
+            result.extend(cut(post, must_break_at_empty_line, break_anyway=break_anyway))
            return result
    try:
        # 第1次尝试，将双空行（\n\n）作为切分点
        return cut(txt, must_break_at_empty_line=True)
    except RuntimeError:
        try:
            # 第2次尝试，将单空行（\n）作为切分点
            return cut(txt, must_break_at_empty_line=False)
        except RuntimeError:
-            # 这个中文的句号是故意的，作为一个标识而存在
+            try:
-            res = cut(txt.replace('.', '。\n'), must_break_at_empty_line=False)
+                # 第3次尝试，将英文句号（.）作为切分点
                res = cut(txt.replace('.', '。\n'), must_break_at_empty_line=False) # 这个中文的句号是故意的，作为一个标识而存在
                return [r.replace('。\n', '.') for r in res]
            except RuntimeError as e:
                try:
                    # 第4次尝试，将中文句号（。）作为切分点
                    res = cut(txt.replace('。', '。。\n'), must_break_at_empty_line=False)
                    return [r.replace('。。\n', '。') for r in res]
                except RuntimeError as e:
                    # 第5次尝试，没办法了，随便切一下敷衍吧
                    return cut(txt, must_break_at_empty_line=False, break_anyway=True)
@ -387,12 +408,15 @@ def read_and_clean_pdf_text(fp):
    import re
    import numpy as np
    from colorful import print亮黄, print亮绿
-    fc = 0
+    fc = 0  # Index 0 文本
-    fs = 1
+    fs = 1  # Index 1 字体
-    fb = 2
+    fb = 2  # Index 2 框框
-    REMOVE_FOOT_NOTE = True
+    REMOVE_FOOT_NOTE = True # 是否丢弃掉 不是正文的内容 （比正文字体小，如参考文献、脚注、图注等）
-    REMOVE_FOOT_FFSIZE_PERCENT = 0.95 
+    REMOVE_FOOT_FFSIZE_PERCENT = 0.95 # 小于正文的？时，判定为不是正文（有些文章的正文部分字体大小不是100%统一的，有肉眼不可见的小变化）
    def primary_ffsize(l):
        """
        提取文本块主字体
        """
        fsize_statiscs = {}
        for wtf in l['spans']:
            if wtf['size'] not in fsize_statiscs: fsize_statiscs[wtf['size']] = 0
@ -400,14 +424,18 @@ def read_and_clean_pdf_text(fp):
        return max(fsize_statiscs, key=fsize_statiscs.get)
    def ffsize_same(a,b):
        """
        提取字体大小是否近似相等
        """
        return abs((a-b)/max(a,b)) < 0.02
-    # file_content = ""
+
    with fitz.open(fp) as doc:
        meta_txt = []
        meta_font = []
        meta_line = []
        meta_span = []
        ############################## <第 1 步，搜集初始信息> ##################################
        for index, page in enumerate(doc):
            # file_content += page.get_text()
            text_areas = page.get_text("dict")  # 获取页面上的文本信息
@ -429,7 +457,8 @@ def read_and_clean_pdf_text(fp):
            if index == 0:
                page_one_meta = [" ".join(["".join([wtf['text'] for wtf in l['spans']]) for l in t['lines']]).replace(
                    '- ', '') for t in text_areas['blocks'] if 'lines' in t]
-        # 获取正文主字体
+                
        ############################## <第 2 步，获取正文主字体> ##################################
        fsize_statiscs = {}
        for span in meta_span:
            if span[1] not in fsize_statiscs: fsize_statiscs[span[1]] = 0
@ -438,7 +467,7 @@ def read_and_clean_pdf_text(fp):
        if REMOVE_FOOT_NOTE:
            give_up_fize_threshold = main_fsize * REMOVE_FOOT_FFSIZE_PERCENT
-        # 切分和重新整合
+        ############################## <第 3 步，切分和重新整合> ##################################
        mega_sec = []
        sec = []
        for index, line in enumerate(meta_line):
@ -480,6 +509,7 @@ def read_and_clean_pdf_text(fp):
            finals.append(final)
        meta_txt = finals
        ############################## <第 4 步，乱七八糟的后处理> ##################################
        def 把字符太少的块清除为回车(meta_txt):
            for index, block_txt in enumerate(meta_txt):
                if len(block_txt) < 100:
@ -523,6 +553,7 @@ def read_and_clean_pdf_text(fp):
        # 换行 -> 双换行
        meta_txt = meta_txt.replace('\n', '\n\n')
        ############################## <第 5 步，展示分割效果> ##################################
        for f in finals:
            print亮黄(f)
            print亮绿('***************************')
--- a/crazy_functions/代码重写为全英文_多线程.py
+++ b/crazy_functions/代码重写为全英文_多线程.py
@ -62,7 +62,7 @@ def 全项目切换英文(txt, llm_kwargs, plugin_kwargs, chatbot, history, sys_
    import tiktoken
    from toolbox import get_conf
    enc = tiktoken.encoding_for_model(*get_conf('LLM_MODEL'))
-    def get_token_fn(txt): return len(enc.encode(txt))
+    def get_token_fn(txt): return len(enc.encode(txt, disallowed_special=()))
    # 第6步：任务函数
--- a/crazy_functions/批量Markdown翻译.py
+++ b/crazy_functions/批量Markdown翻译.py
@ -0,0 +1,162 @@
 from toolbox import update_ui
 from toolbox import CatchException, report_execption, write_results_to_file
 fast_debug = False
 class PaperFileGroup():
    def __init__(self):
        self.file_paths = []
        self.file_contents = []
        self.sp_file_contents = []
        self.sp_file_index = []
        self.sp_file_tag = []
        # count_token
        import tiktoken
        from toolbox import get_conf
        enc = tiktoken.encoding_for_model(*get_conf('LLM_MODEL'))
        def get_token_num(txt): return len(enc.encode(txt, disallowed_special=()))
        self.get_token_num = get_token_num
    def run_file_split(self, max_token_limit=1900):
        """
        将长文本分离开来
        """
        for index, file_content in enumerate(self.file_contents):
            if self.get_token_num(file_content) < max_token_limit:
                self.sp_file_contents.append(file_content)
                self.sp_file_index.append(index)
                self.sp_file_tag.append(self.file_paths[index])
            else:
                from .crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf
                segments = breakdown_txt_to_satisfy_token_limit_for_pdf(file_content, self.get_token_num, max_token_limit)
                for j, segment in enumerate(segments):
                    self.sp_file_contents.append(segment)
                    self.sp_file_index.append(index)
                    self.sp_file_tag.append(self.file_paths[index] + f".part-{j}.md")
        print('Segmentation: done')
 def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='en'):
    import time, os, re
    from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
    #  <-------- 读取Markdown文件，删除其中的所有注释 ----------> 
    pfg = PaperFileGroup()
    for index, fp in enumerate(file_manifest):
        with open(fp, 'r', encoding='utf-8', errors='replace') as f:
            file_content = f.read()
            # 记录删除注释后的文本
            pfg.file_paths.append(fp)
            pfg.file_contents.append(file_content)
    #  <-------- 拆分过长的Markdown文件 ----------> 
    pfg.run_file_split(max_token_limit=2048)
    n_split = len(pfg.sp_file_contents)
    #  <-------- 多线程润色开始 ----------> 
    if language == 'en->zh':
        inputs_array = ["This is a Markdown file, translate it into Chinese, do not modify any existing Markdown commands:" + 
                        f"\n\n{frag}" for frag in pfg.sp_file_contents]
        inputs_show_user_array = [f"翻译 {f}" for f in pfg.sp_file_tag]
        sys_prompt_array = ["You are a professional academic paper translator." for _ in range(n_split)]
    elif language == 'zh->en':
        inputs_array = [f"This is a Markdown file, translate it into English, do not modify any existing Markdown commands:" + 
                        f"\n\n{frag}" for frag in pfg.sp_file_contents]
        inputs_show_user_array = [f"翻译 {f}" for f in pfg.sp_file_tag]
        sys_prompt_array = ["You are a professional academic paper translator." for _ in range(n_split)]
    gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
        inputs_array=inputs_array,
        inputs_show_user_array=inputs_show_user_array,
        llm_kwargs=llm_kwargs,
        chatbot=chatbot,
        history_array=[[""] for _ in range(n_split)],
        sys_prompt_array=sys_prompt_array,
        # max_workers=5,  # OpenAI所允许的最大并行过载
        scroller_max_len = 80
    )
    #  <-------- 整理结果，退出 ----------> 
    create_report_file_name = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime()) + f"-chatgpt.polish.md"
    res = write_results_to_file(gpt_response_collection, file_name=create_report_file_name)
    history = gpt_response_collection
    chatbot.append((f"{fp}完成了吗？", res))
    yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
@CatchException
 def Markdown英译中(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
    # 基本信息：功能、贡献者
    chatbot.append([
        "函数插件功能？",
        "对整个Markdown项目进行翻译。函数插件贡献者: Binary-Husky"])
    yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
    # 尝试导入依赖，如果缺少依赖，则给出安装建议
    try:
        import tiktoken
    except:
        report_execption(chatbot, history,
                         a=f"解析项目: {txt}",
                         b=f"导入软件依赖失败。使用该模块需要额外依赖，安装方法```pip install --upgrade tiktoken```。")
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
        return
    history = []    # 清空历史，以免输入溢出
    import glob, os
    if os.path.exists(txt):
        project_folder = txt
    else:
        if txt == "": txt = '空空如也的输入栏'
        report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
        return
    file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.md', recursive=True)]
    if len(file_manifest) == 0:
        report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到任何.md文件: {txt}")
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
        return
    yield from 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='en->zh')
@CatchException
 def Markdown中译英(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
    # 基本信息：功能、贡献者
    chatbot.append([
        "函数插件功能？",
        "对整个Markdown项目进行翻译。函数插件贡献者: Binary-Husky"])
    yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
    # 尝试导入依赖，如果缺少依赖，则给出安装建议
    try:
        import tiktoken
    except:
        report_execption(chatbot, history,
                         a=f"解析项目: {txt}",
                         b=f"导入软件依赖失败。使用该模块需要额外依赖，安装方法```pip install --upgrade tiktoken```。")
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
        return
    history = []    # 清空历史，以免输入溢出
    import glob, os
    if os.path.exists(txt):
        project_folder = txt
    else:
        if txt == "": txt = '空空如也的输入栏'
        report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
        return
    if txt.endswith('.md'):
        file_manifest = [txt]
    else:
        file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.md', recursive=True)]
    if len(file_manifest) == 0:
        report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到任何.md文件: {txt}")
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
        return
    yield from 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='zh->en')
--- a/crazy_functions/批量翻译PDF文档_多线程.py
+++ b/crazy_functions/批量翻译PDF文档_多线程.py
@ -70,7 +70,7 @@ def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot,
        from .crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf
        from toolbox import get_conf
        enc = tiktoken.encoding_for_model(*get_conf('LLM_MODEL'))
-        def get_token_num(txt): return len(enc.encode(txt))
+        def get_token_num(txt): return len(enc.encode(txt, disallowed_special=()))
        paper_fragments = breakdown_txt_to_satisfy_token_limit_for_pdf(
            txt=file_content,  get_token_fn=get_token_num, limit=TOKEN_LIMIT_PER_FRAGMENT)
        page_one_fragments = breakdown_txt_to_satisfy_token_limit_for_pdf(
@ -98,7 +98,7 @@ def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot,
            history_array=[[paper_meta] for _ in paper_fragments],
            sys_prompt_array=[
                "请你作为一个学术翻译，负责把学术论文的片段准确翻译成中文。" for _ in paper_fragments],
-            max_workers=16  # OpenAI所允许的最大并行过载
+            # max_workers=5  # OpenAI所允许的最大并行过载
        )
        # 整理报告的格式
--- a/crazy_functions/理解PDF文档内容.py
+++ b/crazy_functions/理解PDF文档内容.py
@ -8,17 +8,18 @@ fast_debug = False
 def 解析PDF(file_name, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
    import tiktoken
    print('begin analysis on:', file_name)
    file_content, page_one = read_and_clean_pdf_text(file_name)
-    ############################## <第零步，从摘要中提取高价值信息，放到history中> ##################################
+    ############################## <第 0 步，切割PDF> ##################################
    # 递归地切割PDF文件，每一块（尽量是完整的一个section，比如introduction，experiment等，必要时再进行切割）
    # 的长度必须小于 2500 个 Token
    file_content, page_one = read_and_clean_pdf_text(file_name) # （尝试）按照章节切割PDF
    TOKEN_LIMIT_PER_FRAGMENT = 2500
    from .crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf
    from toolbox import get_conf
    enc = tiktoken.encoding_for_model(*get_conf('LLM_MODEL'))
-    def get_token_num(txt): return len(enc.encode(txt))
+    def get_token_num(txt): return len(enc.encode(txt, disallowed_special=()))
    paper_fragments = breakdown_txt_to_satisfy_token_limit_for_pdf(
        txt=file_content,  get_token_fn=get_token_num, limit=TOKEN_LIMIT_PER_FRAGMENT)
    page_one_fragments = breakdown_txt_to_satisfy_token_limit_for_pdf(
@ -26,11 +27,11 @@ def 解析PDF(file_name, llm_kwargs, plugin_kwargs, chatbot, history, system_pro
    # 为了更好的效果，我们剥离Introduction之后的部分（如果有）
    paper_meta = page_one_fragments[0].split('introduction')[0].split('Introduction')[0].split('INTRODUCTION')[0]
-    ############################## <第一步，从摘要中提取高价值信息，放到history中> ##################################
+    ############################## <第 1 步，从摘要中提取高价值信息，放到history中> ##################################
    final_results = []
    final_results.append(paper_meta)
-    ############################## <第二步，迭代地历遍整个文章，提取精炼信息> ##################################
+    ############################## <第 2 步，迭代地历遍整个文章，提取精炼信息> ##################################
    i_say_show_user = f'首先你在英文语境下通读整篇论文。'; gpt_say = "[Local Message] 收到。"           # 用户提示
    chatbot.append([i_say_show_user, gpt_say]); yield from update_ui(chatbot=chatbot, history=[])    # 更新UI
@ -51,14 +52,14 @@ def 解析PDF(file_name, llm_kwargs, plugin_kwargs, chatbot, history, system_pro
        iteration_results.append(gpt_say)
        last_iteration_result = gpt_say
-    ############################## <第三步，整理history> ##################################
+    ############################## <第 3 步，整理history> ##################################
    final_results.extend(iteration_results)
    final_results.append(f'接下来，你是一名专业的学术教授，利用以上信息，使用中文回答我的问题。')
    # 接下来两句话只显示在界面上，不起实际作用
    i_say_show_user = f'接下来，你是一名专业的学术教授，利用以上信息，使用中文回答我的问题。'; gpt_say = "[Local Message] 收到。"
    chatbot.append([i_say_show_user, gpt_say])
-    ############################## <第四步，设置一个token上限，防止回答时Token溢出> ##################################
+    ############################## <第 4 步，设置一个token上限，防止回答时Token溢出> ##################################
    from .crazy_utils import input_clipping
    _, final_results = input_clipping("", final_results, max_token_limit=3200)
    yield from update_ui(chatbot=chatbot, history=final_results) # 注意这里的历史记录被替代了
--- a/crazy_functions/解析项目源代码.py
+++ b/crazy_functions/解析项目源代码.py
@ -12,8 +12,10 @@ def 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
    sys_prompt_array = []
    report_part_1 = []
    assert len(file_manifest) <= 1024, "源文件太多（超过1024个）, 请缩减输入文件的数量。或者，您也可以选择删除此行警告，并修改代码拆分file_manifest列表，从而实现分批次处理。"
    ############################## <第一步，逐个文件分析，多线程> ##################################
    for index, fp in enumerate(file_manifest):
        # 读取文件
        with open(fp, 'r', encoding='utf-8', errors='replace') as f:
            file_content = f.read()
        prefix = "接下来请你逐文件分析下面的工程" if index==0 else ""
@ -25,6 +27,7 @@ def 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
        history_array.append([])
        sys_prompt_array.append("你是一个程序架构分析师，正在分析一个源代码项目。你的回答必须简单明了。")
    # 文件读取完成，对每一个源代码文件，生成一个请求线程，发送到chatgpt进行分析
    gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
        inputs_array = inputs_array,
        inputs_show_user_array = inputs_show_user_array,
@ -35,28 +38,13 @@ def 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
        show_user_at_complete = True
    )
    # 全部文件解析完成，结果写入文件，准备对工程源代码进行汇总分析
    report_part_1 = copy.deepcopy(gpt_response_collection)
    history_to_return = report_part_1
    res = write_results_to_file(report_part_1)
    chatbot.append(("完成？", "逐个文件分析已完成。" + res + "\n\n正在开始汇总。"))
    yield from update_ui(chatbot=chatbot, history=history_to_return) # 刷新界面
    ############################## <存储中间数据进行调试> ##################################
    # def objdump(obj):
    #     import pickle
    #     with open('objdump.tmp', 'wb+') as f:
    #         pickle.dump(obj, f)
    #     return
    # def objload():
    #     import pickle, os
    #     if not os.path.exists('objdump.tmp'): 
    #         return
    #     with open('objdump.tmp', 'rb') as f:
    #         return pickle.load(f)
    # objdump([report_part_1, gpt_response_collection, history_to_return, file_manifest, project_folder, fp, llm_kwargs, chatbot])
    ############################## <第二步，综合，单线程，分组+迭代处理> ##################################
    batchsize = 16  # 10个文件为一组
    report_part_2 = []
--- a/img/README_EN.md
+++ b/img/README_EN.md
@ -0,0 +1,294 @@
 # ChatGPT Academic Optimization
 > **Note**
 >
 > This English readme is automatically generated by the markdown translation plugin in this project, and may not be 100% correct.
 >
 **If you like this project, please give it a star. If you have come up with more useful academic shortcuts or functional plugins, feel free to open an issue or pull request (to the `dev` branch).**
 > **Note**
 >
 > 1. Please note that only function plugins (buttons) marked in **red** support reading files, and some plugins are located in the **dropdown menu** in the plugin area. Additionally, we welcome and process PRs for any new plugins with the **highest priority**!
 >
 > 2. The functions of each file in this project are detailed in the self-translation report [self_analysis.md](https://github.com/binary-husky/chatgpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A). With the version iteration, you can click on a relevant function plugin at any time to call GPT to regenerate the self-analysis report for the project. Commonly asked questions are summarized in the [`wiki`](https://github.com/binary-husky/chatgpt_academic/wiki/%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98).
 >
 > 3. If you are not used to the function, comments or interface with some Chinese names, you can click on the relevant function plugin at any time to call ChatGPT to generate the source code of the project in English.
 <div align="center">
 Function | Description
 --- | ---
 One-click refinement | Supports one-click refinement, one-click searching for grammatical errors in papers.
 One-click translation between Chinese and English | One-click translation between Chinese and English.
 One-click code interpretation | Can correctly display and interpret the code.
 [Custom shortcuts](https://www.bilibili.com/video/BV14s4y1E7jN) | Supports custom shortcuts.
 [Configure proxy server](https://www.bilibili.com/video/BV1rc411W7Dr) | Supports configuring proxy server.
 Modular design | Supports custom high-order experimental features and [function plug-ins], and plug-ins support [hot update](https://github.com/binary-husky/chatgpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97).
 [Self-program analysis](https://www.bilibili.com/video/BV1cj411A7VW) | [Function Plug-in] [One-Key Understanding](https://github.com/binary-husky/chatgpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A) the source code of this project.
 [Program analysis](https://www.bilibili.com/video/BV1cj411A7VW) | [Function Plug-in] One-click can analyze other Python/C/C++/Java/Golang/Lua/Rect project trees.
 Read papers | [Function Plug-in] One-click reads the full text of a latex paper and generates an abstract.
 Latex full-text translation/refinement | [Function Plug-in] One-click translates or refines a latex paper.
 Batch annotation generation | [Function Plug-in] One-click generates function annotations in batches.
 Chat analysis report generation | [Function Plug-in] Automatically generate summary reports after running.
 [Arxiv assistant](https://www.bilibili.com/video/BV1LM4y1279X) | [Function Plug-in] Enter the arxiv paper url and you can translate the abstract and download the PDF with one click.
 [PDF paper full-text translation function](https://www.bilibili.com/video/BV1KT411x7Wn) | [Function Plug-in] Extract title and abstract of PDF papers + translate full text (multi-threaded).
 [Google Scholar integration assistant](https://www.bilibili.com/video/BV19L411U7ia) (Version>=2.45) | [Function Plug-in] Given any Google Scholar search page URL, let GPT help you choose interesting articles.
 Formula display | Can simultaneously display the tex form and rendering form of formulas.
 Image display | Can display images in Markdown.
 Multithreaded function plug-in support | Supports multi-threaded calling of chatgpt, one-click processing of massive texts or programs.
 Support for markdown tables output by GPT | Can output markdown tables that support GPT.
 Start dark gradio theme [theme](https://github.com/binary-husky/chatgpt_academic/issues/173) | Add ```/?__dark-theme=true``` to the browser URL to switch to the dark theme.
 Huggingface free scientific online experience](https://huggingface.co/spaces/qingxu98/gpt-academic) | After logging in to Huggingface, copy [this space](https://huggingface.co/spaces/qingxu98/gpt-academic).
 [Mixed support for multiple LLM models](https://www.bilibili.com/video/BV1EM411K7VH/) ([v3.0 branch](https://github.com/binary-husky/chatgpt_academic/tree/v3.0) in testing) | It must feel great to be served by both ChatGPT and [Tsinghua ChatGLM](https://github.com/THUDM/ChatGLM-6B)!
 Compatible with [TGUI](https://github.com/oobabooga/text-generation-webui) to access more language models | Access to opt-1.3b, galactica-1.3b and other models ([v3.0 branch](https://github.com/binary-husky/chatgpt_academic/tree/v3.0) under testing).
 … | ...
 </div>
 <!-- - New interface (left: master branch, right: dev development frontier) -->
 - New interface (modify the `LAYOUT` option in `config.py` to switch between "left and right layout" and "up and down layout").
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/230361456-61078362-a966-4eb5-b49e-3c62ef18b860.gif" width="700" >
 </div>
 - All buttons are dynamically generated by reading `functional.py`, and custom functions can be added freely, freeing up the clipboard.
 <div align="center">
 <img src="公式.gif" width="700" >
 </div>
 - Refinement/Correction
 <div align="center">
 <img src="润色.gif" width="700" >
 </div>
 - Supports markdown tables output by GPT.
 <div align="center">
 <img src="demo2.jpg" width="500" >
 </div>
 - If the output contains formulas, both the tex form and the rendering form are displayed simultaneously for easy copying and reading.
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/230598842-1d7fcddd-815d-40ee-af60-baf488a199df.png" width="700" >
 </div>
 - Don't want to read project code? Let chatgpt boast about the whole project.
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/226935232-6b6a73ce-8900-4aee-93f9-733c7e6fef53.png" width="700" >
 </div>
 - Multiple large language models mixed calling. ([v3.0 branch](https://github.com/binary-husky/chatgpt_academic/tree/v3.0) in testing)
 ## Running Directly (Windows, Linux or MacOS)
 ### 1. Download the Project
 ```sh
 git clone https://github.com/binary-husky/chatgpt_academic.git
 cd chatgpt_academic
 ```
 ### 2. Configure API_KEY and Proxy Settings
 In `config.py`, configure the overseas Proxy and OpenAI API KEY, as follows:
 ```
 1. If you are in China, you need to set an overseas proxy to use the OpenAI API smoothly. Please read the instructions in config.py carefully (1. Modify the USE_PROXY to True; 2. Modify the proxies according to the instructions).
 2. Configure OpenAI API KEY. You need to register on the OpenAI official website and obtain an API KEY. Once you get the API KEY, configure it in the config.py file.
 3. Issues related to proxy network (network timeout, proxy not working) are summarized to https://github.com/binary-husky/chatgpt_academic/issues/1
 ```
 (Note: When the program is running, it will first check whether there is a private configuration file named `config_private.py`, and use the configuration in it to overwrite the same name configuration in `config.py`. Therefore, if you can understand our configuration reading logic, we strongly recommend that you create a new configuration file next to `config.py` named `config_private.py` and transfer (copy) the configuration in `config.py` to `config_private.py`. `config_private.py` is not managed by Git, which can make your privacy information more secure.)
 ### 3. Install Dependencies
 ```sh
 # (Option 1) Recommended
 python -m pip install -r requirements.txt   
 # (Option 2) If you use anaconda, the steps are also similar:
 # (Option 2.1) conda create -n gptac_venv python=3.11
 # (Option 2.2) conda activate gptac_venv
 # (Option 2.3) python -m pip install -r requirements.txt
 # Note: Use the official pip source or the Ali pip source. Other pip sources (such as some university pips) may have problems. Temporary substitution method:
 # python -m pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
 ```
 ### 4. Run
 ```sh
 python main.py
 ```
 ### 5. Test Experimental Features
 ```
 - Test C++ Project Header Analysis
    In the input area, enter `./crazy_functions/test_project/cpp/libJPG` , and then click "[Experiment] Parse the entire C++ project (input inputs the root path of the project)"
 - Test Writing Abstracts for Latex Projects
    In the input area, enter `./crazy_functions/test_project/latex/attention` , and then click "[Experiment] Read the tex paper and write an abstract (input inputs the root path of the project)"
 - Test Python Project Analysis
    In the input area, enter `./crazy_functions/test_project/python/dqn` , and then click "[Experiment] Parse the entire py project (input inputs the root path of the project)"
 - Test Self-code Interpretation
    Click "[Experiment] Please analyze and deconstruct this project itself"
 - Test Experimental Function Template (asking GPT what happened in history today), you can implement more complex functions based on this template function
    Click "[Experiment] Experimental function template"
 ```
 ## Use Docker (Linux)
 ``` sh
 # Download Project
 git clone https://github.com/binary-husky/chatgpt_academic.git
 cd chatgpt_academic
 # Configure Overseas Proxy and OpenAI API KEY
 Configure config.py with any text editor
 # Installation
 docker build -t gpt-academic .
 # Run
 docker run --rm -it --net=host gpt-academic
 # Test Experimental Features
 ## Test Self-code Interpretation
 Click "[Experiment] Please analyze and deconstruct this project itself"
 ## Test Experimental Function Template (asking GPT what happened in history today), you can implement more complex functions based on this template function
 Click "[Experiment] Experimental function template"
 ## (Please note that when running in docker, you need to pay extra attention to file access rights issues of the program.)
 ## Test C++ Project Header Analysis
 In the input area, enter ./crazy_functions/test_project/cpp/libJPG , and then click "[Experiment] Parse the entire C++ project (input inputs the root path of the project)"
 ## Test Writing Abstracts for Latex Projects
 In the input area, enter ./crazy_functions/test_project/latex/attention , and then click "[Experiment] Read the tex paper and write an abstract (input inputs the root path of the project)"
 ## Test Python Project Analysis
 In the input area, enter ./crazy_functions/test_project/python/dqn , and then click "[Experiment] Parse the entire py project (input inputs the root path of the project)"
 ```
 ## Other Deployment Methods
 - Use WSL2 (Windows Subsystem for Linux subsystem)
 Please visit [Deploy Wiki-1] (https://github.com/binary-husky/chatgpt_academic/wiki/%E4%BD%BF%E7%94%A8WSL2%EF%BC%88Windows-Subsystem-for-Linux-%E5%AD%90%E7%B3%BB%E7%BB%9F%EF%BC%89%E9%83%A8%E7%BD%B2)
 - nginx remote deployment
 Please visit [Deploy Wiki-2] (https://github.com/binary-husky/chatgpt_academic/wiki/%E8%BF%9C%E7%A8%8B%E9%83%A8%E7%BD%B2%E7%9A%84%E6%8C%87%E5%AF%BC)
 ## Customizing New Convenient Buttons (Academic Shortcut Key Customization)
 Open functional.py and add the entry as follows, and then restart the program. (If the button has been successfully added and is visible, both the prefix and suffix support hot modification and take effect without restarting the program.)
 For example,
 ```
 "Super English to Chinese Translation": {
    # Prefix, which will be added before your input. For example, it is used to describe your requirements, such as translation, code interpretation, polishing, etc.
    "Prefix": "Please translate the following content into Chinese, and then use a markdown table to explain each proprietary term in the text:\n\n", 
    # Suffix, which will be added after your input. For example, in conjunction with the prefix, you can bracket your input in quotes.
    "Suffix": "",
 },
 ```
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/226899272-477c2134-ed71-4326-810c-29891fe4a508.png" width="500" >
 </div>
 If you invent a more user-friendly academic shortcut key, welcome to post an issue or pull request!
 ## Configure Proxy
 ### Method 1: General Method
 Modify the port and proxy software corresponding in ```config.py```
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/226571294-37a47cd9-4d40-4c16-97a2-d360845406f7.png" width="500" >
 <img src="https://user-images.githubusercontent.com/96192199/226838985-e5c95956-69c2-4c23-a4dd-cd7944eeb451.png" width="500" >
 </div>
 After configuring, you can use the following command to test whether the proxy works. If everything is normal, the code below will output the location of your proxy server:
 ```
 python check_proxy.py
 ```
 ### Method Two: Pure Beginner Tutorial
 [Pure Beginner Tutorial](https://github.com/binary-husky/chatgpt_academic/wiki/%E4%BB%A3%E7%90%86%E8%BD%AF%E4%BB%B6%E9%97%AE%E9%A2%98%E7%9A%84%E6%96%B0%E6%89%8B%E8%A7%A3%E5%86%B3%E6%96%B9%E6%B3%95%EF%BC%88%E6%96%B9%E6%B3%95%E5%8F%AA%E9%80%82%E7%94%A8%E4%BA%8E%E6%96%B0%E6%89%8B%EF%BC%89)
 ## Compatibility Testing
 ### Image Display:
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/228737599-bf0a9d9c-1808-4f43-ae15-dfcc7af0f295.png" width="800" >
 </div>
 ### If the program can read and analyze itself:
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/226936850-c77d7183-0749-4c1c-9875-fd4891842d0c.png" width="800" >
 </div>
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/226936618-9b487e4b-ab5b-4b6e-84c6-16942102e917.png" width="800" >
 </div>
 ### Any other Python/Cpp project analysis:
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/226935232-6b6a73ce-8900-4aee-93f9-733c7e6fef53.png" width="800" >
 </div>
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/226969067-968a27c1-1b9c-486b-8b81-ab2de8d3f88a.png" width="800" >
 </div>
 ### Latex paper reading comprehension and abstract generation with one click
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/227504406-86ab97cd-f208-41c3-8e4a-7000e51cf980.png" width="800" >
 </div>
 ### Automatic Report Generation
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/227503770-fe29ce2c-53fd-47b0-b0ff-93805f0c2ff4.png" height="300" >
 <img src="https://user-images.githubusercontent.com/96192199/227504617-7a497bb3-0a2a-4b50-9a8a-95ae60ea7afd.png" height="300" >
 <img src="https://user-images.githubusercontent.com/96192199/227504005-efeaefe0-b687-49d0-bf95-2d7b7e66c348.png" height="300" >
 </div>
 ### Modular Function Design
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/229288270-093643c1-0018-487a-81e6-1d7809b6e90f.png" height="400" >
 <img src="https://user-images.githubusercontent.com/96192199/227504931-19955f78-45cd-4d1c-adac-e71e50957915.png" height="400" >
 </div>
 ### Translating source code to English
 <div align="center">
 <img src="https://user-images.githubusercontent.com/96192199/229720562-fe6c3508-6142-4635-a83d-21eb3669baee.png" height="400" >
 </div>
 ## Todo and Version Planning:
 - version 3 (Todo): 
 - - Support for gpt4 and other llm
 - version 2.4+ (Todo): 
 - - Summary of long text and token overflow problems in large project source code
 - - Implementation of project packaging and deployment
 - - Function plugin parameter interface optimization
 - - Self-updating
 - version 2.4: (1) Added PDF full-text translation function; (2) Added input area switching function; (3) Added vertical layout option; (4) Optimized multi-threaded function plugin.
 - version 2.3: Enhanced multi-threaded interactivity
 - version 2.2: Function plug-in supports hot reloading
 - version 2.1: Collapsible layout
 - version 2.0: Introduction of modular function plugins
 - version 1.0: Basic functions
 ## References and Learning
 ```
 The code refers to the design of many other excellent projects, mainly including:
 # Reference Project 1: Referenced the method of reading OpenAI json, recording historical inquiry records, and using gradio queue in ChuanhuChatGPT
 https://github.com/GaiZhenbiao/ChuanhuChatGPT
 # Reference Project 2:
 https://github.com/THUDM/ChatGLM-6B
 ```
--- a/img/公式.gif
+++ b/img/公式.gif
--- a/img/润色.gif
+++ b/img/润色.gif
--- a/objdump.tmp
+++ b/objdump.tmp
--- a/request_llm/README.md
+++ b/request_llm/README.md
@ -1,4 +1,4 @@
-# 如何使用其他大语言模型（dev分支测试中）
+# 如何使用其他大语言模型（v3.0分支测试中）
 ## 1. 先运行text-generation
 ``` sh
--- a/request_llm/bridge_chatgpt.py
+++ b/request_llm/bridge_chatgpt.py
@ -96,7 +96,7 @@ def predict_no_ui_long_connection(inputs, llm_kwargs, history=[], sys_prompt="",
                # 看门狗，如果超过期限没有喂狗，则终止
                if len(observe_window) >= 2:  
                    if (time.time()-observe_window[1]) > watch_dog_patience:
-                        raise RuntimeError("程序终止。")
+                        raise RuntimeError("用户取消了程序。")
        else: raise RuntimeError("意外Json结构："+delta)
    if json_data['finish_reason'] == 'length':
        raise ConnectionAbortedError("正常结束，但显示Token不足，导致输出不完整，请削减单次输入的文本量。")
--- a/2
+++ b/2
@ -1,5 +1,5 @@
 {
  "version": 2.68,
  "show_feature": true,
-  "new_feature": "改善理解pdf（chatpdf）功能 <-> 如果一键更新失败，可前往github手动更新"
+  "new_feature": "改善理解pdf（chatpdf）功能 <-> 修复读取罕见字符的BUG <-> 如果一键更新失败，可前往github手动更新"
 }