diff --git a/.travis.yml b/.travis.yml new file mode 100644 index 0000000..5927dbe --- /dev/null +++ b/.travis.yml @@ -0,0 +1,6 @@ +language: python +python: + - "2.7" +install: + - pip install -r requirements.txt +script: pytest diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..cd6a57c --- /dev/null +++ b/LICENSE @@ -0,0 +1,21 @@ +The MIT License (MIT) + +Copyright (c) 2016 JackeyGao + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. diff --git a/README.md b/README.md index a18d32a..7492183 100644 --- a/README.md +++ b/README.md @@ -1,41 +1,41 @@ -# chinese-poetry | [全宋词爬取过程及数据分析](http://jackeygao.io/words/crawl-ci.html) +chinese-poetry +============== - +[](https://travis-ci.org/jackeyGao/chinese-poetry) +[](https://github.com/jackeyGao/chinese-poetry/blob/master/LICENSE) +[]() +[]() +[]() -## 唐诗高频词 +最全的中华古典文集数据库, 包含5.5万首唐诗、26万首宋诗和2.1万首宋词. 唐宋两朝斤1.4万古诗人, 和两宋时期1.5K词人. 数据来源于互联网. - +**为什么要做这个仓库?** 古诗是中华民族乃至全世界的瑰宝, 我们应该传承下去, 虽然有古典文集, 但大多数人并没有拥有这些书籍. 从某种意义上来说, 这些庞大的文集离我们是有一定距离的。而电子版方便拷贝, 所以此开源数据库诞生了. 你可以用此数据做任何有益的事情, 甚至我也可以帮助你. -## 唐诗作者作品榜 +古诗采集没有记录过程, 因为古诗数据庞大,目标网站有限制, 采集过程经常中断超过了一个星期.2017年新加入全宋词, [全宋词爬取过程及数据分析](http://jackeygao.io/words/crawl-ci.html). - -## 宋诗高频词 +## 数据分析 - +一些简单的高频分析 -## 宋诗作者作品榜 +||| +| :---: | :---: | +|  唐诗高频词 |  唐诗作者作品榜 | +| 宋诗高频词 |  宋诗作者作品榜 | +|  宋词高频词 | 宋词作者作品榜 | - - -## 宋词作者作品榜 - - - -## 宋词高频词 - - - -## 两宋喜欢的词牌名  +
-目前仅有全唐诗记录, 唐后的一些古诗未收集,古诗乃中华文化瑰宝, 如果有靠谱的数据源也欢迎提交PR加入.
## License
-[MIT](https://zh.wikipedia.org/wiki/MIT%E8%A8%B1%E5%8F%AF%E8%AD%89) 许可证.
+
+[MIT](https://github.com/jackeyGao/chinese-poetry/blob/master/LICENSE) 许可证.
diff --git a/images/WechatIMG1.jpeg b/images/WechatIMG1.jpeg
new file mode 100644
index 0000000..880c3ff
Binary files /dev/null and b/images/WechatIMG1.jpeg differ
diff --git a/json/poet.tang.49000.json b/json/poet.tang.49000.json
index 146092a..2ca8e41 100644
--- a/json/poet.tang.49000.json
+++ b/json/poet.tang.49000.json
@@ -15199,4 +15199,4 @@
],
"title": "句"
}
-]
\ No newline at end of file
+]
diff --git a/requirements.txt b/requirements.txt
new file mode 100644
index 0000000..7565aae
--- /dev/null
+++ b/requirements.txt
@@ -0,0 +1 @@
+pytest==3.1.0
diff --git a/test_poetry.py b/test_poetry.py
new file mode 100644
index 0000000..27454ec
--- /dev/null
+++ b/test_poetry.py
@@ -0,0 +1,34 @@
+#! -*- coding: utf-8 -*-
+import os, json, sqlite3
+
+def check_json(f):
+ filepath = os.path.join('./json', f)
+ with open(filepath) as file:
+ try:
+ _ = json.loads(file.read())
+ return True
+ except:
+ assert False, u"校验(%s)失败" % f
+
+
+def test_json():
+ """
+ 测试古诗JSON文件是否有效
+ """
+ map(check_json, os.listdir('./json'))
+
+
+
+def test_sqlite():
+ """
+ 测试ci数据库文件是否有效
+ """
+ conn = sqlite3.connect('./ci/ci.db')
+
+ c = conn.cursor()
+
+ c.execute("SELECT name FROM sqlite_master WHERE type='table'")
+
+ tables = c.fetchall()
+
+ assert len(tables) == 2, u"Sqlite文件异常"