WIP - add extractor, generate snippet_data

2019-08-20 15:52:05 +02:00
parent 88084d3d30
commit cc8f1d8a7a
37396 changed files with 4588842 additions and 133 deletions
--- a/node_modules/parse-english/readme.md
+++ b/node_modules/parse-english/readme.md
@ -0,0 +1,134 @@
+# parse-english
+
+[![Build][build-badge]][build]
+[![Coverage][coverage-badge]][coverage]
+[![Downloads][downloads-badge]][downloads]
+[![Size][size-badge]][size]
+[![Chat][chat-badge]][chat]
+
+English language parser for [**retext**][retext] producing
+[**NLCST**][nlcst] nodes.
+
+## Installation
+
+[npm][]:
+
+```bash
+npm install parse-english
+```
+
+## Usage
+
+```javascript
+var inspect = require('unist-util-inspect')
+var English = require('parse-english')
+
+var tree = new English().parse(
+  'Mr. Henry Brown: A hapless but friendly City of London worker.'
+)
+
+console.log(inspect(tree))
+```
+
+Yields:
+
+```txt
+RootNode[1] (1:1-1:63, 0-62)
+└─ ParagraphNode[1] (1:1-1:63, 0-62)
+   └─ SentenceNode[23] (1:1-1:63, 0-62)
+      ├─ WordNode[2] (1:1-1:4, 0-3)
+      │  ├─ TextNode: "Mr" (1:1-1:3, 0-2)
+      │  └─ PunctuationNode: "." (1:3-1:4, 2-3)
+      ├─ WhiteSpaceNode: " " (1:4-1:5, 3-4)
+      ├─ WordNode[1] (1:5-1:10, 4-9)
+      │  └─ TextNode: "Henry" (1:5-1:10, 4-9)
+      ├─ WhiteSpaceNode: " " (1:10-1:11, 9-10)
+      ├─ WordNode[1] (1:11-1:16, 10-15)
+      │  └─ TextNode: "Brown" (1:11-1:16, 10-15)
+      ├─ PunctuationNode: ":" (1:16-1:17, 15-16)
+      ├─ WhiteSpaceNode: " " (1:17-1:18, 16-17)
+      ├─ WordNode[1] (1:18-1:19, 17-18)
+      │  └─ TextNode: "A" (1:18-1:19, 17-18)
+      ├─ WhiteSpaceNode: " " (1:19-1:20, 18-19)
+      ├─ WordNode[1] (1:20-1:27, 19-26)
+      │  └─ TextNode: "hapless" (1:20-1:27, 19-26)
+      ├─ WhiteSpaceNode: " " (1:27-1:28, 26-27)
+      ├─ WordNode[1] (1:28-1:31, 27-30)
+      │  └─ TextNode: "but" (1:28-1:31, 27-30)
+      ├─ WhiteSpaceNode: " " (1:31-1:32, 30-31)
+      ├─ WordNode[1] (1:32-1:40, 31-39)
+      │  └─ TextNode: "friendly" (1:32-1:40, 31-39)
+      ├─ WhiteSpaceNode: " " (1:40-1:41, 39-40)
+      ├─ WordNode[1] (1:41-1:45, 40-44)
+      │  └─ TextNode: "City" (1:41-1:45, 40-44)
+      ├─ WhiteSpaceNode: " " (1:45-1:46, 44-45)
+      ├─ WordNode[1] (1:46-1:48, 45-47)
+      │  └─ TextNode: "of" (1:46-1:48, 45-47)
+      ├─ WhiteSpaceNode: " " (1:48-1:49, 47-48)
+      ├─ WordNode[1] (1:49-1:55, 48-54)
+      │  └─ TextNode: "London" (1:49-1:55, 48-54)
+      ├─ WhiteSpaceNode: " " (1:55-1:56, 54-55)
+      ├─ WordNode[1] (1:56-1:62, 55-61)
+      │  └─ TextNode: "worker" (1:56-1:62, 55-61)
+      └─ PunctuationNode: "." (1:62-1:63, 61-62)
+```
+
+## API
+
+`parse-english` exposes [the same API as `parse-latin`][latin].
+
+## Algorithm
+
+All of [`parse-latin`][latin] is included, and the following support
+for the English natural language:
+
+*   Unit abbreviations (`tsp.`, `tbsp.`, `oz.`, `ft.`, and more)
+*   Time references (`sec.`, `min.`, `tues.`, `thu.`, `feb.`, and more)
+*   Business Abbreviations (`Inc.` and `Ltd.`)
+*   Social titles (`Mr.`, `Mmes.`, `Sr.`, and more)
+*   Rank and academic titles (`Dr.`, `Rep.`, `Gen.`, `Prof.`, `Pres.`,
+    and more)
+*   Geographical abbreviations (`Ave.`, `Blvd.`, `Ft.`, `Hwy.`, and more)
+*   American state abbreviations (`Ala.`, `Minn.`, `La.`, `Tex.`, and more)
+*   Canadian province abbreviations (`Alta.`, `Qué.`, `Yuk.`, and more)
+*   English county abbreviations (`Beds.`, `Leics.`, `Shrops.`, and more)
+*   Common elision (omission of letters) (`’n’`, `’o`, `’em`, `’twas`,
+    `’80s`, and more)
+
+## License
+
+[MIT][license] © [Titus Wormer][author]
+
+<!-- Definitions -->
+
+[build-badge]: https://img.shields.io/travis/wooorm/parse-english.svg
+
+[build]: https://travis-ci.org/wooorm/parse-english
+
+[coverage-badge]: https://img.shields.io/codecov/c/github/wooorm/parse-english.svg
+
+[coverage]: https://codecov.io/github/wooorm/parse-english
+
+[downloads-badge]: https://img.shields.io/npm/dm/parse-english.svg
+
+[downloads]: https://www.npmjs.com/package/parse-english
+
+[size-badge]: https://img.shields.io/bundlephobia/minzip/parse-english.svg
+
+[size]: https://bundlephobia.com/result?p=parse-english
+
+[chat-badge]: https://img.shields.io/badge/join%20the%20community-on%20spectrum-7b16ff.svg
+
+[chat]: https://spectrum.chat/unified/retext
+
+[npm]: https://docs.npmjs.com/cli/install
+
+[license]: license
+
+[author]: https://wooorm.com
+
+[retext]: https://github.com/retextjs/retext
+
+[nlcst]: https://github.com/syntax-tree/nlcst
+
+[latin]: https://github.com/wooorm/parse-latin