Week 1 notes completed

2023-01-11 23:09:46 -05:00
parent c03016d199
commit 9658855302
1 changed files with 41 additions and 0 deletions
--- a/Processing/Week
+++ b/Processing/Week
@@ -0,0 +1,41 @@
+
+
+
+## We've learned so far
+1. NLP involves both symbolic and statistical approaches
+2. NLP draws on a number of disciplines and perspectives
+3. NLP is currently undergoing significant growth
+
+# History of NLP
+
+NLP is not such a recent phenomenon.
+NLP's history begins in the 1940s and 1950s
+
+Automation arose from Turing's 1936 model of algorithmic computation.
+
+Chomsky 1956 considered finite state machines as a way to characterize a grammar.
+
+Shannon 1948 used measured the 'entropy' of the English language using probabilistic techniques.
+
+In the 1960s and 1970s, speech and language processing split into two paradigms:
+* Symbolic
+* statistical
+
+ELIZA was an early NLP system developed in 1966 by Wiezenbaum.
+
+SHRDLU was created in 1972 based on a world of blocks. [SHRDLU Wikipedia](https://en.wikipedia.org/wiki/SHRDLU)
+
+The first corpora (bodies of text) was created as the Brown corpus, a one-million-word collection of samples from 500 written texts from different genres.
+
+In the 1980s and 1990s, the two classes of models come back
+
+The rise of the WWW created large amounts of spoken and written language data.
+
+Traditional NLP problems, such as parsing and semantic analysis proved challenging for supervised learning, which lead to more statistically tailored approaches.
+
+IN the 2010s onwards, Recurrent neural networks (RNNs) process items as a sequence with a memory of previous inputs. This is applicable to many tasks such as:
+* word-level: named entity recognition, language modeling.
+* sentence-level: sentiment analysis, selection responses to message
+* language generation for machine translation, image captioning, etc.
+
+RNN are supplemented with long short-term memory (LSTM) or gated recurrent units (GRUs) to improve training performance (the vanishing gradient problem)