UoL/CM3060 Natural Language Processing/Week 1/Week 1 Notes.md



## We've learned so far
1. NLP involves both symbolic and statistical approaches
2. NLP draws on a number of disciplines and perspectives
3. NLP is currently undergoing significant growth

# History of NLP

NLP is not such a recent phenomenon.
NLP's history begins in the 1940s and 1950s

Automation arose from Turing's 1936 model of algorithmic computation.

Chomsky 1956 considered finite state machines as a way to characterize a grammar.

Shannon 1948 used measured the 'entropy' of the English language using probabilistic techniques.

In the 1960s and 1970s, speech and language processing split into two paradigms:
* Symbolic
* statistical

ELIZA was an early NLP system developed in 1966 by Wiezenbaum.

SHRDLU was created in 1972 based on a world of blocks. [SHRDLU Wikipedia](https://en.wikipedia.org/wiki/SHRDLU)

The first corpora (bodies of text) was created as the Brown corpus, a one-million-word collection of samples from 500 written texts from different genres.

In the 1980s and 1990s, the two classes of models come back

The rise of the WWW created large amounts of spoken and written language data.

Traditional NLP problems, such as parsing and semantic analysis proved challenging for supervised learning, which lead to more statistically tailored approaches.

IN the 2010s onwards, Recurrent neural networks (RNNs) process items as a sequence with a memory of previous inputs. This is applicable to many tasks such as:
* word-level: named entity recognition, language modeling.
* sentence-level: sentiment analysis, selection responses to message
* language generation for machine translation, image captioning, etc.

RNN are supplemented with long short-term memory (LSTM) or gated recurrent units (GRUs) to improve training performance (the vanishing gradient problem)