Week 4 notes completed
This commit is contained in:
26
CM3060 Natural Language Processing/Week 4/Week 4 notes.md
Normal file
26
CM3060 Natural Language Processing/Week 4/Week 4 notes.md
Normal file
@ -0,0 +1,26 @@
|
||||
# Text processing
|
||||
### Course Learning Objectives
|
||||
1. Write computer programs that can read, clean and analyze textual and numerical data.
|
||||
2. Evaluate the significance of data analysis results using appropriate statistical tests.
|
||||
3. Write computer programs that can generate plots and visualization of data.
|
||||
4. Explain how decisions can be made based on data analysis and evaluate potential issues that may arise.
|
||||
5. Analyze and critique examples of data analysis being applied in different contexts.
|
||||
6. Use a data science development environment to write data analysis software and describe the features of the environment.
|
||||
|
||||
### Topic Learning objectives
|
||||
1. Understand text processing fundamentals.
|
||||
2. Apply text-processing techniques.
|
||||
3. Manipulate unstructured data.
|
||||
|
||||
We saw:
|
||||
* Ambiguity
|
||||
* Lexical analysis (tokenization) - whitespace and more
|
||||
* Stemming
|
||||
* Lemmatization - syntactic context, linguistically principled analysis
|
||||
* Morphology - prefixes, suffixes, etc
|
||||
* Syntax - part of speech tagging
|
||||
* Parsing - grammar
|
||||
* Sentence boundary detection
|
||||
* Regular expressions
|
||||
* Stop word removal
|
||||
* Text corpora
|
||||
Reference in New Issue
Block a user