27 lines
1.1 KiB
Markdown
27 lines
1.1 KiB
Markdown
# Text processing
|
|
### Course Learning Objectives
|
|
1. Write computer programs that can read, clean and analyze textual and numerical data.
|
|
2. Evaluate the significance of data analysis results using appropriate statistical tests.
|
|
3. Write computer programs that can generate plots and visualization of data.
|
|
4. Explain how decisions can be made based on data analysis and evaluate potential issues that may arise.
|
|
5. Analyze and critique examples of data analysis being applied in different contexts.
|
|
6. Use a data science development environment to write data analysis software and describe the features of the environment.
|
|
|
|
### Topic Learning objectives
|
|
1. Understand text processing fundamentals.
|
|
2. Apply text-processing techniques.
|
|
3. Manipulate unstructured data.
|
|
|
|
We saw:
|
|
* Ambiguity
|
|
* Lexical analysis (tokenization) - whitespace and more
|
|
* Stemming
|
|
* Lemmatization - syntactic context, linguistically principled analysis
|
|
* Morphology - prefixes, suffixes, etc
|
|
* Syntax - part of speech tagging
|
|
* Parsing - grammar
|
|
* Sentence boundary detection
|
|
* Regular expressions
|
|
* Stop word removal
|
|
* Text corpora
|