Week 4 notes completed

This commit is contained in:
levdoescode
2023-01-13 18:17:31 -05:00
parent 027fb6edd3
commit b780ba2324

View File

@ -0,0 +1,26 @@
# Text processing
### Course Learning Objectives
1. Write computer programs that can read, clean and analyze textual and numerical data.
2. Evaluate the significance of data analysis results using appropriate statistical tests.
3. Write computer programs that can generate plots and visualization of data.
4. Explain how decisions can be made based on data analysis and evaluate potential issues that may arise.
5. Analyze and critique examples of data analysis being applied in different contexts.
6. Use a data science development environment to write data analysis software and describe the features of the environment.
### Topic Learning objectives
1. Understand text processing fundamentals.
2. Apply text-processing techniques.
3. Manipulate unstructured data.
We saw:
* Ambiguity
* Lexical analysis (tokenization) - whitespace and more
* Stemming
* Lemmatization - syntactic context, linguistically principled analysis
* Morphology - prefixes, suffixes, etc
* Syntax - part of speech tagging
* Parsing - grammar
* Sentence boundary detection
* Regular expressions
* Stop word removal
* Text corpora