1.1 KiB
1.1 KiB
Text processing
Course Learning Objectives
- Write computer programs that can read, clean and analyze textual and numerical data.
- Evaluate the significance of data analysis results using appropriate statistical tests.
- Write computer programs that can generate plots and visualization of data.
- Explain how decisions can be made based on data analysis and evaluate potential issues that may arise.
- Analyze and critique examples of data analysis being applied in different contexts.
- Use a data science development environment to write data analysis software and describe the features of the environment.
Topic Learning objectives
- Understand text processing fundamentals.
- Apply text-processing techniques.
- Manipulate unstructured data.
We saw:
- Ambiguity
- Lexical analysis (tokenization) - whitespace and more
- Stemming
- Lemmatization - syntactic context, linguistically principled analysis
- Morphology - prefixes, suffixes, etc
- Syntax - part of speech tagging
- Parsing - grammar
- Sentence boundary detection
- Regular expressions
- Stop word removal
- Text corpora