From b780ba232461f6ea6f4cc79c09b1d75672011d52 Mon Sep 17 00:00:00 2001 From: levdoescode Date: Fri, 13 Jan 2023 18:17:31 -0500 Subject: [PATCH] Week 4 notes completed --- .../Week 4/Week 4 notes.md | 26 +++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 CM3060 Natural Language Processing/Week 4/Week 4 notes.md diff --git a/CM3060 Natural Language Processing/Week 4/Week 4 notes.md b/CM3060 Natural Language Processing/Week 4/Week 4 notes.md new file mode 100644 index 0000000..68c640b --- /dev/null +++ b/CM3060 Natural Language Processing/Week 4/Week 4 notes.md @@ -0,0 +1,26 @@ +# Text processing +### Course Learning Objectives +1. Write computer programs that can read, clean and analyze textual and numerical data. +2. Evaluate the significance of data analysis results using appropriate statistical tests. +3. Write computer programs that can generate plots and visualization of data. +4. Explain how decisions can be made based on data analysis and evaluate potential issues that may arise. +5. Analyze and critique examples of data analysis being applied in different contexts. +6. Use a data science development environment to write data analysis software and describe the features of the environment. + +### Topic Learning objectives +1. Understand text processing fundamentals. +2. Apply text-processing techniques. +3. Manipulate unstructured data. + +We saw: +* Ambiguity +* Lexical analysis (tokenization) - whitespace and more +* Stemming +* Lemmatization - syntactic context, linguistically principled analysis +* Morphology - prefixes, suffixes, etc +* Syntax - part of speech tagging +* Parsing - grammar +* Sentence boundary detection +* Regular expressions +* Stop word removal +* Text corpora