From 532d46d988776dba3461ed4fb612fcf1ae4f273c Mon Sep 17 00:00:00 2001 From: levdoescode Date: Wed, 8 Mar 2023 20:26:20 -0500 Subject: [PATCH] Week 12 notes completed --- .../Week 12/Week 12 notes.md | 101 ++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 CM3020 Artificial Intelligence/Week 12/Week 12 notes.md diff --git a/CM3020 Artificial Intelligence/Week 12/Week 12 notes.md b/CM3020 Artificial Intelligence/Week 12/Week 12 notes.md new file mode 100644 index 0000000..1596b24 --- /dev/null +++ b/CM3020 Artificial Intelligence/Week 12/Week 12 notes.md @@ -0,0 +1,101 @@ +# GPT-2 +## Objectives +* Describe how self-attention allows for a combination of contexual and sequential data in transformer networks. +* Instatiate a pre-trained language generating pipeline using GPT-2 and huggingface +* Explain how the process of fine-tuning works and why it is necessary to fine-tune pre-trained neural network models + +## Context and aims +Our aim is to gain an intuitive grasp of the transformer. + +## What is a transformer? +2017 + +Sequence and context via self-attention + +## What came before transformer +* Bag of words + + `the problem with the pop music industry is the music` + + is turned into an object/array + + `{the, problem, with, pop, music, industry, is, music}` + + This ignores sequence, but takes on context. + +* Recurrent neural networks + + Here the input is either **one hot encoded** vectors or **embeddings** + + Either way, it is the **sequence**, not the **context** which is modelled here. This is good for translations. + +* Encoding sequences + + For example, the vocabulary of words are encoded to + + `the problem with the pop music` + + the `100000` + problem `010000` + + In this case, we have a 6-word vocabulary, one digit for each word. With this encoding, there is no information on where the words are located in relation to one another. + + That's where space embedding comes in, where statistical information on the words is recorded such that words that tend to have related meaning end up in a similar space, but the feature ignores context. + + We are still encoding the sequence `010000` to a vector, and we feed the vector to a network, a recurrent neural network such as LSTM neural networks to teach it to predict the next symbol from a sequence. + +* Self attention: encoding context and sequence + +So far we have seen 3 ways of representing a space of words: +* Bag of words +* One hot encoding +* Embedding (dimensionally reduce one hot) to similar spaces + +Transformers add a layer known as **self attention** . It as contextual information to the sequential information. + +In transformers, from the vector space a word is in, it gets placed into a space in a new layer not just by itself but with information from the other words that were in the sequence vector of the previous layer. + +This is what the attention layer does, similarly to convolution where we filter surrounding values by multiplying them, the attention layer does this with the surrounding words. + +We can have multiple filters, also known as multi-headed attention, where we have multiple filters processing the word embedding into new self-attention embeddings such that we take account of the balance of words in different ways to extract features and feed it into the training. + +## GPT-2 + +### Too dangerous to realease +> Due to our concerns about malicious applications of the technology, we are not releasing the trained model. -OpenAI blog post, 2019 + +This happened during the height of election fake news. + +### GPT2 config +What is GPT2? +* 40GB of text +* 1.5 billion parameters +* 48 layers + +How does it work? + +It uses self-attention with multiple 'heads' so it generates multiple attention projections. + +It can be used in auto-regressive mode: it can generate an endless stream of words. + +### Zeroshot concept +GPT-2 outperformed several state-of-the-art systems in zero-shot mode. + +### Hugginface +>"We are on a mission to democratize good machine learning, one commit at a time." + +It's a community and a collection of pre-trained models that can be downloaded through a Python API. + +```Python +from transformers import pipeline, set_seed +generator = pipeline('text-generation', mode='gpt2-large') +generator.model.summary() +generator("The thing about love is ") #will generate some text +``` + +### GPT3 +Why not GPT-3 + +As of 2021, some open-source versions are appearing but they have large computational requirements. + +>Refer to week 12 code '1-lyrics-gpt2'