Conditonal formating notebook completed

This commit is contained in:
levdoescode
2023-01-14 08:07:06 -05:00
parent 0054585a05
commit 46a8ff7bdc

View File

@ -0,0 +1,266 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Conditional frequency distributions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Using simple bigrams"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.1 Download the Brown corpus"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[nltk_data] Downloading package brown to C:\\nltk_data...\n",
"[nltk_data] Package brown is already up-to-date!\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import nltk\n",
"from nltk.corpus import brown\n",
"nltk.download('brown')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.2 Create a bigram model"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"text = brown.words(categories='news')\n",
"bigrams = nltk.bigrams(text)\n",
"cfd = nltk.ConditionalFreqDist(bigrams)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def generate_text(cfdist, word, num=50):\n",
" for i in range(num):\n",
" print(word, end=' ')\n",
" word = cfdist[word].max()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.3 Test it\n",
"Use a variety of different words!"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pass enabling legislation to the first time . The President Kennedy , and the first time . The President Kennedy , and the first time . The President Kennedy , and the first time . The President Kennedy , and the first time . The President Kennedy , and the "
]
}
],
"source": [
"# here is just one example, try some others yourself\n",
"generate_text(cfd, 'pass')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Make it more generative\n",
"Pick the next word at random from the list of bigrams"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Add a parameter for the number of bigrams to consider\n",
"2. Assign the bigrams for the current word to a frequency distribution\n",
"3. Create a list of the top N bigrams\n",
"4. Pick one at random and assign to the variable 'word'"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# fill in the blanks marked X\n",
"import random\n",
"def generate_text(cfdist, word, num=100, n=2): #1\n",
" for i in range(num):\n",
" print(word, end = ' ')\n",
" fdist = cfdist[word] #2\n",
" words = list(fdist.keys())[:n] #3\n",
" word = random.choice(words) #4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.1 Reveal the solution (only if you get stuck!)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# solution\n",
"import random\n",
"def generate_text(cfdist, word, num=100, n=2): \n",
" for i in range(num):\n",
" print(word, end=' ')\n",
" fdist = cfdist[word]\n",
" words = list(fdist.keys())[:n]\n",
" word = random.choice(words)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Test your solution using different values of N"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"He told The jury further said in term-end presentments that any irregularities took a relative merits of Atlanta's recent primary election , which had over-all charge Jan. 1 the City of Atlanta's recent primary election , `` deserves a number of Atlanta's new multi-million-dollar airport be combined to have these laws `` no evidence `` deserves a relative merits awe . The Fulton County Grand Jury indictments with city personnel as `` no evidence `` no -- and thanks of the election , `` no -- and thanks of Atlanta's new multi-million-dollar airport , which had been charged mental cruelty "
]
}
],
"source": [
"generate_text(cfd, 'He', n=2)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"He told The Fulton legislators `` deserves closer and election was received 1,119 votes on a result , `` irregularities '' for its appointed temporary assistant more than three years of the City of Atlanta and election , the City of Atlanta's recent years . The Fulton legislators allotted to have these funds through its appointed and the praise to the City Purchasing Department , which was received and election was conducted . `` deserves the election produced the City of the City Purchasing Department . The jury had been agreed to have a number of Atlanta's recent years of "
]
}
],
"source": [
"generate_text(cfd, 'He', n=3)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"He has been charged the manner in the praise to have a relative merits of possible revisions in the City Purchasing Department . It recommended federal legislation . `` deserves the election was won by Fulton County purchasing and often ambiguous '' in which it said Friday in which it believes `` no word to achieve this problem '' in term-end presentments that the praise to investigate dog . `` deserves a swipe at which was received and often hear a relative merits of Atlanta's Morehouse ( Red Sox today proposed Thursday against racial discrimination in which was conducted by "
]
}
],
"source": [
"generate_text(cfd, 'He', n=4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Discussion:\n",
"Why does the original version get stuck in a loop so easily? \n",
"How does introducing some randomness solve this problem? \n",
"What effect does increasing N (the number of bigrams to consider) have? Does it make the text more or less intelligible? If so, why?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}