Week 4 notes completed

2022-11-27 23:52:29 -05:00
parent 9dd0cb8c8b
commit e3624cbe8a
1 changed files with 99 additions and 1 deletions
--- a/Intelligence/Week
+++ b/Intelligence/Week
@ -1,2 +1,100 @@
 # DQN Implementation
-We'll use python to implement the network
+We'll use python to implement the network
+
+An episode is an instance of a game run for a number of actions (e.g. 10000) or until the game ends.
+
+https://github.com/keras-team/keras-io/blob/master/examples/rl/deep_q_network_breakout.py
+
+# What does the network look like?
+
+The size of the screen is resized to 84 by 84 before the screenshots enter the network.
+
+The input layers could be thought to have a 3D prism 84x84x3
+
+# Convolution layers
+### First convolution layer
+`layer 1 = layers.Conv2D(32, 8, strides=4, activation="relu")(inputs)`
+
+32 filters are applied
+Size of the filter is 8 by 8 pixels.
+The stride is 4, meaning it skips 4 pixels each time it processes, this results in a (20, 20, 32), where we have 32 images of size 20x20, one image for each filter.
+
+### Second convolution layer
+`layer2 = layers.Conv2D(64, 4, strides=2, activation="relu")(inputs)`
+
+This layers acts on the result of the previous layer
+64 filters, filters of size 4x4 pixels, only skips 2 pixels. This results in a size of (9, 9, 64).
+
+### Third convolution layer
+`layer3 = layers.Conv2D(64, 3, strides=1, activation="relu")(inputs)`
+
+This refults to (7, 7, 64). 64 images of 7x7 pixels
+
+# Flatten layer
+`layer4 = layers.Flatten()(layer3)`
+Turns the 2D tensors into a single layer (or flat) element. In this case 7 times 7 time 64 (7x7x64=3136)
+
+# Dense layer
+`layer5 = layers.Dense(512, activation="relu")(layer4)`
+In the example, 3136 turns into 512 nodes that is connected with each other. This means that each node contributes to the activation of the next layer of every node.
+
+# Dense Layer_1
+`action = layers.Dense(num_action, activation="relu")(layer5)`
+Take the result of the previous layer to provide with outputs, which in Breakaway is 4 (possible actions)
+
+At the end we get 4 simple outputs from the Q function, one for each of the moves, which represent the values of the possible actions.
+
+`return keras.Model(inputs=inputs, outputs=action)`
+
+# DQN loss function and training
+We normally we have a ground truth such as labeled pictures, this is not the case with Q learning.
+
+### To sample from the replay buffer
+We choose 32 items from the replay buffer
+
+`indices = np.random.choice(range(len(done_history)), size=batch_size)`
+
+The current state (s)
+
+`state_sample = np.array([state_history[i]) for i in indices)`
+
+The next state (s-)
+
+`state_next_sample = np.array([state_next_history[i] for i in indices])`
+
+The reward r
+
+`rewards_sample = [rewards_history[i] for i in indices]`
+
+The action a
+`action_sample = [action_history[i] for i in indices]`
+
+We then calculate the future reward using the Q- network (model target)
+
+`future_rewards = model_target.predict(state_next_sample)`
+
+We then use the gamma value
+
+`updated_q_values = rewards_sample + gamma * tf.reduce_max(future_rewards, axis=1)`
+
+We calculate the reward on the model
+
+`q_values = model(state_sample)`
+
+We calculate the different between the predicted rewards
+
+`loss = loss_function(updated_q_values, q_action)`
+
+# Watch DQN playing Breakwall
+The size of the replay buffer needs to be big, this is an initial limitation (problem). The keras script reduced the usual size to about 12% of the original requirements.
+
+The graph shows the average running score achieved across multiple episodes.
+
+Training is resource intensive, but the inference (running the model) is fast and requires little ram.
+
+# Lab
+I was able to follow all instructions except training the model with atari_py, there was an error saying
+
+`'C:\Users\gofor\myvenv\lib\site-packages\atari_py\ale_interface\ale_c.dll' (or one of its dependencies). Try using the full path with constructor syntax.`
+
+I looked for the file online and downloaded it, but even then still got the same error.