diff --git a/CM3020 Artificial Intelligence/Week 4/Week 4 Notes.md b/CM3020 Artificial Intelligence/Week 4/Week 4 Notes.md index 41e452e..1481d12 100644 --- a/CM3020 Artificial Intelligence/Week 4/Week 4 Notes.md +++ b/CM3020 Artificial Intelligence/Week 4/Week 4 Notes.md @@ -1,2 +1,100 @@ # DQN Implementation -We'll use python to implement the network \ No newline at end of file +We'll use python to implement the network + +An episode is an instance of a game run for a number of actions (e.g. 10000) or until the game ends. + +https://github.com/keras-team/keras-io/blob/master/examples/rl/deep_q_network_breakout.py + +# What does the network look like? + +The size of the screen is resized to 84 by 84 before the screenshots enter the network. + +The input layers could be thought to have a 3D prism 84x84x3 + +# Convolution layers +### First convolution layer +`layer 1 = layers.Conv2D(32, 8, strides=4, activation="relu")(inputs)` + +32 filters are applied +Size of the filter is 8 by 8 pixels. +The stride is 4, meaning it skips 4 pixels each time it processes, this results in a (20, 20, 32), where we have 32 images of size 20x20, one image for each filter. + +### Second convolution layer +`layer2 = layers.Conv2D(64, 4, strides=2, activation="relu")(inputs)` + +This layers acts on the result of the previous layer +64 filters, filters of size 4x4 pixels, only skips 2 pixels. This results in a size of (9, 9, 64). + +### Third convolution layer +`layer3 = layers.Conv2D(64, 3, strides=1, activation="relu")(inputs)` + +This refults to (7, 7, 64). 64 images of 7x7 pixels + +# Flatten layer +`layer4 = layers.Flatten()(layer3)` +Turns the 2D tensors into a single layer (or flat) element. In this case 7 times 7 time 64 (7x7x64=3136) + +# Dense layer +`layer5 = layers.Dense(512, activation="relu")(layer4)` +In the example, 3136 turns into 512 nodes that is connected with each other. This means that each node contributes to the activation of the next layer of every node. + +# Dense Layer_1 +`action = layers.Dense(num_action, activation="relu")(layer5)` +Take the result of the previous layer to provide with outputs, which in Breakaway is 4 (possible actions) + +At the end we get 4 simple outputs from the Q function, one for each of the moves, which represent the values of the possible actions. + +`return keras.Model(inputs=inputs, outputs=action)` + +# DQN loss function and training +We normally we have a ground truth such as labeled pictures, this is not the case with Q learning. + +### To sample from the replay buffer +We choose 32 items from the replay buffer + +`indices = np.random.choice(range(len(done_history)), size=batch_size)` + +The current state (s) + +`state_sample = np.array([state_history[i]) for i in indices)` + +The next state (s-) + +`state_next_sample = np.array([state_next_history[i] for i in indices])` + +The reward r + +`rewards_sample = [rewards_history[i] for i in indices]` + +The action a +`action_sample = [action_history[i] for i in indices]` + +We then calculate the future reward using the Q- network (model target) + +`future_rewards = model_target.predict(state_next_sample)` + +We then use the gamma value + +`updated_q_values = rewards_sample + gamma * tf.reduce_max(future_rewards, axis=1)` + +We calculate the reward on the model + +`q_values = model(state_sample)` + +We calculate the different between the predicted rewards + +`loss = loss_function(updated_q_values, q_action)` + +# Watch DQN playing Breakwall +The size of the replay buffer needs to be big, this is an initial limitation (problem). The keras script reduced the usual size to about 12% of the original requirements. + +The graph shows the average running score achieved across multiple episodes. + +Training is resource intensive, but the inference (running the model) is fast and requires little ram. + +# Lab +I was able to follow all instructions except training the model with atari_py, there was an error saying + +`'C:\Users\gofor\myvenv\lib\site-packages\atari_py\ale_interface\ale_c.dll' (or one of its dependencies). Try using the full path with constructor syntax.` + +I looked for the file online and downloaded it, but even then still got the same error.