Week 4 notes completed
This commit is contained in:
@ -1,2 +1,100 @@
|
||||
# DQN Implementation
|
||||
We'll use python to implement the network
|
||||
We'll use python to implement the network
|
||||
|
||||
An episode is an instance of a game run for a number of actions (e.g. 10000) or until the game ends.
|
||||
|
||||
https://github.com/keras-team/keras-io/blob/master/examples/rl/deep_q_network_breakout.py
|
||||
|
||||
# What does the network look like?
|
||||
|
||||
The size of the screen is resized to 84 by 84 before the screenshots enter the network.
|
||||
|
||||
The input layers could be thought to have a 3D prism 84x84x3
|
||||
|
||||
# Convolution layers
|
||||
### First convolution layer
|
||||
`layer 1 = layers.Conv2D(32, 8, strides=4, activation="relu")(inputs)`
|
||||
|
||||
32 filters are applied
|
||||
Size of the filter is 8 by 8 pixels.
|
||||
The stride is 4, meaning it skips 4 pixels each time it processes, this results in a (20, 20, 32), where we have 32 images of size 20x20, one image for each filter.
|
||||
|
||||
### Second convolution layer
|
||||
`layer2 = layers.Conv2D(64, 4, strides=2, activation="relu")(inputs)`
|
||||
|
||||
This layers acts on the result of the previous layer
|
||||
64 filters, filters of size 4x4 pixels, only skips 2 pixels. This results in a size of (9, 9, 64).
|
||||
|
||||
### Third convolution layer
|
||||
`layer3 = layers.Conv2D(64, 3, strides=1, activation="relu")(inputs)`
|
||||
|
||||
This refults to (7, 7, 64). 64 images of 7x7 pixels
|
||||
|
||||
# Flatten layer
|
||||
`layer4 = layers.Flatten()(layer3)`
|
||||
Turns the 2D tensors into a single layer (or flat) element. In this case 7 times 7 time 64 (7x7x64=3136)
|
||||
|
||||
# Dense layer
|
||||
`layer5 = layers.Dense(512, activation="relu")(layer4)`
|
||||
In the example, 3136 turns into 512 nodes that is connected with each other. This means that each node contributes to the activation of the next layer of every node.
|
||||
|
||||
# Dense Layer_1
|
||||
`action = layers.Dense(num_action, activation="relu")(layer5)`
|
||||
Take the result of the previous layer to provide with outputs, which in Breakaway is 4 (possible actions)
|
||||
|
||||
At the end we get 4 simple outputs from the Q function, one for each of the moves, which represent the values of the possible actions.
|
||||
|
||||
`return keras.Model(inputs=inputs, outputs=action)`
|
||||
|
||||
# DQN loss function and training
|
||||
We normally we have a ground truth such as labeled pictures, this is not the case with Q learning.
|
||||
|
||||
### To sample from the replay buffer
|
||||
We choose 32 items from the replay buffer
|
||||
|
||||
`indices = np.random.choice(range(len(done_history)), size=batch_size)`
|
||||
|
||||
The current state (s)
|
||||
|
||||
`state_sample = np.array([state_history[i]) for i in indices)`
|
||||
|
||||
The next state (s-)
|
||||
|
||||
`state_next_sample = np.array([state_next_history[i] for i in indices])`
|
||||
|
||||
The reward r
|
||||
|
||||
`rewards_sample = [rewards_history[i] for i in indices]`
|
||||
|
||||
The action a
|
||||
`action_sample = [action_history[i] for i in indices]`
|
||||
|
||||
We then calculate the future reward using the Q- network (model target)
|
||||
|
||||
`future_rewards = model_target.predict(state_next_sample)`
|
||||
|
||||
We then use the gamma value
|
||||
|
||||
`updated_q_values = rewards_sample + gamma * tf.reduce_max(future_rewards, axis=1)`
|
||||
|
||||
We calculate the reward on the model
|
||||
|
||||
`q_values = model(state_sample)`
|
||||
|
||||
We calculate the different between the predicted rewards
|
||||
|
||||
`loss = loss_function(updated_q_values, q_action)`
|
||||
|
||||
# Watch DQN playing Breakwall
|
||||
The size of the replay buffer needs to be big, this is an initial limitation (problem). The keras script reduced the usual size to about 12% of the original requirements.
|
||||
|
||||
The graph shows the average running score achieved across multiple episodes.
|
||||
|
||||
Training is resource intensive, but the inference (running the model) is fast and requires little ram.
|
||||
|
||||
# Lab
|
||||
I was able to follow all instructions except training the model with atari_py, there was an error saying
|
||||
|
||||
`'C:\Users\gofor\myvenv\lib\site-packages\atari_py\ale_interface\ale_c.dll' (or one of its dependencies). Try using the full path with constructor syntax.`
|
||||
|
||||
I looked for the file online and downloaded it, but even then still got the same error.
|
||||
|
||||
Reference in New Issue
Block a user