andre/UoL

Fork 0

Files

levdoescode e3624cbe8a Week 4 notes completed

2022-11-27 23:52:29 -05:00

3.7 KiB

Raw Blame History

DQN Implementation

We'll use python to implement the network

An episode is an instance of a game run for a number of actions (e.g. 10000) or until the game ends.

https://github.com/keras-team/keras-io/blob/master/examples/rl/deep_q_network_breakout.py

What does the network look like?

The size of the screen is resized to 84 by 84 before the screenshots enter the network.

The input layers could be thought to have a 3D prism 84x84x3

Convolution layers

First convolution layer

layer 1 = layers.Conv2D(32, 8, strides=4, activation="relu")(inputs)

32 filters are applied Size of the filter is 8 by 8 pixels. The stride is 4, meaning it skips 4 pixels each time it processes, this results in a (20, 20, 32), where we have 32 images of size 20x20, one image for each filter.

Second convolution layer

layer2 = layers.Conv2D(64, 4, strides=2, activation="relu")(inputs)

This layers acts on the result of the previous layer 64 filters, filters of size 4x4 pixels, only skips 2 pixels. This results in a size of (9, 9, 64).

Third convolution layer

layer3 = layers.Conv2D(64, 3, strides=1, activation="relu")(inputs)

This refults to (7, 7, 64). 64 images of 7x7 pixels

Flatten layer

layer4 = layers.Flatten()(layer3) Turns the 2D tensors into a single layer (or flat) element. In this case 7 times 7 time 64 (7x7x64=3136)

Dense layer

layer5 = layers.Dense(512, activation="relu")(layer4) In the example, 3136 turns into 512 nodes that is connected with each other. This means that each node contributes to the activation of the next layer of every node.

Dense Layer_1

action = layers.Dense(num_action, activation="relu")(layer5) Take the result of the previous layer to provide with outputs, which in Breakaway is 4 (possible actions)

At the end we get 4 simple outputs from the Q function, one for each of the moves, which represent the values of the possible actions.

return keras.Model(inputs=inputs, outputs=action)

DQN loss function and training

We normally we have a ground truth such as labeled pictures, this is not the case with Q learning.

To sample from the replay buffer

We choose 32 items from the replay buffer

indices = np.random.choice(range(len(done_history)), size=batch_size)

The current state (s)

state_sample = np.array([state_history[i]) for i in indices)

The next state (s-)

state_next_sample = np.array([state_next_history[i] for i in indices])

The reward r

rewards_sample = [rewards_history[i] for i in indices]

The action a action_sample = [action_history[i] for i in indices]

We then calculate the future reward using the Q- network (model target)

future_rewards = model_target.predict(state_next_sample)

We then use the gamma value

updated_q_values = rewards_sample + gamma * tf.reduce_max(future_rewards, axis=1)

We calculate the reward on the model

q_values = model(state_sample)

We calculate the different between the predicted rewards

loss = loss_function(updated_q_values, q_action)

Watch DQN playing Breakwall

The size of the replay buffer needs to be big, this is an initial limitation (problem). The keras script reduced the usual size to about 12% of the original requirements.

The graph shows the average running score achieved across multiple episodes.

Training is resource intensive, but the inference (running the model) is fast and requires little ram.

Lab

I was able to follow all instructions except training the model with atari_py, there was an error saying

'C:\Users\gofor\myvenv\lib\site-packages\atari_py\ale_interface\ale_c.dll' (or one of its dependencies). Try using the full path with constructor syntax.

I looked for the file online and downloaded it, but even then still got the same error.

3.7 KiB Raw Blame History