Introduction

In a previous project, we explore Q-Learning. In this one, we saw the limits in term of state space as the Q matrix has a size which should contain all possible state. Knowing that, DeepMind got the idea to use a Neural Network to compute the Q-Value based on the state. This can be seens as an online-regression training. The benefit is allows an infinite number of states. Of course, the more complex, will be the environment, the more complex neural network we will need to approach the estimated Q-Value.

Today, we gonna create the first DeepQNetwork on the environment cartpole. I selected Cartpole for 3 reasons :

  • We need an environment with discrete output (so this removes the Pendulum env.)
  • To ease the training, I wanted to avoid very sparse reward (this removes MountainCar and Acrobot)
  • The environment is small so we don't need a big model and it's gonna be fast to test and tune.

Creates Model

In [1]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

from collections import deque
import numpy as np
import pickle 
import gym

import tensorflow as tf

from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras import backend as K
from keras.models import load_model

import matplotlib.pyplot as plt
C:\python36\envs\machine_learning\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.

Memory Replay

One huge improvement provided by DeepMind team is to train the DQN with a Memory Replay. The idea is to store several pairs of (State, Action, Reward, Next_State) and pick random experiences to train the model instead of the few last tries. The reason is simple. Without Memory Replay, you will get correlations between experiences and creates a poor training. Let's imagine you succeed to maintain the pole perfectly vertical for 20 frames and you use the last 20 frames for your training, you will highly impact the "stability" of the Neural Network. Instead, if you use 20 random frame in a batch of you last 1000 frames, you will have more chances to get frames with pole to the left, to the right, every actions and so on.

Now to implement it, it's quite simple. We just have to store pairs in a Deque (Double Ended Queue) and pick some random elements for the learning.

In [2]:
class Memory():
    """
    Replay Memory slightly extended to compare the impact on the learning of having or not the Memory Replay. 
    Without it, it's just like storing the batch_size of last experiences.
    """
    def __init__(self, max_size=1000):
        self.buffer = deque(maxlen=max_size)

    def add(self, experience):
        self.buffer.append(experience)

    def sample(self, batch_size):
        idx = np.random.choice(np.arange(len(self.buffer)), size=batch_size, replace=False)
        return [self.buffer[x] for x in idx]
    
    def getAll(self):
        return self.buffer
    
    def __len__(self):
        return len(self.buffer)
    
class Stats():
    def __init__(self, max_size=None):
        self.buffer = deque(maxlen=max_size)
    
    def add(self, score):
        self.buffer.append(score)

DQN

Now let's create our DQN. This is quite simple with Keras, it's like creating a MultiClass Regression Model. The Input is the state and the output is the predicted Q-value of every actions.

As a result, the label will be the Q-value predicted the the value of the action taken corrected with the result of the Bellmann Equation. For example:

  • The model predict [0.1, 0.7] as Q-value for action 0 and 1
  • We take, using Epsilon Greedy rule, the action 0
  • The reward of this action is 1 and game is not done
  • The prediction on the new State provides a Q-value of [0.5, 0.4]
  • With a gamma of 0.99, the Q_value of action 0 is : 1 (reward) + 0.99(gamma) * 0.5 (max Next_Q) = 1.495
  • The target will be in that case : [1.495, 0.7]

If the game is done, the we only keep the reward so the taget will be :

  • The target will be in that case : [1.0, 0.7]

For the training, the principle is exactly the same as Q-Learning except that we perform the "fit" on a batch taken from the replay memory instad of only the last action.

In [3]:
class QNetwork:
    def __init__(self, learning_rate=0.001, use_replay_memory=True, reshape_reward=False):
        self.train_episodes = 1000          # max number of episodes to learn from
        self.test_episodes = 100
        self.max_steps = 200                # max steps in an episode
        self.gamma = 0.99                   # future reward discount

        # Exploration parameters
        self.explore_start = 1.0            # exploration probability at start
        self.explore_stop = 0.01            # minimum exploration probability
        self.decay_rate = 0.0002            # exponential decay rate for exploration prob

        # Network parameters
        self.hidden_size = 16               # number of units in each Q-network hidden layer
        self.learning_rate = learning_rate
        
        # Benchmark info
        self.stats = Stats(max_size=None)
        self.use_replay_memory = use_replay_memory
        self.reshape_reward = reshape_reward

        # Memory parameters
        self.batch_size = 32 
        if not self.use_replay_memory:
            self.memory_size = self.batch_size       # Store only a batch
            self.pretrain_length = self.batch_size   # number experiences to pretrain the memory
        else:
            self.memory_size = 10000            # memory capacity 
            self.pretrain_length = 10000        # number experiences to pretrain the memory
        self.replay_memory = Memory(max_size=self.memory_size)
        
        # env parameters
        self.input_shape = (4)
        self.output_shape = 2      
        
        self.create_model()
        self.env = gym.make('CartPole-v0')
    
    def create_model(self):
        K.clear_session()
        
        self.model = Sequential()

        self.model.add(Dense(self.hidden_size, activation='relu', input_dim=self.input_shape))
        self.model.add(Dense(self.hidden_size, activation='relu'))
        self.model.add(Dense(self.output_shape, activation='linear'))

        self.optimizer = Adam(lr=self.learning_rate)
        self.model.compile(loss='mse', optimizer=self.optimizer)
        self.model.summary()
        
    def fill_replay(self):
        done = True
        while len(self.replay_memory) < self.pretrain_length:
            if done:
                obs = self.env.reset()
                state = self.obs_to_state(obs)
            else:
                state = next_state
            
            action = self.env.action_space.sample()
            next_obs, reward, done, _ = self.env.step(action)
            next_state = self.obs_to_state(next_obs)
            self.replay_memory.add((state, action, reward, next_state, 1.0-done))
            
    def obs_to_state(self, obs):
        return np.reshape(obs, [1, 4])
    
    def densify_reward(self, obs):
        score_position = (2.4-abs(obs[0]))**2/2.4**2
        score_angle =(0.20944-abs(obs[2]))**2/0.20944**2
        return (score_position + score_angle) /2
    
    def epsilon_greedy(self, step):
        explore_p = self.explore_stop + (self.explore_start - self.explore_stop)*np.exp(-self.decay_rate*step)
        return explore_p
    
    def plot_epsilon_greedy(self):
        import matplotlib.pyplot as plt
        plt.figure(figsize=(10,6))
        X = np.arange(1, 10000, 5)
        y = self.epsilon_greedy(X)
        plt.plot(X, y)
        plt.show()
        
    def learn(self, minibatch):
        state_mat = np.zeros((self.batch_size, 4))
        action_arr = np.zeros(self.batch_size, dtype=np.int32)
        rewards_arr = np.zeros(self.batch_size)
        next_state_mat = np.zeros((self.batch_size, 4))
        cont_arr = np.zeros(self.batch_size, dtype=np.int32)
        for i, (state_b, action_b, reward_b, next_state_b, cont) in enumerate(minibatch):
            state_mat[i] = state_b[0]
            action_arr[i] = action_b
            rewards_arr[i] = reward_b
            next_state_mat[i] = next_state_b[0]
            cont_arr[i] = cont
        Q_values = self.model.predict(state_mat)
        next_Q_values = self.model.predict(next_state_mat)
        for i in range(self.batch_size):
            Q_values[i, action_arr[i]] = rewards_arr[i] + cont_arr[i] * self.gamma * np.max(next_Q_values[i])
        self.model.fit(state_mat, Q_values, epochs=1, verbose=0)  #, callbacks=[self.learning_rate]
    
    def train(self, train_episodes_ovr = None):
        self.fill_replay()
        
        if train_episodes_ovr is not None:
            self.train_episodes = train_episodes_ovr
        
        training_step = 0
        for episodes in range(self.train_episodes):
            obs = self.env.reset()
            state = self.obs_to_state(obs)
            
            total_reward = 0
            total_dense_reward = 0
            t = 0
            while t < self.max_steps:
                training_step += 1
                
                exploration_prob = self.epsilon_greedy(training_step)
                if np.random.rand() < exploration_prob:
                    action = self.env.action_space.sample()
                else:
                    Qs = self.model.predict(state)
                    action = np.argmax(Qs[0])

                # Take action, get new state and reward
                next_obs, reward, done, _ = self.env.step(action)
                next_state = self.obs_to_state(next_obs)
                
                dense_reward = self.densify_reward(next_obs)
                if self.reshape_reward:
                    reward_used = reward
                else:
                    reward_used = dense_reward
                
                total_reward += reward
                total_dense_reward += dense_reward
                
                self.replay_memory.add((state, action, reward_used, next_state, 1-done))
                
                # Replay
                if self.use_replay_memory:
                    minibatch = self.replay_memory.sample(self.batch_size)
                else:
                    minibatch = self.replay_memory.getAll()
                self.learn(minibatch)                        
                
                if done:
                    self.stats.add((t, total_reward, total_dense_reward))
                    break
                else:
                    state = next_state
                    t += 1
            
            print('Episode: {}/{}'.format(episodes, self.train_episodes),
                  'Total reward: {}'.format(total_reward),
                  'iter : {}'.format(training_step),
                  'P : {:.3f}'.format(exploration_prob)
                 )
            
    def save_stats(self, filename):
        with open(filename, "wb") as f:
            pickle.dump(self.stats, f)
        
    def play(self, render=False):
        for i in range(self.test_episodes):
            obs = self.env.reset()
            done = False
            total_reward = 0
            while not done:
                if render:
                    self.env.render()
                state = self.obs_to_state(obs)
                Qs = self.model.predict(state)
                action = np.argmax(Qs[0])
                obs, reward, done, _ = self.env.step(action)
                total_reward += reward
            print("Game {} - Score {}".format(i, total_reward))

Now let's try some models. First of all, we can train the same model with and without Replay Memory. We can try to reshape the reward to help the learning (the reward is 1 if we are exactly on the middle with a pole angle of 0) else it decrease to 0 in the worst case (12 deg angle and X position close to +/- 2.4). To finish, I also try smaller and bigger Learning Rate (and decay factor for the epsilon greedy as we need less iterations to learn)

In [4]:
# DQN = QNetwork(learning_rate = 0.0001, use_replay_memory=True, reshape_reward=False)
# DQN.decay_rate = 0.0003
# DQN.plot_epsilon_greedy()
In [9]:
DQN = QNetwork(learning_rate = 0.001, use_replay_memory=True, reshape_reward=False)
DQN.train(train_episodes_ovr=1000)
DQN.save_stats("DQN_with_memory_simple_reward.p")
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 16)                80        
_________________________________________________________________
dense_2 (Dense)              (None, 16)                272       
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 34        
=================================================================
Total params: 386
Trainable params: 386
Non-trainable params: 0
_________________________________________________________________
Episode: 0/1000 Total reward: 10.0 iter : 10 P : 0.998
Episode: 1/1000 Total reward: 32.0 iter : 42 P : 0.992
Episode: 2/1000 Total reward: 9.0 iter : 51 P : 0.990
Episode: 3/1000 Total reward: 12.0 iter : 63 P : 0.988
Episode: 4/1000 Total reward: 19.0 iter : 82 P : 0.984
Episode: 5/1000 Total reward: 16.0 iter : 98 P : 0.981
Episode: 6/1000 Total reward: 11.0 iter : 109 P : 0.979
Episode: 7/1000 Total reward: 10.0 iter : 119 P : 0.977
Episode: 8/1000 Total reward: 15.0 iter : 134 P : 0.974
Episode: 9/1000 Total reward: 15.0 iter : 149 P : 0.971
Episode: 10/1000 Total reward: 17.0 iter : 166 P : 0.968
Episode: 11/1000 Total reward: 57.0 iter : 223 P : 0.957
Episode: 12/1000 Total reward: 12.0 iter : 235 P : 0.955
Episode: 13/1000 Total reward: 19.0 iter : 254 P : 0.951
Episode: 14/1000 Total reward: 11.0 iter : 265 P : 0.949
Episode: 15/1000 Total reward: 14.0 iter : 279 P : 0.946
Episode: 16/1000 Total reward: 20.0 iter : 299 P : 0.943
Episode: 17/1000 Total reward: 33.0 iter : 332 P : 0.936
Episode: 18/1000 Total reward: 22.0 iter : 354 P : 0.932
Episode: 19/1000 Total reward: 25.0 iter : 379 P : 0.928
Episode: 20/1000 Total reward: 32.0 iter : 411 P : 0.922
Episode: 21/1000 Total reward: 23.0 iter : 434 P : 0.918
Episode: 22/1000 Total reward: 37.0 iter : 471 P : 0.911
Episode: 23/1000 Total reward: 42.0 iter : 513 P : 0.903
Episode: 24/1000 Total reward: 11.0 iter : 524 P : 0.901
Episode: 25/1000 Total reward: 18.0 iter : 542 P : 0.898
Episode: 26/1000 Total reward: 9.0 iter : 551 P : 0.897
Episode: 27/1000 Total reward: 12.0 iter : 563 P : 0.895
Episode: 28/1000 Total reward: 18.0 iter : 581 P : 0.891
Episode: 29/1000 Total reward: 14.0 iter : 595 P : 0.889
Episode: 30/1000 Total reward: 15.0 iter : 610 P : 0.886
Episode: 31/1000 Total reward: 37.0 iter : 647 P : 0.880
Episode: 32/1000 Total reward: 23.0 iter : 670 P : 0.876
Episode: 33/1000 Total reward: 19.0 iter : 689 P : 0.873
Episode: 34/1000 Total reward: 11.0 iter : 700 P : 0.871
Episode: 35/1000 Total reward: 23.0 iter : 723 P : 0.867
Episode: 36/1000 Total reward: 18.0 iter : 741 P : 0.864
Episode: 37/1000 Total reward: 41.0 iter : 782 P : 0.857
Episode: 38/1000 Total reward: 15.0 iter : 797 P : 0.854
Episode: 39/1000 Total reward: 11.0 iter : 808 P : 0.852
Episode: 40/1000 Total reward: 23.0 iter : 831 P : 0.848
Episode: 41/1000 Total reward: 20.0 iter : 851 P : 0.845
Episode: 42/1000 Total reward: 19.0 iter : 870 P : 0.842
Episode: 43/1000 Total reward: 8.0 iter : 878 P : 0.841
Episode: 44/1000 Total reward: 24.0 iter : 902 P : 0.837
Episode: 45/1000 Total reward: 10.0 iter : 912 P : 0.835
Episode: 46/1000 Total reward: 16.0 iter : 928 P : 0.832
Episode: 47/1000 Total reward: 16.0 iter : 944 P : 0.830
Episode: 48/1000 Total reward: 12.0 iter : 956 P : 0.828
Episode: 49/1000 Total reward: 16.0 iter : 972 P : 0.825
Episode: 50/1000 Total reward: 17.0 iter : 989 P : 0.822
Episode: 51/1000 Total reward: 23.0 iter : 1012 P : 0.819
Episode: 52/1000 Total reward: 15.0 iter : 1027 P : 0.816
Episode: 53/1000 Total reward: 20.0 iter : 1047 P : 0.813
Episode: 54/1000 Total reward: 28.0 iter : 1075 P : 0.808
Episode: 55/1000 Total reward: 16.0 iter : 1091 P : 0.806
Episode: 56/1000 Total reward: 12.0 iter : 1103 P : 0.804
Episode: 57/1000 Total reward: 40.0 iter : 1143 P : 0.798
Episode: 58/1000 Total reward: 40.0 iter : 1183 P : 0.791
Episode: 59/1000 Total reward: 19.0 iter : 1202 P : 0.788
Episode: 60/1000 Total reward: 42.0 iter : 1244 P : 0.782
Episode: 61/1000 Total reward: 22.0 iter : 1266 P : 0.779
Episode: 62/1000 Total reward: 31.0 iter : 1297 P : 0.774
Episode: 63/1000 Total reward: 40.0 iter : 1337 P : 0.768
Episode: 64/1000 Total reward: 25.0 iter : 1362 P : 0.764
Episode: 65/1000 Total reward: 63.0 iter : 1425 P : 0.754
Episode: 66/1000 Total reward: 29.0 iter : 1454 P : 0.750
Episode: 67/1000 Total reward: 20.0 iter : 1474 P : 0.747
Episode: 68/1000 Total reward: 99.0 iter : 1573 P : 0.733
Episode: 69/1000 Total reward: 36.0 iter : 1609 P : 0.728
Episode: 70/1000 Total reward: 47.0 iter : 1656 P : 0.721
Episode: 71/1000 Total reward: 26.0 iter : 1682 P : 0.717
Episode: 72/1000 Total reward: 19.0 iter : 1701 P : 0.715
Episode: 73/1000 Total reward: 41.0 iter : 1742 P : 0.709
Episode: 74/1000 Total reward: 42.0 iter : 1784 P : 0.703
Episode: 75/1000 Total reward: 57.0 iter : 1841 P : 0.695
Episode: 76/1000 Total reward: 40.0 iter : 1881 P : 0.690
Episode: 77/1000 Total reward: 21.0 iter : 1902 P : 0.687
Episode: 78/1000 Total reward: 37.0 iter : 1939 P : 0.682
Episode: 79/1000 Total reward: 12.0 iter : 1951 P : 0.680
Episode: 80/1000 Total reward: 35.0 iter : 1986 P : 0.675
Episode: 81/1000 Total reward: 49.0 iter : 2035 P : 0.669
Episode: 82/1000 Total reward: 42.0 iter : 2077 P : 0.663
Episode: 83/1000 Total reward: 99.0 iter : 2176 P : 0.651
Episode: 84/1000 Total reward: 50.0 iter : 2226 P : 0.644
Episode: 85/1000 Total reward: 14.0 iter : 2240 P : 0.643
Episode: 86/1000 Total reward: 108.0 iter : 2348 P : 0.629
Episode: 87/1000 Total reward: 34.0 iter : 2382 P : 0.625
Episode: 88/1000 Total reward: 93.0 iter : 2475 P : 0.613
Episode: 89/1000 Total reward: 41.0 iter : 2516 P : 0.609
Episode: 90/1000 Total reward: 54.0 iter : 2570 P : 0.602
Episode: 91/1000 Total reward: 65.0 iter : 2635 P : 0.594
Episode: 92/1000 Total reward: 18.0 iter : 2653 P : 0.592
Episode: 93/1000 Total reward: 48.0 iter : 2701 P : 0.587
Episode: 94/1000 Total reward: 21.0 iter : 2722 P : 0.584
Episode: 95/1000 Total reward: 93.0 iter : 2815 P : 0.574
Episode: 96/1000 Total reward: 102.0 iter : 2917 P : 0.562
Episode: 97/1000 Total reward: 105.0 iter : 3022 P : 0.551
Episode: 98/1000 Total reward: 95.0 iter : 3117 P : 0.541
Episode: 99/1000 Total reward: 84.0 iter : 3201 P : 0.532
Episode: 100/1000 Total reward: 87.0 iter : 3288 P : 0.523
Episode: 101/1000 Total reward: 37.0 iter : 3325 P : 0.519
Episode: 102/1000 Total reward: 115.0 iter : 3440 P : 0.508
Episode: 103/1000 Total reward: 55.0 iter : 3495 P : 0.502
Episode: 104/1000 Total reward: 103.0 iter : 3598 P : 0.492
Episode: 105/1000 Total reward: 148.0 iter : 3746 P : 0.478
Episode: 106/1000 Total reward: 84.0 iter : 3830 P : 0.470
Episode: 107/1000 Total reward: 73.0 iter : 3903 P : 0.464
Episode: 108/1000 Total reward: 68.0 iter : 3971 P : 0.457
Episode: 109/1000 Total reward: 59.0 iter : 4030 P : 0.452
Episode: 110/1000 Total reward: 39.0 iter : 4069 P : 0.449
Episode: 111/1000 Total reward: 12.0 iter : 4081 P : 0.448
Episode: 112/1000 Total reward: 95.0 iter : 4176 P : 0.439
Episode: 113/1000 Total reward: 118.0 iter : 4294 P : 0.429
Episode: 114/1000 Total reward: 112.0 iter : 4406 P : 0.420
Episode: 115/1000 Total reward: 54.0 iter : 4460 P : 0.416
Episode: 116/1000 Total reward: 17.0 iter : 4477 P : 0.414
Episode: 117/1000 Total reward: 151.0 iter : 4628 P : 0.402
Episode: 118/1000 Total reward: 80.0 iter : 4708 P : 0.396
Episode: 119/1000 Total reward: 154.0 iter : 4862 P : 0.384
Episode: 120/1000 Total reward: 106.0 iter : 4968 P : 0.377
Episode: 121/1000 Total reward: 36.0 iter : 5004 P : 0.374
Episode: 122/1000 Total reward: 103.0 iter : 5107 P : 0.366
Episode: 123/1000 Total reward: 179.0 iter : 5286 P : 0.354
Episode: 124/1000 Total reward: 147.0 iter : 5433 P : 0.344
Episode: 125/1000 Total reward: 162.0 iter : 5595 P : 0.333
Episode: 126/1000 Total reward: 200.0 iter : 5795 P : 0.321
Episode: 127/1000 Total reward: 110.0 iter : 5905 P : 0.314
Episode: 128/1000 Total reward: 174.0 iter : 6079 P : 0.304
Episode: 129/1000 Total reward: 190.0 iter : 6269 P : 0.293
Episode: 130/1000 Total reward: 200.0 iter : 6469 P : 0.281
Episode: 131/1000 Total reward: 200.0 iter : 6669 P : 0.271
Episode: 132/1000 Total reward: 200.0 iter : 6869 P : 0.261
Episode: 133/1000 Total reward: 200.0 iter : 7069 P : 0.251
Episode: 134/1000 Total reward: 177.0 iter : 7246 P : 0.242
Episode: 135/1000 Total reward: 200.0 iter : 7446 P : 0.233
Episode: 136/1000 Total reward: 200.0 iter : 7646 P : 0.225
Episode: 137/1000 Total reward: 187.0 iter : 7833 P : 0.217
Episode: 138/1000 Total reward: 200.0 iter : 8033 P : 0.209
Episode: 139/1000 Total reward: 200.0 iter : 8233 P : 0.201
Episode: 140/1000 Total reward: 200.0 iter : 8433 P : 0.193
Episode: 141/1000 Total reward: 198.0 iter : 8631 P : 0.186
Episode: 142/1000 Total reward: 200.0 iter : 8831 P : 0.179
Episode: 143/1000 Total reward: 200.0 iter : 9031 P : 0.173
Episode: 144/1000 Total reward: 200.0 iter : 9231 P : 0.166
Episode: 145/1000 Total reward: 200.0 iter : 9431 P : 0.160
Episode: 146/1000 Total reward: 200.0 iter : 9631 P : 0.154
Episode: 147/1000 Total reward: 200.0 iter : 9831 P : 0.149
Episode: 148/1000 Total reward: 200.0 iter : 10031 P : 0.143
Episode: 149/1000 Total reward: 200.0 iter : 10231 P : 0.138
Episode: 150/1000 Total reward: 200.0 iter : 10431 P : 0.133
Episode: 151/1000 Total reward: 200.0 iter : 10631 P : 0.128
Episode: 152/1000 Total reward: 200.0 iter : 10831 P : 0.123
Episode: 153/1000 Total reward: 200.0 iter : 11031 P : 0.119
Episode: 154/1000 Total reward: 200.0 iter : 11231 P : 0.115
Episode: 155/1000 Total reward: 200.0 iter : 11431 P : 0.111
Episode: 156/1000 Total reward: 200.0 iter : 11631 P : 0.107
Episode: 157/1000 Total reward: 200.0 iter : 11831 P : 0.103
Episode: 158/1000 Total reward: 200.0 iter : 12031 P : 0.099
Episode: 159/1000 Total reward: 200.0 iter : 12231 P : 0.096
Episode: 160/1000 Total reward: 200.0 iter : 12431 P : 0.092
Episode: 161/1000 Total reward: 200.0 iter : 12631 P : 0.089
Episode: 162/1000 Total reward: 200.0 iter : 12831 P : 0.086
Episode: 163/1000 Total reward: 200.0 iter : 13031 P : 0.083
Episode: 164/1000 Total reward: 200.0 iter : 13231 P : 0.080
Episode: 165/1000 Total reward: 200.0 iter : 13431 P : 0.077
Episode: 166/1000 Total reward: 200.0 iter : 13631 P : 0.075
Episode: 167/1000 Total reward: 200.0 iter : 13831 P : 0.072
Episode: 168/1000 Total reward: 200.0 iter : 14031 P : 0.070
Episode: 169/1000 Total reward: 188.0 iter : 14219 P : 0.068
Episode: 170/1000 Total reward: 200.0 iter : 14419 P : 0.065
Episode: 171/1000 Total reward: 197.0 iter : 14616 P : 0.063
Episode: 172/1000 Total reward: 200.0 iter : 14816 P : 0.061
Episode: 173/1000 Total reward: 200.0 iter : 15016 P : 0.059
Episode: 174/1000 Total reward: 200.0 iter : 15216 P : 0.057
Episode: 175/1000 Total reward: 200.0 iter : 15416 P : 0.055
Episode: 176/1000 Total reward: 200.0 iter : 15616 P : 0.054
Episode: 177/1000 Total reward: 200.0 iter : 15816 P : 0.052
Episode: 178/1000 Total reward: 200.0 iter : 16016 P : 0.050
Episode: 179/1000 Total reward: 200.0 iter : 16216 P : 0.049
Episode: 180/1000 Total reward: 200.0 iter : 16416 P : 0.047
Episode: 181/1000 Total reward: 200.0 iter : 16616 P : 0.046
Episode: 182/1000 Total reward: 200.0 iter : 16816 P : 0.044
Episode: 183/1000 Total reward: 200.0 iter : 17016 P : 0.043
Episode: 184/1000 Total reward: 200.0 iter : 17216 P : 0.042
Episode: 185/1000 Total reward: 200.0 iter : 17416 P : 0.040
Episode: 186/1000 Total reward: 200.0 iter : 17616 P : 0.039
Episode: 187/1000 Total reward: 200.0 iter : 17816 P : 0.038
Episode: 188/1000 Total reward: 200.0 iter : 18016 P : 0.037
Episode: 189/1000 Total reward: 200.0 iter : 18216 P : 0.036
Episode: 190/1000 Total reward: 200.0 iter : 18416 P : 0.035
Episode: 191/1000 Total reward: 200.0 iter : 18616 P : 0.034
Episode: 192/1000 Total reward: 200.0 iter : 18816 P : 0.033
Episode: 193/1000 Total reward: 200.0 iter : 19016 P : 0.032
Episode: 194/1000 Total reward: 200.0 iter : 19216 P : 0.031
Episode: 195/1000 Total reward: 190.0 iter : 19406 P : 0.030
Episode: 196/1000 Total reward: 135.0 iter : 19541 P : 0.030
Episode: 197/1000 Total reward: 191.0 iter : 19732 P : 0.029
Episode: 198/1000 Total reward: 172.0 iter : 19904 P : 0.028
Episode: 199/1000 Total reward: 200.0 iter : 20104 P : 0.028
Episode: 200/1000 Total reward: 173.0 iter : 20277 P : 0.027
Episode: 201/1000 Total reward: 10.0 iter : 20287 P : 0.027
Episode: 202/1000 Total reward: 130.0 iter : 20417 P : 0.027
Episode: 203/1000 Total reward: 151.0 iter : 20568 P : 0.026
Episode: 204/1000 Total reward: 153.0 iter : 20721 P : 0.026
Episode: 205/1000 Total reward: 186.0 iter : 20907 P : 0.025
Episode: 206/1000 Total reward: 200.0 iter : 21107 P : 0.025
Episode: 207/1000 Total reward: 200.0 iter : 21307 P : 0.024
Episode: 208/1000 Total reward: 200.0 iter : 21507 P : 0.023
Episode: 209/1000 Total reward: 200.0 iter : 21707 P : 0.023
Episode: 210/1000 Total reward: 200.0 iter : 21907 P : 0.022
Episode: 211/1000 Total reward: 200.0 iter : 22107 P : 0.022
Episode: 212/1000 Total reward: 200.0 iter : 22307 P : 0.021
Episode: 213/1000 Total reward: 200.0 iter : 22507 P : 0.021
Episode: 214/1000 Total reward: 200.0 iter : 22707 P : 0.021
Episode: 215/1000 Total reward: 200.0 iter : 22907 P : 0.020
Episode: 216/1000 Total reward: 200.0 iter : 23107 P : 0.020
Episode: 217/1000 Total reward: 185.0 iter : 23292 P : 0.019
Episode: 218/1000 Total reward: 115.0 iter : 23407 P : 0.019
Episode: 219/1000 Total reward: 9.0 iter : 23416 P : 0.019
Episode: 220/1000 Total reward: 9.0 iter : 23425 P : 0.019
Episode: 221/1000 Total reward: 10.0 iter : 23435 P : 0.019
Episode: 222/1000 Total reward: 10.0 iter : 23445 P : 0.019
Episode: 223/1000 Total reward: 10.0 iter : 23455 P : 0.019
Episode: 224/1000 Total reward: 10.0 iter : 23465 P : 0.019
Episode: 225/1000 Total reward: 9.0 iter : 23474 P : 0.019
Episode: 226/1000 Total reward: 8.0 iter : 23482 P : 0.019
Episode: 227/1000 Total reward: 9.0 iter : 23491 P : 0.019
Episode: 228/1000 Total reward: 11.0 iter : 23502 P : 0.019
Episode: 229/1000 Total reward: 9.0 iter : 23511 P : 0.019
Episode: 230/1000 Total reward: 8.0 iter : 23519 P : 0.019
Episode: 231/1000 Total reward: 11.0 iter : 23530 P : 0.019
Episode: 232/1000 Total reward: 11.0 iter : 23541 P : 0.019
Episode: 233/1000 Total reward: 10.0 iter : 23551 P : 0.019
Episode: 234/1000 Total reward: 200.0 iter : 23751 P : 0.019
Episode: 235/1000 Total reward: 9.0 iter : 23760 P : 0.019
Episode: 236/1000 Total reward: 9.0 iter : 23769 P : 0.019
Episode: 237/1000 Total reward: 10.0 iter : 23779 P : 0.019
Episode: 238/1000 Total reward: 8.0 iter : 23787 P : 0.019
Episode: 239/1000 Total reward: 10.0 iter : 23797 P : 0.018
Episode: 240/1000 Total reward: 10.0 iter : 23807 P : 0.018
Episode: 241/1000 Total reward: 10.0 iter : 23817 P : 0.018
Episode: 242/1000 Total reward: 10.0 iter : 23827 P : 0.018
Episode: 243/1000 Total reward: 8.0 iter : 23835 P : 0.018
Episode: 244/1000 Total reward: 10.0 iter : 23845 P : 0.018
Episode: 245/1000 Total reward: 11.0 iter : 23856 P : 0.018
Episode: 246/1000 Total reward: 9.0 iter : 23865 P : 0.018
Episode: 247/1000 Total reward: 10.0 iter : 23875 P : 0.018
Episode: 248/1000 Total reward: 10.0 iter : 23885 P : 0.018
Episode: 249/1000 Total reward: 10.0 iter : 23895 P : 0.018
Episode: 250/1000 Total reward: 8.0 iter : 23903 P : 0.018
Episode: 251/1000 Total reward: 9.0 iter : 23912 P : 0.018
Episode: 252/1000 Total reward: 9.0 iter : 23921 P : 0.018
Episode: 253/1000 Total reward: 10.0 iter : 23931 P : 0.018
Episode: 254/1000 Total reward: 9.0 iter : 23940 P : 0.018
Episode: 255/1000 Total reward: 10.0 iter : 23950 P : 0.018
Episode: 256/1000 Total reward: 11.0 iter : 23961 P : 0.018
Episode: 257/1000 Total reward: 10.0 iter : 23971 P : 0.018
Episode: 258/1000 Total reward: 158.0 iter : 24129 P : 0.018
Episode: 259/1000 Total reward: 200.0 iter : 24329 P : 0.018
Episode: 260/1000 Total reward: 200.0 iter : 24529 P : 0.017
Episode: 261/1000 Total reward: 200.0 iter : 24729 P : 0.017
Episode: 262/1000 Total reward: 200.0 iter : 24929 P : 0.017
Episode: 263/1000 Total reward: 200.0 iter : 25129 P : 0.017
Episode: 264/1000 Total reward: 200.0 iter : 25329 P : 0.016
Episode: 265/1000 Total reward: 200.0 iter : 25529 P : 0.016
Episode: 266/1000 Total reward: 200.0 iter : 25729 P : 0.016
Episode: 267/1000 Total reward: 200.0 iter : 25929 P : 0.016
Episode: 268/1000 Total reward: 200.0 iter : 26129 P : 0.015
Episode: 269/1000 Total reward: 200.0 iter : 26329 P : 0.015
Episode: 270/1000 Total reward: 200.0 iter : 26529 P : 0.015
Episode: 271/1000 Total reward: 200.0 iter : 26729 P : 0.015
Episode: 272/1000 Total reward: 200.0 iter : 26929 P : 0.015
Episode: 273/1000 Total reward: 200.0 iter : 27129 P : 0.014
Episode: 274/1000 Total reward: 200.0 iter : 27329 P : 0.014
Episode: 275/1000 Total reward: 200.0 iter : 27529 P : 0.014
Episode: 276/1000 Total reward: 200.0 iter : 27729 P : 0.014
Episode: 277/1000 Total reward: 200.0 iter : 27929 P : 0.014
Episode: 278/1000 Total reward: 200.0 iter : 28129 P : 0.014
Episode: 279/1000 Total reward: 200.0 iter : 28329 P : 0.013
Episode: 280/1000 Total reward: 200.0 iter : 28529 P : 0.013
Episode: 281/1000 Total reward: 200.0 iter : 28729 P : 0.013
Episode: 282/1000 Total reward: 200.0 iter : 28929 P : 0.013
Episode: 283/1000 Total reward: 200.0 iter : 29129 P : 0.013
Episode: 284/1000 Total reward: 200.0 iter : 29329 P : 0.013
Episode: 285/1000 Total reward: 200.0 iter : 29529 P : 0.013
Episode: 286/1000 Total reward: 200.0 iter : 29729 P : 0.013
Episode: 287/1000 Total reward: 200.0 iter : 29929 P : 0.012
Episode: 288/1000 Total reward: 200.0 iter : 30129 P : 0.012
Episode: 289/1000 Total reward: 200.0 iter : 30329 P : 0.012
Episode: 290/1000 Total reward: 200.0 iter : 30529 P : 0.012
Episode: 291/1000 Total reward: 200.0 iter : 30729 P : 0.012
Episode: 292/1000 Total reward: 198.0 iter : 30927 P : 0.012
Episode: 293/1000 Total reward: 200.0 iter : 31127 P : 0.012
Episode: 294/1000 Total reward: 200.0 iter : 31327 P : 0.012
Episode: 295/1000 Total reward: 200.0 iter : 31527 P : 0.012
Episode: 296/1000 Total reward: 200.0 iter : 31727 P : 0.012
Episode: 297/1000 Total reward: 200.0 iter : 31927 P : 0.012
Episode: 298/1000 Total reward: 200.0 iter : 32127 P : 0.012
Episode: 299/1000 Total reward: 196.0 iter : 32323 P : 0.012
Episode: 300/1000 Total reward: 200.0 iter : 32523 P : 0.011
Episode: 301/1000 Total reward: 200.0 iter : 32723 P : 0.011
Episode: 302/1000 Total reward: 200.0 iter : 32923 P : 0.011
Episode: 303/1000 Total reward: 200.0 iter : 33123 P : 0.011
Episode: 304/1000 Total reward: 200.0 iter : 33323 P : 0.011
Episode: 305/1000 Total reward: 200.0 iter : 33523 P : 0.011
Episode: 306/1000 Total reward: 200.0 iter : 33723 P : 0.011
Episode: 307/1000 Total reward: 200.0 iter : 33923 P : 0.011
Episode: 308/1000 Total reward: 149.0 iter : 34072 P : 0.011
Episode: 309/1000 Total reward: 135.0 iter : 34207 P : 0.011
Episode: 310/1000 Total reward: 148.0 iter : 34355 P : 0.011
Episode: 311/1000 Total reward: 166.0 iter : 34521 P : 0.011
Episode: 312/1000 Total reward: 135.0 iter : 34656 P : 0.011
Episode: 313/1000 Total reward: 185.0 iter : 34841 P : 0.011
Episode: 314/1000 Total reward: 123.0 iter : 34964 P : 0.011
Episode: 315/1000 Total reward: 152.0 iter : 35116 P : 0.011
Episode: 316/1000 Total reward: 155.0 iter : 35271 P : 0.011
Episode: 317/1000 Total reward: 166.0 iter : 35437 P : 0.011
Episode: 318/1000 Total reward: 200.0 iter : 35637 P : 0.011
Episode: 319/1000 Total reward: 200.0 iter : 35837 P : 0.011
Episode: 320/1000 Total reward: 200.0 iter : 36037 P : 0.011
Episode: 321/1000 Total reward: 200.0 iter : 36237 P : 0.011
Episode: 322/1000 Total reward: 200.0 iter : 36437 P : 0.011
Episode: 323/1000 Total reward: 200.0 iter : 36637 P : 0.011
Episode: 324/1000 Total reward: 200.0 iter : 36837 P : 0.011
Episode: 325/1000 Total reward: 200.0 iter : 37037 P : 0.011
Episode: 326/1000 Total reward: 200.0 iter : 37237 P : 0.011
Episode: 327/1000 Total reward: 200.0 iter : 37437 P : 0.011
Episode: 328/1000 Total reward: 200.0 iter : 37637 P : 0.011
Episode: 329/1000 Total reward: 200.0 iter : 37837 P : 0.011
Episode: 330/1000 Total reward: 200.0 iter : 38037 P : 0.010
Episode: 331/1000 Total reward: 200.0 iter : 38237 P : 0.010
Episode: 332/1000 Total reward: 200.0 iter : 38437 P : 0.010
Episode: 333/1000 Total reward: 182.0 iter : 38619 P : 0.010
Episode: 334/1000 Total reward: 184.0 iter : 38803 P : 0.010
Episode: 335/1000 Total reward: 200.0 iter : 39003 P : 0.010
Episode: 336/1000 Total reward: 200.0 iter : 39203 P : 0.010
Episode: 337/1000 Total reward: 164.0 iter : 39367 P : 0.010
Episode: 338/1000 Total reward: 200.0 iter : 39567 P : 0.010
Episode: 339/1000 Total reward: 200.0 iter : 39767 P : 0.010
Episode: 340/1000 Total reward: 200.0 iter : 39967 P : 0.010
Episode: 341/1000 Total reward: 200.0 iter : 40167 P : 0.010
Episode: 342/1000 Total reward: 200.0 iter : 40367 P : 0.010
Episode: 343/1000 Total reward: 200.0 iter : 40567 P : 0.010
Episode: 344/1000 Total reward: 168.0 iter : 40735 P : 0.010
Episode: 345/1000 Total reward: 175.0 iter : 40910 P : 0.010
Episode: 346/1000 Total reward: 200.0 iter : 41110 P : 0.010
Episode: 347/1000 Total reward: 200.0 iter : 41310 P : 0.010
Episode: 348/1000 Total reward: 164.0 iter : 41474 P : 0.010
Episode: 349/1000 Total reward: 178.0 iter : 41652 P : 0.010
Episode: 350/1000 Total reward: 150.0 iter : 41802 P : 0.010
Episode: 351/1000 Total reward: 166.0 iter : 41968 P : 0.010
Episode: 352/1000 Total reward: 197.0 iter : 42165 P : 0.010
Episode: 353/1000 Total reward: 163.0 iter : 42328 P : 0.010
Episode: 354/1000 Total reward: 133.0 iter : 42461 P : 0.010
Episode: 355/1000 Total reward: 163.0 iter : 42624 P : 0.010
Episode: 356/1000 Total reward: 114.0 iter : 42738 P : 0.010
Episode: 357/1000 Total reward: 144.0 iter : 42882 P : 0.010
Episode: 358/1000 Total reward: 112.0 iter : 42994 P : 0.010
Episode: 359/1000 Total reward: 55.0 iter : 43049 P : 0.010
Episode: 360/1000 Total reward: 10.0 iter : 43059 P : 0.010
Episode: 361/1000 Total reward: 145.0 iter : 43204 P : 0.010
Episode: 362/1000 Total reward: 11.0 iter : 43215 P : 0.010
Episode: 363/1000 Total reward: 10.0 iter : 43225 P : 0.010
Episode: 364/1000 Total reward: 10.0 iter : 43235 P : 0.010
Episode: 365/1000 Total reward: 10.0 iter : 43245 P : 0.010
Episode: 366/1000 Total reward: 11.0 iter : 43256 P : 0.010
Episode: 367/1000 Total reward: 8.0 iter : 43264 P : 0.010
Episode: 368/1000 Total reward: 9.0 iter : 43273 P : 0.010
Episode: 369/1000 Total reward: 13.0 iter : 43286 P : 0.010
Episode: 370/1000 Total reward: 9.0 iter : 43295 P : 0.010
Episode: 371/1000 Total reward: 23.0 iter : 43318 P : 0.010
Episode: 372/1000 Total reward: 25.0 iter : 43343 P : 0.010
Episode: 373/1000 Total reward: 12.0 iter : 43355 P : 0.010
Episode: 374/1000 Total reward: 15.0 iter : 43370 P : 0.010
Episode: 375/1000 Total reward: 16.0 iter : 43386 P : 0.010
Episode: 376/1000 Total reward: 10.0 iter : 43396 P : 0.010
Episode: 377/1000 Total reward: 11.0 iter : 43407 P : 0.010
Episode: 378/1000 Total reward: 10.0 iter : 43417 P : 0.010
Episode: 379/1000 Total reward: 9.0 iter : 43426 P : 0.010
Episode: 380/1000 Total reward: 10.0 iter : 43436 P : 0.010
Episode: 381/1000 Total reward: 8.0 iter : 43444 P : 0.010
Episode: 382/1000 Total reward: 10.0 iter : 43454 P : 0.010
Episode: 383/1000 Total reward: 200.0 iter : 43654 P : 0.010
Episode: 384/1000 Total reward: 139.0 iter : 43793 P : 0.010
Episode: 385/1000 Total reward: 134.0 iter : 43927 P : 0.010
Episode: 386/1000 Total reward: 133.0 iter : 44060 P : 0.010
Episode: 387/1000 Total reward: 141.0 iter : 44201 P : 0.010
Episode: 388/1000 Total reward: 143.0 iter : 44344 P : 0.010
Episode: 389/1000 Total reward: 200.0 iter : 44544 P : 0.010
Episode: 390/1000 Total reward: 200.0 iter : 44744 P : 0.010
Episode: 391/1000 Total reward: 200.0 iter : 44944 P : 0.010
Episode: 392/1000 Total reward: 168.0 iter : 45112 P : 0.010
Episode: 393/1000 Total reward: 167.0 iter : 45279 P : 0.010
Episode: 394/1000 Total reward: 160.0 iter : 45439 P : 0.010
Episode: 395/1000 Total reward: 200.0 iter : 45639 P : 0.010
Episode: 396/1000 Total reward: 200.0 iter : 45839 P : 0.010
Episode: 397/1000 Total reward: 200.0 iter : 46039 P : 0.010
Episode: 398/1000 Total reward: 200.0 iter : 46239 P : 0.010
Episode: 399/1000 Total reward: 200.0 iter : 46439 P : 0.010
Episode: 400/1000 Total reward: 199.0 iter : 46638 P : 0.010
Episode: 401/1000 Total reward: 185.0 iter : 46823 P : 0.010
Episode: 402/1000 Total reward: 191.0 iter : 47014 P : 0.010
Episode: 403/1000 Total reward: 195.0 iter : 47209 P : 0.010
Episode: 404/1000 Total reward: 200.0 iter : 47409 P : 0.010
Episode: 405/1000 Total reward: 200.0 iter : 47609 P : 0.010
Episode: 406/1000 Total reward: 200.0 iter : 47809 P : 0.010
Episode: 407/1000 Total reward: 200.0 iter : 48009 P : 0.010
Episode: 408/1000 Total reward: 200.0 iter : 48209 P : 0.010
Episode: 409/1000 Total reward: 200.0 iter : 48409 P : 0.010
Episode: 410/1000 Total reward: 200.0 iter : 48609 P : 0.010
Episode: 411/1000 Total reward: 166.0 iter : 48775 P : 0.010
Episode: 412/1000 Total reward: 200.0 iter : 48975 P : 0.010
Episode: 413/1000 Total reward: 200.0 iter : 49175 P : 0.010
Episode: 414/1000 Total reward: 200.0 iter : 49375 P : 0.010
Episode: 415/1000 Total reward: 200.0 iter : 49575 P : 0.010
Episode: 416/1000 Total reward: 192.0 iter : 49767 P : 0.010
Episode: 417/1000 Total reward: 193.0 iter : 49960 P : 0.010
Episode: 418/1000 Total reward: 200.0 iter : 50160 P : 0.010
Episode: 419/1000 Total reward: 200.0 iter : 50360 P : 0.010
Episode: 420/1000 Total reward: 200.0 iter : 50560 P : 0.010
Episode: 421/1000 Total reward: 200.0 iter : 50760 P : 0.010
Episode: 422/1000 Total reward: 200.0 iter : 50960 P : 0.010
Episode: 423/1000 Total reward: 200.0 iter : 51160 P : 0.010
Episode: 424/1000 Total reward: 200.0 iter : 51360 P : 0.010
Episode: 425/1000 Total reward: 200.0 iter : 51560 P : 0.010
Episode: 426/1000 Total reward: 200.0 iter : 51760 P : 0.010
Episode: 427/1000 Total reward: 200.0 iter : 51960 P : 0.010
Episode: 428/1000 Total reward: 200.0 iter : 52160 P : 0.010
Episode: 429/1000 Total reward: 200.0 iter : 52360 P : 0.010
Episode: 430/1000 Total reward: 200.0 iter : 52560 P : 0.010
Episode: 431/1000 Total reward: 200.0 iter : 52760 P : 0.010
Episode: 432/1000 Total reward: 200.0 iter : 52960 P : 0.010
Episode: 433/1000 Total reward: 200.0 iter : 53160 P : 0.010
Episode: 434/1000 Total reward: 200.0 iter : 53360 P : 0.010
Episode: 435/1000 Total reward: 200.0 iter : 53560 P : 0.010
Episode: 436/1000 Total reward: 200.0 iter : 53760 P : 0.010
Episode: 437/1000 Total reward: 200.0 iter : 53960 P : 0.010
Episode: 438/1000 Total reward: 200.0 iter : 54160 P : 0.010
Episode: 439/1000 Total reward: 200.0 iter : 54360 P : 0.010
Episode: 440/1000 Total reward: 200.0 iter : 54560 P : 0.010
Episode: 441/1000 Total reward: 16.0 iter : 54576 P : 0.010
Episode: 442/1000 Total reward: 10.0 iter : 54586 P : 0.010
Episode: 443/1000 Total reward: 12.0 iter : 54598 P : 0.010
Episode: 444/1000 Total reward: 14.0 iter : 54612 P : 0.010
Episode: 445/1000 Total reward: 10.0 iter : 54622 P : 0.010
Episode: 446/1000 Total reward: 9.0 iter : 54631 P : 0.010
Episode: 447/1000 Total reward: 8.0 iter : 54639 P : 0.010
Episode: 448/1000 Total reward: 8.0 iter : 54647 P : 0.010
Episode: 449/1000 Total reward: 200.0 iter : 54847 P : 0.010
Episode: 450/1000 Total reward: 187.0 iter : 55034 P : 0.010
Episode: 451/1000 Total reward: 178.0 iter : 55212 P : 0.010
Episode: 452/1000 Total reward: 170.0 iter : 55382 P : 0.010
Episode: 453/1000 Total reward: 137.0 iter : 55519 P : 0.010
Episode: 454/1000 Total reward: 142.0 iter : 55661 P : 0.010
Episode: 455/1000 Total reward: 143.0 iter : 55804 P : 0.010
Episode: 456/1000 Total reward: 122.0 iter : 55926 P : 0.010
Episode: 457/1000 Total reward: 110.0 iter : 56036 P : 0.010
Episode: 458/1000 Total reward: 95.0 iter : 56131 P : 0.010
Episode: 459/1000 Total reward: 96.0 iter : 56227 P : 0.010
Episode: 460/1000 Total reward: 142.0 iter : 56369 P : 0.010
Episode: 461/1000 Total reward: 119.0 iter : 56488 P : 0.010
Episode: 462/1000 Total reward: 200.0 iter : 56688 P : 0.010
Episode: 463/1000 Total reward: 200.0 iter : 56888 P : 0.010
Episode: 464/1000 Total reward: 200.0 iter : 57088 P : 0.010
Episode: 465/1000 Total reward: 200.0 iter : 57288 P : 0.010
Episode: 466/1000 Total reward: 200.0 iter : 57488 P : 0.010
Episode: 467/1000 Total reward: 200.0 iter : 57688 P : 0.010
Episode: 468/1000 Total reward: 200.0 iter : 57888 P : 0.010
Episode: 469/1000 Total reward: 200.0 iter : 58088 P : 0.010
Episode: 470/1000 Total reward: 200.0 iter : 58288 P : 0.010
Episode: 471/1000 Total reward: 200.0 iter : 58488 P : 0.010
Episode: 472/1000 Total reward: 200.0 iter : 58688 P : 0.010
Episode: 473/1000 Total reward: 200.0 iter : 58888 P : 0.010
Episode: 474/1000 Total reward: 200.0 iter : 59088 P : 0.010
Episode: 475/1000 Total reward: 200.0 iter : 59288 P : 0.010
Episode: 476/1000 Total reward: 200.0 iter : 59488 P : 0.010
Episode: 477/1000 Total reward: 200.0 iter : 59688 P : 0.010
Episode: 478/1000 Total reward: 200.0 iter : 59888 P : 0.010
Episode: 479/1000 Total reward: 200.0 iter : 60088 P : 0.010
Episode: 480/1000 Total reward: 200.0 iter : 60288 P : 0.010
Episode: 481/1000 Total reward: 200.0 iter : 60488 P : 0.010
Episode: 482/1000 Total reward: 200.0 iter : 60688 P : 0.010
Episode: 483/1000 Total reward: 200.0 iter : 60888 P : 0.010
Episode: 484/1000 Total reward: 200.0 iter : 61088 P : 0.010
Episode: 485/1000 Total reward: 200.0 iter : 61288 P : 0.010
Episode: 486/1000 Total reward: 137.0 iter : 61425 P : 0.010
Episode: 487/1000 Total reward: 200.0 iter : 61625 P : 0.010
Episode: 488/1000 Total reward: 200.0 iter : 61825 P : 0.010
Episode: 489/1000 Total reward: 200.0 iter : 62025 P : 0.010
Episode: 490/1000 Total reward: 200.0 iter : 62225 P : 0.010
Episode: 491/1000 Total reward: 200.0 iter : 62425 P : 0.010
Episode: 492/1000 Total reward: 200.0 iter : 62625 P : 0.010
Episode: 493/1000 Total reward: 200.0 iter : 62825 P : 0.010
Episode: 494/1000 Total reward: 200.0 iter : 63025 P : 0.010
Episode: 495/1000 Total reward: 200.0 iter : 63225 P : 0.010
Episode: 496/1000 Total reward: 200.0 iter : 63425 P : 0.010
Episode: 497/1000 Total reward: 200.0 iter : 63625 P : 0.010
Episode: 498/1000 Total reward: 200.0 iter : 63825 P : 0.010
Episode: 499/1000 Total reward: 200.0 iter : 64025 P : 0.010
Episode: 500/1000 Total reward: 200.0 iter : 64225 P : 0.010
Episode: 501/1000 Total reward: 200.0 iter : 64425 P : 0.010
Episode: 502/1000 Total reward: 200.0 iter : 64625 P : 0.010
Episode: 503/1000 Total reward: 200.0 iter : 64825 P : 0.010
Episode: 504/1000 Total reward: 200.0 iter : 65025 P : 0.010
Episode: 505/1000 Total reward: 200.0 iter : 65225 P : 0.010
Episode: 506/1000 Total reward: 109.0 iter : 65334 P : 0.010
Episode: 507/1000 Total reward: 200.0 iter : 65534 P : 0.010
Episode: 508/1000 Total reward: 200.0 iter : 65734 P : 0.010
Episode: 509/1000 Total reward: 200.0 iter : 65934 P : 0.010
Episode: 510/1000 Total reward: 200.0 iter : 66134 P : 0.010
Episode: 511/1000 Total reward: 200.0 iter : 66334 P : 0.010
Episode: 512/1000 Total reward: 200.0 iter : 66534 P : 0.010
Episode: 513/1000 Total reward: 200.0 iter : 66734 P : 0.010
Episode: 514/1000 Total reward: 86.0 iter : 66820 P : 0.010
Episode: 515/1000 Total reward: 10.0 iter : 66830 P : 0.010
Episode: 516/1000 Total reward: 9.0 iter : 66839 P : 0.010
Episode: 517/1000 Total reward: 8.0 iter : 66847 P : 0.010
Episode: 518/1000 Total reward: 9.0 iter : 66856 P : 0.010
Episode: 519/1000 Total reward: 9.0 iter : 66865 P : 0.010
Episode: 520/1000 Total reward: 10.0 iter : 66875 P : 0.010
Episode: 521/1000 Total reward: 10.0 iter : 66885 P : 0.010
Episode: 522/1000 Total reward: 9.0 iter : 66894 P : 0.010
Episode: 523/1000 Total reward: 10.0 iter : 66904 P : 0.010
Episode: 524/1000 Total reward: 9.0 iter : 66913 P : 0.010
Episode: 525/1000 Total reward: 9.0 iter : 66922 P : 0.010
Episode: 526/1000 Total reward: 10.0 iter : 66932 P : 0.010
Episode: 527/1000 Total reward: 10.0 iter : 66942 P : 0.010
Episode: 528/1000 Total reward: 9.0 iter : 66951 P : 0.010
Episode: 529/1000 Total reward: 8.0 iter : 66959 P : 0.010
Episode: 530/1000 Total reward: 8.0 iter : 66967 P : 0.010
Episode: 531/1000 Total reward: 10.0 iter : 66977 P : 0.010
Episode: 532/1000 Total reward: 10.0 iter : 66987 P : 0.010
Episode: 533/1000 Total reward: 10.0 iter : 66997 P : 0.010
Episode: 534/1000 Total reward: 11.0 iter : 67008 P : 0.010
Episode: 535/1000 Total reward: 10.0 iter : 67018 P : 0.010
Episode: 536/1000 Total reward: 9.0 iter : 67027 P : 0.010
Episode: 537/1000 Total reward: 11.0 iter : 67038 P : 0.010
Episode: 538/1000 Total reward: 9.0 iter : 67047 P : 0.010
Episode: 539/1000 Total reward: 10.0 iter : 67057 P : 0.010
Episode: 540/1000 Total reward: 10.0 iter : 67067 P : 0.010
Episode: 541/1000 Total reward: 11.0 iter : 67078 P : 0.010
Episode: 542/1000 Total reward: 9.0 iter : 67087 P : 0.010
Episode: 543/1000 Total reward: 9.0 iter : 67096 P : 0.010
Episode: 544/1000 Total reward: 10.0 iter : 67106 P : 0.010
Episode: 545/1000 Total reward: 10.0 iter : 67116 P : 0.010
Episode: 546/1000 Total reward: 9.0 iter : 67125 P : 0.010
Episode: 547/1000 Total reward: 10.0 iter : 67135 P : 0.010
Episode: 548/1000 Total reward: 9.0 iter : 67144 P : 0.010
Episode: 549/1000 Total reward: 10.0 iter : 67154 P : 0.010
Episode: 550/1000 Total reward: 9.0 iter : 67163 P : 0.010
Episode: 551/1000 Total reward: 9.0 iter : 67172 P : 0.010
Episode: 552/1000 Total reward: 8.0 iter : 67180 P : 0.010
Episode: 553/1000 Total reward: 10.0 iter : 67190 P : 0.010
Episode: 554/1000 Total reward: 8.0 iter : 67198 P : 0.010
Episode: 555/1000 Total reward: 10.0 iter : 67208 P : 0.010
Episode: 556/1000 Total reward: 13.0 iter : 67221 P : 0.010
Episode: 557/1000 Total reward: 35.0 iter : 67256 P : 0.010
Episode: 558/1000 Total reward: 9.0 iter : 67265 P : 0.010
Episode: 559/1000 Total reward: 9.0 iter : 67274 P : 0.010
Episode: 560/1000 Total reward: 9.0 iter : 67283 P : 0.010
Episode: 561/1000 Total reward: 8.0 iter : 67291 P : 0.010
Episode: 562/1000 Total reward: 8.0 iter : 67299 P : 0.010
Episode: 563/1000 Total reward: 10.0 iter : 67309 P : 0.010
Episode: 564/1000 Total reward: 9.0 iter : 67318 P : 0.010
Episode: 565/1000 Total reward: 10.0 iter : 67328 P : 0.010
Episode: 566/1000 Total reward: 10.0 iter : 67338 P : 0.010
Episode: 567/1000 Total reward: 9.0 iter : 67347 P : 0.010
Episode: 568/1000 Total reward: 9.0 iter : 67356 P : 0.010
Episode: 569/1000 Total reward: 10.0 iter : 67366 P : 0.010
Episode: 570/1000 Total reward: 8.0 iter : 67374 P : 0.010
Episode: 571/1000 Total reward: 11.0 iter : 67385 P : 0.010
Episode: 572/1000 Total reward: 10.0 iter : 67395 P : 0.010
Episode: 573/1000 Total reward: 10.0 iter : 67405 P : 0.010
Episode: 574/1000 Total reward: 10.0 iter : 67415 P : 0.010
Episode: 575/1000 Total reward: 10.0 iter : 67425 P : 0.010
Episode: 576/1000 Total reward: 8.0 iter : 67433 P : 0.010
Episode: 577/1000 Total reward: 9.0 iter : 67442 P : 0.010
Episode: 578/1000 Total reward: 10.0 iter : 67452 P : 0.010
Episode: 579/1000 Total reward: 11.0 iter : 67463 P : 0.010
Episode: 580/1000 Total reward: 11.0 iter : 67474 P : 0.010
Episode: 581/1000 Total reward: 9.0 iter : 67483 P : 0.010
Episode: 582/1000 Total reward: 11.0 iter : 67494 P : 0.010
Episode: 583/1000 Total reward: 9.0 iter : 67503 P : 0.010
Episode: 584/1000 Total reward: 9.0 iter : 67512 P : 0.010
Episode: 585/1000 Total reward: 10.0 iter : 67522 P : 0.010
Episode: 586/1000 Total reward: 10.0 iter : 67532 P : 0.010
Episode: 587/1000 Total reward: 10.0 iter : 67542 P : 0.010
Episode: 588/1000 Total reward: 10.0 iter : 67552 P : 0.010
Episode: 589/1000 Total reward: 10.0 iter : 67562 P : 0.010
Episode: 590/1000 Total reward: 9.0 iter : 67571 P : 0.010
Episode: 591/1000 Total reward: 10.0 iter : 67581 P : 0.010
Episode: 592/1000 Total reward: 9.0 iter : 67590 P : 0.010
Episode: 593/1000 Total reward: 10.0 iter : 67600 P : 0.010
Episode: 594/1000 Total reward: 8.0 iter : 67608 P : 0.010
Episode: 595/1000 Total reward: 10.0 iter : 67618 P : 0.010
Episode: 596/1000 Total reward: 8.0 iter : 67626 P : 0.010
Episode: 597/1000 Total reward: 10.0 iter : 67636 P : 0.010
Episode: 598/1000 Total reward: 9.0 iter : 67645 P : 0.010
Episode: 599/1000 Total reward: 9.0 iter : 67654 P : 0.010
Episode: 600/1000 Total reward: 11.0 iter : 67665 P : 0.010
Episode: 601/1000 Total reward: 10.0 iter : 67675 P : 0.010
Episode: 602/1000 Total reward: 9.0 iter : 67684 P : 0.010
Episode: 603/1000 Total reward: 9.0 iter : 67693 P : 0.010
Episode: 604/1000 Total reward: 9.0 iter : 67702 P : 0.010
Episode: 605/1000 Total reward: 9.0 iter : 67711 P : 0.010
Episode: 606/1000 Total reward: 9.0 iter : 67720 P : 0.010
Episode: 607/1000 Total reward: 9.0 iter : 67729 P : 0.010
Episode: 608/1000 Total reward: 9.0 iter : 67738 P : 0.010
Episode: 609/1000 Total reward: 20.0 iter : 67758 P : 0.010
Episode: 610/1000 Total reward: 17.0 iter : 67775 P : 0.010
Episode: 611/1000 Total reward: 23.0 iter : 67798 P : 0.010
Episode: 612/1000 Total reward: 22.0 iter : 67820 P : 0.010
Episode: 613/1000 Total reward: 43.0 iter : 67863 P : 0.010
Episode: 614/1000 Total reward: 111.0 iter : 67974 P : 0.010
Episode: 615/1000 Total reward: 141.0 iter : 68115 P : 0.010
Episode: 616/1000 Total reward: 116.0 iter : 68231 P : 0.010
Episode: 617/1000 Total reward: 118.0 iter : 68349 P : 0.010
Episode: 618/1000 Total reward: 200.0 iter : 68549 P : 0.010
Episode: 619/1000 Total reward: 200.0 iter : 68749 P : 0.010
Episode: 620/1000 Total reward: 200.0 iter : 68949 P : 0.010
Episode: 621/1000 Total reward: 200.0 iter : 69149 P : 0.010
Episode: 622/1000 Total reward: 200.0 iter : 69349 P : 0.010
Episode: 623/1000 Total reward: 135.0 iter : 69484 P : 0.010
Episode: 624/1000 Total reward: 142.0 iter : 69626 P : 0.010
Episode: 625/1000 Total reward: 132.0 iter : 69758 P : 0.010
Episode: 626/1000 Total reward: 125.0 iter : 69883 P : 0.010
Episode: 627/1000 Total reward: 150.0 iter : 70033 P : 0.010
Episode: 628/1000 Total reward: 200.0 iter : 70233 P : 0.010
Episode: 629/1000 Total reward: 163.0 iter : 70396 P : 0.010
Episode: 630/1000 Total reward: 141.0 iter : 70537 P : 0.010
Episode: 631/1000 Total reward: 157.0 iter : 70694 P : 0.010
Episode: 632/1000 Total reward: 147.0 iter : 70841 P : 0.010
Episode: 633/1000 Total reward: 158.0 iter : 70999 P : 0.010
Episode: 634/1000 Total reward: 200.0 iter : 71199 P : 0.010
Episode: 635/1000 Total reward: 200.0 iter : 71399 P : 0.010
Episode: 636/1000 Total reward: 134.0 iter : 71533 P : 0.010
Episode: 637/1000 Total reward: 200.0 iter : 71733 P : 0.010
Episode: 638/1000 Total reward: 145.0 iter : 71878 P : 0.010
Episode: 639/1000 Total reward: 200.0 iter : 72078 P : 0.010
Episode: 640/1000 Total reward: 124.0 iter : 72202 P : 0.010
Episode: 641/1000 Total reward: 135.0 iter : 72337 P : 0.010
Episode: 642/1000 Total reward: 200.0 iter : 72537 P : 0.010
Episode: 643/1000 Total reward: 136.0 iter : 72673 P : 0.010
Episode: 644/1000 Total reward: 129.0 iter : 72802 P : 0.010
Episode: 645/1000 Total reward: 131.0 iter : 72933 P : 0.010
Episode: 646/1000 Total reward: 138.0 iter : 73071 P : 0.010
Episode: 647/1000 Total reward: 110.0 iter : 73181 P : 0.010
Episode: 648/1000 Total reward: 134.0 iter : 73315 P : 0.010
Episode: 649/1000 Total reward: 154.0 iter : 73469 P : 0.010
Episode: 650/1000 Total reward: 144.0 iter : 73613 P : 0.010
Episode: 651/1000 Total reward: 139.0 iter : 73752 P : 0.010
Episode: 652/1000 Total reward: 138.0 iter : 73890 P : 0.010
Episode: 653/1000 Total reward: 157.0 iter : 74047 P : 0.010
Episode: 654/1000 Total reward: 160.0 iter : 74207 P : 0.010
Episode: 655/1000 Total reward: 179.0 iter : 74386 P : 0.010
Episode: 656/1000 Total reward: 155.0 iter : 74541 P : 0.010
Episode: 657/1000 Total reward: 200.0 iter : 74741 P : 0.010
Episode: 658/1000 Total reward: 200.0 iter : 74941 P : 0.010
Episode: 659/1000 Total reward: 200.0 iter : 75141 P : 0.010
Episode: 660/1000 Total reward: 200.0 iter : 75341 P : 0.010
Episode: 661/1000 Total reward: 200.0 iter : 75541 P : 0.010
Episode: 662/1000 Total reward: 200.0 iter : 75741 P : 0.010
Episode: 663/1000 Total reward: 174.0 iter : 75915 P : 0.010
Episode: 664/1000 Total reward: 176.0 iter : 76091 P : 0.010
Episode: 665/1000 Total reward: 149.0 iter : 76240 P : 0.010
Episode: 666/1000 Total reward: 154.0 iter : 76394 P : 0.010
Episode: 667/1000 Total reward: 139.0 iter : 76533 P : 0.010
Episode: 668/1000 Total reward: 132.0 iter : 76665 P : 0.010
Episode: 669/1000 Total reward: 140.0 iter : 76805 P : 0.010
Episode: 670/1000 Total reward: 124.0 iter : 76929 P : 0.010
Episode: 671/1000 Total reward: 127.0 iter : 77056 P : 0.010
Episode: 672/1000 Total reward: 132.0 iter : 77188 P : 0.010
Episode: 673/1000 Total reward: 156.0 iter : 77344 P : 0.010
Episode: 674/1000 Total reward: 200.0 iter : 77544 P : 0.010
Episode: 675/1000 Total reward: 68.0 iter : 77612 P : 0.010
Episode: 676/1000 Total reward: 13.0 iter : 77625 P : 0.010
Episode: 677/1000 Total reward: 9.0 iter : 77634 P : 0.010
Episode: 678/1000 Total reward: 8.0 iter : 77642 P : 0.010
Episode: 679/1000 Total reward: 10.0 iter : 77652 P : 0.010
Episode: 680/1000 Total reward: 10.0 iter : 77662 P : 0.010
Episode: 681/1000 Total reward: 10.0 iter : 77672 P : 0.010
Episode: 682/1000 Total reward: 10.0 iter : 77682 P : 0.010
Episode: 683/1000 Total reward: 10.0 iter : 77692 P : 0.010
Episode: 684/1000 Total reward: 9.0 iter : 77701 P : 0.010
Episode: 685/1000 Total reward: 10.0 iter : 77711 P : 0.010
Episode: 686/1000 Total reward: 9.0 iter : 77720 P : 0.010
Episode: 687/1000 Total reward: 9.0 iter : 77729 P : 0.010
Episode: 688/1000 Total reward: 9.0 iter : 77738 P : 0.010
Episode: 689/1000 Total reward: 10.0 iter : 77748 P : 0.010
Episode: 690/1000 Total reward: 9.0 iter : 77757 P : 0.010
Episode: 691/1000 Total reward: 10.0 iter : 77767 P : 0.010
Episode: 692/1000 Total reward: 8.0 iter : 77775 P : 0.010
Episode: 693/1000 Total reward: 9.0 iter : 77784 P : 0.010
Episode: 694/1000 Total reward: 9.0 iter : 77793 P : 0.010
Episode: 695/1000 Total reward: 10.0 iter : 77803 P : 0.010
Episode: 696/1000 Total reward: 8.0 iter : 77811 P : 0.010
Episode: 697/1000 Total reward: 10.0 iter : 77821 P : 0.010
Episode: 698/1000 Total reward: 9.0 iter : 77830 P : 0.010
Episode: 699/1000 Total reward: 9.0 iter : 77839 P : 0.010
Episode: 700/1000 Total reward: 9.0 iter : 77848 P : 0.010
Episode: 701/1000 Total reward: 12.0 iter : 77860 P : 0.010
Episode: 702/1000 Total reward: 10.0 iter : 77870 P : 0.010
Episode: 703/1000 Total reward: 10.0 iter : 77880 P : 0.010
Episode: 704/1000 Total reward: 10.0 iter : 77890 P : 0.010
Episode: 705/1000 Total reward: 8.0 iter : 77898 P : 0.010
Episode: 706/1000 Total reward: 9.0 iter : 77907 P : 0.010
Episode: 707/1000 Total reward: 10.0 iter : 77917 P : 0.010
Episode: 708/1000 Total reward: 9.0 iter : 77926 P : 0.010
Episode: 709/1000 Total reward: 10.0 iter : 77936 P : 0.010
Episode: 710/1000 Total reward: 9.0 iter : 77945 P : 0.010
Episode: 711/1000 Total reward: 9.0 iter : 77954 P : 0.010
Episode: 712/1000 Total reward: 10.0 iter : 77964 P : 0.010
Episode: 713/1000 Total reward: 9.0 iter : 77973 P : 0.010
Episode: 714/1000 Total reward: 11.0 iter : 77984 P : 0.010
Episode: 715/1000 Total reward: 12.0 iter : 77996 P : 0.010
Episode: 716/1000 Total reward: 12.0 iter : 78008 P : 0.010
Episode: 717/1000 Total reward: 103.0 iter : 78111 P : 0.010
Episode: 718/1000 Total reward: 20.0 iter : 78131 P : 0.010
Episode: 719/1000 Total reward: 14.0 iter : 78145 P : 0.010
Episode: 720/1000 Total reward: 15.0 iter : 78160 P : 0.010
Episode: 721/1000 Total reward: 10.0 iter : 78170 P : 0.010
Episode: 722/1000 Total reward: 9.0 iter : 78179 P : 0.010
Episode: 723/1000 Total reward: 8.0 iter : 78187 P : 0.010
Episode: 724/1000 Total reward: 9.0 iter : 78196 P : 0.010
Episode: 725/1000 Total reward: 9.0 iter : 78205 P : 0.010
Episode: 726/1000 Total reward: 10.0 iter : 78215 P : 0.010
Episode: 727/1000 Total reward: 9.0 iter : 78224 P : 0.010
Episode: 728/1000 Total reward: 10.0 iter : 78234 P : 0.010
Episode: 729/1000 Total reward: 10.0 iter : 78244 P : 0.010
Episode: 730/1000 Total reward: 10.0 iter : 78254 P : 0.010
Episode: 731/1000 Total reward: 9.0 iter : 78263 P : 0.010
Episode: 732/1000 Total reward: 9.0 iter : 78272 P : 0.010
Episode: 733/1000 Total reward: 9.0 iter : 78281 P : 0.010
Episode: 734/1000 Total reward: 10.0 iter : 78291 P : 0.010
Episode: 735/1000 Total reward: 9.0 iter : 78300 P : 0.010
Episode: 736/1000 Total reward: 9.0 iter : 78309 P : 0.010
Episode: 737/1000 Total reward: 8.0 iter : 78317 P : 0.010
Episode: 738/1000 Total reward: 10.0 iter : 78327 P : 0.010
Episode: 739/1000 Total reward: 10.0 iter : 78337 P : 0.010
Episode: 740/1000 Total reward: 9.0 iter : 78346 P : 0.010
Episode: 741/1000 Total reward: 9.0 iter : 78355 P : 0.010
Episode: 742/1000 Total reward: 10.0 iter : 78365 P : 0.010
Episode: 743/1000 Total reward: 9.0 iter : 78374 P : 0.010
Episode: 744/1000 Total reward: 10.0 iter : 78384 P : 0.010
Episode: 745/1000 Total reward: 12.0 iter : 78396 P : 0.010
Episode: 746/1000 Total reward: 14.0 iter : 78410 P : 0.010
Episode: 747/1000 Total reward: 13.0 iter : 78423 P : 0.010
Episode: 748/1000 Total reward: 16.0 iter : 78439 P : 0.010
Episode: 749/1000 Total reward: 10.0 iter : 78449 P : 0.010
Episode: 750/1000 Total reward: 10.0 iter : 78459 P : 0.010
Episode: 751/1000 Total reward: 11.0 iter : 78470 P : 0.010
Episode: 752/1000 Total reward: 9.0 iter : 78479 P : 0.010
Episode: 753/1000 Total reward: 13.0 iter : 78492 P : 0.010
Episode: 754/1000 Total reward: 11.0 iter : 78503 P : 0.010
Episode: 755/1000 Total reward: 10.0 iter : 78513 P : 0.010
Episode: 756/1000 Total reward: 8.0 iter : 78521 P : 0.010
Episode: 757/1000 Total reward: 10.0 iter : 78531 P : 0.010
Episode: 758/1000 Total reward: 11.0 iter : 78542 P : 0.010
Episode: 759/1000 Total reward: 16.0 iter : 78558 P : 0.010
Episode: 760/1000 Total reward: 12.0 iter : 78570 P : 0.010
Episode: 761/1000 Total reward: 15.0 iter : 78585 P : 0.010
Episode: 762/1000 Total reward: 13.0 iter : 78598 P : 0.010
Episode: 763/1000 Total reward: 16.0 iter : 78614 P : 0.010
Episode: 764/1000 Total reward: 16.0 iter : 78630 P : 0.010
Episode: 765/1000 Total reward: 26.0 iter : 78656 P : 0.010
Episode: 766/1000 Total reward: 46.0 iter : 78702 P : 0.010
Episode: 767/1000 Total reward: 27.0 iter : 78729 P : 0.010
Episode: 768/1000 Total reward: 23.0 iter : 78752 P : 0.010
Episode: 769/1000 Total reward: 68.0 iter : 78820 P : 0.010
Episode: 770/1000 Total reward: 104.0 iter : 78924 P : 0.010
Episode: 771/1000 Total reward: 97.0 iter : 79021 P : 0.010
Episode: 772/1000 Total reward: 107.0 iter : 79128 P : 0.010
Episode: 773/1000 Total reward: 120.0 iter : 79248 P : 0.010
Episode: 774/1000 Total reward: 137.0 iter : 79385 P : 0.010
Episode: 775/1000 Total reward: 121.0 iter : 79506 P : 0.010
Episode: 776/1000 Total reward: 112.0 iter : 79618 P : 0.010
Episode: 777/1000 Total reward: 111.0 iter : 79729 P : 0.010
Episode: 778/1000 Total reward: 117.0 iter : 79846 P : 0.010
Episode: 779/1000 Total reward: 113.0 iter : 79959 P : 0.010
Episode: 780/1000 Total reward: 107.0 iter : 80066 P : 0.010
Episode: 781/1000 Total reward: 114.0 iter : 80180 P : 0.010
Episode: 782/1000 Total reward: 104.0 iter : 80284 P : 0.010
Episode: 783/1000 Total reward: 109.0 iter : 80393 P : 0.010
Episode: 784/1000 Total reward: 111.0 iter : 80504 P : 0.010
Episode: 785/1000 Total reward: 105.0 iter : 80609 P : 0.010
Episode: 786/1000 Total reward: 107.0 iter : 80716 P : 0.010
Episode: 787/1000 Total reward: 118.0 iter : 80834 P : 0.010
Episode: 788/1000 Total reward: 107.0 iter : 80941 P : 0.010
Episode: 789/1000 Total reward: 103.0 iter : 81044 P : 0.010
Episode: 790/1000 Total reward: 106.0 iter : 81150 P : 0.010
Episode: 791/1000 Total reward: 111.0 iter : 81261 P : 0.010
Episode: 792/1000 Total reward: 115.0 iter : 81376 P : 0.010
Episode: 793/1000 Total reward: 106.0 iter : 81482 P : 0.010
Episode: 794/1000 Total reward: 105.0 iter : 81587 P : 0.010
Episode: 795/1000 Total reward: 109.0 iter : 81696 P : 0.010
Episode: 796/1000 Total reward: 103.0 iter : 81799 P : 0.010
Episode: 797/1000 Total reward: 110.0 iter : 81909 P : 0.010
Episode: 798/1000 Total reward: 123.0 iter : 82032 P : 0.010
Episode: 799/1000 Total reward: 126.0 iter : 82158 P : 0.010
Episode: 800/1000 Total reward: 141.0 iter : 82299 P : 0.010
Episode: 801/1000 Total reward: 135.0 iter : 82434 P : 0.010
Episode: 802/1000 Total reward: 136.0 iter : 82570 P : 0.010
Episode: 803/1000 Total reward: 131.0 iter : 82701 P : 0.010
Episode: 804/1000 Total reward: 132.0 iter : 82833 P : 0.010
Episode: 805/1000 Total reward: 181.0 iter : 83014 P : 0.010
Episode: 806/1000 Total reward: 200.0 iter : 83214 P : 0.010
Episode: 807/1000 Total reward: 200.0 iter : 83414 P : 0.010
Episode: 808/1000 Total reward: 200.0 iter : 83614 P : 0.010
Episode: 809/1000 Total reward: 151.0 iter : 83765 P : 0.010
Episode: 810/1000 Total reward: 13.0 iter : 83778 P : 0.010
Episode: 811/1000 Total reward: 12.0 iter : 83790 P : 0.010
Episode: 812/1000 Total reward: 10.0 iter : 83800 P : 0.010
Episode: 813/1000 Total reward: 9.0 iter : 83809 P : 0.010
Episode: 814/1000 Total reward: 9.0 iter : 83818 P : 0.010
Episode: 815/1000 Total reward: 11.0 iter : 83829 P : 0.010
Episode: 816/1000 Total reward: 12.0 iter : 83841 P : 0.010
Episode: 817/1000 Total reward: 9.0 iter : 83850 P : 0.010
Episode: 818/1000 Total reward: 9.0 iter : 83859 P : 0.010
Episode: 819/1000 Total reward: 11.0 iter : 83870 P : 0.010
Episode: 820/1000 Total reward: 13.0 iter : 83883 P : 0.010
Episode: 821/1000 Total reward: 12.0 iter : 83895 P : 0.010
Episode: 822/1000 Total reward: 10.0 iter : 83905 P : 0.010
Episode: 823/1000 Total reward: 14.0 iter : 83919 P : 0.010
Episode: 824/1000 Total reward: 13.0 iter : 83932 P : 0.010
Episode: 825/1000 Total reward: 200.0 iter : 84132 P : 0.010
Episode: 826/1000 Total reward: 164.0 iter : 84296 P : 0.010
Episode: 827/1000 Total reward: 111.0 iter : 84407 P : 0.010
Episode: 828/1000 Total reward: 105.0 iter : 84512 P : 0.010
Episode: 829/1000 Total reward: 95.0 iter : 84607 P : 0.010
Episode: 830/1000 Total reward: 106.0 iter : 84713 P : 0.010
Episode: 831/1000 Total reward: 21.0 iter : 84734 P : 0.010
Episode: 832/1000 Total reward: 96.0 iter : 84830 P : 0.010
Episode: 833/1000 Total reward: 104.0 iter : 84934 P : 0.010
Episode: 834/1000 Total reward: 105.0 iter : 85039 P : 0.010
Episode: 835/1000 Total reward: 123.0 iter : 85162 P : 0.010
Episode: 836/1000 Total reward: 130.0 iter : 85292 P : 0.010
Episode: 837/1000 Total reward: 113.0 iter : 85405 P : 0.010
Episode: 838/1000 Total reward: 115.0 iter : 85520 P : 0.010
Episode: 839/1000 Total reward: 118.0 iter : 85638 P : 0.010
Episode: 840/1000 Total reward: 126.0 iter : 85764 P : 0.010
Episode: 841/1000 Total reward: 200.0 iter : 85964 P : 0.010
Episode: 842/1000 Total reward: 200.0 iter : 86164 P : 0.010
Episode: 843/1000 Total reward: 200.0 iter : 86364 P : 0.010
Episode: 844/1000 Total reward: 200.0 iter : 86564 P : 0.010
Episode: 845/1000 Total reward: 200.0 iter : 86764 P : 0.010
Episode: 846/1000 Total reward: 200.0 iter : 86964 P : 0.010
Episode: 847/1000 Total reward: 200.0 iter : 87164 P : 0.010
Episode: 848/1000 Total reward: 179.0 iter : 87343 P : 0.010
Episode: 849/1000 Total reward: 12.0 iter : 87355 P : 0.010
Episode: 850/1000 Total reward: 200.0 iter : 87555 P : 0.010
Episode: 851/1000 Total reward: 125.0 iter : 87680 P : 0.010
Episode: 852/1000 Total reward: 107.0 iter : 87787 P : 0.010
Episode: 853/1000 Total reward: 100.0 iter : 87887 P : 0.010
Episode: 854/1000 Total reward: 97.0 iter : 87984 P : 0.010
Episode: 855/1000 Total reward: 107.0 iter : 88091 P : 0.010
Episode: 856/1000 Total reward: 98.0 iter : 88189 P : 0.010
Episode: 857/1000 Total reward: 96.0 iter : 88285 P : 0.010
Episode: 858/1000 Total reward: 103.0 iter : 88388 P : 0.010
Episode: 859/1000 Total reward: 100.0 iter : 88488 P : 0.010
Episode: 860/1000 Total reward: 105.0 iter : 88593 P : 0.010
Episode: 861/1000 Total reward: 115.0 iter : 88708 P : 0.010
Episode: 862/1000 Total reward: 128.0 iter : 88836 P : 0.010
Episode: 863/1000 Total reward: 200.0 iter : 89036 P : 0.010
Episode: 864/1000 Total reward: 200.0 iter : 89236 P : 0.010
Episode: 865/1000 Total reward: 200.0 iter : 89436 P : 0.010
Episode: 866/1000 Total reward: 200.0 iter : 89636 P : 0.010
Episode: 867/1000 Total reward: 200.0 iter : 89836 P : 0.010
Episode: 868/1000 Total reward: 200.0 iter : 90036 P : 0.010
Episode: 869/1000 Total reward: 200.0 iter : 90236 P : 0.010
Episode: 870/1000 Total reward: 200.0 iter : 90436 P : 0.010
Episode: 871/1000 Total reward: 200.0 iter : 90636 P : 0.010
Episode: 872/1000 Total reward: 200.0 iter : 90836 P : 0.010
Episode: 873/1000 Total reward: 200.0 iter : 91036 P : 0.010
Episode: 874/1000 Total reward: 200.0 iter : 91236 P : 0.010
Episode: 875/1000 Total reward: 200.0 iter : 91436 P : 0.010
Episode: 876/1000 Total reward: 200.0 iter : 91636 P : 0.010
Episode: 877/1000 Total reward: 200.0 iter : 91836 P : 0.010
Episode: 878/1000 Total reward: 200.0 iter : 92036 P : 0.010
Episode: 879/1000 Total reward: 200.0 iter : 92236 P : 0.010
Episode: 880/1000 Total reward: 200.0 iter : 92436 P : 0.010
Episode: 881/1000 Total reward: 200.0 iter : 92636 P : 0.010
Episode: 882/1000 Total reward: 200.0 iter : 92836 P : 0.010
Episode: 883/1000 Total reward: 200.0 iter : 93036 P : 0.010
Episode: 884/1000 Total reward: 200.0 iter : 93236 P : 0.010
Episode: 885/1000 Total reward: 200.0 iter : 93436 P : 0.010
Episode: 886/1000 Total reward: 200.0 iter : 93636 P : 0.010
Episode: 887/1000 Total reward: 200.0 iter : 93836 P : 0.010
Episode: 888/1000 Total reward: 200.0 iter : 94036 P : 0.010
Episode: 889/1000 Total reward: 200.0 iter : 94236 P : 0.010
Episode: 890/1000 Total reward: 200.0 iter : 94436 P : 0.010
Episode: 891/1000 Total reward: 120.0 iter : 94556 P : 0.010
Episode: 892/1000 Total reward: 200.0 iter : 94756 P : 0.010
Episode: 893/1000 Total reward: 124.0 iter : 94880 P : 0.010
Episode: 894/1000 Total reward: 200.0 iter : 95080 P : 0.010
Episode: 895/1000 Total reward: 200.0 iter : 95280 P : 0.010
Episode: 896/1000 Total reward: 123.0 iter : 95403 P : 0.010
Episode: 897/1000 Total reward: 33.0 iter : 95436 P : 0.010
Episode: 898/1000 Total reward: 21.0 iter : 95457 P : 0.010
Episode: 899/1000 Total reward: 17.0 iter : 95474 P : 0.010
Episode: 900/1000 Total reward: 200.0 iter : 95674 P : 0.010
Episode: 901/1000 Total reward: 118.0 iter : 95792 P : 0.010
Episode: 902/1000 Total reward: 200.0 iter : 95992 P : 0.010
Episode: 903/1000 Total reward: 30.0 iter : 96022 P : 0.010
Episode: 904/1000 Total reward: 18.0 iter : 96040 P : 0.010
Episode: 905/1000 Total reward: 16.0 iter : 96056 P : 0.010
Episode: 906/1000 Total reward: 18.0 iter : 96074 P : 0.010
Episode: 907/1000 Total reward: 12.0 iter : 96086 P : 0.010
Episode: 908/1000 Total reward: 11.0 iter : 96097 P : 0.010
Episode: 909/1000 Total reward: 15.0 iter : 96112 P : 0.010
Episode: 910/1000 Total reward: 14.0 iter : 96126 P : 0.010
Episode: 911/1000 Total reward: 13.0 iter : 96139 P : 0.010
Episode: 912/1000 Total reward: 16.0 iter : 96155 P : 0.010
Episode: 913/1000 Total reward: 16.0 iter : 96171 P : 0.010
Episode: 914/1000 Total reward: 14.0 iter : 96185 P : 0.010
Episode: 915/1000 Total reward: 84.0 iter : 96269 P : 0.010
Episode: 916/1000 Total reward: 144.0 iter : 96413 P : 0.010
Episode: 917/1000 Total reward: 24.0 iter : 96437 P : 0.010
Episode: 918/1000 Total reward: 10.0 iter : 96447 P : 0.010
Episode: 919/1000 Total reward: 10.0 iter : 96457 P : 0.010
Episode: 920/1000 Total reward: 10.0 iter : 96467 P : 0.010
Episode: 921/1000 Total reward: 9.0 iter : 96476 P : 0.010
Episode: 922/1000 Total reward: 11.0 iter : 96487 P : 0.010
Episode: 923/1000 Total reward: 9.0 iter : 96496 P : 0.010
Episode: 924/1000 Total reward: 9.0 iter : 96505 P : 0.010
Episode: 925/1000 Total reward: 9.0 iter : 96514 P : 0.010
Episode: 926/1000 Total reward: 10.0 iter : 96524 P : 0.010
Episode: 927/1000 Total reward: 10.0 iter : 96534 P : 0.010
Episode: 928/1000 Total reward: 10.0 iter : 96544 P : 0.010
Episode: 929/1000 Total reward: 9.0 iter : 96553 P : 0.010
Episode: 930/1000 Total reward: 10.0 iter : 96563 P : 0.010
Episode: 931/1000 Total reward: 9.0 iter : 96572 P : 0.010
Episode: 932/1000 Total reward: 11.0 iter : 96583 P : 0.010
Episode: 933/1000 Total reward: 8.0 iter : 96591 P : 0.010
Episode: 934/1000 Total reward: 9.0 iter : 96600 P : 0.010
Episode: 935/1000 Total reward: 9.0 iter : 96609 P : 0.010
Episode: 936/1000 Total reward: 10.0 iter : 96619 P : 0.010
Episode: 937/1000 Total reward: 10.0 iter : 96629 P : 0.010
Episode: 938/1000 Total reward: 9.0 iter : 96638 P : 0.010
Episode: 939/1000 Total reward: 10.0 iter : 96648 P : 0.010
Episode: 940/1000 Total reward: 10.0 iter : 96658 P : 0.010
Episode: 941/1000 Total reward: 8.0 iter : 96666 P : 0.010
Episode: 942/1000 Total reward: 10.0 iter : 96676 P : 0.010
Episode: 943/1000 Total reward: 9.0 iter : 96685 P : 0.010
Episode: 944/1000 Total reward: 12.0 iter : 96697 P : 0.010
Episode: 945/1000 Total reward: 11.0 iter : 96708 P : 0.010
Episode: 946/1000 Total reward: 9.0 iter : 96717 P : 0.010
Episode: 947/1000 Total reward: 10.0 iter : 96727 P : 0.010
Episode: 948/1000 Total reward: 9.0 iter : 96736 P : 0.010
Episode: 949/1000 Total reward: 8.0 iter : 96744 P : 0.010
Episode: 950/1000 Total reward: 9.0 iter : 96753 P : 0.010
Episode: 951/1000 Total reward: 10.0 iter : 96763 P : 0.010
Episode: 952/1000 Total reward: 11.0 iter : 96774 P : 0.010
Episode: 953/1000 Total reward: 13.0 iter : 96787 P : 0.010
Episode: 954/1000 Total reward: 10.0 iter : 96797 P : 0.010
Episode: 955/1000 Total reward: 157.0 iter : 96954 P : 0.010
Episode: 956/1000 Total reward: 13.0 iter : 96967 P : 0.010
Episode: 957/1000 Total reward: 14.0 iter : 96981 P : 0.010
Episode: 958/1000 Total reward: 17.0 iter : 96998 P : 0.010
Episode: 959/1000 Total reward: 14.0 iter : 97012 P : 0.010
Episode: 960/1000 Total reward: 10.0 iter : 97022 P : 0.010
Episode: 961/1000 Total reward: 11.0 iter : 97033 P : 0.010
Episode: 962/1000 Total reward: 11.0 iter : 97044 P : 0.010
Episode: 963/1000 Total reward: 12.0 iter : 97056 P : 0.010
Episode: 964/1000 Total reward: 12.0 iter : 97068 P : 0.010
Episode: 965/1000 Total reward: 11.0 iter : 97079 P : 0.010
Episode: 966/1000 Total reward: 12.0 iter : 97091 P : 0.010
Episode: 967/1000 Total reward: 10.0 iter : 97101 P : 0.010
Episode: 968/1000 Total reward: 9.0 iter : 97110 P : 0.010
Episode: 969/1000 Total reward: 9.0 iter : 97119 P : 0.010
Episode: 970/1000 Total reward: 8.0 iter : 97127 P : 0.010
Episode: 971/1000 Total reward: 11.0 iter : 97138 P : 0.010
Episode: 972/1000 Total reward: 9.0 iter : 97147 P : 0.010
Episode: 973/1000 Total reward: 10.0 iter : 97157 P : 0.010
Episode: 974/1000 Total reward: 11.0 iter : 97168 P : 0.010
Episode: 975/1000 Total reward: 10.0 iter : 97178 P : 0.010
Episode: 976/1000 Total reward: 11.0 iter : 97189 P : 0.010
Episode: 977/1000 Total reward: 12.0 iter : 97201 P : 0.010
Episode: 978/1000 Total reward: 13.0 iter : 97214 P : 0.010
Episode: 979/1000 Total reward: 14.0 iter : 97228 P : 0.010
Episode: 980/1000 Total reward: 16.0 iter : 97244 P : 0.010
Episode: 981/1000 Total reward: 15.0 iter : 97259 P : 0.010
Episode: 982/1000 Total reward: 13.0 iter : 97272 P : 0.010
Episode: 983/1000 Total reward: 18.0 iter : 97290 P : 0.010
Episode: 984/1000 Total reward: 16.0 iter : 97306 P : 0.010
Episode: 985/1000 Total reward: 17.0 iter : 97323 P : 0.010
Episode: 986/1000 Total reward: 20.0 iter : 97343 P : 0.010
Episode: 987/1000 Total reward: 67.0 iter : 97410 P : 0.010
Episode: 988/1000 Total reward: 44.0 iter : 97454 P : 0.010
Episode: 989/1000 Total reward: 23.0 iter : 97477 P : 0.010
Episode: 990/1000 Total reward: 27.0 iter : 97504 P : 0.010
Episode: 991/1000 Total reward: 27.0 iter : 97531 P : 0.010
Episode: 992/1000 Total reward: 49.0 iter : 97580 P : 0.010
Episode: 993/1000 Total reward: 23.0 iter : 97603 P : 0.010
Episode: 994/1000 Total reward: 20.0 iter : 97623 P : 0.010
Episode: 995/1000 Total reward: 25.0 iter : 97648 P : 0.010
Episode: 996/1000 Total reward: 23.0 iter : 97671 P : 0.010
Episode: 997/1000 Total reward: 80.0 iter : 97751 P : 0.010
Episode: 998/1000 Total reward: 34.0 iter : 97785 P : 0.010
Episode: 999/1000 Total reward: 67.0 iter : 97852 P : 0.010

We can test the lastest model on a play (without Replay Memory and with dense reward). It may be a bad score and we will see why in the next part.

In [8]:
DQN.play(render = False)
Game 0 - Score 200.0
Game 1 - Score 200.0
Game 2 - Score 200.0
Game 3 - Score 200.0
Game 4 - Score 200.0
Game 5 - Score 200.0
Game 6 - Score 200.0
Game 7 - Score 200.0
Game 8 - Score 200.0
Game 9 - Score 200.0
Game 10 - Score 200.0
Game 11 - Score 200.0
Game 12 - Score 200.0
Game 13 - Score 200.0
Game 14 - Score 200.0
Game 15 - Score 200.0
Game 16 - Score 200.0
Game 17 - Score 200.0
Game 18 - Score 200.0
Game 19 - Score 200.0
Game 20 - Score 200.0
Game 21 - Score 194.0
Game 22 - Score 200.0
Game 23 - Score 200.0
Game 24 - Score 200.0
Game 25 - Score 200.0
Game 26 - Score 200.0
Game 27 - Score 200.0
Game 28 - Score 200.0
Game 29 - Score 200.0
Game 30 - Score 200.0
Game 31 - Score 200.0
Game 32 - Score 200.0
Game 33 - Score 200.0
Game 34 - Score 200.0
Game 35 - Score 200.0
Game 36 - Score 200.0
Game 37 - Score 200.0
Game 38 - Score 197.0
Game 39 - Score 200.0
Game 40 - Score 200.0
Game 41 - Score 200.0
Game 42 - Score 200.0
Game 43 - Score 200.0
Game 44 - Score 200.0
Game 45 - Score 200.0
Game 46 - Score 200.0
Game 47 - Score 200.0
Game 48 - Score 200.0
Game 49 - Score 200.0
Game 50 - Score 200.0
Game 51 - Score 200.0
Game 52 - Score 200.0
Game 53 - Score 200.0
Game 54 - Score 200.0
Game 55 - Score 200.0
Game 56 - Score 200.0
Game 57 - Score 200.0
Game 58 - Score 200.0
Game 59 - Score 200.0
Game 60 - Score 200.0
Game 61 - Score 200.0
Game 62 - Score 200.0
Game 63 - Score 200.0
Game 64 - Score 200.0
Game 65 - Score 200.0
Game 66 - Score 200.0
Game 67 - Score 200.0
Game 68 - Score 200.0
Game 69 - Score 200.0
Game 70 - Score 200.0
Game 71 - Score 200.0
Game 72 - Score 200.0
Game 73 - Score 195.0
Game 74 - Score 200.0
Game 75 - Score 200.0
Game 76 - Score 200.0
Game 77 - Score 200.0
Game 78 - Score 200.0
Game 79 - Score 200.0
Game 80 - Score 200.0
Game 81 - Score 200.0
Game 82 - Score 200.0
Game 83 - Score 200.0
Game 84 - Score 200.0
Game 85 - Score 200.0
Game 86 - Score 200.0
Game 87 - Score 200.0
Game 88 - Score 200.0
Game 89 - Score 200.0
Game 90 - Score 200.0
Game 91 - Score 200.0
Game 92 - Score 200.0
Game 93 - Score 200.0
Game 94 - Score 200.0
Game 95 - Score 200.0
Game 96 - Score 200.0
Game 97 - Score 200.0
Game 98 - Score 200.0
Game 99 - Score 200.0
In [6]:
DQN = QNetwork(learning_rate = 0.001, use_replay_memory=True, reshape_reward=True)
DQN.train(train_episodes_ovr=1000)
DQN.save_stats("DQN_with_memory_dense_reward.p")
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 16)                80        
_________________________________________________________________
dense_2 (Dense)              (None, 16)                272       
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 34        
=================================================================
Total params: 386
Trainable params: 386
Non-trainable params: 0
_________________________________________________________________
Episode: 0/1000 Total reward: 10.0 iter : 10 P : 0.998
Episode: 1/1000 Total reward: 12.0 iter : 22 P : 0.996
Episode: 2/1000 Total reward: 10.0 iter : 32 P : 0.994
Episode: 3/1000 Total reward: 26.0 iter : 58 P : 0.989
Episode: 4/1000 Total reward: 33.0 iter : 91 P : 0.982
Episode: 5/1000 Total reward: 26.0 iter : 117 P : 0.977
Episode: 6/1000 Total reward: 15.0 iter : 132 P : 0.974
Episode: 7/1000 Total reward: 11.0 iter : 143 P : 0.972
Episode: 8/1000 Total reward: 19.0 iter : 162 P : 0.968
Episode: 9/1000 Total reward: 25.0 iter : 187 P : 0.964
Episode: 10/1000 Total reward: 24.0 iter : 211 P : 0.959
Episode: 11/1000 Total reward: 21.0 iter : 232 P : 0.955
Episode: 12/1000 Total reward: 11.0 iter : 243 P : 0.953
Episode: 13/1000 Total reward: 18.0 iter : 261 P : 0.950
Episode: 14/1000 Total reward: 13.0 iter : 274 P : 0.947
Episode: 15/1000 Total reward: 28.0 iter : 302 P : 0.942
Episode: 16/1000 Total reward: 15.0 iter : 317 P : 0.939
Episode: 17/1000 Total reward: 17.0 iter : 334 P : 0.936
Episode: 18/1000 Total reward: 14.0 iter : 348 P : 0.933
Episode: 19/1000 Total reward: 16.0 iter : 364 P : 0.930
Episode: 20/1000 Total reward: 12.0 iter : 376 P : 0.928
Episode: 21/1000 Total reward: 15.0 iter : 391 P : 0.926
Episode: 22/1000 Total reward: 35.0 iter : 426 P : 0.919
Episode: 23/1000 Total reward: 10.0 iter : 436 P : 0.917
Episode: 24/1000 Total reward: 13.0 iter : 449 P : 0.915
Episode: 25/1000 Total reward: 16.0 iter : 465 P : 0.912
Episode: 26/1000 Total reward: 23.0 iter : 488 P : 0.908
Episode: 27/1000 Total reward: 10.0 iter : 498 P : 0.906
Episode: 28/1000 Total reward: 17.0 iter : 515 P : 0.903
Episode: 29/1000 Total reward: 12.0 iter : 527 P : 0.901
Episode: 30/1000 Total reward: 8.0 iter : 535 P : 0.900
Episode: 31/1000 Total reward: 14.0 iter : 549 P : 0.897
Episode: 32/1000 Total reward: 15.0 iter : 564 P : 0.894
Episode: 33/1000 Total reward: 32.0 iter : 596 P : 0.889
Episode: 34/1000 Total reward: 31.0 iter : 627 P : 0.883
Episode: 35/1000 Total reward: 37.0 iter : 664 P : 0.877
Episode: 36/1000 Total reward: 13.0 iter : 677 P : 0.875
Episode: 37/1000 Total reward: 11.0 iter : 688 P : 0.873
Episode: 38/1000 Total reward: 40.0 iter : 728 P : 0.866
Episode: 39/1000 Total reward: 12.0 iter : 740 P : 0.864
Episode: 40/1000 Total reward: 26.0 iter : 766 P : 0.859
Episode: 41/1000 Total reward: 26.0 iter : 792 P : 0.855
Episode: 42/1000 Total reward: 18.0 iter : 810 P : 0.852
Episode: 43/1000 Total reward: 20.0 iter : 830 P : 0.849
Episode: 44/1000 Total reward: 13.0 iter : 843 P : 0.846
Episode: 45/1000 Total reward: 24.0 iter : 867 P : 0.842
Episode: 46/1000 Total reward: 29.0 iter : 896 P : 0.838
Episode: 47/1000 Total reward: 16.0 iter : 912 P : 0.835
Episode: 48/1000 Total reward: 19.0 iter : 931 P : 0.832
Episode: 49/1000 Total reward: 22.0 iter : 953 P : 0.828
Episode: 50/1000 Total reward: 12.0 iter : 965 P : 0.826
Episode: 51/1000 Total reward: 23.0 iter : 988 P : 0.822
Episode: 52/1000 Total reward: 50.0 iter : 1038 P : 0.814
Episode: 53/1000 Total reward: 25.0 iter : 1063 P : 0.810
Episode: 54/1000 Total reward: 33.0 iter : 1096 P : 0.805
Episode: 55/1000 Total reward: 28.0 iter : 1124 P : 0.801
Episode: 56/1000 Total reward: 10.0 iter : 1134 P : 0.799
Episode: 57/1000 Total reward: 19.0 iter : 1153 P : 0.796
Episode: 58/1000 Total reward: 23.0 iter : 1176 P : 0.793
Episode: 59/1000 Total reward: 33.0 iter : 1209 P : 0.787
Episode: 60/1000 Total reward: 27.0 iter : 1236 P : 0.783
Episode: 61/1000 Total reward: 22.0 iter : 1258 P : 0.780
Episode: 62/1000 Total reward: 15.0 iter : 1273 P : 0.777
Episode: 63/1000 Total reward: 14.0 iter : 1287 P : 0.775
Episode: 64/1000 Total reward: 16.0 iter : 1303 P : 0.773
Episode: 65/1000 Total reward: 17.0 iter : 1320 P : 0.770
Episode: 66/1000 Total reward: 18.0 iter : 1338 P : 0.768
Episode: 67/1000 Total reward: 21.0 iter : 1359 P : 0.764
Episode: 68/1000 Total reward: 12.0 iter : 1371 P : 0.763
Episode: 69/1000 Total reward: 18.0 iter : 1389 P : 0.760
Episode: 70/1000 Total reward: 11.0 iter : 1400 P : 0.758
Episode: 71/1000 Total reward: 17.0 iter : 1417 P : 0.756
Episode: 72/1000 Total reward: 16.0 iter : 1433 P : 0.753
Episode: 73/1000 Total reward: 15.0 iter : 1448 P : 0.751
Episode: 74/1000 Total reward: 37.0 iter : 1485 P : 0.746
Episode: 75/1000 Total reward: 12.0 iter : 1497 P : 0.744
Episode: 76/1000 Total reward: 17.0 iter : 1514 P : 0.741
Episode: 77/1000 Total reward: 26.0 iter : 1540 P : 0.738
Episode: 78/1000 Total reward: 17.0 iter : 1557 P : 0.735
Episode: 79/1000 Total reward: 36.0 iter : 1593 P : 0.730
Episode: 80/1000 Total reward: 36.0 iter : 1629 P : 0.725
Episode: 81/1000 Total reward: 11.0 iter : 1640 P : 0.723
Episode: 82/1000 Total reward: 22.0 iter : 1662 P : 0.720
Episode: 83/1000 Total reward: 9.0 iter : 1671 P : 0.719
Episode: 84/1000 Total reward: 18.0 iter : 1689 P : 0.716
Episode: 85/1000 Total reward: 17.0 iter : 1706 P : 0.714
Episode: 86/1000 Total reward: 20.0 iter : 1726 P : 0.711
Episode: 87/1000 Total reward: 16.0 iter : 1742 P : 0.709
Episode: 88/1000 Total reward: 12.0 iter : 1754 P : 0.707
Episode: 89/1000 Total reward: 14.0 iter : 1768 P : 0.705
Episode: 90/1000 Total reward: 11.0 iter : 1779 P : 0.704
Episode: 91/1000 Total reward: 13.0 iter : 1792 P : 0.702
Episode: 92/1000 Total reward: 19.0 iter : 1811 P : 0.699
Episode: 93/1000 Total reward: 41.0 iter : 1852 P : 0.694
Episode: 94/1000 Total reward: 46.0 iter : 1898 P : 0.687
Episode: 95/1000 Total reward: 25.0 iter : 1923 P : 0.684
Episode: 96/1000 Total reward: 17.0 iter : 1940 P : 0.682
Episode: 97/1000 Total reward: 27.0 iter : 1967 P : 0.678
Episode: 98/1000 Total reward: 12.0 iter : 1979 P : 0.676
Episode: 99/1000 Total reward: 19.0 iter : 1998 P : 0.674
Episode: 100/1000 Total reward: 14.0 iter : 2012 P : 0.672
Episode: 101/1000 Total reward: 10.0 iter : 2022 P : 0.671
Episode: 102/1000 Total reward: 18.0 iter : 2040 P : 0.668
Episode: 103/1000 Total reward: 17.0 iter : 2057 P : 0.666
Episode: 104/1000 Total reward: 9.0 iter : 2066 P : 0.665
Episode: 105/1000 Total reward: 20.0 iter : 2086 P : 0.662
Episode: 106/1000 Total reward: 15.0 iter : 2101 P : 0.660
Episode: 107/1000 Total reward: 16.0 iter : 2117 P : 0.658
Episode: 108/1000 Total reward: 26.0 iter : 2143 P : 0.655
Episode: 109/1000 Total reward: 14.0 iter : 2157 P : 0.653
Episode: 110/1000 Total reward: 22.0 iter : 2179 P : 0.650
Episode: 111/1000 Total reward: 16.0 iter : 2195 P : 0.648
Episode: 112/1000 Total reward: 21.0 iter : 2216 P : 0.646
Episode: 113/1000 Total reward: 11.0 iter : 2227 P : 0.644
Episode: 114/1000 Total reward: 13.0 iter : 2240 P : 0.643
Episode: 115/1000 Total reward: 23.0 iter : 2263 P : 0.640
Episode: 116/1000 Total reward: 31.0 iter : 2294 P : 0.636
Episode: 117/1000 Total reward: 25.0 iter : 2319 P : 0.633
Episode: 118/1000 Total reward: 51.0 iter : 2370 P : 0.626
Episode: 119/1000 Total reward: 16.0 iter : 2386 P : 0.624
Episode: 120/1000 Total reward: 16.0 iter : 2402 P : 0.622
Episode: 121/1000 Total reward: 28.0 iter : 2430 P : 0.619
Episode: 122/1000 Total reward: 49.0 iter : 2479 P : 0.613
Episode: 123/1000 Total reward: 32.0 iter : 2511 P : 0.609
Episode: 124/1000 Total reward: 21.0 iter : 2532 P : 0.607
Episode: 125/1000 Total reward: 14.0 iter : 2546 P : 0.605
Episode: 126/1000 Total reward: 17.0 iter : 2563 P : 0.603
Episode: 127/1000 Total reward: 36.0 iter : 2599 P : 0.599
Episode: 128/1000 Total reward: 12.0 iter : 2611 P : 0.597
Episode: 129/1000 Total reward: 14.0 iter : 2625 P : 0.596
Episode: 130/1000 Total reward: 33.0 iter : 2658 P : 0.592
Episode: 131/1000 Total reward: 28.0 iter : 2686 P : 0.589
Episode: 132/1000 Total reward: 42.0 iter : 2728 P : 0.584
Episode: 133/1000 Total reward: 21.0 iter : 2749 P : 0.581
Episode: 134/1000 Total reward: 20.0 iter : 2769 P : 0.579
Episode: 135/1000 Total reward: 21.0 iter : 2790 P : 0.577
Episode: 136/1000 Total reward: 40.0 iter : 2830 P : 0.572
Episode: 137/1000 Total reward: 25.0 iter : 2855 P : 0.569
Episode: 138/1000 Total reward: 16.0 iter : 2871 P : 0.568
Episode: 139/1000 Total reward: 25.0 iter : 2896 P : 0.565
Episode: 140/1000 Total reward: 22.0 iter : 2918 P : 0.562
Episode: 141/1000 Total reward: 19.0 iter : 2937 P : 0.560
Episode: 142/1000 Total reward: 84.0 iter : 3021 P : 0.551
Episode: 143/1000 Total reward: 34.0 iter : 3055 P : 0.547
Episode: 144/1000 Total reward: 26.0 iter : 3081 P : 0.545
Episode: 145/1000 Total reward: 18.0 iter : 3099 P : 0.543
Episode: 146/1000 Total reward: 16.0 iter : 3115 P : 0.541
Episode: 147/1000 Total reward: 13.0 iter : 3128 P : 0.540
Episode: 148/1000 Total reward: 12.0 iter : 3140 P : 0.538
Episode: 149/1000 Total reward: 38.0 iter : 3178 P : 0.534
Episode: 150/1000 Total reward: 16.0 iter : 3194 P : 0.533
Episode: 151/1000 Total reward: 25.0 iter : 3219 P : 0.530
Episode: 152/1000 Total reward: 24.0 iter : 3243 P : 0.528
Episode: 153/1000 Total reward: 20.0 iter : 3263 P : 0.525
Episode: 154/1000 Total reward: 102.0 iter : 3365 P : 0.515
Episode: 155/1000 Total reward: 26.0 iter : 3391 P : 0.512
Episode: 156/1000 Total reward: 12.0 iter : 3403 P : 0.511
Episode: 157/1000 Total reward: 44.0 iter : 3447 P : 0.507
Episode: 158/1000 Total reward: 17.0 iter : 3464 P : 0.505
Episode: 159/1000 Total reward: 63.0 iter : 3527 P : 0.499
Episode: 160/1000 Total reward: 28.0 iter : 3555 P : 0.496
Episode: 161/1000 Total reward: 17.0 iter : 3572 P : 0.495
Episode: 162/1000 Total reward: 13.0 iter : 3585 P : 0.493
Episode: 163/1000 Total reward: 12.0 iter : 3597 P : 0.492
Episode: 164/1000 Total reward: 13.0 iter : 3610 P : 0.491
Episode: 165/1000 Total reward: 31.0 iter : 3641 P : 0.488
Episode: 166/1000 Total reward: 14.0 iter : 3655 P : 0.487
Episode: 167/1000 Total reward: 20.0 iter : 3675 P : 0.485
Episode: 168/1000 Total reward: 17.0 iter : 3692 P : 0.483
Episode: 169/1000 Total reward: 26.0 iter : 3718 P : 0.481
Episode: 170/1000 Total reward: 20.0 iter : 3738 P : 0.479
Episode: 171/1000 Total reward: 29.0 iter : 3767 P : 0.476
Episode: 172/1000 Total reward: 31.0 iter : 3798 P : 0.473
Episode: 173/1000 Total reward: 24.0 iter : 3822 P : 0.471
Episode: 174/1000 Total reward: 13.0 iter : 3835 P : 0.470
Episode: 175/1000 Total reward: 25.0 iter : 3860 P : 0.467
Episode: 176/1000 Total reward: 30.0 iter : 3890 P : 0.465
Episode: 177/1000 Total reward: 17.0 iter : 3907 P : 0.463
Episode: 178/1000 Total reward: 22.0 iter : 3929 P : 0.461
Episode: 179/1000 Total reward: 34.0 iter : 3963 P : 0.458
Episode: 180/1000 Total reward: 18.0 iter : 3981 P : 0.457
Episode: 181/1000 Total reward: 34.0 iter : 4015 P : 0.454
Episode: 182/1000 Total reward: 25.0 iter : 4040 P : 0.451
Episode: 183/1000 Total reward: 28.0 iter : 4068 P : 0.449
Episode: 184/1000 Total reward: 26.0 iter : 4094 P : 0.447
Episode: 185/1000 Total reward: 26.0 iter : 4120 P : 0.444
Episode: 186/1000 Total reward: 23.0 iter : 4143 P : 0.442
Episode: 187/1000 Total reward: 24.0 iter : 4167 P : 0.440
Episode: 188/1000 Total reward: 27.0 iter : 4194 P : 0.438
Episode: 189/1000 Total reward: 51.0 iter : 4245 P : 0.434
Episode: 190/1000 Total reward: 21.0 iter : 4266 P : 0.432
Episode: 191/1000 Total reward: 27.0 iter : 4293 P : 0.430
Episode: 192/1000 Total reward: 25.0 iter : 4318 P : 0.427
Episode: 193/1000 Total reward: 23.0 iter : 4341 P : 0.426
Episode: 194/1000 Total reward: 24.0 iter : 4365 P : 0.424
Episode: 195/1000 Total reward: 43.0 iter : 4408 P : 0.420
Episode: 196/1000 Total reward: 20.0 iter : 4428 P : 0.418
Episode: 197/1000 Total reward: 27.0 iter : 4455 P : 0.416
Episode: 198/1000 Total reward: 42.0 iter : 4497 P : 0.413
Episode: 199/1000 Total reward: 34.0 iter : 4531 P : 0.410
Episode: 200/1000 Total reward: 34.0 iter : 4565 P : 0.407
Episode: 201/1000 Total reward: 40.0 iter : 4605 P : 0.404
Episode: 202/1000 Total reward: 22.0 iter : 4627 P : 0.402
Episode: 203/1000 Total reward: 21.0 iter : 4648 P : 0.401
Episode: 204/1000 Total reward: 25.0 iter : 4673 P : 0.399
Episode: 205/1000 Total reward: 45.0 iter : 4718 P : 0.395
Episode: 206/1000 Total reward: 41.0 iter : 4759 P : 0.392
Episode: 207/1000 Total reward: 35.0 iter : 4794 P : 0.390
Episode: 208/1000 Total reward: 32.0 iter : 4826 P : 0.387
Episode: 209/1000 Total reward: 77.0 iter : 4903 P : 0.381
Episode: 210/1000 Total reward: 45.0 iter : 4948 P : 0.378
Episode: 211/1000 Total reward: 26.0 iter : 4974 P : 0.376
Episode: 212/1000 Total reward: 29.0 iter : 5003 P : 0.374
Episode: 213/1000 Total reward: 28.0 iter : 5031 P : 0.372
Episode: 214/1000 Total reward: 81.0 iter : 5112 P : 0.366
Episode: 215/1000 Total reward: 24.0 iter : 5136 P : 0.364
Episode: 216/1000 Total reward: 14.0 iter : 5150 P : 0.363
Episode: 217/1000 Total reward: 47.0 iter : 5197 P : 0.360
Episode: 218/1000 Total reward: 48.0 iter : 5245 P : 0.357
Episode: 219/1000 Total reward: 59.0 iter : 5304 P : 0.353
Episode: 220/1000 Total reward: 39.0 iter : 5343 P : 0.350
Episode: 221/1000 Total reward: 47.0 iter : 5390 P : 0.347
Episode: 222/1000 Total reward: 27.0 iter : 5417 P : 0.345
Episode: 223/1000 Total reward: 28.0 iter : 5445 P : 0.343
Episode: 224/1000 Total reward: 25.0 iter : 5470 P : 0.342
Episode: 225/1000 Total reward: 44.0 iter : 5514 P : 0.339
Episode: 226/1000 Total reward: 28.0 iter : 5542 P : 0.337
Episode: 227/1000 Total reward: 36.0 iter : 5578 P : 0.334
Episode: 228/1000 Total reward: 27.0 iter : 5605 P : 0.333
Episode: 229/1000 Total reward: 41.0 iter : 5646 P : 0.330
Episode: 230/1000 Total reward: 48.0 iter : 5694 P : 0.327
Episode: 231/1000 Total reward: 37.0 iter : 5731 P : 0.325
Episode: 232/1000 Total reward: 50.0 iter : 5781 P : 0.322
Episode: 233/1000 Total reward: 47.0 iter : 5828 P : 0.319
Episode: 234/1000 Total reward: 40.0 iter : 5868 P : 0.316
Episode: 235/1000 Total reward: 26.0 iter : 5894 P : 0.315
Episode: 236/1000 Total reward: 45.0 iter : 5939 P : 0.312
Episode: 237/1000 Total reward: 20.0 iter : 5959 P : 0.311
Episode: 238/1000 Total reward: 24.0 iter : 5983 P : 0.309
Episode: 239/1000 Total reward: 88.0 iter : 6071 P : 0.304
Episode: 240/1000 Total reward: 24.0 iter : 6095 P : 0.303
Episode: 241/1000 Total reward: 63.0 iter : 6158 P : 0.299
Episode: 242/1000 Total reward: 86.0 iter : 6244 P : 0.294
Episode: 243/1000 Total reward: 124.0 iter : 6368 P : 0.287
Episode: 244/1000 Total reward: 142.0 iter : 6510 P : 0.279
Episode: 245/1000 Total reward: 86.0 iter : 6596 P : 0.275
Episode: 246/1000 Total reward: 101.0 iter : 6697 P : 0.269
Episode: 247/1000 Total reward: 26.0 iter : 6723 P : 0.268
Episode: 248/1000 Total reward: 136.0 iter : 6859 P : 0.261
Episode: 249/1000 Total reward: 191.0 iter : 7050 P : 0.252
Episode: 250/1000 Total reward: 136.0 iter : 7186 P : 0.245
Episode: 251/1000 Total reward: 200.0 iter : 7386 P : 0.236
Episode: 252/1000 Total reward: 172.0 iter : 7558 P : 0.228
Episode: 253/1000 Total reward: 193.0 iter : 7751 P : 0.220
Episode: 254/1000 Total reward: 200.0 iter : 7951 P : 0.212
Episode: 255/1000 Total reward: 200.0 iter : 8151 P : 0.204
Episode: 256/1000 Total reward: 200.0 iter : 8351 P : 0.196
Episode: 257/1000 Total reward: 200.0 iter : 8551 P : 0.189
Episode: 258/1000 Total reward: 200.0 iter : 8751 P : 0.182
Episode: 259/1000 Total reward: 200.0 iter : 8951 P : 0.175
Episode: 260/1000 Total reward: 198.0 iter : 9149 P : 0.169
Episode: 261/1000 Total reward: 200.0 iter : 9349 P : 0.163
Episode: 262/1000 Total reward: 200.0 iter : 9549 P : 0.157
Episode: 263/1000 Total reward: 200.0 iter : 9749 P : 0.151
Episode: 264/1000 Total reward: 200.0 iter : 9949 P : 0.145
Episode: 265/1000 Total reward: 200.0 iter : 10149 P : 0.140
Episode: 266/1000 Total reward: 200.0 iter : 10349 P : 0.135
Episode: 267/1000 Total reward: 200.0 iter : 10549 P : 0.130
Episode: 268/1000 Total reward: 200.0 iter : 10749 P : 0.125
Episode: 269/1000 Total reward: 200.0 iter : 10949 P : 0.121
Episode: 270/1000 Total reward: 200.0 iter : 11149 P : 0.116
Episode: 271/1000 Total reward: 200.0 iter : 11349 P : 0.112
Episode: 272/1000 Total reward: 200.0 iter : 11549 P : 0.108
Episode: 273/1000 Total reward: 200.0 iter : 11749 P : 0.104
Episode: 274/1000 Total reward: 200.0 iter : 11949 P : 0.101
Episode: 275/1000 Total reward: 200.0 iter : 12149 P : 0.097
Episode: 276/1000 Total reward: 197.0 iter : 12346 P : 0.094
Episode: 277/1000 Total reward: 200.0 iter : 12546 P : 0.091
Episode: 278/1000 Total reward: 192.0 iter : 12738 P : 0.087
Episode: 279/1000 Total reward: 198.0 iter : 12936 P : 0.084
Episode: 280/1000 Total reward: 165.0 iter : 13101 P : 0.082
Episode: 281/1000 Total reward: 196.0 iter : 13297 P : 0.079
Episode: 282/1000 Total reward: 179.0 iter : 13476 P : 0.077
Episode: 283/1000 Total reward: 178.0 iter : 13654 P : 0.075
Episode: 284/1000 Total reward: 196.0 iter : 13850 P : 0.072
Episode: 285/1000 Total reward: 147.0 iter : 13997 P : 0.070
Episode: 286/1000 Total reward: 162.0 iter : 14159 P : 0.068
Episode: 287/1000 Total reward: 159.0 iter : 14318 P : 0.066
Episode: 288/1000 Total reward: 152.0 iter : 14470 P : 0.065
Episode: 289/1000 Total reward: 167.0 iter : 14637 P : 0.063
Episode: 290/1000 Total reward: 149.0 iter : 14786 P : 0.061
Episode: 291/1000 Total reward: 162.0 iter : 14948 P : 0.060
Episode: 292/1000 Total reward: 136.0 iter : 15084 P : 0.058
Episode: 293/1000 Total reward: 169.0 iter : 15253 P : 0.057
Episode: 294/1000 Total reward: 169.0 iter : 15422 P : 0.055
Episode: 295/1000 Total reward: 167.0 iter : 15589 P : 0.054
Episode: 296/1000 Total reward: 157.0 iter : 15746 P : 0.052
Episode: 297/1000 Total reward: 157.0 iter : 15903 P : 0.051
Episode: 298/1000 Total reward: 149.0 iter : 16052 P : 0.050
Episode: 299/1000 Total reward: 139.0 iter : 16191 P : 0.049
Episode: 300/1000 Total reward: 142.0 iter : 16333 P : 0.048
Episode: 301/1000 Total reward: 138.0 iter : 16471 P : 0.047
Episode: 302/1000 Total reward: 156.0 iter : 16627 P : 0.046
Episode: 303/1000 Total reward: 149.0 iter : 16776 P : 0.045
Episode: 304/1000 Total reward: 143.0 iter : 16919 P : 0.044
Episode: 305/1000 Total reward: 137.0 iter : 17056 P : 0.043
Episode: 306/1000 Total reward: 141.0 iter : 17197 P : 0.042
Episode: 307/1000 Total reward: 150.0 iter : 17347 P : 0.041
Episode: 308/1000 Total reward: 147.0 iter : 17494 P : 0.040
Episode: 309/1000 Total reward: 152.0 iter : 17646 P : 0.039
Episode: 310/1000 Total reward: 155.0 iter : 17801 P : 0.038
Episode: 311/1000 Total reward: 162.0 iter : 17963 P : 0.037
Episode: 312/1000 Total reward: 142.0 iter : 18105 P : 0.036
Episode: 313/1000 Total reward: 159.0 iter : 18264 P : 0.036
Episode: 314/1000 Total reward: 164.0 iter : 18428 P : 0.035
Episode: 315/1000 Total reward: 153.0 iter : 18581 P : 0.034
Episode: 316/1000 Total reward: 149.0 iter : 18730 P : 0.033
Episode: 317/1000 Total reward: 149.0 iter : 18879 P : 0.033
Episode: 318/1000 Total reward: 181.0 iter : 19060 P : 0.032
Episode: 319/1000 Total reward: 160.0 iter : 19220 P : 0.031
Episode: 320/1000 Total reward: 9.0 iter : 19229 P : 0.031
Episode: 321/1000 Total reward: 9.0 iter : 19238 P : 0.031
Episode: 322/1000 Total reward: 10.0 iter : 19248 P : 0.031
Episode: 323/1000 Total reward: 10.0 iter : 19258 P : 0.031
Episode: 324/1000 Total reward: 9.0 iter : 19267 P : 0.031
Episode: 325/1000 Total reward: 11.0 iter : 19278 P : 0.031
Episode: 326/1000 Total reward: 10.0 iter : 19288 P : 0.031
Episode: 327/1000 Total reward: 9.0 iter : 19297 P : 0.031
Episode: 328/1000 Total reward: 9.0 iter : 19306 P : 0.031
Episode: 329/1000 Total reward: 9.0 iter : 19315 P : 0.031
Episode: 330/1000 Total reward: 9.0 iter : 19324 P : 0.031
Episode: 331/1000 Total reward: 10.0 iter : 19334 P : 0.031
Episode: 332/1000 Total reward: 9.0 iter : 19343 P : 0.031
Episode: 333/1000 Total reward: 12.0 iter : 19355 P : 0.031
Episode: 334/1000 Total reward: 10.0 iter : 19365 P : 0.031
Episode: 335/1000 Total reward: 10.0 iter : 19375 P : 0.031
Episode: 336/1000 Total reward: 200.0 iter : 19575 P : 0.030
Episode: 337/1000 Total reward: 108.0 iter : 19683 P : 0.029
Episode: 338/1000 Total reward: 101.0 iter : 19784 P : 0.029
Episode: 339/1000 Total reward: 12.0 iter : 19796 P : 0.029
Episode: 340/1000 Total reward: 13.0 iter : 19809 P : 0.029
Episode: 341/1000 Total reward: 10.0 iter : 19819 P : 0.029
Episode: 342/1000 Total reward: 9.0 iter : 19828 P : 0.029
Episode: 343/1000 Total reward: 11.0 iter : 19839 P : 0.029
Episode: 344/1000 Total reward: 10.0 iter : 19849 P : 0.029
Episode: 345/1000 Total reward: 9.0 iter : 19858 P : 0.029
Episode: 346/1000 Total reward: 9.0 iter : 19867 P : 0.029
Episode: 347/1000 Total reward: 9.0 iter : 19876 P : 0.029
Episode: 348/1000 Total reward: 11.0 iter : 19887 P : 0.029
Episode: 349/1000 Total reward: 10.0 iter : 19897 P : 0.029
Episode: 350/1000 Total reward: 9.0 iter : 19906 P : 0.028
Episode: 351/1000 Total reward: 10.0 iter : 19916 P : 0.028
Episode: 352/1000 Total reward: 10.0 iter : 19926 P : 0.028
Episode: 353/1000 Total reward: 9.0 iter : 19935 P : 0.028
Episode: 354/1000 Total reward: 10.0 iter : 19945 P : 0.028
Episode: 355/1000 Total reward: 12.0 iter : 19957 P : 0.028
Episode: 356/1000 Total reward: 9.0 iter : 19966 P : 0.028
Episode: 357/1000 Total reward: 9.0 iter : 19975 P : 0.028
Episode: 358/1000 Total reward: 10.0 iter : 19985 P : 0.028
Episode: 359/1000 Total reward: 9.0 iter : 19994 P : 0.028
Episode: 360/1000 Total reward: 10.0 iter : 20004 P : 0.028
Episode: 361/1000 Total reward: 10.0 iter : 20014 P : 0.028
Episode: 362/1000 Total reward: 9.0 iter : 20023 P : 0.028
Episode: 363/1000 Total reward: 12.0 iter : 20035 P : 0.028
Episode: 364/1000 Total reward: 11.0 iter : 20046 P : 0.028
Episode: 365/1000 Total reward: 9.0 iter : 20055 P : 0.028
Episode: 366/1000 Total reward: 9.0 iter : 20064 P : 0.028
Episode: 367/1000 Total reward: 11.0 iter : 20075 P : 0.028
Episode: 368/1000 Total reward: 106.0 iter : 20181 P : 0.027
Episode: 369/1000 Total reward: 200.0 iter : 20381 P : 0.027
Episode: 370/1000 Total reward: 200.0 iter : 20581 P : 0.026
Episode: 371/1000 Total reward: 199.0 iter : 20780 P : 0.026
Episode: 372/1000 Total reward: 200.0 iter : 20980 P : 0.025
Episode: 373/1000 Total reward: 166.0 iter : 21146 P : 0.024
Episode: 374/1000 Total reward: 190.0 iter : 21336 P : 0.024
Episode: 375/1000 Total reward: 175.0 iter : 21511 P : 0.023
Episode: 376/1000 Total reward: 168.0 iter : 21679 P : 0.023
Episode: 377/1000 Total reward: 161.0 iter : 21840 P : 0.023
Episode: 378/1000 Total reward: 200.0 iter : 22040 P : 0.022
Episode: 379/1000 Total reward: 176.0 iter : 22216 P : 0.022
Episode: 380/1000 Total reward: 144.0 iter : 22360 P : 0.021
Episode: 381/1000 Total reward: 200.0 iter : 22560 P : 0.021
Episode: 382/1000 Total reward: 172.0 iter : 22732 P : 0.020
Episode: 383/1000 Total reward: 184.0 iter : 22916 P : 0.020
Episode: 384/1000 Total reward: 114.0 iter : 23030 P : 0.020
Episode: 385/1000 Total reward: 134.0 iter : 23164 P : 0.020
Episode: 386/1000 Total reward: 132.0 iter : 23296 P : 0.019
Episode: 387/1000 Total reward: 122.0 iter : 23418 P : 0.019
Episode: 388/1000 Total reward: 138.0 iter : 23556 P : 0.019
Episode: 389/1000 Total reward: 103.0 iter : 23659 P : 0.019
Episode: 390/1000 Total reward: 139.0 iter : 23798 P : 0.018
Episode: 391/1000 Total reward: 105.0 iter : 23903 P : 0.018
Episode: 392/1000 Total reward: 124.0 iter : 24027 P : 0.018
Episode: 393/1000 Total reward: 124.0 iter : 24151 P : 0.018
Episode: 394/1000 Total reward: 94.0 iter : 24245 P : 0.018
Episode: 395/1000 Total reward: 107.0 iter : 24352 P : 0.018
Episode: 396/1000 Total reward: 102.0 iter : 24454 P : 0.017
Episode: 397/1000 Total reward: 129.0 iter : 24583 P : 0.017
Episode: 398/1000 Total reward: 90.0 iter : 24673 P : 0.017
Episode: 399/1000 Total reward: 85.0 iter : 24758 P : 0.017
Episode: 400/1000 Total reward: 95.0 iter : 24853 P : 0.017
Episode: 401/1000 Total reward: 83.0 iter : 24936 P : 0.017
Episode: 402/1000 Total reward: 93.0 iter : 25029 P : 0.017
Episode: 403/1000 Total reward: 99.0 iter : 25128 P : 0.017
Episode: 404/1000 Total reward: 84.0 iter : 25212 P : 0.016
Episode: 405/1000 Total reward: 78.0 iter : 25290 P : 0.016
Episode: 406/1000 Total reward: 74.0 iter : 25364 P : 0.016
Episode: 407/1000 Total reward: 88.0 iter : 25452 P : 0.016
Episode: 408/1000 Total reward: 109.0 iter : 25561 P : 0.016
Episode: 409/1000 Total reward: 88.0 iter : 25649 P : 0.016
Episode: 410/1000 Total reward: 90.0 iter : 25739 P : 0.016
Episode: 411/1000 Total reward: 74.0 iter : 25813 P : 0.016
Episode: 412/1000 Total reward: 90.0 iter : 25903 P : 0.016
Episode: 413/1000 Total reward: 76.0 iter : 25979 P : 0.015
Episode: 414/1000 Total reward: 78.0 iter : 26057 P : 0.015
Episode: 415/1000 Total reward: 84.0 iter : 26141 P : 0.015
Episode: 416/1000 Total reward: 100.0 iter : 26241 P : 0.015
Episode: 417/1000 Total reward: 98.0 iter : 26339 P : 0.015
Episode: 418/1000 Total reward: 119.0 iter : 26458 P : 0.015
Episode: 419/1000 Total reward: 96.0 iter : 26554 P : 0.015
Episode: 420/1000 Total reward: 74.0 iter : 26628 P : 0.015
Episode: 421/1000 Total reward: 106.0 iter : 26734 P : 0.015
Episode: 422/1000 Total reward: 86.0 iter : 26820 P : 0.015
Episode: 423/1000 Total reward: 127.0 iter : 26947 P : 0.015
Episode: 424/1000 Total reward: 96.0 iter : 27043 P : 0.014
Episode: 425/1000 Total reward: 72.0 iter : 27115 P : 0.014
Episode: 426/1000 Total reward: 78.0 iter : 27193 P : 0.014
Episode: 427/1000 Total reward: 65.0 iter : 27258 P : 0.014
Episode: 428/1000 Total reward: 66.0 iter : 27324 P : 0.014
Episode: 429/1000 Total reward: 84.0 iter : 27408 P : 0.014
Episode: 430/1000 Total reward: 92.0 iter : 27500 P : 0.014
Episode: 431/1000 Total reward: 106.0 iter : 27606 P : 0.014
Episode: 432/1000 Total reward: 151.0 iter : 27757 P : 0.014
Episode: 433/1000 Total reward: 106.0 iter : 27863 P : 0.014
Episode: 434/1000 Total reward: 76.0 iter : 27939 P : 0.014
Episode: 435/1000 Total reward: 187.0 iter : 28126 P : 0.014
Episode: 436/1000 Total reward: 158.0 iter : 28284 P : 0.013
Episode: 437/1000 Total reward: 99.0 iter : 28383 P : 0.013
Episode: 438/1000 Total reward: 92.0 iter : 28475 P : 0.013
Episode: 439/1000 Total reward: 112.0 iter : 28587 P : 0.013
Episode: 440/1000 Total reward: 118.0 iter : 28705 P : 0.013
Episode: 441/1000 Total reward: 137.0 iter : 28842 P : 0.013
Episode: 442/1000 Total reward: 136.0 iter : 28978 P : 0.013
Episode: 443/1000 Total reward: 200.0 iter : 29178 P : 0.013
Episode: 444/1000 Total reward: 173.0 iter : 29351 P : 0.013
Episode: 445/1000 Total reward: 200.0 iter : 29551 P : 0.013
Episode: 446/1000 Total reward: 162.0 iter : 29713 P : 0.013
Episode: 447/1000 Total reward: 140.0 iter : 29853 P : 0.013
Episode: 448/1000 Total reward: 200.0 iter : 30053 P : 0.012
Episode: 449/1000 Total reward: 162.0 iter : 30215 P : 0.012
Episode: 450/1000 Total reward: 153.0 iter : 30368 P : 0.012
Episode: 451/1000 Total reward: 115.0 iter : 30483 P : 0.012
Episode: 452/1000 Total reward: 131.0 iter : 30614 P : 0.012
Episode: 453/1000 Total reward: 66.0 iter : 30680 P : 0.012
Episode: 454/1000 Total reward: 82.0 iter : 30762 P : 0.012
Episode: 455/1000 Total reward: 65.0 iter : 30827 P : 0.012
Episode: 456/1000 Total reward: 77.0 iter : 30904 P : 0.012
Episode: 457/1000 Total reward: 33.0 iter : 30937 P : 0.012
Episode: 458/1000 Total reward: 17.0 iter : 30954 P : 0.012
Episode: 459/1000 Total reward: 56.0 iter : 31010 P : 0.012
Episode: 460/1000 Total reward: 160.0 iter : 31170 P : 0.012
Episode: 461/1000 Total reward: 157.0 iter : 31327 P : 0.012
Episode: 462/1000 Total reward: 147.0 iter : 31474 P : 0.012
Episode: 463/1000 Total reward: 156.0 iter : 31630 P : 0.012
Episode: 464/1000 Total reward: 160.0 iter : 31790 P : 0.012
Episode: 465/1000 Total reward: 175.0 iter : 31965 P : 0.012
Episode: 466/1000 Total reward: 173.0 iter : 32138 P : 0.012
Episode: 467/1000 Total reward: 76.0 iter : 32214 P : 0.012
Episode: 468/1000 Total reward: 65.0 iter : 32279 P : 0.012
Episode: 469/1000 Total reward: 69.0 iter : 32348 P : 0.012
Episode: 470/1000 Total reward: 83.0 iter : 32431 P : 0.012
Episode: 471/1000 Total reward: 67.0 iter : 32498 P : 0.011
Episode: 472/1000 Total reward: 105.0 iter : 32603 P : 0.011
Episode: 473/1000 Total reward: 106.0 iter : 32709 P : 0.011
Episode: 474/1000 Total reward: 122.0 iter : 32831 P : 0.011
Episode: 475/1000 Total reward: 200.0 iter : 33031 P : 0.011
Episode: 476/1000 Total reward: 200.0 iter : 33231 P : 0.011
Episode: 477/1000 Total reward: 200.0 iter : 33431 P : 0.011
Episode: 478/1000 Total reward: 200.0 iter : 33631 P : 0.011
Episode: 479/1000 Total reward: 200.0 iter : 33831 P : 0.011
Episode: 480/1000 Total reward: 200.0 iter : 34031 P : 0.011
Episode: 481/1000 Total reward: 200.0 iter : 34231 P : 0.011
Episode: 482/1000 Total reward: 200.0 iter : 34431 P : 0.011
Episode: 483/1000 Total reward: 200.0 iter : 34631 P : 0.011
Episode: 484/1000 Total reward: 200.0 iter : 34831 P : 0.011
Episode: 485/1000 Total reward: 200.0 iter : 35031 P : 0.011
Episode: 486/1000 Total reward: 200.0 iter : 35231 P : 0.011
Episode: 487/1000 Total reward: 200.0 iter : 35431 P : 0.011
Episode: 488/1000 Total reward: 200.0 iter : 35631 P : 0.011
Episode: 489/1000 Total reward: 200.0 iter : 35831 P : 0.011
Episode: 490/1000 Total reward: 200.0 iter : 36031 P : 0.011
Episode: 491/1000 Total reward: 200.0 iter : 36231 P : 0.011
Episode: 492/1000 Total reward: 200.0 iter : 36431 P : 0.011
Episode: 493/1000 Total reward: 200.0 iter : 36631 P : 0.011
Episode: 494/1000 Total reward: 200.0 iter : 36831 P : 0.011
Episode: 495/1000 Total reward: 200.0 iter : 37031 P : 0.011
Episode: 496/1000 Total reward: 200.0 iter : 37231 P : 0.011
Episode: 497/1000 Total reward: 200.0 iter : 37431 P : 0.011
Episode: 498/1000 Total reward: 200.0 iter : 37631 P : 0.011
Episode: 499/1000 Total reward: 200.0 iter : 37831 P : 0.011
Episode: 500/1000 Total reward: 200.0 iter : 38031 P : 0.010
Episode: 501/1000 Total reward: 200.0 iter : 38231 P : 0.010
Episode: 502/1000 Total reward: 200.0 iter : 38431 P : 0.010
Episode: 503/1000 Total reward: 200.0 iter : 38631 P : 0.010
Episode: 504/1000 Total reward: 200.0 iter : 38831 P : 0.010
Episode: 505/1000 Total reward: 200.0 iter : 39031 P : 0.010
Episode: 506/1000 Total reward: 200.0 iter : 39231 P : 0.010
Episode: 507/1000 Total reward: 200.0 iter : 39431 P : 0.010
Episode: 508/1000 Total reward: 200.0 iter : 39631 P : 0.010
Episode: 509/1000 Total reward: 200.0 iter : 39831 P : 0.010
Episode: 510/1000 Total reward: 200.0 iter : 40031 P : 0.010
Episode: 511/1000 Total reward: 200.0 iter : 40231 P : 0.010
Episode: 512/1000 Total reward: 200.0 iter : 40431 P : 0.010
Episode: 513/1000 Total reward: 200.0 iter : 40631 P : 0.010
Episode: 514/1000 Total reward: 200.0 iter : 40831 P : 0.010
Episode: 515/1000 Total reward: 200.0 iter : 41031 P : 0.010
Episode: 516/1000 Total reward: 200.0 iter : 41231 P : 0.010
Episode: 517/1000 Total reward: 173.0 iter : 41404 P : 0.010
Episode: 518/1000 Total reward: 181.0 iter : 41585 P : 0.010
Episode: 519/1000 Total reward: 188.0 iter : 41773 P : 0.010
Episode: 520/1000 Total reward: 158.0 iter : 41931 P : 0.010
Episode: 521/1000 Total reward: 165.0 iter : 42096 P : 0.010
Episode: 522/1000 Total reward: 158.0 iter : 42254 P : 0.010
Episode: 523/1000 Total reward: 158.0 iter : 42412 P : 0.010
Episode: 524/1000 Total reward: 159.0 iter : 42571 P : 0.010
Episode: 525/1000 Total reward: 163.0 iter : 42734 P : 0.010
Episode: 526/1000 Total reward: 163.0 iter : 42897 P : 0.010
Episode: 527/1000 Total reward: 179.0 iter : 43076 P : 0.010
Episode: 528/1000 Total reward: 182.0 iter : 43258 P : 0.010
Episode: 529/1000 Total reward: 183.0 iter : 43441 P : 0.010
Episode: 530/1000 Total reward: 200.0 iter : 43641 P : 0.010
Episode: 531/1000 Total reward: 200.0 iter : 43841 P : 0.010
Episode: 532/1000 Total reward: 200.0 iter : 44041 P : 0.010
Episode: 533/1000 Total reward: 200.0 iter : 44241 P : 0.010
Episode: 534/1000 Total reward: 200.0 iter : 44441 P : 0.010
Episode: 535/1000 Total reward: 194.0 iter : 44635 P : 0.010
Episode: 536/1000 Total reward: 8.0 iter : 44643 P : 0.010
Episode: 537/1000 Total reward: 10.0 iter : 44653 P : 0.010
Episode: 538/1000 Total reward: 9.0 iter : 44662 P : 0.010
Episode: 539/1000 Total reward: 9.0 iter : 44671 P : 0.010
Episode: 540/1000 Total reward: 10.0 iter : 44681 P : 0.010
Episode: 541/1000 Total reward: 10.0 iter : 44691 P : 0.010
Episode: 542/1000 Total reward: 10.0 iter : 44701 P : 0.010
Episode: 543/1000 Total reward: 10.0 iter : 44711 P : 0.010
Episode: 544/1000 Total reward: 9.0 iter : 44720 P : 0.010
Episode: 545/1000 Total reward: 9.0 iter : 44729 P : 0.010
Episode: 546/1000 Total reward: 8.0 iter : 44737 P : 0.010
Episode: 547/1000 Total reward: 10.0 iter : 44747 P : 0.010
Episode: 548/1000 Total reward: 10.0 iter : 44757 P : 0.010
Episode: 549/1000 Total reward: 9.0 iter : 44766 P : 0.010
Episode: 550/1000 Total reward: 10.0 iter : 44776 P : 0.010
Episode: 551/1000 Total reward: 10.0 iter : 44786 P : 0.010
Episode: 552/1000 Total reward: 10.0 iter : 44796 P : 0.010
Episode: 553/1000 Total reward: 10.0 iter : 44806 P : 0.010
Episode: 554/1000 Total reward: 9.0 iter : 44815 P : 0.010
Episode: 555/1000 Total reward: 10.0 iter : 44825 P : 0.010
Episode: 556/1000 Total reward: 9.0 iter : 44834 P : 0.010
Episode: 557/1000 Total reward: 10.0 iter : 44844 P : 0.010
Episode: 558/1000 Total reward: 171.0 iter : 45015 P : 0.010
Episode: 559/1000 Total reward: 109.0 iter : 45124 P : 0.010
Episode: 560/1000 Total reward: 100.0 iter : 45224 P : 0.010
Episode: 561/1000 Total reward: 105.0 iter : 45329 P : 0.010
Episode: 562/1000 Total reward: 110.0 iter : 45439 P : 0.010
Episode: 563/1000 Total reward: 115.0 iter : 45554 P : 0.010
Episode: 564/1000 Total reward: 112.0 iter : 45666 P : 0.010
Episode: 565/1000 Total reward: 123.0 iter : 45789 P : 0.010
Episode: 566/1000 Total reward: 125.0 iter : 45914 P : 0.010
Episode: 567/1000 Total reward: 123.0 iter : 46037 P : 0.010
Episode: 568/1000 Total reward: 132.0 iter : 46169 P : 0.010
Episode: 569/1000 Total reward: 144.0 iter : 46313 P : 0.010
Episode: 570/1000 Total reward: 168.0 iter : 46481 P : 0.010
Episode: 571/1000 Total reward: 188.0 iter : 46669 P : 0.010
Episode: 572/1000 Total reward: 200.0 iter : 46869 P : 0.010
Episode: 573/1000 Total reward: 200.0 iter : 47069 P : 0.010
Episode: 574/1000 Total reward: 200.0 iter : 47269 P : 0.010
Episode: 575/1000 Total reward: 200.0 iter : 47469 P : 0.010
Episode: 576/1000 Total reward: 200.0 iter : 47669 P : 0.010
Episode: 577/1000 Total reward: 200.0 iter : 47869 P : 0.010
Episode: 578/1000 Total reward: 200.0 iter : 48069 P : 0.010
Episode: 579/1000 Total reward: 200.0 iter : 48269 P : 0.010
Episode: 580/1000 Total reward: 177.0 iter : 48446 P : 0.010
Episode: 581/1000 Total reward: 198.0 iter : 48644 P : 0.010
Episode: 582/1000 Total reward: 152.0 iter : 48796 P : 0.010
Episode: 583/1000 Total reward: 157.0 iter : 48953 P : 0.010
Episode: 584/1000 Total reward: 138.0 iter : 49091 P : 0.010
Episode: 585/1000 Total reward: 139.0 iter : 49230 P : 0.010
Episode: 586/1000 Total reward: 145.0 iter : 49375 P : 0.010
Episode: 587/1000 Total reward: 126.0 iter : 49501 P : 0.010
Episode: 588/1000 Total reward: 140.0 iter : 49641 P : 0.010
Episode: 589/1000 Total reward: 134.0 iter : 49775 P : 0.010
Episode: 590/1000 Total reward: 167.0 iter : 49942 P : 0.010
Episode: 591/1000 Total reward: 183.0 iter : 50125 P : 0.010
Episode: 592/1000 Total reward: 124.0 iter : 50249 P : 0.010
Episode: 593/1000 Total reward: 137.0 iter : 50386 P : 0.010
Episode: 594/1000 Total reward: 126.0 iter : 50512 P : 0.010
Episode: 595/1000 Total reward: 121.0 iter : 50633 P : 0.010
Episode: 596/1000 Total reward: 163.0 iter : 50796 P : 0.010
Episode: 597/1000 Total reward: 200.0 iter : 50996 P : 0.010
Episode: 598/1000 Total reward: 183.0 iter : 51179 P : 0.010
Episode: 599/1000 Total reward: 122.0 iter : 51301 P : 0.010
Episode: 600/1000 Total reward: 112.0 iter : 51413 P : 0.010
Episode: 601/1000 Total reward: 116.0 iter : 51529 P : 0.010
Episode: 602/1000 Total reward: 119.0 iter : 51648 P : 0.010
Episode: 603/1000 Total reward: 122.0 iter : 51770 P : 0.010
Episode: 604/1000 Total reward: 128.0 iter : 51898 P : 0.010
Episode: 605/1000 Total reward: 124.0 iter : 52022 P : 0.010
Episode: 606/1000 Total reward: 121.0 iter : 52143 P : 0.010
Episode: 607/1000 Total reward: 131.0 iter : 52274 P : 0.010
Episode: 608/1000 Total reward: 155.0 iter : 52429 P : 0.010
Episode: 609/1000 Total reward: 141.0 iter : 52570 P : 0.010
Episode: 610/1000 Total reward: 129.0 iter : 52699 P : 0.010
Episode: 611/1000 Total reward: 116.0 iter : 52815 P : 0.010
Episode: 612/1000 Total reward: 119.0 iter : 52934 P : 0.010
Episode: 613/1000 Total reward: 115.0 iter : 53049 P : 0.010
Episode: 614/1000 Total reward: 130.0 iter : 53179 P : 0.010
Episode: 615/1000 Total reward: 123.0 iter : 53302 P : 0.010
Episode: 616/1000 Total reward: 126.0 iter : 53428 P : 0.010
Episode: 617/1000 Total reward: 119.0 iter : 53547 P : 0.010
Episode: 618/1000 Total reward: 118.0 iter : 53665 P : 0.010
Episode: 619/1000 Total reward: 137.0 iter : 53802 P : 0.010
Episode: 620/1000 Total reward: 121.0 iter : 53923 P : 0.010
Episode: 621/1000 Total reward: 121.0 iter : 54044 P : 0.010
Episode: 622/1000 Total reward: 127.0 iter : 54171 P : 0.010
Episode: 623/1000 Total reward: 138.0 iter : 54309 P : 0.010
Episode: 624/1000 Total reward: 124.0 iter : 54433 P : 0.010
Episode: 625/1000 Total reward: 138.0 iter : 54571 P : 0.010
Episode: 626/1000 Total reward: 117.0 iter : 54688 P : 0.010
Episode: 627/1000 Total reward: 152.0 iter : 54840 P : 0.010
Episode: 628/1000 Total reward: 200.0 iter : 55040 P : 0.010
Episode: 629/1000 Total reward: 200.0 iter : 55240 P : 0.010
Episode: 630/1000 Total reward: 200.0 iter : 55440 P : 0.010
Episode: 631/1000 Total reward: 200.0 iter : 55640 P : 0.010
Episode: 632/1000 Total reward: 200.0 iter : 55840 P : 0.010
Episode: 633/1000 Total reward: 200.0 iter : 56040 P : 0.010
Episode: 634/1000 Total reward: 39.0 iter : 56079 P : 0.010
Episode: 635/1000 Total reward: 21.0 iter : 56100 P : 0.010
Episode: 636/1000 Total reward: 9.0 iter : 56109 P : 0.010
Episode: 637/1000 Total reward: 19.0 iter : 56128 P : 0.010
Episode: 638/1000 Total reward: 22.0 iter : 56150 P : 0.010
Episode: 639/1000 Total reward: 10.0 iter : 56160 P : 0.010
Episode: 640/1000 Total reward: 9.0 iter : 56169 P : 0.010
Episode: 641/1000 Total reward: 191.0 iter : 56360 P : 0.010
Episode: 642/1000 Total reward: 103.0 iter : 56463 P : 0.010
Episode: 643/1000 Total reward: 98.0 iter : 56561 P : 0.010
Episode: 644/1000 Total reward: 27.0 iter : 56588 P : 0.010
Episode: 645/1000 Total reward: 91.0 iter : 56679 P : 0.010
Episode: 646/1000 Total reward: 21.0 iter : 56700 P : 0.010
Episode: 647/1000 Total reward: 36.0 iter : 56736 P : 0.010
Episode: 648/1000 Total reward: 95.0 iter : 56831 P : 0.010
Episode: 649/1000 Total reward: 102.0 iter : 56933 P : 0.010
Episode: 650/1000 Total reward: 200.0 iter : 57133 P : 0.010
Episode: 651/1000 Total reward: 200.0 iter : 57333 P : 0.010
Episode: 652/1000 Total reward: 200.0 iter : 57533 P : 0.010
Episode: 653/1000 Total reward: 200.0 iter : 57733 P : 0.010
Episode: 654/1000 Total reward: 200.0 iter : 57933 P : 0.010
Episode: 655/1000 Total reward: 200.0 iter : 58133 P : 0.010
Episode: 656/1000 Total reward: 200.0 iter : 58333 P : 0.010
Episode: 657/1000 Total reward: 200.0 iter : 58533 P : 0.010
Episode: 658/1000 Total reward: 159.0 iter : 58692 P : 0.010
Episode: 659/1000 Total reward: 139.0 iter : 58831 P : 0.010
Episode: 660/1000 Total reward: 200.0 iter : 59031 P : 0.010
Episode: 661/1000 Total reward: 200.0 iter : 59231 P : 0.010
Episode: 662/1000 Total reward: 149.0 iter : 59380 P : 0.010
Episode: 663/1000 Total reward: 119.0 iter : 59499 P : 0.010
Episode: 664/1000 Total reward: 106.0 iter : 59605 P : 0.010
Episode: 665/1000 Total reward: 200.0 iter : 59805 P : 0.010
Episode: 666/1000 Total reward: 200.0 iter : 60005 P : 0.010
Episode: 667/1000 Total reward: 188.0 iter : 60193 P : 0.010
Episode: 668/1000 Total reward: 173.0 iter : 60366 P : 0.010
Episode: 669/1000 Total reward: 200.0 iter : 60566 P : 0.010
Episode: 670/1000 Total reward: 121.0 iter : 60687 P : 0.010
Episode: 671/1000 Total reward: 22.0 iter : 60709 P : 0.010
Episode: 672/1000 Total reward: 169.0 iter : 60878 P : 0.010
Episode: 673/1000 Total reward: 9.0 iter : 60887 P : 0.010
Episode: 674/1000 Total reward: 8.0 iter : 60895 P : 0.010
Episode: 675/1000 Total reward: 30.0 iter : 60925 P : 0.010
Episode: 676/1000 Total reward: 200.0 iter : 61125 P : 0.010
Episode: 677/1000 Total reward: 131.0 iter : 61256 P : 0.010
Episode: 678/1000 Total reward: 108.0 iter : 61364 P : 0.010
Episode: 679/1000 Total reward: 108.0 iter : 61472 P : 0.010
Episode: 680/1000 Total reward: 106.0 iter : 61578 P : 0.010
Episode: 681/1000 Total reward: 99.0 iter : 61677 P : 0.010
Episode: 682/1000 Total reward: 28.0 iter : 61705 P : 0.010
Episode: 683/1000 Total reward: 104.0 iter : 61809 P : 0.010
Episode: 684/1000 Total reward: 106.0 iter : 61915 P : 0.010
Episode: 685/1000 Total reward: 99.0 iter : 62014 P : 0.010
Episode: 686/1000 Total reward: 109.0 iter : 62123 P : 0.010
Episode: 687/1000 Total reward: 169.0 iter : 62292 P : 0.010
Episode: 688/1000 Total reward: 200.0 iter : 62492 P : 0.010
Episode: 689/1000 Total reward: 104.0 iter : 62596 P : 0.010
Episode: 690/1000 Total reward: 200.0 iter : 62796 P : 0.010
Episode: 691/1000 Total reward: 200.0 iter : 62996 P : 0.010
Episode: 692/1000 Total reward: 200.0 iter : 63196 P : 0.010
Episode: 693/1000 Total reward: 126.0 iter : 63322 P : 0.010
Episode: 694/1000 Total reward: 200.0 iter : 63522 P : 0.010
Episode: 695/1000 Total reward: 200.0 iter : 63722 P : 0.010
Episode: 696/1000 Total reward: 200.0 iter : 63922 P : 0.010
Episode: 697/1000 Total reward: 122.0 iter : 64044 P : 0.010
Episode: 698/1000 Total reward: 112.0 iter : 64156 P : 0.010
Episode: 699/1000 Total reward: 177.0 iter : 64333 P : 0.010
Episode: 700/1000 Total reward: 200.0 iter : 64533 P : 0.010
Episode: 701/1000 Total reward: 117.0 iter : 64650 P : 0.010
Episode: 702/1000 Total reward: 200.0 iter : 64850 P : 0.010
Episode: 703/1000 Total reward: 138.0 iter : 64988 P : 0.010
Episode: 704/1000 Total reward: 102.0 iter : 65090 P : 0.010
Episode: 705/1000 Total reward: 200.0 iter : 65290 P : 0.010
Episode: 706/1000 Total reward: 200.0 iter : 65490 P : 0.010
Episode: 707/1000 Total reward: 177.0 iter : 65667 P : 0.010
Episode: 708/1000 Total reward: 200.0 iter : 65867 P : 0.010
Episode: 709/1000 Total reward: 128.0 iter : 65995 P : 0.010
Episode: 710/1000 Total reward: 200.0 iter : 66195 P : 0.010
Episode: 711/1000 Total reward: 200.0 iter : 66395 P : 0.010
Episode: 712/1000 Total reward: 158.0 iter : 66553 P : 0.010
Episode: 713/1000 Total reward: 101.0 iter : 66654 P : 0.010
Episode: 714/1000 Total reward: 200.0 iter : 66854 P : 0.010
Episode: 715/1000 Total reward: 200.0 iter : 67054 P : 0.010
Episode: 716/1000 Total reward: 102.0 iter : 67156 P : 0.010
Episode: 717/1000 Total reward: 200.0 iter : 67356 P : 0.010
Episode: 718/1000 Total reward: 121.0 iter : 67477 P : 0.010
Episode: 719/1000 Total reward: 200.0 iter : 67677 P : 0.010
Episode: 720/1000 Total reward: 200.0 iter : 67877 P : 0.010
Episode: 721/1000 Total reward: 104.0 iter : 67981 P : 0.010
Episode: 722/1000 Total reward: 200.0 iter : 68181 P : 0.010
Episode: 723/1000 Total reward: 200.0 iter : 68381 P : 0.010
Episode: 724/1000 Total reward: 200.0 iter : 68581 P : 0.010
Episode: 725/1000 Total reward: 90.0 iter : 68671 P : 0.010
Episode: 726/1000 Total reward: 140.0 iter : 68811 P : 0.010
Episode: 727/1000 Total reward: 200.0 iter : 69011 P : 0.010
Episode: 728/1000 Total reward: 130.0 iter : 69141 P : 0.010
Episode: 729/1000 Total reward: 111.0 iter : 69252 P : 0.010
Episode: 730/1000 Total reward: 200.0 iter : 69452 P : 0.010
Episode: 731/1000 Total reward: 200.0 iter : 69652 P : 0.010
Episode: 732/1000 Total reward: 200.0 iter : 69852 P : 0.010
Episode: 733/1000 Total reward: 200.0 iter : 70052 P : 0.010
Episode: 734/1000 Total reward: 165.0 iter : 70217 P : 0.010
Episode: 735/1000 Total reward: 105.0 iter : 70322 P : 0.010
Episode: 736/1000 Total reward: 200.0 iter : 70522 P : 0.010
Episode: 737/1000 Total reward: 200.0 iter : 70722 P : 0.010
Episode: 738/1000 Total reward: 200.0 iter : 70922 P : 0.010
Episode: 739/1000 Total reward: 183.0 iter : 71105 P : 0.010
Episode: 740/1000 Total reward: 176.0 iter : 71281 P : 0.010
Episode: 741/1000 Total reward: 13.0 iter : 71294 P : 0.010
Episode: 742/1000 Total reward: 8.0 iter : 71302 P : 0.010
Episode: 743/1000 Total reward: 9.0 iter : 71311 P : 0.010
Episode: 744/1000 Total reward: 8.0 iter : 71319 P : 0.010
Episode: 745/1000 Total reward: 10.0 iter : 71329 P : 0.010
Episode: 746/1000 Total reward: 9.0 iter : 71338 P : 0.010
Episode: 747/1000 Total reward: 10.0 iter : 71348 P : 0.010
Episode: 748/1000 Total reward: 8.0 iter : 71356 P : 0.010
Episode: 749/1000 Total reward: 10.0 iter : 71366 P : 0.010
Episode: 750/1000 Total reward: 10.0 iter : 71376 P : 0.010
Episode: 751/1000 Total reward: 9.0 iter : 71385 P : 0.010
Episode: 752/1000 Total reward: 9.0 iter : 71394 P : 0.010
Episode: 753/1000 Total reward: 10.0 iter : 71404 P : 0.010
Episode: 754/1000 Total reward: 200.0 iter : 71604 P : 0.010
Episode: 755/1000 Total reward: 200.0 iter : 71804 P : 0.010
Episode: 756/1000 Total reward: 200.0 iter : 72004 P : 0.010
Episode: 757/1000 Total reward: 200.0 iter : 72204 P : 0.010
Episode: 758/1000 Total reward: 181.0 iter : 72385 P : 0.010
Episode: 759/1000 Total reward: 129.0 iter : 72514 P : 0.010
Episode: 760/1000 Total reward: 125.0 iter : 72639 P : 0.010
Episode: 761/1000 Total reward: 113.0 iter : 72752 P : 0.010
Episode: 762/1000 Total reward: 200.0 iter : 72952 P : 0.010
Episode: 763/1000 Total reward: 200.0 iter : 73152 P : 0.010
Episode: 764/1000 Total reward: 200.0 iter : 73352 P : 0.010
Episode: 765/1000 Total reward: 130.0 iter : 73482 P : 0.010
Episode: 766/1000 Total reward: 184.0 iter : 73666 P : 0.010
Episode: 767/1000 Total reward: 30.0 iter : 73696 P : 0.010
Episode: 768/1000 Total reward: 200.0 iter : 73896 P : 0.010
Episode: 769/1000 Total reward: 81.0 iter : 73977 P : 0.010
Episode: 770/1000 Total reward: 115.0 iter : 74092 P : 0.010
Episode: 771/1000 Total reward: 200.0 iter : 74292 P : 0.010
Episode: 772/1000 Total reward: 200.0 iter : 74492 P : 0.010
Episode: 773/1000 Total reward: 200.0 iter : 74692 P : 0.010
Episode: 774/1000 Total reward: 200.0 iter : 74892 P : 0.010
Episode: 775/1000 Total reward: 200.0 iter : 75092 P : 0.010
Episode: 776/1000 Total reward: 113.0 iter : 75205 P : 0.010
Episode: 777/1000 Total reward: 148.0 iter : 75353 P : 0.010
Episode: 778/1000 Total reward: 200.0 iter : 75553 P : 0.010
Episode: 779/1000 Total reward: 200.0 iter : 75753 P : 0.010
Episode: 780/1000 Total reward: 9.0 iter : 75762 P : 0.010
Episode: 781/1000 Total reward: 15.0 iter : 75777 P : 0.010
Episode: 782/1000 Total reward: 10.0 iter : 75787 P : 0.010
Episode: 783/1000 Total reward: 10.0 iter : 75797 P : 0.010
Episode: 784/1000 Total reward: 10.0 iter : 75807 P : 0.010
Episode: 785/1000 Total reward: 11.0 iter : 75818 P : 0.010
Episode: 786/1000 Total reward: 9.0 iter : 75827 P : 0.010
Episode: 787/1000 Total reward: 9.0 iter : 75836 P : 0.010
Episode: 788/1000 Total reward: 9.0 iter : 75845 P : 0.010
Episode: 789/1000 Total reward: 10.0 iter : 75855 P : 0.010
Episode: 790/1000 Total reward: 11.0 iter : 75866 P : 0.010
Episode: 791/1000 Total reward: 9.0 iter : 75875 P : 0.010
Episode: 792/1000 Total reward: 15.0 iter : 75890 P : 0.010
Episode: 793/1000 Total reward: 200.0 iter : 76090 P : 0.010
Episode: 794/1000 Total reward: 171.0 iter : 76261 P : 0.010
Episode: 795/1000 Total reward: 184.0 iter : 76445 P : 0.010
Episode: 796/1000 Total reward: 169.0 iter : 76614 P : 0.010
Episode: 797/1000 Total reward: 200.0 iter : 76814 P : 0.010
Episode: 798/1000 Total reward: 192.0 iter : 77006 P : 0.010
Episode: 799/1000 Total reward: 200.0 iter : 77206 P : 0.010
Episode: 800/1000 Total reward: 200.0 iter : 77406 P : 0.010
Episode: 801/1000 Total reward: 185.0 iter : 77591 P : 0.010
Episode: 802/1000 Total reward: 136.0 iter : 77727 P : 0.010
Episode: 803/1000 Total reward: 142.0 iter : 77869 P : 0.010
Episode: 804/1000 Total reward: 114.0 iter : 77983 P : 0.010
Episode: 805/1000 Total reward: 125.0 iter : 78108 P : 0.010
Episode: 806/1000 Total reward: 137.0 iter : 78245 P : 0.010
Episode: 807/1000 Total reward: 94.0 iter : 78339 P : 0.010
Episode: 808/1000 Total reward: 169.0 iter : 78508 P : 0.010
Episode: 809/1000 Total reward: 138.0 iter : 78646 P : 0.010
Episode: 810/1000 Total reward: 117.0 iter : 78763 P : 0.010
Episode: 811/1000 Total reward: 173.0 iter : 78936 P : 0.010
Episode: 812/1000 Total reward: 165.0 iter : 79101 P : 0.010
Episode: 813/1000 Total reward: 151.0 iter : 79252 P : 0.010
Episode: 814/1000 Total reward: 109.0 iter : 79361 P : 0.010
Episode: 815/1000 Total reward: 123.0 iter : 79484 P : 0.010
Episode: 816/1000 Total reward: 198.0 iter : 79682 P : 0.010
Episode: 817/1000 Total reward: 88.0 iter : 79770 P : 0.010
Episode: 818/1000 Total reward: 28.0 iter : 79798 P : 0.010
Episode: 819/1000 Total reward: 200.0 iter : 79998 P : 0.010
Episode: 820/1000 Total reward: 200.0 iter : 80198 P : 0.010
Episode: 821/1000 Total reward: 200.0 iter : 80398 P : 0.010
Episode: 822/1000 Total reward: 200.0 iter : 80598 P : 0.010
Episode: 823/1000 Total reward: 200.0 iter : 80798 P : 0.010
Episode: 824/1000 Total reward: 200.0 iter : 80998 P : 0.010
Episode: 825/1000 Total reward: 200.0 iter : 81198 P : 0.010
Episode: 826/1000 Total reward: 144.0 iter : 81342 P : 0.010
Episode: 827/1000 Total reward: 112.0 iter : 81454 P : 0.010
Episode: 828/1000 Total reward: 21.0 iter : 81475 P : 0.010
Episode: 829/1000 Total reward: 73.0 iter : 81548 P : 0.010
Episode: 830/1000 Total reward: 10.0 iter : 81558 P : 0.010
Episode: 831/1000 Total reward: 104.0 iter : 81662 P : 0.010
Episode: 832/1000 Total reward: 200.0 iter : 81862 P : 0.010
Episode: 833/1000 Total reward: 200.0 iter : 82062 P : 0.010
Episode: 834/1000 Total reward: 200.0 iter : 82262 P : 0.010
Episode: 835/1000 Total reward: 200.0 iter : 82462 P : 0.010
Episode: 836/1000 Total reward: 200.0 iter : 82662 P : 0.010
Episode: 837/1000 Total reward: 200.0 iter : 82862 P : 0.010
Episode: 838/1000 Total reward: 200.0 iter : 83062 P : 0.010
Episode: 839/1000 Total reward: 200.0 iter : 83262 P : 0.010
Episode: 840/1000 Total reward: 200.0 iter : 83462 P : 0.010
Episode: 841/1000 Total reward: 24.0 iter : 83486 P : 0.010
Episode: 842/1000 Total reward: 11.0 iter : 83497 P : 0.010
Episode: 843/1000 Total reward: 15.0 iter : 83512 P : 0.010
Episode: 844/1000 Total reward: 10.0 iter : 83522 P : 0.010
Episode: 845/1000 Total reward: 9.0 iter : 83531 P : 0.010
Episode: 846/1000 Total reward: 200.0 iter : 83731 P : 0.010
Episode: 847/1000 Total reward: 200.0 iter : 83931 P : 0.010
Episode: 848/1000 Total reward: 200.0 iter : 84131 P : 0.010
Episode: 849/1000 Total reward: 200.0 iter : 84331 P : 0.010
Episode: 850/1000 Total reward: 200.0 iter : 84531 P : 0.010
Episode: 851/1000 Total reward: 200.0 iter : 84731 P : 0.010
Episode: 852/1000 Total reward: 200.0 iter : 84931 P : 0.010
Episode: 853/1000 Total reward: 200.0 iter : 85131 P : 0.010
Episode: 854/1000 Total reward: 200.0 iter : 85331 P : 0.010
Episode: 855/1000 Total reward: 200.0 iter : 85531 P : 0.010
Episode: 856/1000 Total reward: 88.0 iter : 85619 P : 0.010
Episode: 857/1000 Total reward: 12.0 iter : 85631 P : 0.010
Episode: 858/1000 Total reward: 32.0 iter : 85663 P : 0.010
Episode: 859/1000 Total reward: 19.0 iter : 85682 P : 0.010
Episode: 860/1000 Total reward: 14.0 iter : 85696 P : 0.010
Episode: 861/1000 Total reward: 109.0 iter : 85805 P : 0.010
Episode: 862/1000 Total reward: 97.0 iter : 85902 P : 0.010
Episode: 863/1000 Total reward: 200.0 iter : 86102 P : 0.010
Episode: 864/1000 Total reward: 200.0 iter : 86302 P : 0.010
Episode: 865/1000 Total reward: 200.0 iter : 86502 P : 0.010
Episode: 866/1000 Total reward: 164.0 iter : 86666 P : 0.010
Episode: 867/1000 Total reward: 19.0 iter : 86685 P : 0.010
Episode: 868/1000 Total reward: 9.0 iter : 86694 P : 0.010
Episode: 869/1000 Total reward: 10.0 iter : 86704 P : 0.010
Episode: 870/1000 Total reward: 10.0 iter : 86714 P : 0.010
Episode: 871/1000 Total reward: 10.0 iter : 86724 P : 0.010
Episode: 872/1000 Total reward: 10.0 iter : 86734 P : 0.010
Episode: 873/1000 Total reward: 10.0 iter : 86744 P : 0.010
Episode: 874/1000 Total reward: 10.0 iter : 86754 P : 0.010
Episode: 875/1000 Total reward: 10.0 iter : 86764 P : 0.010
Episode: 876/1000 Total reward: 8.0 iter : 86772 P : 0.010
Episode: 877/1000 Total reward: 9.0 iter : 86781 P : 0.010
Episode: 878/1000 Total reward: 9.0 iter : 86790 P : 0.010
Episode: 879/1000 Total reward: 9.0 iter : 86799 P : 0.010
Episode: 880/1000 Total reward: 10.0 iter : 86809 P : 0.010
Episode: 881/1000 Total reward: 9.0 iter : 86818 P : 0.010
Episode: 882/1000 Total reward: 9.0 iter : 86827 P : 0.010
Episode: 883/1000 Total reward: 9.0 iter : 86836 P : 0.010
Episode: 884/1000 Total reward: 9.0 iter : 86845 P : 0.010
Episode: 885/1000 Total reward: 9.0 iter : 86854 P : 0.010
Episode: 886/1000 Total reward: 9.0 iter : 86863 P : 0.010
Episode: 887/1000 Total reward: 9.0 iter : 86872 P : 0.010
Episode: 888/1000 Total reward: 200.0 iter : 87072 P : 0.010
Episode: 889/1000 Total reward: 200.0 iter : 87272 P : 0.010
Episode: 890/1000 Total reward: 200.0 iter : 87472 P : 0.010
Episode: 891/1000 Total reward: 200.0 iter : 87672 P : 0.010
Episode: 892/1000 Total reward: 200.0 iter : 87872 P : 0.010
Episode: 893/1000 Total reward: 200.0 iter : 88072 P : 0.010
Episode: 894/1000 Total reward: 200.0 iter : 88272 P : 0.010
Episode: 895/1000 Total reward: 200.0 iter : 88472 P : 0.010
Episode: 896/1000 Total reward: 200.0 iter : 88672 P : 0.010
Episode: 897/1000 Total reward: 200.0 iter : 88872 P : 0.010
Episode: 898/1000 Total reward: 200.0 iter : 89072 P : 0.010
Episode: 899/1000 Total reward: 200.0 iter : 89272 P : 0.010
Episode: 900/1000 Total reward: 200.0 iter : 89472 P : 0.010
Episode: 901/1000 Total reward: 200.0 iter : 89672 P : 0.010
Episode: 902/1000 Total reward: 200.0 iter : 89872 P : 0.010
Episode: 903/1000 Total reward: 200.0 iter : 90072 P : 0.010
Episode: 904/1000 Total reward: 200.0 iter : 90272 P : 0.010
Episode: 905/1000 Total reward: 200.0 iter : 90472 P : 0.010
Episode: 906/1000 Total reward: 200.0 iter : 90672 P : 0.010
Episode: 907/1000 Total reward: 200.0 iter : 90872 P : 0.010
Episode: 908/1000 Total reward: 200.0 iter : 91072 P : 0.010
Episode: 909/1000 Total reward: 200.0 iter : 91272 P : 0.010
Episode: 910/1000 Total reward: 200.0 iter : 91472 P : 0.010
Episode: 911/1000 Total reward: 200.0 iter : 91672 P : 0.010
Episode: 912/1000 Total reward: 200.0 iter : 91872 P : 0.010
Episode: 913/1000 Total reward: 200.0 iter : 92072 P : 0.010
Episode: 914/1000 Total reward: 162.0 iter : 92234 P : 0.010
Episode: 915/1000 Total reward: 117.0 iter : 92351 P : 0.010
Episode: 916/1000 Total reward: 123.0 iter : 92474 P : 0.010
Episode: 917/1000 Total reward: 109.0 iter : 92583 P : 0.010
Episode: 918/1000 Total reward: 99.0 iter : 92682 P : 0.010
Episode: 919/1000 Total reward: 98.0 iter : 92780 P : 0.010
Episode: 920/1000 Total reward: 109.0 iter : 92889 P : 0.010
Episode: 921/1000 Total reward: 111.0 iter : 93000 P : 0.010
Episode: 922/1000 Total reward: 105.0 iter : 93105 P : 0.010
Episode: 923/1000 Total reward: 117.0 iter : 93222 P : 0.010
Episode: 924/1000 Total reward: 138.0 iter : 93360 P : 0.010
Episode: 925/1000 Total reward: 200.0 iter : 93560 P : 0.010
Episode: 926/1000 Total reward: 200.0 iter : 93760 P : 0.010
Episode: 927/1000 Total reward: 200.0 iter : 93960 P : 0.010
Episode: 928/1000 Total reward: 200.0 iter : 94160 P : 0.010
Episode: 929/1000 Total reward: 200.0 iter : 94360 P : 0.010
Episode: 930/1000 Total reward: 200.0 iter : 94560 P : 0.010
Episode: 931/1000 Total reward: 200.0 iter : 94760 P : 0.010
Episode: 932/1000 Total reward: 200.0 iter : 94960 P : 0.010
Episode: 933/1000 Total reward: 200.0 iter : 95160 P : 0.010
Episode: 934/1000 Total reward: 200.0 iter : 95360 P : 0.010
Episode: 935/1000 Total reward: 200.0 iter : 95560 P : 0.010
Episode: 936/1000 Total reward: 200.0 iter : 95760 P : 0.010
Episode: 937/1000 Total reward: 200.0 iter : 95960 P : 0.010
Episode: 938/1000 Total reward: 200.0 iter : 96160 P : 0.010
Episode: 939/1000 Total reward: 200.0 iter : 96360 P : 0.010
Episode: 940/1000 Total reward: 200.0 iter : 96560 P : 0.010
Episode: 941/1000 Total reward: 193.0 iter : 96753 P : 0.010
Episode: 942/1000 Total reward: 191.0 iter : 96944 P : 0.010
Episode: 943/1000 Total reward: 92.0 iter : 97036 P : 0.010
Episode: 944/1000 Total reward: 152.0 iter : 97188 P : 0.010
Episode: 945/1000 Total reward: 181.0 iter : 97369 P : 0.010
Episode: 946/1000 Total reward: 200.0 iter : 97569 P : 0.010
Episode: 947/1000 Total reward: 119.0 iter : 97688 P : 0.010
Episode: 948/1000 Total reward: 200.0 iter : 97888 P : 0.010
Episode: 949/1000 Total reward: 200.0 iter : 98088 P : 0.010
Episode: 950/1000 Total reward: 200.0 iter : 98288 P : 0.010
Episode: 951/1000 Total reward: 132.0 iter : 98420 P : 0.010
Episode: 952/1000 Total reward: 113.0 iter : 98533 P : 0.010
Episode: 953/1000 Total reward: 104.0 iter : 98637 P : 0.010
Episode: 954/1000 Total reward: 101.0 iter : 98738 P : 0.010
Episode: 955/1000 Total reward: 24.0 iter : 98762 P : 0.010
Episode: 956/1000 Total reward: 22.0 iter : 98784 P : 0.010
Episode: 957/1000 Total reward: 23.0 iter : 98807 P : 0.010
Episode: 958/1000 Total reward: 23.0 iter : 98830 P : 0.010
Episode: 959/1000 Total reward: 25.0 iter : 98855 P : 0.010
Episode: 960/1000 Total reward: 23.0 iter : 98878 P : 0.010
Episode: 961/1000 Total reward: 22.0 iter : 98900 P : 0.010
Episode: 962/1000 Total reward: 20.0 iter : 98920 P : 0.010
Episode: 963/1000 Total reward: 24.0 iter : 98944 P : 0.010
Episode: 964/1000 Total reward: 36.0 iter : 98980 P : 0.010
Episode: 965/1000 Total reward: 25.0 iter : 99005 P : 0.010
Episode: 966/1000 Total reward: 35.0 iter : 99040 P : 0.010
Episode: 967/1000 Total reward: 31.0 iter : 99071 P : 0.010
Episode: 968/1000 Total reward: 107.0 iter : 99178 P : 0.010
Episode: 969/1000 Total reward: 185.0 iter : 99363 P : 0.010
Episode: 970/1000 Total reward: 124.0 iter : 99487 P : 0.010
Episode: 971/1000 Total reward: 200.0 iter : 99687 P : 0.010
Episode: 972/1000 Total reward: 142.0 iter : 99829 P : 0.010
Episode: 973/1000 Total reward: 200.0 iter : 100029 P : 0.010
Episode: 974/1000 Total reward: 160.0 iter : 100189 P : 0.010
Episode: 975/1000 Total reward: 200.0 iter : 100389 P : 0.010
Episode: 976/1000 Total reward: 200.0 iter : 100589 P : 0.010
Episode: 977/1000 Total reward: 200.0 iter : 100789 P : 0.010
Episode: 978/1000 Total reward: 200.0 iter : 100989 P : 0.010
Episode: 979/1000 Total reward: 200.0 iter : 101189 P : 0.010
Episode: 980/1000 Total reward: 200.0 iter : 101389 P : 0.010
Episode: 981/1000 Total reward: 200.0 iter : 101589 P : 0.010
Episode: 982/1000 Total reward: 200.0 iter : 101789 P : 0.010
Episode: 983/1000 Total reward: 200.0 iter : 101989 P : 0.010
Episode: 984/1000 Total reward: 200.0 iter : 102189 P : 0.010
Episode: 985/1000 Total reward: 200.0 iter : 102389 P : 0.010
Episode: 986/1000 Total reward: 200.0 iter : 102589 P : 0.010
Episode: 987/1000 Total reward: 200.0 iter : 102789 P : 0.010
Episode: 988/1000 Total reward: 200.0 iter : 102989 P : 0.010
Episode: 989/1000 Total reward: 200.0 iter : 103189 P : 0.010
Episode: 990/1000 Total reward: 200.0 iter : 103389 P : 0.010
Episode: 991/1000 Total reward: 200.0 iter : 103589 P : 0.010
Episode: 992/1000 Total reward: 200.0 iter : 103789 P : 0.010
Episode: 993/1000 Total reward: 200.0 iter : 103989 P : 0.010
Episode: 994/1000 Total reward: 200.0 iter : 104189 P : 0.010
Episode: 995/1000 Total reward: 200.0 iter : 104389 P : 0.010
Episode: 996/1000 Total reward: 200.0 iter : 104589 P : 0.010
Episode: 997/1000 Total reward: 200.0 iter : 104789 P : 0.010
Episode: 998/1000 Total reward: 200.0 iter : 104989 P : 0.010
Episode: 999/1000 Total reward: 200.0 iter : 105189 P : 0.010
In [7]:
DQN = QNetwork(learning_rate = 0.001, use_replay_memory=False, reshape_reward=False)
DQN.train(train_episodes_ovr=1000)
DQN.save_stats("DQN_without_memory_simple_reward.p")
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 16)                80        
_________________________________________________________________
dense_2 (Dense)              (None, 16)                272       
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 34        
=================================================================
Total params: 386
Trainable params: 386
Non-trainable params: 0
_________________________________________________________________
Episode: 0/1000 Total reward: 32.0 iter : 32 P : 0.994
Episode: 1/1000 Total reward: 13.0 iter : 45 P : 0.991
Episode: 2/1000 Total reward: 12.0 iter : 57 P : 0.989
Episode: 3/1000 Total reward: 19.0 iter : 76 P : 0.985
Episode: 4/1000 Total reward: 27.0 iter : 103 P : 0.980
Episode: 5/1000 Total reward: 10.0 iter : 113 P : 0.978
Episode: 6/1000 Total reward: 31.0 iter : 144 P : 0.972
Episode: 7/1000 Total reward: 24.0 iter : 168 P : 0.967
Episode: 8/1000 Total reward: 16.0 iter : 184 P : 0.964
Episode: 9/1000 Total reward: 15.0 iter : 199 P : 0.961
Episode: 10/1000 Total reward: 38.0 iter : 237 P : 0.954
Episode: 11/1000 Total reward: 24.0 iter : 261 P : 0.950
Episode: 12/1000 Total reward: 14.0 iter : 275 P : 0.947
Episode: 13/1000 Total reward: 13.0 iter : 288 P : 0.945
Episode: 14/1000 Total reward: 21.0 iter : 309 P : 0.941
Episode: 15/1000 Total reward: 15.0 iter : 324 P : 0.938
Episode: 16/1000 Total reward: 16.0 iter : 340 P : 0.935
Episode: 17/1000 Total reward: 14.0 iter : 354 P : 0.932
Episode: 18/1000 Total reward: 62.0 iter : 416 P : 0.921
Episode: 19/1000 Total reward: 17.0 iter : 433 P : 0.918
Episode: 20/1000 Total reward: 34.0 iter : 467 P : 0.912
Episode: 21/1000 Total reward: 16.0 iter : 483 P : 0.909
Episode: 22/1000 Total reward: 14.0 iter : 497 P : 0.906
Episode: 23/1000 Total reward: 38.0 iter : 535 P : 0.900
Episode: 24/1000 Total reward: 34.0 iter : 569 P : 0.894
Episode: 25/1000 Total reward: 29.0 iter : 598 P : 0.888
Episode: 26/1000 Total reward: 15.0 iter : 613 P : 0.886
Episode: 27/1000 Total reward: 15.0 iter : 628 P : 0.883
Episode: 28/1000 Total reward: 18.0 iter : 646 P : 0.880
Episode: 29/1000 Total reward: 9.0 iter : 655 P : 0.878
Episode: 30/1000 Total reward: 15.0 iter : 670 P : 0.876
Episode: 31/1000 Total reward: 42.0 iter : 712 P : 0.869
Episode: 32/1000 Total reward: 18.0 iter : 730 P : 0.866
Episode: 33/1000 Total reward: 14.0 iter : 744 P : 0.863
Episode: 34/1000 Total reward: 37.0 iter : 781 P : 0.857
Episode: 35/1000 Total reward: 21.0 iter : 802 P : 0.853
Episode: 36/1000 Total reward: 14.0 iter : 816 P : 0.851
Episode: 37/1000 Total reward: 31.0 iter : 847 P : 0.846
Episode: 38/1000 Total reward: 29.0 iter : 876 P : 0.841
Episode: 39/1000 Total reward: 21.0 iter : 897 P : 0.837
Episode: 40/1000 Total reward: 22.0 iter : 919 P : 0.834
Episode: 41/1000 Total reward: 13.0 iter : 932 P : 0.832
Episode: 42/1000 Total reward: 23.0 iter : 955 P : 0.828
Episode: 43/1000 Total reward: 18.0 iter : 973 P : 0.825
Episode: 44/1000 Total reward: 58.0 iter : 1031 P : 0.816
Episode: 45/1000 Total reward: 22.0 iter : 1053 P : 0.812
Episode: 46/1000 Total reward: 38.0 iter : 1091 P : 0.806
Episode: 47/1000 Total reward: 39.0 iter : 1130 P : 0.800
Episode: 48/1000 Total reward: 13.0 iter : 1143 P : 0.798
Episode: 49/1000 Total reward: 32.0 iter : 1175 P : 0.793
Episode: 50/1000 Total reward: 54.0 iter : 1229 P : 0.784
Episode: 51/1000 Total reward: 27.0 iter : 1256 P : 0.780
Episode: 52/1000 Total reward: 10.0 iter : 1266 P : 0.779
Episode: 53/1000 Total reward: 46.0 iter : 1312 P : 0.772
Episode: 54/1000 Total reward: 66.0 iter : 1378 P : 0.762
Episode: 55/1000 Total reward: 14.0 iter : 1392 P : 0.759
Episode: 56/1000 Total reward: 18.0 iter : 1410 P : 0.757
Episode: 57/1000 Total reward: 22.0 iter : 1432 P : 0.753
Episode: 58/1000 Total reward: 29.0 iter : 1461 P : 0.749
Episode: 59/1000 Total reward: 21.0 iter : 1482 P : 0.746
Episode: 60/1000 Total reward: 21.0 iter : 1503 P : 0.743
Episode: 61/1000 Total reward: 35.0 iter : 1538 P : 0.738
Episode: 62/1000 Total reward: 25.0 iter : 1563 P : 0.734
Episode: 63/1000 Total reward: 34.0 iter : 1597 P : 0.729
Episode: 64/1000 Total reward: 14.0 iter : 1611 P : 0.727
Episode: 65/1000 Total reward: 10.0 iter : 1621 P : 0.726
Episode: 66/1000 Total reward: 75.0 iter : 1696 P : 0.715
Episode: 67/1000 Total reward: 23.0 iter : 1719 P : 0.712
Episode: 68/1000 Total reward: 38.0 iter : 1757 P : 0.707
Episode: 69/1000 Total reward: 43.0 iter : 1800 P : 0.701
Episode: 70/1000 Total reward: 79.0 iter : 1879 P : 0.690
Episode: 71/1000 Total reward: 31.0 iter : 1910 P : 0.686
Episode: 72/1000 Total reward: 25.0 iter : 1935 P : 0.682
Episode: 73/1000 Total reward: 14.0 iter : 1949 P : 0.680
Episode: 74/1000 Total reward: 11.0 iter : 1960 P : 0.679
Episode: 75/1000 Total reward: 37.0 iter : 1997 P : 0.674
Episode: 76/1000 Total reward: 18.0 iter : 2015 P : 0.672
Episode: 77/1000 Total reward: 22.0 iter : 2037 P : 0.669
Episode: 78/1000 Total reward: 29.0 iter : 2066 P : 0.665
Episode: 79/1000 Total reward: 16.0 iter : 2082 P : 0.663
Episode: 80/1000 Total reward: 24.0 iter : 2106 P : 0.660
Episode: 81/1000 Total reward: 79.0 iter : 2185 P : 0.650
Episode: 82/1000 Total reward: 38.0 iter : 2223 P : 0.645
Episode: 83/1000 Total reward: 20.0 iter : 2243 P : 0.642
Episode: 84/1000 Total reward: 17.0 iter : 2260 P : 0.640
Episode: 85/1000 Total reward: 24.0 iter : 2284 P : 0.637
Episode: 86/1000 Total reward: 24.0 iter : 2308 P : 0.634
Episode: 87/1000 Total reward: 15.0 iter : 2323 P : 0.632
Episode: 88/1000 Total reward: 151.0 iter : 2474 P : 0.614
Episode: 89/1000 Total reward: 65.0 iter : 2539 P : 0.606
Episode: 90/1000 Total reward: 38.0 iter : 2577 P : 0.601
Episode: 91/1000 Total reward: 22.0 iter : 2599 P : 0.599
Episode: 92/1000 Total reward: 113.0 iter : 2712 P : 0.586
Episode: 93/1000 Total reward: 61.0 iter : 2773 P : 0.579
Episode: 94/1000 Total reward: 21.0 iter : 2794 P : 0.576
Episode: 95/1000 Total reward: 19.0 iter : 2813 P : 0.574
Episode: 96/1000 Total reward: 23.0 iter : 2836 P : 0.571
Episode: 97/1000 Total reward: 74.0 iter : 2910 P : 0.563
Episode: 98/1000 Total reward: 22.0 iter : 2932 P : 0.561
Episode: 99/1000 Total reward: 38.0 iter : 2970 P : 0.557
Episode: 100/1000 Total reward: 62.0 iter : 3032 P : 0.550
Episode: 101/1000 Total reward: 30.0 iter : 3062 P : 0.547
Episode: 102/1000 Total reward: 32.0 iter : 3094 P : 0.543
Episode: 103/1000 Total reward: 23.0 iter : 3117 P : 0.541
Episode: 104/1000 Total reward: 119.0 iter : 3236 P : 0.528
Episode: 105/1000 Total reward: 82.0 iter : 3318 P : 0.520
Episode: 106/1000 Total reward: 26.0 iter : 3344 P : 0.517
Episode: 107/1000 Total reward: 16.0 iter : 3360 P : 0.516
Episode: 108/1000 Total reward: 87.0 iter : 3447 P : 0.507
Episode: 109/1000 Total reward: 22.0 iter : 3469 P : 0.505
Episode: 110/1000 Total reward: 40.0 iter : 3509 P : 0.501
Episode: 111/1000 Total reward: 26.0 iter : 3535 P : 0.498
Episode: 112/1000 Total reward: 83.0 iter : 3618 P : 0.490
Episode: 113/1000 Total reward: 37.0 iter : 3655 P : 0.487
Episode: 114/1000 Total reward: 62.0 iter : 3717 P : 0.481
Episode: 115/1000 Total reward: 21.0 iter : 3738 P : 0.479
Episode: 116/1000 Total reward: 45.0 iter : 3783 P : 0.475
Episode: 117/1000 Total reward: 14.0 iter : 3797 P : 0.473
Episode: 118/1000 Total reward: 125.0 iter : 3922 P : 0.462
Episode: 119/1000 Total reward: 59.0 iter : 3981 P : 0.457
Episode: 120/1000 Total reward: 9.0 iter : 3990 P : 0.456
Episode: 121/1000 Total reward: 12.0 iter : 4002 P : 0.455
Episode: 122/1000 Total reward: 50.0 iter : 4052 P : 0.450
Episode: 123/1000 Total reward: 21.0 iter : 4073 P : 0.448
Episode: 124/1000 Total reward: 88.0 iter : 4161 P : 0.441
Episode: 125/1000 Total reward: 80.0 iter : 4241 P : 0.434
Episode: 126/1000 Total reward: 43.0 iter : 4284 P : 0.430
Episode: 127/1000 Total reward: 175.0 iter : 4459 P : 0.416
Episode: 128/1000 Total reward: 147.0 iter : 4606 P : 0.404
Episode: 129/1000 Total reward: 130.0 iter : 4736 P : 0.394
Episode: 130/1000 Total reward: 142.0 iter : 4878 P : 0.383
Episode: 131/1000 Total reward: 21.0 iter : 4899 P : 0.382
Episode: 132/1000 Total reward: 200.0 iter : 5099 P : 0.367
Episode: 133/1000 Total reward: 22.0 iter : 5121 P : 0.365
Episode: 134/1000 Total reward: 43.0 iter : 5164 P : 0.362
Episode: 135/1000 Total reward: 15.0 iter : 5179 P : 0.361
Episode: 136/1000 Total reward: 13.0 iter : 5192 P : 0.360
Episode: 137/1000 Total reward: 200.0 iter : 5392 P : 0.347
Episode: 138/1000 Total reward: 31.0 iter : 5423 P : 0.345
Episode: 139/1000 Total reward: 14.0 iter : 5437 P : 0.344
Episode: 140/1000 Total reward: 100.0 iter : 5537 P : 0.337
Episode: 141/1000 Total reward: 200.0 iter : 5737 P : 0.324
Episode: 142/1000 Total reward: 200.0 iter : 5937 P : 0.312
Episode: 143/1000 Total reward: 178.0 iter : 6115 P : 0.301
Episode: 144/1000 Total reward: 157.0 iter : 6272 P : 0.292
Episode: 145/1000 Total reward: 188.0 iter : 6460 P : 0.282
Episode: 146/1000 Total reward: 200.0 iter : 6660 P : 0.271
Episode: 147/1000 Total reward: 200.0 iter : 6860 P : 0.261
Episode: 148/1000 Total reward: 200.0 iter : 7060 P : 0.251
Episode: 149/1000 Total reward: 200.0 iter : 7260 P : 0.242
Episode: 150/1000 Total reward: 192.0 iter : 7452 P : 0.233
Episode: 151/1000 Total reward: 20.0 iter : 7472 P : 0.232
Episode: 152/1000 Total reward: 13.0 iter : 7485 P : 0.232
Episode: 153/1000 Total reward: 11.0 iter : 7496 P : 0.231
Episode: 154/1000 Total reward: 15.0 iter : 7511 P : 0.230
Episode: 155/1000 Total reward: 29.0 iter : 7540 P : 0.229
Episode: 156/1000 Total reward: 125.0 iter : 7665 P : 0.224
Episode: 157/1000 Total reward: 77.0 iter : 7742 P : 0.220
Episode: 158/1000 Total reward: 165.0 iter : 7907 P : 0.214
Episode: 159/1000 Total reward: 21.0 iter : 7928 P : 0.213
Episode: 160/1000 Total reward: 142.0 iter : 8070 P : 0.207
Episode: 161/1000 Total reward: 167.0 iter : 8237 P : 0.201
Episode: 162/1000 Total reward: 159.0 iter : 8396 P : 0.195
Episode: 163/1000 Total reward: 131.0 iter : 8527 P : 0.190
Episode: 164/1000 Total reward: 20.0 iter : 8547 P : 0.189
Episode: 165/1000 Total reward: 20.0 iter : 8567 P : 0.188
Episode: 166/1000 Total reward: 61.0 iter : 8628 P : 0.186
Episode: 167/1000 Total reward: 134.0 iter : 8762 P : 0.182
Episode: 168/1000 Total reward: 59.0 iter : 8821 P : 0.180
Episode: 169/1000 Total reward: 143.0 iter : 8964 P : 0.175
Episode: 170/1000 Total reward: 108.0 iter : 9072 P : 0.171
Episode: 171/1000 Total reward: 176.0 iter : 9248 P : 0.166
Episode: 172/1000 Total reward: 200.0 iter : 9448 P : 0.160
Episode: 173/1000 Total reward: 200.0 iter : 9648 P : 0.154
Episode: 174/1000 Total reward: 57.0 iter : 9705 P : 0.152
Episode: 175/1000 Total reward: 170.0 iter : 9875 P : 0.147
Episode: 176/1000 Total reward: 13.0 iter : 9888 P : 0.147
Episode: 177/1000 Total reward: 39.0 iter : 9927 P : 0.146
Episode: 178/1000 Total reward: 9.0 iter : 9936 P : 0.146
Episode: 179/1000 Total reward: 10.0 iter : 9946 P : 0.145
Episode: 180/1000 Total reward: 200.0 iter : 10146 P : 0.140
Episode: 181/1000 Total reward: 200.0 iter : 10346 P : 0.135
Episode: 182/1000 Total reward: 200.0 iter : 10546 P : 0.130
Episode: 183/1000 Total reward: 200.0 iter : 10746 P : 0.125
Episode: 184/1000 Total reward: 200.0 iter : 10946 P : 0.121
Episode: 185/1000 Total reward: 82.0 iter : 11028 P : 0.119
Episode: 186/1000 Total reward: 200.0 iter : 11228 P : 0.115
Episode: 187/1000 Total reward: 62.0 iter : 11290 P : 0.114
Episode: 188/1000 Total reward: 200.0 iter : 11490 P : 0.109
Episode: 189/1000 Total reward: 200.0 iter : 11690 P : 0.106
Episode: 190/1000 Total reward: 200.0 iter : 11890 P : 0.102
Episode: 191/1000 Total reward: 200.0 iter : 12090 P : 0.098
Episode: 192/1000 Total reward: 37.0 iter : 12127 P : 0.098
Episode: 193/1000 Total reward: 66.0 iter : 12193 P : 0.096
Episode: 194/1000 Total reward: 77.0 iter : 12270 P : 0.095
Episode: 195/1000 Total reward: 78.0 iter : 12348 P : 0.094
Episode: 196/1000 Total reward: 32.0 iter : 12380 P : 0.093
Episode: 197/1000 Total reward: 68.0 iter : 12448 P : 0.092
Episode: 198/1000 Total reward: 61.0 iter : 12509 P : 0.091
Episode: 199/1000 Total reward: 57.0 iter : 12566 P : 0.090
Episode: 200/1000 Total reward: 61.0 iter : 12627 P : 0.089
Episode: 201/1000 Total reward: 40.0 iter : 12667 P : 0.089
Episode: 202/1000 Total reward: 55.0 iter : 12722 P : 0.088
Episode: 203/1000 Total reward: 67.0 iter : 12789 P : 0.087
Episode: 204/1000 Total reward: 54.0 iter : 12843 P : 0.086
Episode: 205/1000 Total reward: 83.0 iter : 12926 P : 0.085
Episode: 206/1000 Total reward: 56.0 iter : 12982 P : 0.084
Episode: 207/1000 Total reward: 67.0 iter : 13049 P : 0.083
Episode: 208/1000 Total reward: 60.0 iter : 13109 P : 0.082
Episode: 209/1000 Total reward: 73.0 iter : 13182 P : 0.081
Episode: 210/1000 Total reward: 69.0 iter : 13251 P : 0.080
Episode: 211/1000 Total reward: 100.0 iter : 13351 P : 0.079
Episode: 212/1000 Total reward: 89.0 iter : 13440 P : 0.077
Episode: 213/1000 Total reward: 115.0 iter : 13555 P : 0.076
Episode: 214/1000 Total reward: 83.0 iter : 13638 P : 0.075
Episode: 215/1000 Total reward: 96.0 iter : 13734 P : 0.073
Episode: 216/1000 Total reward: 103.0 iter : 13837 P : 0.072
Episode: 217/1000 Total reward: 73.0 iter : 13910 P : 0.071
Episode: 218/1000 Total reward: 66.0 iter : 13976 P : 0.070
Episode: 219/1000 Total reward: 72.0 iter : 14048 P : 0.070
Episode: 220/1000 Total reward: 50.0 iter : 14098 P : 0.069
Episode: 221/1000 Total reward: 59.0 iter : 14157 P : 0.068
Episode: 222/1000 Total reward: 63.0 iter : 14220 P : 0.068
Episode: 223/1000 Total reward: 55.0 iter : 14275 P : 0.067
Episode: 224/1000 Total reward: 62.0 iter : 14337 P : 0.066
Episode: 225/1000 Total reward: 69.0 iter : 14406 P : 0.066
Episode: 226/1000 Total reward: 80.0 iter : 14486 P : 0.065
Episode: 227/1000 Total reward: 98.0 iter : 14584 P : 0.064
Episode: 228/1000 Total reward: 65.0 iter : 14649 P : 0.063
Episode: 229/1000 Total reward: 87.0 iter : 14736 P : 0.062
Episode: 230/1000 Total reward: 108.0 iter : 14844 P : 0.061
Episode: 231/1000 Total reward: 197.0 iter : 15041 P : 0.059
Episode: 232/1000 Total reward: 200.0 iter : 15241 P : 0.057
Episode: 233/1000 Total reward: 126.0 iter : 15367 P : 0.056
Episode: 234/1000 Total reward: 200.0 iter : 15567 P : 0.054
Episode: 235/1000 Total reward: 200.0 iter : 15767 P : 0.052
Episode: 236/1000 Total reward: 124.0 iter : 15891 P : 0.051
Episode: 237/1000 Total reward: 9.0 iter : 15900 P : 0.051
Episode: 238/1000 Total reward: 13.0 iter : 15913 P : 0.051
Episode: 239/1000 Total reward: 200.0 iter : 16113 P : 0.049
Episode: 240/1000 Total reward: 200.0 iter : 16313 P : 0.048
Episode: 241/1000 Total reward: 200.0 iter : 16513 P : 0.046
Episode: 242/1000 Total reward: 200.0 iter : 16713 P : 0.045
Episode: 243/1000 Total reward: 200.0 iter : 16913 P : 0.044
Episode: 244/1000 Total reward: 200.0 iter : 17113 P : 0.042
Episode: 245/1000 Total reward: 200.0 iter : 17313 P : 0.041
Episode: 246/1000 Total reward: 64.0 iter : 17377 P : 0.041
Episode: 247/1000 Total reward: 55.0 iter : 17432 P : 0.040
Episode: 248/1000 Total reward: 52.0 iter : 17484 P : 0.040
Episode: 249/1000 Total reward: 200.0 iter : 17684 P : 0.039
Episode: 250/1000 Total reward: 200.0 iter : 17884 P : 0.038
Episode: 251/1000 Total reward: 200.0 iter : 18084 P : 0.037
Episode: 252/1000 Total reward: 200.0 iter : 18284 P : 0.036
Episode: 253/1000 Total reward: 200.0 iter : 18484 P : 0.035
Episode: 254/1000 Total reward: 200.0 iter : 18684 P : 0.034
Episode: 255/1000 Total reward: 200.0 iter : 18884 P : 0.033
Episode: 256/1000 Total reward: 200.0 iter : 19084 P : 0.032
Episode: 257/1000 Total reward: 200.0 iter : 19284 P : 0.031
Episode: 258/1000 Total reward: 200.0 iter : 19484 P : 0.030
Episode: 259/1000 Total reward: 200.0 iter : 19684 P : 0.029
Episode: 260/1000 Total reward: 200.0 iter : 19884 P : 0.029
Episode: 261/1000 Total reward: 200.0 iter : 20084 P : 0.028
Episode: 262/1000 Total reward: 200.0 iter : 20284 P : 0.027
Episode: 263/1000 Total reward: 135.0 iter : 20419 P : 0.027
Episode: 264/1000 Total reward: 10.0 iter : 20429 P : 0.027
Episode: 265/1000 Total reward: 11.0 iter : 20440 P : 0.027
Episode: 266/1000 Total reward: 44.0 iter : 20484 P : 0.026
Episode: 267/1000 Total reward: 9.0 iter : 20493 P : 0.026
Episode: 268/1000 Total reward: 9.0 iter : 20502 P : 0.026
Episode: 269/1000 Total reward: 103.0 iter : 20605 P : 0.026
Episode: 270/1000 Total reward: 200.0 iter : 20805 P : 0.025
Episode: 271/1000 Total reward: 200.0 iter : 21005 P : 0.025
Episode: 272/1000 Total reward: 200.0 iter : 21205 P : 0.024
Episode: 273/1000 Total reward: 200.0 iter : 21405 P : 0.024
Episode: 274/1000 Total reward: 200.0 iter : 21605 P : 0.023
Episode: 275/1000 Total reward: 200.0 iter : 21805 P : 0.023
Episode: 276/1000 Total reward: 200.0 iter : 22005 P : 0.022
Episode: 277/1000 Total reward: 200.0 iter : 22205 P : 0.022
Episode: 278/1000 Total reward: 200.0 iter : 22405 P : 0.021
Episode: 279/1000 Total reward: 200.0 iter : 22605 P : 0.021
Episode: 280/1000 Total reward: 200.0 iter : 22805 P : 0.020
Episode: 281/1000 Total reward: 200.0 iter : 23005 P : 0.020
Episode: 282/1000 Total reward: 200.0 iter : 23205 P : 0.020
Episode: 283/1000 Total reward: 200.0 iter : 23405 P : 0.019
Episode: 284/1000 Total reward: 200.0 iter : 23605 P : 0.019
Episode: 285/1000 Total reward: 200.0 iter : 23805 P : 0.018
Episode: 286/1000 Total reward: 200.0 iter : 24005 P : 0.018
Episode: 287/1000 Total reward: 150.0 iter : 24155 P : 0.018
Episode: 288/1000 Total reward: 9.0 iter : 24164 P : 0.018
Episode: 289/1000 Total reward: 9.0 iter : 24173 P : 0.018
Episode: 290/1000 Total reward: 17.0 iter : 24190 P : 0.018
Episode: 291/1000 Total reward: 44.0 iter : 24234 P : 0.018
Episode: 292/1000 Total reward: 12.0 iter : 24246 P : 0.018
Episode: 293/1000 Total reward: 13.0 iter : 24259 P : 0.018
Episode: 294/1000 Total reward: 81.0 iter : 24340 P : 0.018
Episode: 295/1000 Total reward: 155.0 iter : 24495 P : 0.017
Episode: 296/1000 Total reward: 181.0 iter : 24676 P : 0.017
Episode: 297/1000 Total reward: 137.0 iter : 24813 P : 0.017
Episode: 298/1000 Total reward: 200.0 iter : 25013 P : 0.017
Episode: 299/1000 Total reward: 200.0 iter : 25213 P : 0.016
Episode: 300/1000 Total reward: 200.0 iter : 25413 P : 0.016
Episode: 301/1000 Total reward: 200.0 iter : 25613 P : 0.016
Episode: 302/1000 Total reward: 193.0 iter : 25806 P : 0.016
Episode: 303/1000 Total reward: 130.0 iter : 25936 P : 0.016
Episode: 304/1000 Total reward: 137.0 iter : 26073 P : 0.015
Episode: 305/1000 Total reward: 128.0 iter : 26201 P : 0.015
Episode: 306/1000 Total reward: 111.0 iter : 26312 P : 0.015
Episode: 307/1000 Total reward: 103.0 iter : 26415 P : 0.015
Episode: 308/1000 Total reward: 94.0 iter : 26509 P : 0.015
Episode: 309/1000 Total reward: 123.0 iter : 26632 P : 0.015
Episode: 310/1000 Total reward: 66.0 iter : 26698 P : 0.015
Episode: 311/1000 Total reward: 53.0 iter : 26751 P : 0.015
Episode: 312/1000 Total reward: 63.0 iter : 26814 P : 0.015
Episode: 313/1000 Total reward: 51.0 iter : 26865 P : 0.015
Episode: 314/1000 Total reward: 57.0 iter : 26922 P : 0.015
Episode: 315/1000 Total reward: 54.0 iter : 26976 P : 0.014
Episode: 316/1000 Total reward: 51.0 iter : 27027 P : 0.014
Episode: 317/1000 Total reward: 51.0 iter : 27078 P : 0.014
Episode: 318/1000 Total reward: 38.0 iter : 27116 P : 0.014
Episode: 319/1000 Total reward: 34.0 iter : 27150 P : 0.014
Episode: 320/1000 Total reward: 32.0 iter : 27182 P : 0.014
Episode: 321/1000 Total reward: 38.0 iter : 27220 P : 0.014
Episode: 322/1000 Total reward: 27.0 iter : 27247 P : 0.014
Episode: 323/1000 Total reward: 37.0 iter : 27284 P : 0.014
Episode: 324/1000 Total reward: 36.0 iter : 27320 P : 0.014
Episode: 325/1000 Total reward: 26.0 iter : 27346 P : 0.014
Episode: 326/1000 Total reward: 21.0 iter : 27367 P : 0.014
Episode: 327/1000 Total reward: 36.0 iter : 27403 P : 0.014
Episode: 328/1000 Total reward: 30.0 iter : 27433 P : 0.014
Episode: 329/1000 Total reward: 24.0 iter : 27457 P : 0.014
Episode: 330/1000 Total reward: 25.0 iter : 27482 P : 0.014
Episode: 331/1000 Total reward: 32.0 iter : 27514 P : 0.014
Episode: 332/1000 Total reward: 33.0 iter : 27547 P : 0.014
Episode: 333/1000 Total reward: 22.0 iter : 27569 P : 0.014
Episode: 334/1000 Total reward: 22.0 iter : 27591 P : 0.014
Episode: 335/1000 Total reward: 23.0 iter : 27614 P : 0.014
Episode: 336/1000 Total reward: 23.0 iter : 27637 P : 0.014
Episode: 337/1000 Total reward: 29.0 iter : 27666 P : 0.014
Episode: 338/1000 Total reward: 17.0 iter : 27683 P : 0.014
Episode: 339/1000 Total reward: 33.0 iter : 27716 P : 0.014
Episode: 340/1000 Total reward: 31.0 iter : 27747 P : 0.014
Episode: 341/1000 Total reward: 23.0 iter : 27770 P : 0.014
Episode: 342/1000 Total reward: 19.0 iter : 27789 P : 0.014
Episode: 343/1000 Total reward: 23.0 iter : 27812 P : 0.014
Episode: 344/1000 Total reward: 24.0 iter : 27836 P : 0.014
Episode: 345/1000 Total reward: 36.0 iter : 27872 P : 0.014
Episode: 346/1000 Total reward: 22.0 iter : 27894 P : 0.014
Episode: 347/1000 Total reward: 29.0 iter : 27923 P : 0.014
Episode: 348/1000 Total reward: 18.0 iter : 27941 P : 0.014
Episode: 349/1000 Total reward: 20.0 iter : 27961 P : 0.014
Episode: 350/1000 Total reward: 27.0 iter : 27988 P : 0.014
Episode: 351/1000 Total reward: 21.0 iter : 28009 P : 0.014
Episode: 352/1000 Total reward: 26.0 iter : 28035 P : 0.014
Episode: 353/1000 Total reward: 18.0 iter : 28053 P : 0.014
Episode: 354/1000 Total reward: 21.0 iter : 28074 P : 0.014
Episode: 355/1000 Total reward: 16.0 iter : 28090 P : 0.014
Episode: 356/1000 Total reward: 31.0 iter : 28121 P : 0.014
Episode: 357/1000 Total reward: 14.0 iter : 28135 P : 0.014
Episode: 358/1000 Total reward: 17.0 iter : 28152 P : 0.014
Episode: 359/1000 Total reward: 21.0 iter : 28173 P : 0.014
Episode: 360/1000 Total reward: 19.0 iter : 28192 P : 0.014
Episode: 361/1000 Total reward: 16.0 iter : 28208 P : 0.014
Episode: 362/1000 Total reward: 25.0 iter : 28233 P : 0.013
Episode: 363/1000 Total reward: 24.0 iter : 28257 P : 0.013
Episode: 364/1000 Total reward: 16.0 iter : 28273 P : 0.013
Episode: 365/1000 Total reward: 27.0 iter : 28300 P : 0.013
Episode: 366/1000 Total reward: 21.0 iter : 28321 P : 0.013
Episode: 367/1000 Total reward: 19.0 iter : 28340 P : 0.013
Episode: 368/1000 Total reward: 13.0 iter : 28353 P : 0.013
Episode: 369/1000 Total reward: 31.0 iter : 28384 P : 0.013
Episode: 370/1000 Total reward: 20.0 iter : 28404 P : 0.013
Episode: 371/1000 Total reward: 17.0 iter : 28421 P : 0.013
Episode: 372/1000 Total reward: 24.0 iter : 28445 P : 0.013
Episode: 373/1000 Total reward: 19.0 iter : 28464 P : 0.013
Episode: 374/1000 Total reward: 23.0 iter : 28487 P : 0.013
Episode: 375/1000 Total reward: 16.0 iter : 28503 P : 0.013
Episode: 376/1000 Total reward: 15.0 iter : 28518 P : 0.013
Episode: 377/1000 Total reward: 25.0 iter : 28543 P : 0.013
Episode: 378/1000 Total reward: 16.0 iter : 28559 P : 0.013
Episode: 379/1000 Total reward: 15.0 iter : 28574 P : 0.013
Episode: 380/1000 Total reward: 26.0 iter : 28600 P : 0.013
Episode: 381/1000 Total reward: 20.0 iter : 28620 P : 0.013
Episode: 382/1000 Total reward: 21.0 iter : 28641 P : 0.013
Episode: 383/1000 Total reward: 18.0 iter : 28659 P : 0.013
Episode: 384/1000 Total reward: 16.0 iter : 28675 P : 0.013
Episode: 385/1000 Total reward: 11.0 iter : 28686 P : 0.013
Episode: 386/1000 Total reward: 22.0 iter : 28708 P : 0.013
Episode: 387/1000 Total reward: 26.0 iter : 28734 P : 0.013
Episode: 388/1000 Total reward: 13.0 iter : 28747 P : 0.013
Episode: 389/1000 Total reward: 20.0 iter : 28767 P : 0.013
Episode: 390/1000 Total reward: 22.0 iter : 28789 P : 0.013
Episode: 391/1000 Total reward: 15.0 iter : 28804 P : 0.013
Episode: 392/1000 Total reward: 17.0 iter : 28821 P : 0.013
Episode: 393/1000 Total reward: 23.0 iter : 28844 P : 0.013
Episode: 394/1000 Total reward: 20.0 iter : 28864 P : 0.013
Episode: 395/1000 Total reward: 20.0 iter : 28884 P : 0.013
Episode: 396/1000 Total reward: 10.0 iter : 28894 P : 0.013
Episode: 397/1000 Total reward: 14.0 iter : 28908 P : 0.013
Episode: 398/1000 Total reward: 13.0 iter : 28921 P : 0.013
Episode: 399/1000 Total reward: 20.0 iter : 28941 P : 0.013
Episode: 400/1000 Total reward: 16.0 iter : 28957 P : 0.013
Episode: 401/1000 Total reward: 20.0 iter : 28977 P : 0.013
Episode: 402/1000 Total reward: 12.0 iter : 28989 P : 0.013
Episode: 403/1000 Total reward: 18.0 iter : 29007 P : 0.013
Episode: 404/1000 Total reward: 10.0 iter : 29017 P : 0.013
Episode: 405/1000 Total reward: 14.0 iter : 29031 P : 0.013
Episode: 406/1000 Total reward: 16.0 iter : 29047 P : 0.013
Episode: 407/1000 Total reward: 13.0 iter : 29060 P : 0.013
Episode: 408/1000 Total reward: 14.0 iter : 29074 P : 0.013
Episode: 409/1000 Total reward: 18.0 iter : 29092 P : 0.013
Episode: 410/1000 Total reward: 13.0 iter : 29105 P : 0.013
Episode: 411/1000 Total reward: 16.0 iter : 29121 P : 0.013
Episode: 412/1000 Total reward: 15.0 iter : 29136 P : 0.013
Episode: 413/1000 Total reward: 13.0 iter : 29149 P : 0.013
Episode: 414/1000 Total reward: 15.0 iter : 29164 P : 0.013
Episode: 415/1000 Total reward: 11.0 iter : 29175 P : 0.013
Episode: 416/1000 Total reward: 13.0 iter : 29188 P : 0.013
Episode: 417/1000 Total reward: 10.0 iter : 29198 P : 0.013
Episode: 418/1000 Total reward: 12.0 iter : 29210 P : 0.013
Episode: 419/1000 Total reward: 12.0 iter : 29222 P : 0.013
Episode: 420/1000 Total reward: 22.0 iter : 29244 P : 0.013
Episode: 421/1000 Total reward: 13.0 iter : 29257 P : 0.013
Episode: 422/1000 Total reward: 11.0 iter : 29268 P : 0.013
Episode: 423/1000 Total reward: 18.0 iter : 29286 P : 0.013
Episode: 424/1000 Total reward: 16.0 iter : 29302 P : 0.013
Episode: 425/1000 Total reward: 14.0 iter : 29316 P : 0.013
Episode: 426/1000 Total reward: 14.0 iter : 29330 P : 0.013
Episode: 427/1000 Total reward: 16.0 iter : 29346 P : 0.013
Episode: 428/1000 Total reward: 8.0 iter : 29354 P : 0.013
Episode: 429/1000 Total reward: 13.0 iter : 29367 P : 0.013
Episode: 430/1000 Total reward: 11.0 iter : 29378 P : 0.013
Episode: 431/1000 Total reward: 17.0 iter : 29395 P : 0.013
Episode: 432/1000 Total reward: 12.0 iter : 29407 P : 0.013
Episode: 433/1000 Total reward: 21.0 iter : 29428 P : 0.013
Episode: 434/1000 Total reward: 14.0 iter : 29442 P : 0.013
Episode: 435/1000 Total reward: 16.0 iter : 29458 P : 0.013
Episode: 436/1000 Total reward: 21.0 iter : 29479 P : 0.013
Episode: 437/1000 Total reward: 11.0 iter : 29490 P : 0.013
Episode: 438/1000 Total reward: 10.0 iter : 29500 P : 0.013
Episode: 439/1000 Total reward: 16.0 iter : 29516 P : 0.013
Episode: 440/1000 Total reward: 13.0 iter : 29529 P : 0.013
Episode: 441/1000 Total reward: 17.0 iter : 29546 P : 0.013
Episode: 442/1000 Total reward: 17.0 iter : 29563 P : 0.013
Episode: 443/1000 Total reward: 36.0 iter : 29599 P : 0.013
Episode: 444/1000 Total reward: 26.0 iter : 29625 P : 0.013
Episode: 445/1000 Total reward: 20.0 iter : 29645 P : 0.013
Episode: 446/1000 Total reward: 43.0 iter : 29688 P : 0.013
Episode: 447/1000 Total reward: 22.0 iter : 29710 P : 0.013
Episode: 448/1000 Total reward: 30.0 iter : 29740 P : 0.013
Episode: 449/1000 Total reward: 36.0 iter : 29776 P : 0.013
Episode: 450/1000 Total reward: 26.0 iter : 29802 P : 0.013
Episode: 451/1000 Total reward: 22.0 iter : 29824 P : 0.013
Episode: 452/1000 Total reward: 21.0 iter : 29845 P : 0.013
Episode: 453/1000 Total reward: 21.0 iter : 29866 P : 0.013
Episode: 454/1000 Total reward: 25.0 iter : 29891 P : 0.013
Episode: 455/1000 Total reward: 26.0 iter : 29917 P : 0.012
Episode: 456/1000 Total reward: 17.0 iter : 29934 P : 0.012
Episode: 457/1000 Total reward: 24.0 iter : 29958 P : 0.012
Episode: 458/1000 Total reward: 25.0 iter : 29983 P : 0.012
Episode: 459/1000 Total reward: 18.0 iter : 30001 P : 0.012
Episode: 460/1000 Total reward: 24.0 iter : 30025 P : 0.012
Episode: 461/1000 Total reward: 16.0 iter : 30041 P : 0.012
Episode: 462/1000 Total reward: 12.0 iter : 30053 P : 0.012
Episode: 463/1000 Total reward: 16.0 iter : 30069 P : 0.012
Episode: 464/1000 Total reward: 29.0 iter : 30098 P : 0.012
Episode: 465/1000 Total reward: 13.0 iter : 30111 P : 0.012
Episode: 466/1000 Total reward: 27.0 iter : 30138 P : 0.012
Episode: 467/1000 Total reward: 12.0 iter : 30150 P : 0.012
Episode: 468/1000 Total reward: 20.0 iter : 30170 P : 0.012
Episode: 469/1000 Total reward: 21.0 iter : 30191 P : 0.012
Episode: 470/1000 Total reward: 29.0 iter : 30220 P : 0.012
Episode: 471/1000 Total reward: 26.0 iter : 30246 P : 0.012
Episode: 472/1000 Total reward: 29.0 iter : 30275 P : 0.012
Episode: 473/1000 Total reward: 19.0 iter : 30294 P : 0.012
Episode: 474/1000 Total reward: 12.0 iter : 30306 P : 0.012
Episode: 475/1000 Total reward: 26.0 iter : 30332 P : 0.012
Episode: 476/1000 Total reward: 28.0 iter : 30360 P : 0.012
Episode: 477/1000 Total reward: 20.0 iter : 30380 P : 0.012
Episode: 478/1000 Total reward: 26.0 iter : 30406 P : 0.012
Episode: 479/1000 Total reward: 32.0 iter : 30438 P : 0.012
Episode: 480/1000 Total reward: 15.0 iter : 30453 P : 0.012
Episode: 481/1000 Total reward: 34.0 iter : 30487 P : 0.012
Episode: 482/1000 Total reward: 17.0 iter : 30504 P : 0.012
Episode: 483/1000 Total reward: 37.0 iter : 30541 P : 0.012
Episode: 484/1000 Total reward: 26.0 iter : 30567 P : 0.012
Episode: 485/1000 Total reward: 27.0 iter : 30594 P : 0.012
Episode: 486/1000 Total reward: 17.0 iter : 30611 P : 0.012
Episode: 487/1000 Total reward: 26.0 iter : 30637 P : 0.012
Episode: 488/1000 Total reward: 84.0 iter : 30721 P : 0.012
Episode: 489/1000 Total reward: 200.0 iter : 30921 P : 0.012
Episode: 490/1000 Total reward: 65.0 iter : 30986 P : 0.012
Episode: 491/1000 Total reward: 10.0 iter : 30996 P : 0.012
Episode: 492/1000 Total reward: 17.0 iter : 31013 P : 0.012
Episode: 493/1000 Total reward: 134.0 iter : 31147 P : 0.012
Episode: 494/1000 Total reward: 19.0 iter : 31166 P : 0.012
Episode: 495/1000 Total reward: 13.0 iter : 31179 P : 0.012
Episode: 496/1000 Total reward: 11.0 iter : 31190 P : 0.012
Episode: 497/1000 Total reward: 55.0 iter : 31245 P : 0.012
Episode: 498/1000 Total reward: 79.0 iter : 31324 P : 0.012
Episode: 499/1000 Total reward: 50.0 iter : 31374 P : 0.012
Episode: 500/1000 Total reward: 200.0 iter : 31574 P : 0.012
Episode: 501/1000 Total reward: 200.0 iter : 31774 P : 0.012
Episode: 502/1000 Total reward: 200.0 iter : 31974 P : 0.012
Episode: 503/1000 Total reward: 200.0 iter : 32174 P : 0.012
Episode: 504/1000 Total reward: 58.0 iter : 32232 P : 0.012
Episode: 505/1000 Total reward: 44.0 iter : 32276 P : 0.012
Episode: 506/1000 Total reward: 200.0 iter : 32476 P : 0.011
Episode: 507/1000 Total reward: 200.0 iter : 32676 P : 0.011
Episode: 508/1000 Total reward: 200.0 iter : 32876 P : 0.011
Episode: 509/1000 Total reward: 200.0 iter : 33076 P : 0.011
Episode: 510/1000 Total reward: 200.0 iter : 33276 P : 0.011
Episode: 511/1000 Total reward: 200.0 iter : 33476 P : 0.011
Episode: 512/1000 Total reward: 200.0 iter : 33676 P : 0.011
Episode: 513/1000 Total reward: 200.0 iter : 33876 P : 0.011
Episode: 514/1000 Total reward: 200.0 iter : 34076 P : 0.011
Episode: 515/1000 Total reward: 200.0 iter : 34276 P : 0.011
Episode: 516/1000 Total reward: 12.0 iter : 34288 P : 0.011
Episode: 517/1000 Total reward: 53.0 iter : 34341 P : 0.011
Episode: 518/1000 Total reward: 63.0 iter : 34404 P : 0.011
Episode: 519/1000 Total reward: 200.0 iter : 34604 P : 0.011
Episode: 520/1000 Total reward: 200.0 iter : 34804 P : 0.011
Episode: 521/1000 Total reward: 200.0 iter : 35004 P : 0.011
Episode: 522/1000 Total reward: 200.0 iter : 35204 P : 0.011
Episode: 523/1000 Total reward: 200.0 iter : 35404 P : 0.011
Episode: 524/1000 Total reward: 200.0 iter : 35604 P : 0.011
Episode: 525/1000 Total reward: 140.0 iter : 35744 P : 0.011
Episode: 526/1000 Total reward: 10.0 iter : 35754 P : 0.011
Episode: 527/1000 Total reward: 11.0 iter : 35765 P : 0.011
Episode: 528/1000 Total reward: 13.0 iter : 35778 P : 0.011
Episode: 529/1000 Total reward: 171.0 iter : 35949 P : 0.011
Episode: 530/1000 Total reward: 43.0 iter : 35992 P : 0.011
Episode: 531/1000 Total reward: 41.0 iter : 36033 P : 0.011
Episode: 532/1000 Total reward: 41.0 iter : 36074 P : 0.011
Episode: 533/1000 Total reward: 63.0 iter : 36137 P : 0.011
Episode: 534/1000 Total reward: 46.0 iter : 36183 P : 0.011
Episode: 535/1000 Total reward: 40.0 iter : 36223 P : 0.011
Episode: 536/1000 Total reward: 55.0 iter : 36278 P : 0.011
Episode: 537/1000 Total reward: 44.0 iter : 36322 P : 0.011
Episode: 538/1000 Total reward: 44.0 iter : 36366 P : 0.011
Episode: 539/1000 Total reward: 42.0 iter : 36408 P : 0.011
Episode: 540/1000 Total reward: 28.0 iter : 36436 P : 0.011
Episode: 541/1000 Total reward: 44.0 iter : 36480 P : 0.011
Episode: 542/1000 Total reward: 36.0 iter : 36516 P : 0.011
Episode: 543/1000 Total reward: 28.0 iter : 36544 P : 0.011
Episode: 544/1000 Total reward: 32.0 iter : 36576 P : 0.011
Episode: 545/1000 Total reward: 46.0 iter : 36622 P : 0.011
Episode: 546/1000 Total reward: 30.0 iter : 36652 P : 0.011
Episode: 547/1000 Total reward: 22.0 iter : 36674 P : 0.011
Episode: 548/1000 Total reward: 24.0 iter : 36698 P : 0.011
Episode: 549/1000 Total reward: 40.0 iter : 36738 P : 0.011
Episode: 550/1000 Total reward: 25.0 iter : 36763 P : 0.011
Episode: 551/1000 Total reward: 27.0 iter : 36790 P : 0.011
Episode: 552/1000 Total reward: 15.0 iter : 36805 P : 0.011
Episode: 553/1000 Total reward: 13.0 iter : 36818 P : 0.011
Episode: 554/1000 Total reward: 20.0 iter : 36838 P : 0.011
Episode: 555/1000 Total reward: 22.0 iter : 36860 P : 0.011
Episode: 556/1000 Total reward: 18.0 iter : 36878 P : 0.011
Episode: 557/1000 Total reward: 20.0 iter : 36898 P : 0.011
Episode: 558/1000 Total reward: 22.0 iter : 36920 P : 0.011
Episode: 559/1000 Total reward: 15.0 iter : 36935 P : 0.011
Episode: 560/1000 Total reward: 20.0 iter : 36955 P : 0.011
Episode: 561/1000 Total reward: 21.0 iter : 36976 P : 0.011
Episode: 562/1000 Total reward: 15.0 iter : 36991 P : 0.011
Episode: 563/1000 Total reward: 15.0 iter : 37006 P : 0.011
Episode: 564/1000 Total reward: 14.0 iter : 37020 P : 0.011
Episode: 565/1000 Total reward: 17.0 iter : 37037 P : 0.011
Episode: 566/1000 Total reward: 22.0 iter : 37059 P : 0.011
Episode: 567/1000 Total reward: 13.0 iter : 37072 P : 0.011
Episode: 568/1000 Total reward: 18.0 iter : 37090 P : 0.011
Episode: 569/1000 Total reward: 16.0 iter : 37106 P : 0.011
Episode: 570/1000 Total reward: 21.0 iter : 37127 P : 0.011
Episode: 571/1000 Total reward: 22.0 iter : 37149 P : 0.011
Episode: 572/1000 Total reward: 20.0 iter : 37169 P : 0.011
Episode: 573/1000 Total reward: 23.0 iter : 37192 P : 0.011
Episode: 574/1000 Total reward: 25.0 iter : 37217 P : 0.011
Episode: 575/1000 Total reward: 12.0 iter : 37229 P : 0.011
Episode: 576/1000 Total reward: 34.0 iter : 37263 P : 0.011
Episode: 577/1000 Total reward: 21.0 iter : 37284 P : 0.011
Episode: 578/1000 Total reward: 15.0 iter : 37299 P : 0.011
Episode: 579/1000 Total reward: 22.0 iter : 37321 P : 0.011
Episode: 580/1000 Total reward: 25.0 iter : 37346 P : 0.011
Episode: 581/1000 Total reward: 18.0 iter : 37364 P : 0.011
Episode: 582/1000 Total reward: 16.0 iter : 37380 P : 0.011
Episode: 583/1000 Total reward: 32.0 iter : 37412 P : 0.011
Episode: 584/1000 Total reward: 15.0 iter : 37427 P : 0.011
Episode: 585/1000 Total reward: 10.0 iter : 37437 P : 0.011
Episode: 586/1000 Total reward: 31.0 iter : 37468 P : 0.011
Episode: 587/1000 Total reward: 22.0 iter : 37490 P : 0.011
Episode: 588/1000 Total reward: 18.0 iter : 37508 P : 0.011
Episode: 589/1000 Total reward: 16.0 iter : 37524 P : 0.011
Episode: 590/1000 Total reward: 32.0 iter : 37556 P : 0.011
Episode: 591/1000 Total reward: 11.0 iter : 37567 P : 0.011
Episode: 592/1000 Total reward: 27.0 iter : 37594 P : 0.011
Episode: 593/1000 Total reward: 23.0 iter : 37617 P : 0.011
Episode: 594/1000 Total reward: 23.0 iter : 37640 P : 0.011
Episode: 595/1000 Total reward: 21.0 iter : 37661 P : 0.011
Episode: 596/1000 Total reward: 16.0 iter : 37677 P : 0.011
Episode: 597/1000 Total reward: 35.0 iter : 37712 P : 0.011
Episode: 598/1000 Total reward: 18.0 iter : 37730 P : 0.011
Episode: 599/1000 Total reward: 10.0 iter : 37740 P : 0.011
Episode: 600/1000 Total reward: 13.0 iter : 37753 P : 0.011
Episode: 601/1000 Total reward: 26.0 iter : 37779 P : 0.011
Episode: 602/1000 Total reward: 24.0 iter : 37803 P : 0.011
Episode: 603/1000 Total reward: 19.0 iter : 37822 P : 0.011
Episode: 604/1000 Total reward: 25.0 iter : 37847 P : 0.011
Episode: 605/1000 Total reward: 18.0 iter : 37865 P : 0.011
Episode: 606/1000 Total reward: 21.0 iter : 37886 P : 0.011
Episode: 607/1000 Total reward: 36.0 iter : 37922 P : 0.011
Episode: 608/1000 Total reward: 21.0 iter : 37943 P : 0.011
Episode: 609/1000 Total reward: 17.0 iter : 37960 P : 0.010
Episode: 610/1000 Total reward: 15.0 iter : 37975 P : 0.010
Episode: 611/1000 Total reward: 13.0 iter : 37988 P : 0.010
Episode: 612/1000 Total reward: 17.0 iter : 38005 P : 0.010
Episode: 613/1000 Total reward: 31.0 iter : 38036 P : 0.010
Episode: 614/1000 Total reward: 26.0 iter : 38062 P : 0.010
Episode: 615/1000 Total reward: 34.0 iter : 38096 P : 0.010
Episode: 616/1000 Total reward: 20.0 iter : 38116 P : 0.010
Episode: 617/1000 Total reward: 26.0 iter : 38142 P : 0.010
Episode: 618/1000 Total reward: 18.0 iter : 38160 P : 0.010
Episode: 619/1000 Total reward: 14.0 iter : 38174 P : 0.010
Episode: 620/1000 Total reward: 20.0 iter : 38194 P : 0.010
Episode: 621/1000 Total reward: 34.0 iter : 38228 P : 0.010
Episode: 622/1000 Total reward: 31.0 iter : 38259 P : 0.010
Episode: 623/1000 Total reward: 35.0 iter : 38294 P : 0.010
Episode: 624/1000 Total reward: 19.0 iter : 38313 P : 0.010
Episode: 625/1000 Total reward: 15.0 iter : 38328 P : 0.010
Episode: 626/1000 Total reward: 30.0 iter : 38358 P : 0.010
Episode: 627/1000 Total reward: 25.0 iter : 38383 P : 0.010
Episode: 628/1000 Total reward: 24.0 iter : 38407 P : 0.010
Episode: 629/1000 Total reward: 23.0 iter : 38430 P : 0.010
Episode: 630/1000 Total reward: 43.0 iter : 38473 P : 0.010
Episode: 631/1000 Total reward: 19.0 iter : 38492 P : 0.010
Episode: 632/1000 Total reward: 46.0 iter : 38538 P : 0.010
Episode: 633/1000 Total reward: 23.0 iter : 38561 P : 0.010
Episode: 634/1000 Total reward: 29.0 iter : 38590 P : 0.010
Episode: 635/1000 Total reward: 35.0 iter : 38625 P : 0.010
Episode: 636/1000 Total reward: 18.0 iter : 38643 P : 0.010
Episode: 637/1000 Total reward: 64.0 iter : 38707 P : 0.010
Episode: 638/1000 Total reward: 42.0 iter : 38749 P : 0.010
Episode: 639/1000 Total reward: 99.0 iter : 38848 P : 0.010
Episode: 640/1000 Total reward: 65.0 iter : 38913 P : 0.010
Episode: 641/1000 Total reward: 200.0 iter : 39113 P : 0.010
Episode: 642/1000 Total reward: 200.0 iter : 39313 P : 0.010
Episode: 643/1000 Total reward: 200.0 iter : 39513 P : 0.010
Episode: 644/1000 Total reward: 200.0 iter : 39713 P : 0.010
Episode: 645/1000 Total reward: 200.0 iter : 39913 P : 0.010
Episode: 646/1000 Total reward: 200.0 iter : 40113 P : 0.010
Episode: 647/1000 Total reward: 200.0 iter : 40313 P : 0.010
Episode: 648/1000 Total reward: 200.0 iter : 40513 P : 0.010
Episode: 649/1000 Total reward: 200.0 iter : 40713 P : 0.010
Episode: 650/1000 Total reward: 200.0 iter : 40913 P : 0.010
Episode: 651/1000 Total reward: 200.0 iter : 41113 P : 0.010
Episode: 652/1000 Total reward: 200.0 iter : 41313 P : 0.010
Episode: 653/1000 Total reward: 200.0 iter : 41513 P : 0.010
Episode: 654/1000 Total reward: 200.0 iter : 41713 P : 0.010
Episode: 655/1000 Total reward: 17.0 iter : 41730 P : 0.010
Episode: 656/1000 Total reward: 200.0 iter : 41930 P : 0.010
Episode: 657/1000 Total reward: 200.0 iter : 42130 P : 0.010
Episode: 658/1000 Total reward: 200.0 iter : 42330 P : 0.010
Episode: 659/1000 Total reward: 200.0 iter : 42530 P : 0.010
Episode: 660/1000 Total reward: 200.0 iter : 42730 P : 0.010
Episode: 661/1000 Total reward: 200.0 iter : 42930 P : 0.010
Episode: 662/1000 Total reward: 173.0 iter : 43103 P : 0.010
Episode: 663/1000 Total reward: 200.0 iter : 43303 P : 0.010
Episode: 664/1000 Total reward: 200.0 iter : 43503 P : 0.010
Episode: 665/1000 Total reward: 200.0 iter : 43703 P : 0.010
Episode: 666/1000 Total reward: 200.0 iter : 43903 P : 0.010
Episode: 667/1000 Total reward: 200.0 iter : 44103 P : 0.010
Episode: 668/1000 Total reward: 200.0 iter : 44303 P : 0.010
Episode: 669/1000 Total reward: 200.0 iter : 44503 P : 0.010
Episode: 670/1000 Total reward: 200.0 iter : 44703 P : 0.010
Episode: 671/1000 Total reward: 200.0 iter : 44903 P : 0.010
Episode: 672/1000 Total reward: 200.0 iter : 45103 P : 0.010
Episode: 673/1000 Total reward: 200.0 iter : 45303 P : 0.010
Episode: 674/1000 Total reward: 200.0 iter : 45503 P : 0.010
Episode: 675/1000 Total reward: 200.0 iter : 45703 P : 0.010
Episode: 676/1000 Total reward: 200.0 iter : 45903 P : 0.010
Episode: 677/1000 Total reward: 200.0 iter : 46103 P : 0.010
Episode: 678/1000 Total reward: 15.0 iter : 46118 P : 0.010
Episode: 679/1000 Total reward: 200.0 iter : 46318 P : 0.010
Episode: 680/1000 Total reward: 200.0 iter : 46518 P : 0.010
Episode: 681/1000 Total reward: 200.0 iter : 46718 P : 0.010
Episode: 682/1000 Total reward: 48.0 iter : 46766 P : 0.010
Episode: 683/1000 Total reward: 200.0 iter : 46966 P : 0.010
Episode: 684/1000 Total reward: 17.0 iter : 46983 P : 0.010
Episode: 685/1000 Total reward: 200.0 iter : 47183 P : 0.010
Episode: 686/1000 Total reward: 47.0 iter : 47230 P : 0.010
Episode: 687/1000 Total reward: 200.0 iter : 47430 P : 0.010
Episode: 688/1000 Total reward: 18.0 iter : 47448 P : 0.010
Episode: 689/1000 Total reward: 200.0 iter : 47648 P : 0.010
Episode: 690/1000 Total reward: 200.0 iter : 47848 P : 0.010
Episode: 691/1000 Total reward: 200.0 iter : 48048 P : 0.010
Episode: 692/1000 Total reward: 200.0 iter : 48248 P : 0.010
Episode: 693/1000 Total reward: 200.0 iter : 48448 P : 0.010
Episode: 694/1000 Total reward: 200.0 iter : 48648 P : 0.010
Episode: 695/1000 Total reward: 200.0 iter : 48848 P : 0.010
Episode: 696/1000 Total reward: 200.0 iter : 49048 P : 0.010
Episode: 697/1000 Total reward: 200.0 iter : 49248 P : 0.010
Episode: 698/1000 Total reward: 200.0 iter : 49448 P : 0.010
Episode: 699/1000 Total reward: 200.0 iter : 49648 P : 0.010
Episode: 700/1000 Total reward: 200.0 iter : 49848 P : 0.010
Episode: 701/1000 Total reward: 200.0 iter : 50048 P : 0.010
Episode: 702/1000 Total reward: 200.0 iter : 50248 P : 0.010
Episode: 703/1000 Total reward: 200.0 iter : 50448 P : 0.010
Episode: 704/1000 Total reward: 200.0 iter : 50648 P : 0.010
Episode: 705/1000 Total reward: 200.0 iter : 50848 P : 0.010
Episode: 706/1000 Total reward: 200.0 iter : 51048 P : 0.010
Episode: 707/1000 Total reward: 200.0 iter : 51248 P : 0.010
Episode: 708/1000 Total reward: 200.0 iter : 51448 P : 0.010
Episode: 709/1000 Total reward: 200.0 iter : 51648 P : 0.010
Episode: 710/1000 Total reward: 200.0 iter : 51848 P : 0.010
Episode: 711/1000 Total reward: 200.0 iter : 52048 P : 0.010
Episode: 712/1000 Total reward: 200.0 iter : 52248 P : 0.010
Episode: 713/1000 Total reward: 181.0 iter : 52429 P : 0.010
Episode: 714/1000 Total reward: 102.0 iter : 52531 P : 0.010
Episode: 715/1000 Total reward: 82.0 iter : 52613 P : 0.010
Episode: 716/1000 Total reward: 58.0 iter : 52671 P : 0.010
Episode: 717/1000 Total reward: 46.0 iter : 52717 P : 0.010
Episode: 718/1000 Total reward: 71.0 iter : 52788 P : 0.010
Episode: 719/1000 Total reward: 62.0 iter : 52850 P : 0.010
Episode: 720/1000 Total reward: 47.0 iter : 52897 P : 0.010
Episode: 721/1000 Total reward: 46.0 iter : 52943 P : 0.010
Episode: 722/1000 Total reward: 59.0 iter : 53002 P : 0.010
Episode: 723/1000 Total reward: 57.0 iter : 53059 P : 0.010
Episode: 724/1000 Total reward: 48.0 iter : 53107 P : 0.010
Episode: 725/1000 Total reward: 43.0 iter : 53150 P : 0.010
Episode: 726/1000 Total reward: 63.0 iter : 53213 P : 0.010
Episode: 727/1000 Total reward: 71.0 iter : 53284 P : 0.010
Episode: 728/1000 Total reward: 60.0 iter : 53344 P : 0.010
Episode: 729/1000 Total reward: 98.0 iter : 53442 P : 0.010
Episode: 730/1000 Total reward: 200.0 iter : 53642 P : 0.010
Episode: 731/1000 Total reward: 67.0 iter : 53709 P : 0.010
Episode: 732/1000 Total reward: 11.0 iter : 53720 P : 0.010
Episode: 733/1000 Total reward: 44.0 iter : 53764 P : 0.010
Episode: 734/1000 Total reward: 10.0 iter : 53774 P : 0.010
Episode: 735/1000 Total reward: 8.0 iter : 53782 P : 0.010
Episode: 736/1000 Total reward: 200.0 iter : 53982 P : 0.010
Episode: 737/1000 Total reward: 200.0 iter : 54182 P : 0.010
Episode: 738/1000 Total reward: 200.0 iter : 54382 P : 0.010
Episode: 739/1000 Total reward: 200.0 iter : 54582 P : 0.010
Episode: 740/1000 Total reward: 200.0 iter : 54782 P : 0.010
Episode: 741/1000 Total reward: 200.0 iter : 54982 P : 0.010
Episode: 742/1000 Total reward: 200.0 iter : 55182 P : 0.010
Episode: 743/1000 Total reward: 200.0 iter : 55382 P : 0.010
Episode: 744/1000 Total reward: 200.0 iter : 55582 P : 0.010
Episode: 745/1000 Total reward: 200.0 iter : 55782 P : 0.010
Episode: 746/1000 Total reward: 200.0 iter : 55982 P : 0.010
Episode: 747/1000 Total reward: 200.0 iter : 56182 P : 0.010
Episode: 748/1000 Total reward: 200.0 iter : 56382 P : 0.010
Episode: 749/1000 Total reward: 200.0 iter : 56582 P : 0.010
Episode: 750/1000 Total reward: 200.0 iter : 56782 P : 0.010
Episode: 751/1000 Total reward: 200.0 iter : 56982 P : 0.010
Episode: 752/1000 Total reward: 200.0 iter : 57182 P : 0.010
Episode: 753/1000 Total reward: 200.0 iter : 57382 P : 0.010
Episode: 754/1000 Total reward: 200.0 iter : 57582 P : 0.010
Episode: 755/1000 Total reward: 200.0 iter : 57782 P : 0.010
Episode: 756/1000 Total reward: 200.0 iter : 57982 P : 0.010
Episode: 757/1000 Total reward: 200.0 iter : 58182 P : 0.010
Episode: 758/1000 Total reward: 200.0 iter : 58382 P : 0.010
Episode: 759/1000 Total reward: 200.0 iter : 58582 P : 0.010
Episode: 760/1000 Total reward: 200.0 iter : 58782 P : 0.010
Episode: 761/1000 Total reward: 200.0 iter : 58982 P : 0.010
Episode: 762/1000 Total reward: 200.0 iter : 59182 P : 0.010
Episode: 763/1000 Total reward: 200.0 iter : 59382 P : 0.010
Episode: 764/1000 Total reward: 200.0 iter : 59582 P : 0.010
Episode: 765/1000 Total reward: 200.0 iter : 59782 P : 0.010
Episode: 766/1000 Total reward: 200.0 iter : 59982 P : 0.010
Episode: 767/1000 Total reward: 200.0 iter : 60182 P : 0.010
Episode: 768/1000 Total reward: 200.0 iter : 60382 P : 0.010
Episode: 769/1000 Total reward: 200.0 iter : 60582 P : 0.010
Episode: 770/1000 Total reward: 200.0 iter : 60782 P : 0.010
Episode: 771/1000 Total reward: 200.0 iter : 60982 P : 0.010
Episode: 772/1000 Total reward: 11.0 iter : 60993 P : 0.010
Episode: 773/1000 Total reward: 200.0 iter : 61193 P : 0.010
Episode: 774/1000 Total reward: 200.0 iter : 61393 P : 0.010
Episode: 775/1000 Total reward: 200.0 iter : 61593 P : 0.010
Episode: 776/1000 Total reward: 200.0 iter : 61793 P : 0.010
Episode: 777/1000 Total reward: 200.0 iter : 61993 P : 0.010
Episode: 778/1000 Total reward: 200.0 iter : 62193 P : 0.010
Episode: 779/1000 Total reward: 200.0 iter : 62393 P : 0.010
Episode: 780/1000 Total reward: 200.0 iter : 62593 P : 0.010
Episode: 781/1000 Total reward: 200.0 iter : 62793 P : 0.010
Episode: 782/1000 Total reward: 200.0 iter : 62993 P : 0.010
Episode: 783/1000 Total reward: 200.0 iter : 63193 P : 0.010
Episode: 784/1000 Total reward: 200.0 iter : 63393 P : 0.010
Episode: 785/1000 Total reward: 200.0 iter : 63593 P : 0.010
Episode: 786/1000 Total reward: 200.0 iter : 63793 P : 0.010
Episode: 787/1000 Total reward: 200.0 iter : 63993 P : 0.010
Episode: 788/1000 Total reward: 200.0 iter : 64193 P : 0.010
Episode: 789/1000 Total reward: 200.0 iter : 64393 P : 0.010
Episode: 790/1000 Total reward: 10.0 iter : 64403 P : 0.010
Episode: 791/1000 Total reward: 200.0 iter : 64603 P : 0.010
Episode: 792/1000 Total reward: 200.0 iter : 64803 P : 0.010
Episode: 793/1000 Total reward: 200.0 iter : 65003 P : 0.010
Episode: 794/1000 Total reward: 200.0 iter : 65203 P : 0.010
Episode: 795/1000 Total reward: 200.0 iter : 65403 P : 0.010
Episode: 796/1000 Total reward: 200.0 iter : 65603 P : 0.010
Episode: 797/1000 Total reward: 200.0 iter : 65803 P : 0.010
Episode: 798/1000 Total reward: 200.0 iter : 66003 P : 0.010
Episode: 799/1000 Total reward: 200.0 iter : 66203 P : 0.010
Episode: 800/1000 Total reward: 13.0 iter : 66216 P : 0.010
Episode: 801/1000 Total reward: 200.0 iter : 66416 P : 0.010
Episode: 802/1000 Total reward: 12.0 iter : 66428 P : 0.010
Episode: 803/1000 Total reward: 200.0 iter : 66628 P : 0.010
Episode: 804/1000 Total reward: 200.0 iter : 66828 P : 0.010
Episode: 805/1000 Total reward: 200.0 iter : 67028 P : 0.010
Episode: 806/1000 Total reward: 200.0 iter : 67228 P : 0.010
Episode: 807/1000 Total reward: 200.0 iter : 67428 P : 0.010
Episode: 808/1000 Total reward: 200.0 iter : 67628 P : 0.010
Episode: 809/1000 Total reward: 12.0 iter : 67640 P : 0.010
Episode: 810/1000 Total reward: 200.0 iter : 67840 P : 0.010
Episode: 811/1000 Total reward: 179.0 iter : 68019 P : 0.010
Episode: 812/1000 Total reward: 50.0 iter : 68069 P : 0.010
Episode: 813/1000 Total reward: 20.0 iter : 68089 P : 0.010
Episode: 814/1000 Total reward: 46.0 iter : 68135 P : 0.010
Episode: 815/1000 Total reward: 47.0 iter : 68182 P : 0.010
Episode: 816/1000 Total reward: 46.0 iter : 68228 P : 0.010
Episode: 817/1000 Total reward: 38.0 iter : 68266 P : 0.010
Episode: 818/1000 Total reward: 42.0 iter : 68308 P : 0.010
Episode: 819/1000 Total reward: 46.0 iter : 68354 P : 0.010
Episode: 820/1000 Total reward: 48.0 iter : 68402 P : 0.010
Episode: 821/1000 Total reward: 53.0 iter : 68455 P : 0.010
Episode: 822/1000 Total reward: 55.0 iter : 68510 P : 0.010
Episode: 823/1000 Total reward: 76.0 iter : 68586 P : 0.010
Episode: 824/1000 Total reward: 71.0 iter : 68657 P : 0.010
Episode: 825/1000 Total reward: 75.0 iter : 68732 P : 0.010
Episode: 826/1000 Total reward: 70.0 iter : 68802 P : 0.010
Episode: 827/1000 Total reward: 59.0 iter : 68861 P : 0.010
Episode: 828/1000 Total reward: 88.0 iter : 68949 P : 0.010
Episode: 829/1000 Total reward: 110.0 iter : 69059 P : 0.010
Episode: 830/1000 Total reward: 200.0 iter : 69259 P : 0.010
Episode: 831/1000 Total reward: 56.0 iter : 69315 P : 0.010
Episode: 832/1000 Total reward: 200.0 iter : 69515 P : 0.010
Episode: 833/1000 Total reward: 200.0 iter : 69715 P : 0.010
Episode: 834/1000 Total reward: 85.0 iter : 69800 P : 0.010
Episode: 835/1000 Total reward: 68.0 iter : 69868 P : 0.010
Episode: 836/1000 Total reward: 200.0 iter : 70068 P : 0.010
Episode: 837/1000 Total reward: 200.0 iter : 70268 P : 0.010
Episode: 838/1000 Total reward: 200.0 iter : 70468 P : 0.010
Episode: 839/1000 Total reward: 200.0 iter : 70668 P : 0.010
Episode: 840/1000 Total reward: 200.0 iter : 70868 P : 0.010
Episode: 841/1000 Total reward: 200.0 iter : 71068 P : 0.010
Episode: 842/1000 Total reward: 200.0 iter : 71268 P : 0.010
Episode: 843/1000 Total reward: 200.0 iter : 71468 P : 0.010
Episode: 844/1000 Total reward: 200.0 iter : 71668 P : 0.010
Episode: 845/1000 Total reward: 200.0 iter : 71868 P : 0.010
Episode: 846/1000 Total reward: 200.0 iter : 72068 P : 0.010
Episode: 847/1000 Total reward: 200.0 iter : 72268 P : 0.010
Episode: 848/1000 Total reward: 200.0 iter : 72468 P : 0.010
Episode: 849/1000 Total reward: 200.0 iter : 72668 P : 0.010
Episode: 850/1000 Total reward: 200.0 iter : 72868 P : 0.010
Episode: 851/1000 Total reward: 200.0 iter : 73068 P : 0.010
Episode: 852/1000 Total reward: 200.0 iter : 73268 P : 0.010
Episode: 853/1000 Total reward: 200.0 iter : 73468 P : 0.010
Episode: 854/1000 Total reward: 200.0 iter : 73668 P : 0.010
Episode: 855/1000 Total reward: 200.0 iter : 73868 P : 0.010
Episode: 856/1000 Total reward: 200.0 iter : 74068 P : 0.010
Episode: 857/1000 Total reward: 200.0 iter : 74268 P : 0.010
Episode: 858/1000 Total reward: 200.0 iter : 74468 P : 0.010
Episode: 859/1000 Total reward: 200.0 iter : 74668 P : 0.010
Episode: 860/1000 Total reward: 200.0 iter : 74868 P : 0.010
Episode: 861/1000 Total reward: 200.0 iter : 75068 P : 0.010
Episode: 862/1000 Total reward: 200.0 iter : 75268 P : 0.010
Episode: 863/1000 Total reward: 200.0 iter : 75468 P : 0.010
Episode: 864/1000 Total reward: 200.0 iter : 75668 P : 0.010
Episode: 865/1000 Total reward: 200.0 iter : 75868 P : 0.010
Episode: 866/1000 Total reward: 200.0 iter : 76068 P : 0.010
Episode: 867/1000 Total reward: 200.0 iter : 76268 P : 0.010
Episode: 868/1000 Total reward: 10.0 iter : 76278 P : 0.010
Episode: 869/1000 Total reward: 10.0 iter : 76288 P : 0.010
Episode: 870/1000 Total reward: 10.0 iter : 76298 P : 0.010
Episode: 871/1000 Total reward: 200.0 iter : 76498 P : 0.010
Episode: 872/1000 Total reward: 200.0 iter : 76698 P : 0.010
Episode: 873/1000 Total reward: 200.0 iter : 76898 P : 0.010
Episode: 874/1000 Total reward: 200.0 iter : 77098 P : 0.010
Episode: 875/1000 Total reward: 200.0 iter : 77298 P : 0.010
Episode: 876/1000 Total reward: 200.0 iter : 77498 P : 0.010
Episode: 877/1000 Total reward: 200.0 iter : 77698 P : 0.010
Episode: 878/1000 Total reward: 200.0 iter : 77898 P : 0.010
Episode: 879/1000 Total reward: 200.0 iter : 78098 P : 0.010
Episode: 880/1000 Total reward: 200.0 iter : 78298 P : 0.010
Episode: 881/1000 Total reward: 200.0 iter : 78498 P : 0.010
Episode: 882/1000 Total reward: 200.0 iter : 78698 P : 0.010
Episode: 883/1000 Total reward: 200.0 iter : 78898 P : 0.010
Episode: 884/1000 Total reward: 200.0 iter : 79098 P : 0.010
Episode: 885/1000 Total reward: 200.0 iter : 79298 P : 0.010
Episode: 886/1000 Total reward: 192.0 iter : 79490 P : 0.010
Episode: 887/1000 Total reward: 147.0 iter : 79637 P : 0.010
Episode: 888/1000 Total reward: 200.0 iter : 79837 P : 0.010
Episode: 889/1000 Total reward: 200.0 iter : 80037 P : 0.010
Episode: 890/1000 Total reward: 200.0 iter : 80237 P : 0.010
Episode: 891/1000 Total reward: 200.0 iter : 80437 P : 0.010
Episode: 892/1000 Total reward: 200.0 iter : 80637 P : 0.010
Episode: 893/1000 Total reward: 200.0 iter : 80837 P : 0.010
Episode: 894/1000 Total reward: 200.0 iter : 81037 P : 0.010
Episode: 895/1000 Total reward: 200.0 iter : 81237 P : 0.010
Episode: 896/1000 Total reward: 200.0 iter : 81437 P : 0.010
Episode: 897/1000 Total reward: 200.0 iter : 81637 P : 0.010
Episode: 898/1000 Total reward: 200.0 iter : 81837 P : 0.010
Episode: 899/1000 Total reward: 200.0 iter : 82037 P : 0.010
Episode: 900/1000 Total reward: 110.0 iter : 82147 P : 0.010
Episode: 901/1000 Total reward: 96.0 iter : 82243 P : 0.010
Episode: 902/1000 Total reward: 200.0 iter : 82443 P : 0.010
Episode: 903/1000 Total reward: 9.0 iter : 82452 P : 0.010
Episode: 904/1000 Total reward: 8.0 iter : 82460 P : 0.010
Episode: 905/1000 Total reward: 11.0 iter : 82471 P : 0.010
Episode: 906/1000 Total reward: 81.0 iter : 82552 P : 0.010
Episode: 907/1000 Total reward: 197.0 iter : 82749 P : 0.010
Episode: 908/1000 Total reward: 200.0 iter : 82949 P : 0.010
Episode: 909/1000 Total reward: 200.0 iter : 83149 P : 0.010
Episode: 910/1000 Total reward: 200.0 iter : 83349 P : 0.010
Episode: 911/1000 Total reward: 200.0 iter : 83549 P : 0.010
Episode: 912/1000 Total reward: 200.0 iter : 83749 P : 0.010
Episode: 913/1000 Total reward: 200.0 iter : 83949 P : 0.010
Episode: 914/1000 Total reward: 200.0 iter : 84149 P : 0.010
Episode: 915/1000 Total reward: 200.0 iter : 84349 P : 0.010
Episode: 916/1000 Total reward: 200.0 iter : 84549 P : 0.010
Episode: 917/1000 Total reward: 200.0 iter : 84749 P : 0.010
Episode: 918/1000 Total reward: 200.0 iter : 84949 P : 0.010
Episode: 919/1000 Total reward: 200.0 iter : 85149 P : 0.010
Episode: 920/1000 Total reward: 200.0 iter : 85349 P : 0.010
Episode: 921/1000 Total reward: 200.0 iter : 85549 P : 0.010
Episode: 922/1000 Total reward: 200.0 iter : 85749 P : 0.010
Episode: 923/1000 Total reward: 200.0 iter : 85949 P : 0.010
Episode: 924/1000 Total reward: 200.0 iter : 86149 P : 0.010
Episode: 925/1000 Total reward: 200.0 iter : 86349 P : 0.010
Episode: 926/1000 Total reward: 200.0 iter : 86549 P : 0.010
Episode: 927/1000 Total reward: 200.0 iter : 86749 P : 0.010
Episode: 928/1000 Total reward: 200.0 iter : 86949 P : 0.010
Episode: 929/1000 Total reward: 200.0 iter : 87149 P : 0.010
Episode: 930/1000 Total reward: 200.0 iter : 87349 P : 0.010
Episode: 931/1000 Total reward: 200.0 iter : 87549 P : 0.010
Episode: 932/1000 Total reward: 200.0 iter : 87749 P : 0.010
Episode: 933/1000 Total reward: 200.0 iter : 87949 P : 0.010
Episode: 934/1000 Total reward: 200.0 iter : 88149 P : 0.010
Episode: 935/1000 Total reward: 200.0 iter : 88349 P : 0.010
Episode: 936/1000 Total reward: 200.0 iter : 88549 P : 0.010
Episode: 937/1000 Total reward: 200.0 iter : 88749 P : 0.010
Episode: 938/1000 Total reward: 200.0 iter : 88949 P : 0.010
Episode: 939/1000 Total reward: 200.0 iter : 89149 P : 0.010
Episode: 940/1000 Total reward: 200.0 iter : 89349 P : 0.010
Episode: 941/1000 Total reward: 200.0 iter : 89549 P : 0.010
Episode: 942/1000 Total reward: 200.0 iter : 89749 P : 0.010
Episode: 943/1000 Total reward: 200.0 iter : 89949 P : 0.010
Episode: 944/1000 Total reward: 200.0 iter : 90149 P : 0.010
Episode: 945/1000 Total reward: 200.0 iter : 90349 P : 0.010
Episode: 946/1000 Total reward: 200.0 iter : 90549 P : 0.010
Episode: 947/1000 Total reward: 200.0 iter : 90749 P : 0.010
Episode: 948/1000 Total reward: 200.0 iter : 90949 P : 0.010
Episode: 949/1000 Total reward: 200.0 iter : 91149 P : 0.010
Episode: 950/1000 Total reward: 200.0 iter : 91349 P : 0.010
Episode: 951/1000 Total reward: 200.0 iter : 91549 P : 0.010
Episode: 952/1000 Total reward: 200.0 iter : 91749 P : 0.010
Episode: 953/1000 Total reward: 131.0 iter : 91880 P : 0.010
Episode: 954/1000 Total reward: 8.0 iter : 91888 P : 0.010
Episode: 955/1000 Total reward: 10.0 iter : 91898 P : 0.010
Episode: 956/1000 Total reward: 69.0 iter : 91967 P : 0.010
Episode: 957/1000 Total reward: 109.0 iter : 92076 P : 0.010
Episode: 958/1000 Total reward: 140.0 iter : 92216 P : 0.010
Episode: 959/1000 Total reward: 164.0 iter : 92380 P : 0.010
Episode: 960/1000 Total reward: 160.0 iter : 92540 P : 0.010
Episode: 961/1000 Total reward: 200.0 iter : 92740 P : 0.010
Episode: 962/1000 Total reward: 64.0 iter : 92804 P : 0.010
Episode: 963/1000 Total reward: 76.0 iter : 92880 P : 0.010
Episode: 964/1000 Total reward: 150.0 iter : 93030 P : 0.010
Episode: 965/1000 Total reward: 200.0 iter : 93230 P : 0.010
Episode: 966/1000 Total reward: 116.0 iter : 93346 P : 0.010
Episode: 967/1000 Total reward: 200.0 iter : 93546 P : 0.010
Episode: 968/1000 Total reward: 200.0 iter : 93746 P : 0.010
Episode: 969/1000 Total reward: 200.0 iter : 93946 P : 0.010
Episode: 970/1000 Total reward: 200.0 iter : 94146 P : 0.010
Episode: 971/1000 Total reward: 200.0 iter : 94346 P : 0.010
Episode: 972/1000 Total reward: 200.0 iter : 94546 P : 0.010
Episode: 973/1000 Total reward: 184.0 iter : 94730 P : 0.010
Episode: 974/1000 Total reward: 146.0 iter : 94876 P : 0.010
Episode: 975/1000 Total reward: 200.0 iter : 95076 P : 0.010
Episode: 976/1000 Total reward: 200.0 iter : 95276 P : 0.010
Episode: 977/1000 Total reward: 200.0 iter : 95476 P : 0.010
Episode: 978/1000 Total reward: 200.0 iter : 95676 P : 0.010
Episode: 979/1000 Total reward: 145.0 iter : 95821 P : 0.010
Episode: 980/1000 Total reward: 200.0 iter : 96021 P : 0.010
Episode: 981/1000 Total reward: 144.0 iter : 96165 P : 0.010
Episode: 982/1000 Total reward: 200.0 iter : 96365 P : 0.010
Episode: 983/1000 Total reward: 163.0 iter : 96528 P : 0.010
Episode: 984/1000 Total reward: 200.0 iter : 96728 P : 0.010
Episode: 985/1000 Total reward: 200.0 iter : 96928 P : 0.010
Episode: 986/1000 Total reward: 200.0 iter : 97128 P : 0.010
Episode: 987/1000 Total reward: 200.0 iter : 97328 P : 0.010
Episode: 988/1000 Total reward: 200.0 iter : 97528 P : 0.010
Episode: 989/1000 Total reward: 171.0 iter : 97699 P : 0.010
Episode: 990/1000 Total reward: 176.0 iter : 97875 P : 0.010
Episode: 991/1000 Total reward: 200.0 iter : 98075 P : 0.010
Episode: 992/1000 Total reward: 200.0 iter : 98275 P : 0.010
Episode: 993/1000 Total reward: 200.0 iter : 98475 P : 0.010
Episode: 994/1000 Total reward: 14.0 iter : 98489 P : 0.010
Episode: 995/1000 Total reward: 11.0 iter : 98500 P : 0.010
Episode: 996/1000 Total reward: 71.0 iter : 98571 P : 0.010
Episode: 997/1000 Total reward: 11.0 iter : 98582 P : 0.010
Episode: 998/1000 Total reward: 10.0 iter : 98592 P : 0.010
Episode: 999/1000 Total reward: 12.0 iter : 98604 P : 0.010
In [8]:
DQN = QNetwork(learning_rate = 0.001, use_replay_memory=False, reshape_reward=True)
DQN.train(train_episodes_ovr=1000)
DQN.save_stats("DQN_without_memory_dense_reward.p")
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 16)                80        
_________________________________________________________________
dense_2 (Dense)              (None, 16)                272       
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 34        
=================================================================
Total params: 386
Trainable params: 386
Non-trainable params: 0
_________________________________________________________________
Episode: 0/1000 Total reward: 12.0 iter : 12 P : 0.998
Episode: 1/1000 Total reward: 29.0 iter : 41 P : 0.992
Episode: 2/1000 Total reward: 46.0 iter : 87 P : 0.983
Episode: 3/1000 Total reward: 23.0 iter : 110 P : 0.978
Episode: 4/1000 Total reward: 30.0 iter : 140 P : 0.973
Episode: 5/1000 Total reward: 41.0 iter : 181 P : 0.965
Episode: 6/1000 Total reward: 27.0 iter : 208 P : 0.960
Episode: 7/1000 Total reward: 42.0 iter : 250 P : 0.952
Episode: 8/1000 Total reward: 27.0 iter : 277 P : 0.947
Episode: 9/1000 Total reward: 50.0 iter : 327 P : 0.937
Episode: 10/1000 Total reward: 19.0 iter : 346 P : 0.934
Episode: 11/1000 Total reward: 17.0 iter : 363 P : 0.931
Episode: 12/1000 Total reward: 23.0 iter : 386 P : 0.926
Episode: 13/1000 Total reward: 21.0 iter : 407 P : 0.923
Episode: 14/1000 Total reward: 15.0 iter : 422 P : 0.920
Episode: 15/1000 Total reward: 18.0 iter : 440 P : 0.917
Episode: 16/1000 Total reward: 17.0 iter : 457 P : 0.914
Episode: 17/1000 Total reward: 11.0 iter : 468 P : 0.912
Episode: 18/1000 Total reward: 40.0 iter : 508 P : 0.904
Episode: 19/1000 Total reward: 28.0 iter : 536 P : 0.899
Episode: 20/1000 Total reward: 14.0 iter : 550 P : 0.897
Episode: 21/1000 Total reward: 17.0 iter : 567 P : 0.894
Episode: 22/1000 Total reward: 11.0 iter : 578 P : 0.892
Episode: 23/1000 Total reward: 17.0 iter : 595 P : 0.889
Episode: 24/1000 Total reward: 16.0 iter : 611 P : 0.886
Episode: 25/1000 Total reward: 11.0 iter : 622 P : 0.884
Episode: 26/1000 Total reward: 16.0 iter : 638 P : 0.881
Episode: 27/1000 Total reward: 26.0 iter : 664 P : 0.877
Episode: 28/1000 Total reward: 14.0 iter : 678 P : 0.874
Episode: 29/1000 Total reward: 25.0 iter : 703 P : 0.870
Episode: 30/1000 Total reward: 40.0 iter : 743 P : 0.863
Episode: 31/1000 Total reward: 20.0 iter : 763 P : 0.860
Episode: 32/1000 Total reward: 19.0 iter : 782 P : 0.857
Episode: 33/1000 Total reward: 12.0 iter : 794 P : 0.855
Episode: 34/1000 Total reward: 14.0 iter : 808 P : 0.852
Episode: 35/1000 Total reward: 10.0 iter : 818 P : 0.851
Episode: 36/1000 Total reward: 14.0 iter : 832 P : 0.848
Episode: 37/1000 Total reward: 39.0 iter : 871 P : 0.842
Episode: 38/1000 Total reward: 22.0 iter : 893 P : 0.838
Episode: 39/1000 Total reward: 13.0 iter : 906 P : 0.836
Episode: 40/1000 Total reward: 22.0 iter : 928 P : 0.832
Episode: 41/1000 Total reward: 11.0 iter : 939 P : 0.830
Episode: 42/1000 Total reward: 16.0 iter : 955 P : 0.828
Episode: 43/1000 Total reward: 11.0 iter : 966 P : 0.826
Episode: 44/1000 Total reward: 12.0 iter : 978 P : 0.824
Episode: 45/1000 Total reward: 18.0 iter : 996 P : 0.821
Episode: 46/1000 Total reward: 19.0 iter : 1015 P : 0.818
Episode: 47/1000 Total reward: 19.0 iter : 1034 P : 0.815
Episode: 48/1000 Total reward: 11.0 iter : 1045 P : 0.813
Episode: 49/1000 Total reward: 33.0 iter : 1078 P : 0.808
Episode: 50/1000 Total reward: 18.0 iter : 1096 P : 0.805
Episode: 51/1000 Total reward: 20.0 iter : 1116 P : 0.802
Episode: 52/1000 Total reward: 15.0 iter : 1131 P : 0.800
Episode: 53/1000 Total reward: 15.0 iter : 1146 P : 0.797
Episode: 54/1000 Total reward: 13.0 iter : 1159 P : 0.795
Episode: 55/1000 Total reward: 17.0 iter : 1176 P : 0.793
Episode: 56/1000 Total reward: 20.0 iter : 1196 P : 0.789
Episode: 57/1000 Total reward: 13.0 iter : 1209 P : 0.787
Episode: 58/1000 Total reward: 13.0 iter : 1222 P : 0.785
Episode: 59/1000 Total reward: 22.0 iter : 1244 P : 0.782
Episode: 60/1000 Total reward: 25.0 iter : 1269 P : 0.778
Episode: 61/1000 Total reward: 38.0 iter : 1307 P : 0.772
Episode: 62/1000 Total reward: 15.0 iter : 1322 P : 0.770
Episode: 63/1000 Total reward: 20.0 iter : 1342 P : 0.767
Episode: 64/1000 Total reward: 13.0 iter : 1355 P : 0.765
Episode: 65/1000 Total reward: 35.0 iter : 1390 P : 0.760
Episode: 66/1000 Total reward: 17.0 iter : 1407 P : 0.757
Episode: 67/1000 Total reward: 47.0 iter : 1454 P : 0.750
Episode: 68/1000 Total reward: 18.0 iter : 1472 P : 0.748
Episode: 69/1000 Total reward: 14.0 iter : 1486 P : 0.745
Episode: 70/1000 Total reward: 22.0 iter : 1508 P : 0.742
Episode: 71/1000 Total reward: 32.0 iter : 1540 P : 0.738
Episode: 72/1000 Total reward: 20.0 iter : 1560 P : 0.735
Episode: 73/1000 Total reward: 47.0 iter : 1607 P : 0.728
Episode: 74/1000 Total reward: 50.0 iter : 1657 P : 0.721
Episode: 75/1000 Total reward: 43.0 iter : 1700 P : 0.715
Episode: 76/1000 Total reward: 26.0 iter : 1726 P : 0.711
Episode: 77/1000 Total reward: 50.0 iter : 1776 P : 0.704
Episode: 78/1000 Total reward: 34.0 iter : 1810 P : 0.699
Episode: 79/1000 Total reward: 75.0 iter : 1885 P : 0.689
Episode: 80/1000 Total reward: 23.0 iter : 1908 P : 0.686
Episode: 81/1000 Total reward: 42.0 iter : 1950 P : 0.680
Episode: 82/1000 Total reward: 43.0 iter : 1993 P : 0.675
Episode: 83/1000 Total reward: 76.0 iter : 2069 P : 0.665
Episode: 84/1000 Total reward: 18.0 iter : 2087 P : 0.662
Episode: 85/1000 Total reward: 23.0 iter : 2110 P : 0.659
Episode: 86/1000 Total reward: 31.0 iter : 2141 P : 0.655
Episode: 87/1000 Total reward: 14.0 iter : 2155 P : 0.653
Episode: 88/1000 Total reward: 14.0 iter : 2169 P : 0.652
Episode: 89/1000 Total reward: 54.0 iter : 2223 P : 0.645
Episode: 90/1000 Total reward: 35.0 iter : 2258 P : 0.640
Episode: 91/1000 Total reward: 106.0 iter : 2364 P : 0.627
Episode: 92/1000 Total reward: 51.0 iter : 2415 P : 0.621
Episode: 93/1000 Total reward: 13.0 iter : 2428 P : 0.619
Episode: 94/1000 Total reward: 13.0 iter : 2441 P : 0.618
Episode: 95/1000 Total reward: 70.0 iter : 2511 P : 0.609
Episode: 96/1000 Total reward: 31.0 iter : 2542 P : 0.605
Episode: 97/1000 Total reward: 41.0 iter : 2583 P : 0.601
Episode: 98/1000 Total reward: 40.0 iter : 2623 P : 0.596
Episode: 99/1000 Total reward: 36.0 iter : 2659 P : 0.592
Episode: 100/1000 Total reward: 40.0 iter : 2699 P : 0.587
Episode: 101/1000 Total reward: 39.0 iter : 2738 P : 0.583
Episode: 102/1000 Total reward: 105.0 iter : 2843 P : 0.571
Episode: 103/1000 Total reward: 43.0 iter : 2886 P : 0.566
Episode: 104/1000 Total reward: 31.0 iter : 2917 P : 0.562
Episode: 105/1000 Total reward: 24.0 iter : 2941 P : 0.560
Episode: 106/1000 Total reward: 42.0 iter : 2983 P : 0.555
Episode: 107/1000 Total reward: 12.0 iter : 2995 P : 0.554
Episode: 108/1000 Total reward: 55.0 iter : 3050 P : 0.548
Episode: 109/1000 Total reward: 73.0 iter : 3123 P : 0.540
Episode: 110/1000 Total reward: 30.0 iter : 3153 P : 0.537
Episode: 111/1000 Total reward: 66.0 iter : 3219 P : 0.530
Episode: 112/1000 Total reward: 43.0 iter : 3262 P : 0.526
Episode: 113/1000 Total reward: 53.0 iter : 3315 P : 0.520
Episode: 114/1000 Total reward: 126.0 iter : 3441 P : 0.507
Episode: 115/1000 Total reward: 62.0 iter : 3503 P : 0.501
Episode: 116/1000 Total reward: 32.0 iter : 3535 P : 0.498
Episode: 117/1000 Total reward: 16.0 iter : 3551 P : 0.497
Episode: 118/1000 Total reward: 24.0 iter : 3575 P : 0.494
Episode: 119/1000 Total reward: 31.0 iter : 3606 P : 0.491
Episode: 120/1000 Total reward: 200.0 iter : 3806 P : 0.472
Episode: 121/1000 Total reward: 16.0 iter : 3822 P : 0.471
Episode: 122/1000 Total reward: 19.0 iter : 3841 P : 0.469
Episode: 123/1000 Total reward: 38.0 iter : 3879 P : 0.466
Episode: 124/1000 Total reward: 14.0 iter : 3893 P : 0.464
Episode: 125/1000 Total reward: 10.0 iter : 3903 P : 0.464
Episode: 126/1000 Total reward: 180.0 iter : 4083 P : 0.448
Episode: 127/1000 Total reward: 62.0 iter : 4145 P : 0.442
Episode: 128/1000 Total reward: 99.0 iter : 4244 P : 0.434
Episode: 129/1000 Total reward: 100.0 iter : 4344 P : 0.425
Episode: 130/1000 Total reward: 19.0 iter : 4363 P : 0.424
Episode: 131/1000 Total reward: 13.0 iter : 4376 P : 0.423
Episode: 132/1000 Total reward: 32.0 iter : 4408 P : 0.420
Episode: 133/1000 Total reward: 46.0 iter : 4454 P : 0.416
Episode: 134/1000 Total reward: 70.0 iter : 4524 P : 0.411
Episode: 135/1000 Total reward: 103.0 iter : 4627 P : 0.402
Episode: 136/1000 Total reward: 62.0 iter : 4689 P : 0.398
Episode: 137/1000 Total reward: 43.0 iter : 4732 P : 0.394
Episode: 138/1000 Total reward: 41.0 iter : 4773 P : 0.391
Episode: 139/1000 Total reward: 76.0 iter : 4849 P : 0.385
Episode: 140/1000 Total reward: 189.0 iter : 5038 P : 0.371
Episode: 141/1000 Total reward: 62.0 iter : 5100 P : 0.367
Episode: 142/1000 Total reward: 138.0 iter : 5238 P : 0.357
Episode: 143/1000 Total reward: 68.0 iter : 5306 P : 0.353
Episode: 144/1000 Total reward: 23.0 iter : 5329 P : 0.351
Episode: 145/1000 Total reward: 47.0 iter : 5376 P : 0.348
Episode: 146/1000 Total reward: 92.0 iter : 5468 P : 0.342
Episode: 147/1000 Total reward: 39.0 iter : 5507 P : 0.339
Episode: 148/1000 Total reward: 22.0 iter : 5529 P : 0.338
Episode: 149/1000 Total reward: 37.0 iter : 5566 P : 0.335
Episode: 150/1000 Total reward: 169.0 iter : 5735 P : 0.324
Episode: 151/1000 Total reward: 74.0 iter : 5809 P : 0.320
Episode: 152/1000 Total reward: 85.0 iter : 5894 P : 0.315
Episode: 153/1000 Total reward: 154.0 iter : 6048 P : 0.305
Episode: 154/1000 Total reward: 73.0 iter : 6121 P : 0.301
Episode: 155/1000 Total reward: 33.0 iter : 6154 P : 0.299
Episode: 156/1000 Total reward: 59.0 iter : 6213 P : 0.296
Episode: 157/1000 Total reward: 11.0 iter : 6224 P : 0.295
Episode: 158/1000 Total reward: 52.0 iter : 6276 P : 0.292
Episode: 159/1000 Total reward: 11.0 iter : 6287 P : 0.292
Episode: 160/1000 Total reward: 18.0 iter : 6305 P : 0.291
Episode: 161/1000 Total reward: 58.0 iter : 6363 P : 0.287
Episode: 162/1000 Total reward: 20.0 iter : 6383 P : 0.286
Episode: 163/1000 Total reward: 19.0 iter : 6402 P : 0.285
Episode: 164/1000 Total reward: 90.0 iter : 6492 P : 0.280
Episode: 165/1000 Total reward: 92.0 iter : 6584 P : 0.275
Episode: 166/1000 Total reward: 44.0 iter : 6628 P : 0.273
Episode: 167/1000 Total reward: 52.0 iter : 6680 P : 0.270
Episode: 168/1000 Total reward: 87.0 iter : 6767 P : 0.266
Episode: 169/1000 Total reward: 61.0 iter : 6828 P : 0.263
Episode: 170/1000 Total reward: 77.0 iter : 6905 P : 0.259
Episode: 171/1000 Total reward: 96.0 iter : 7001 P : 0.254
Episode: 172/1000 Total reward: 96.0 iter : 7097 P : 0.249
Episode: 173/1000 Total reward: 143.0 iter : 7240 P : 0.243
Episode: 174/1000 Total reward: 36.0 iter : 7276 P : 0.241
Episode: 175/1000 Total reward: 131.0 iter : 7407 P : 0.235
Episode: 176/1000 Total reward: 176.0 iter : 7583 P : 0.227
Episode: 177/1000 Total reward: 29.0 iter : 7612 P : 0.226
Episode: 178/1000 Total reward: 92.0 iter : 7704 P : 0.222
Episode: 179/1000 Total reward: 84.0 iter : 7788 P : 0.219
Episode: 180/1000 Total reward: 135.0 iter : 7923 P : 0.213
Episode: 181/1000 Total reward: 26.0 iter : 7949 P : 0.212
Episode: 182/1000 Total reward: 16.0 iter : 7965 P : 0.211
Episode: 183/1000 Total reward: 22.0 iter : 7987 P : 0.210
Episode: 184/1000 Total reward: 58.0 iter : 8045 P : 0.208
Episode: 185/1000 Total reward: 124.0 iter : 8169 P : 0.203
Episode: 186/1000 Total reward: 112.0 iter : 8281 P : 0.199
Episode: 187/1000 Total reward: 120.0 iter : 8401 P : 0.194
Episode: 188/1000 Total reward: 196.0 iter : 8597 P : 0.187
Episode: 189/1000 Total reward: 77.0 iter : 8674 P : 0.185
Episode: 190/1000 Total reward: 109.0 iter : 8783 P : 0.181
Episode: 191/1000 Total reward: 200.0 iter : 8983 P : 0.174
Episode: 192/1000 Total reward: 70.0 iter : 9053 P : 0.172
Episode: 193/1000 Total reward: 90.0 iter : 9143 P : 0.169
Episode: 194/1000 Total reward: 97.0 iter : 9240 P : 0.166
Episode: 195/1000 Total reward: 95.0 iter : 9335 P : 0.163
Episode: 196/1000 Total reward: 163.0 iter : 9498 P : 0.158
Episode: 197/1000 Total reward: 113.0 iter : 9611 P : 0.155
Episode: 198/1000 Total reward: 76.0 iter : 9687 P : 0.153
Episode: 199/1000 Total reward: 130.0 iter : 9817 P : 0.149
Episode: 200/1000 Total reward: 89.0 iter : 9906 P : 0.147
Episode: 201/1000 Total reward: 99.0 iter : 10005 P : 0.144
Episode: 202/1000 Total reward: 105.0 iter : 10110 P : 0.141
Episode: 203/1000 Total reward: 108.0 iter : 10218 P : 0.138
Episode: 204/1000 Total reward: 120.0 iter : 10338 P : 0.135
Episode: 205/1000 Total reward: 110.0 iter : 10448 P : 0.132
Episode: 206/1000 Total reward: 200.0 iter : 10648 P : 0.128
Episode: 207/1000 Total reward: 22.0 iter : 10670 P : 0.127
Episode: 208/1000 Total reward: 46.0 iter : 10716 P : 0.126
Episode: 209/1000 Total reward: 11.0 iter : 10727 P : 0.126
Episode: 210/1000 Total reward: 12.0 iter : 10739 P : 0.126
Episode: 211/1000 Total reward: 13.0 iter : 10752 P : 0.125
Episode: 212/1000 Total reward: 18.0 iter : 10770 P : 0.125
Episode: 213/1000 Total reward: 25.0 iter : 10795 P : 0.124
Episode: 214/1000 Total reward: 127.0 iter : 10922 P : 0.121
Episode: 215/1000 Total reward: 85.0 iter : 11007 P : 0.120
Episode: 216/1000 Total reward: 105.0 iter : 11112 P : 0.117
Episode: 217/1000 Total reward: 86.0 iter : 11198 P : 0.115
Episode: 218/1000 Total reward: 102.0 iter : 11300 P : 0.113
Episode: 219/1000 Total reward: 114.0 iter : 11414 P : 0.111
Episode: 220/1000 Total reward: 87.0 iter : 11501 P : 0.109
Episode: 221/1000 Total reward: 99.0 iter : 11600 P : 0.107
Episode: 222/1000 Total reward: 85.0 iter : 11685 P : 0.106
Episode: 223/1000 Total reward: 137.0 iter : 11822 P : 0.103
Episode: 224/1000 Total reward: 114.0 iter : 11936 P : 0.101
Episode: 225/1000 Total reward: 129.0 iter : 12065 P : 0.099
Episode: 226/1000 Total reward: 110.0 iter : 12175 P : 0.097
Episode: 227/1000 Total reward: 200.0 iter : 12375 P : 0.093
Episode: 228/1000 Total reward: 160.0 iter : 12535 P : 0.091
Episode: 229/1000 Total reward: 200.0 iter : 12735 P : 0.088
Episode: 230/1000 Total reward: 200.0 iter : 12935 P : 0.084
Episode: 231/1000 Total reward: 200.0 iter : 13135 P : 0.082
Episode: 232/1000 Total reward: 89.0 iter : 13224 P : 0.080
Episode: 233/1000 Total reward: 52.0 iter : 13276 P : 0.080
Episode: 234/1000 Total reward: 9.0 iter : 13285 P : 0.079
Episode: 235/1000 Total reward: 8.0 iter : 13293 P : 0.079
Episode: 236/1000 Total reward: 11.0 iter : 13304 P : 0.079
Episode: 237/1000 Total reward: 56.0 iter : 13360 P : 0.078
Episode: 238/1000 Total reward: 200.0 iter : 13560 P : 0.076
Episode: 239/1000 Total reward: 200.0 iter : 13760 P : 0.073
Episode: 240/1000 Total reward: 200.0 iter : 13960 P : 0.071
Episode: 241/1000 Total reward: 198.0 iter : 14158 P : 0.068
Episode: 242/1000 Total reward: 163.0 iter : 14321 P : 0.066
Episode: 243/1000 Total reward: 175.0 iter : 14496 P : 0.065
Episode: 244/1000 Total reward: 200.0 iter : 14696 P : 0.062
Episode: 245/1000 Total reward: 200.0 iter : 14896 P : 0.060
Episode: 246/1000 Total reward: 53.0 iter : 14949 P : 0.060
Episode: 247/1000 Total reward: 64.0 iter : 15013 P : 0.059
Episode: 248/1000 Total reward: 46.0 iter : 15059 P : 0.059
Episode: 249/1000 Total reward: 55.0 iter : 15114 P : 0.058
Episode: 250/1000 Total reward: 71.0 iter : 15185 P : 0.057
Episode: 251/1000 Total reward: 101.0 iter : 15286 P : 0.057
Episode: 252/1000 Total reward: 73.0 iter : 15359 P : 0.056
Episode: 253/1000 Total reward: 81.0 iter : 15440 P : 0.055
Episode: 254/1000 Total reward: 95.0 iter : 15535 P : 0.054
Episode: 255/1000 Total reward: 131.0 iter : 15666 P : 0.053
Episode: 256/1000 Total reward: 148.0 iter : 15814 P : 0.052
Episode: 257/1000 Total reward: 160.0 iter : 15974 P : 0.051
Episode: 258/1000 Total reward: 113.0 iter : 16087 P : 0.050
Episode: 259/1000 Total reward: 107.0 iter : 16194 P : 0.049
Episode: 260/1000 Total reward: 86.0 iter : 16280 P : 0.048
Episode: 261/1000 Total reward: 91.0 iter : 16371 P : 0.047
Episode: 262/1000 Total reward: 91.0 iter : 16462 P : 0.047
Episode: 263/1000 Total reward: 82.0 iter : 16544 P : 0.046
Episode: 264/1000 Total reward: 95.0 iter : 16639 P : 0.046
Episode: 265/1000 Total reward: 78.0 iter : 16717 P : 0.045
Episode: 266/1000 Total reward: 79.0 iter : 16796 P : 0.044
Episode: 267/1000 Total reward: 78.0 iter : 16874 P : 0.044
Episode: 268/1000 Total reward: 91.0 iter : 16965 P : 0.043
Episode: 269/1000 Total reward: 87.0 iter : 17052 P : 0.043
Episode: 270/1000 Total reward: 94.0 iter : 17146 P : 0.042
Episode: 271/1000 Total reward: 76.0 iter : 17222 P : 0.042
Episode: 272/1000 Total reward: 111.0 iter : 17333 P : 0.041
Episode: 273/1000 Total reward: 89.0 iter : 17422 P : 0.040
Episode: 274/1000 Total reward: 81.0 iter : 17503 P : 0.040
Episode: 275/1000 Total reward: 76.0 iter : 17579 P : 0.039
Episode: 276/1000 Total reward: 72.0 iter : 17651 P : 0.039
Episode: 277/1000 Total reward: 84.0 iter : 17735 P : 0.039
Episode: 278/1000 Total reward: 84.0 iter : 17819 P : 0.038
Episode: 279/1000 Total reward: 70.0 iter : 17889 P : 0.038
Episode: 280/1000 Total reward: 75.0 iter : 17964 P : 0.037
Episode: 281/1000 Total reward: 83.0 iter : 18047 P : 0.037
Episode: 282/1000 Total reward: 71.0 iter : 18118 P : 0.036
Episode: 283/1000 Total reward: 82.0 iter : 18200 P : 0.036
Episode: 284/1000 Total reward: 81.0 iter : 18281 P : 0.036
Episode: 285/1000 Total reward: 69.0 iter : 18350 P : 0.035
Episode: 286/1000 Total reward: 99.0 iter : 18449 P : 0.035
Episode: 287/1000 Total reward: 70.0 iter : 18519 P : 0.034
Episode: 288/1000 Total reward: 82.0 iter : 18601 P : 0.034
Episode: 289/1000 Total reward: 76.0 iter : 18677 P : 0.034
Episode: 290/1000 Total reward: 88.0 iter : 18765 P : 0.033
Episode: 291/1000 Total reward: 78.0 iter : 18843 P : 0.033
Episode: 292/1000 Total reward: 76.0 iter : 18919 P : 0.033
Episode: 293/1000 Total reward: 104.0 iter : 19023 P : 0.032
Episode: 294/1000 Total reward: 118.0 iter : 19141 P : 0.032
Episode: 295/1000 Total reward: 97.0 iter : 19238 P : 0.031
Episode: 296/1000 Total reward: 83.0 iter : 19321 P : 0.031
Episode: 297/1000 Total reward: 118.0 iter : 19439 P : 0.030
Episode: 298/1000 Total reward: 123.0 iter : 19562 P : 0.030
Episode: 299/1000 Total reward: 84.0 iter : 19646 P : 0.029
Episode: 300/1000 Total reward: 103.0 iter : 19749 P : 0.029
Episode: 301/1000 Total reward: 200.0 iter : 19949 P : 0.028
Episode: 302/1000 Total reward: 153.0 iter : 20102 P : 0.028
Episode: 303/1000 Total reward: 167.0 iter : 20269 P : 0.027
Episode: 304/1000 Total reward: 10.0 iter : 20279 P : 0.027
Episode: 305/1000 Total reward: 9.0 iter : 20288 P : 0.027
Episode: 306/1000 Total reward: 9.0 iter : 20297 P : 0.027
Episode: 307/1000 Total reward: 13.0 iter : 20310 P : 0.027
Episode: 308/1000 Total reward: 89.0 iter : 20399 P : 0.027
Episode: 309/1000 Total reward: 9.0 iter : 20408 P : 0.027
Episode: 310/1000 Total reward: 9.0 iter : 20417 P : 0.027
Episode: 311/1000 Total reward: 10.0 iter : 20427 P : 0.027
Episode: 312/1000 Total reward: 10.0 iter : 20437 P : 0.027
Episode: 313/1000 Total reward: 166.0 iter : 20603 P : 0.026
Episode: 314/1000 Total reward: 200.0 iter : 20803 P : 0.025
Episode: 315/1000 Total reward: 200.0 iter : 21003 P : 0.025
Episode: 316/1000 Total reward: 200.0 iter : 21203 P : 0.024
Episode: 317/1000 Total reward: 200.0 iter : 21403 P : 0.024
Episode: 318/1000 Total reward: 200.0 iter : 21603 P : 0.023
Episode: 319/1000 Total reward: 200.0 iter : 21803 P : 0.023
Episode: 320/1000 Total reward: 200.0 iter : 22003 P : 0.022
Episode: 321/1000 Total reward: 200.0 iter : 22203 P : 0.022
Episode: 322/1000 Total reward: 194.0 iter : 22397 P : 0.021
Episode: 323/1000 Total reward: 176.0 iter : 22573 P : 0.021
Episode: 324/1000 Total reward: 126.0 iter : 22699 P : 0.021
Episode: 325/1000 Total reward: 128.0 iter : 22827 P : 0.020
Episode: 326/1000 Total reward: 21.0 iter : 22848 P : 0.020
Episode: 327/1000 Total reward: 14.0 iter : 22862 P : 0.020
Episode: 328/1000 Total reward: 14.0 iter : 22876 P : 0.020
Episode: 329/1000 Total reward: 13.0 iter : 22889 P : 0.020
Episode: 330/1000 Total reward: 15.0 iter : 22904 P : 0.020
Episode: 331/1000 Total reward: 176.0 iter : 23080 P : 0.020
Episode: 332/1000 Total reward: 200.0 iter : 23280 P : 0.019
Episode: 333/1000 Total reward: 200.0 iter : 23480 P : 0.019
Episode: 334/1000 Total reward: 200.0 iter : 23680 P : 0.019
Episode: 335/1000 Total reward: 200.0 iter : 23880 P : 0.018
Episode: 336/1000 Total reward: 200.0 iter : 24080 P : 0.018
Episode: 337/1000 Total reward: 163.0 iter : 24243 P : 0.018
Episode: 338/1000 Total reward: 183.0 iter : 24426 P : 0.017
Episode: 339/1000 Total reward: 130.0 iter : 24556 P : 0.017
Episode: 340/1000 Total reward: 20.0 iter : 24576 P : 0.017
Episode: 341/1000 Total reward: 21.0 iter : 24597 P : 0.017
Episode: 342/1000 Total reward: 154.0 iter : 24751 P : 0.017
Episode: 343/1000 Total reward: 200.0 iter : 24951 P : 0.017
Episode: 344/1000 Total reward: 200.0 iter : 25151 P : 0.016
Episode: 345/1000 Total reward: 200.0 iter : 25351 P : 0.016
Episode: 346/1000 Total reward: 200.0 iter : 25551 P : 0.016
Episode: 347/1000 Total reward: 178.0 iter : 25729 P : 0.016
Episode: 348/1000 Total reward: 117.0 iter : 25846 P : 0.016
Episode: 349/1000 Total reward: 13.0 iter : 25859 P : 0.016
Episode: 350/1000 Total reward: 200.0 iter : 26059 P : 0.015
Episode: 351/1000 Total reward: 200.0 iter : 26259 P : 0.015
Episode: 352/1000 Total reward: 200.0 iter : 26459 P : 0.015
Episode: 353/1000 Total reward: 192.0 iter : 26651 P : 0.015
Episode: 354/1000 Total reward: 186.0 iter : 26837 P : 0.015
Episode: 355/1000 Total reward: 200.0 iter : 27037 P : 0.014
Episode: 356/1000 Total reward: 200.0 iter : 27237 P : 0.014
Episode: 357/1000 Total reward: 190.0 iter : 27427 P : 0.014
Episode: 358/1000 Total reward: 200.0 iter : 27627 P : 0.014
Episode: 359/1000 Total reward: 200.0 iter : 27827 P : 0.014
Episode: 360/1000 Total reward: 200.0 iter : 28027 P : 0.014
Episode: 361/1000 Total reward: 200.0 iter : 28227 P : 0.013
Episode: 362/1000 Total reward: 182.0 iter : 28409 P : 0.013
Episode: 363/1000 Total reward: 11.0 iter : 28420 P : 0.013
Episode: 364/1000 Total reward: 14.0 iter : 28434 P : 0.013
Episode: 365/1000 Total reward: 33.0 iter : 28467 P : 0.013
Episode: 366/1000 Total reward: 9.0 iter : 28476 P : 0.013
Episode: 367/1000 Total reward: 13.0 iter : 28489 P : 0.013
Episode: 368/1000 Total reward: 200.0 iter : 28689 P : 0.013
Episode: 369/1000 Total reward: 200.0 iter : 28889 P : 0.013
Episode: 370/1000 Total reward: 200.0 iter : 29089 P : 0.013
Episode: 371/1000 Total reward: 200.0 iter : 29289 P : 0.013
Episode: 372/1000 Total reward: 200.0 iter : 29489 P : 0.013
Episode: 373/1000 Total reward: 200.0 iter : 29689 P : 0.013
Episode: 374/1000 Total reward: 200.0 iter : 29889 P : 0.013
Episode: 375/1000 Total reward: 200.0 iter : 30089 P : 0.012
Episode: 376/1000 Total reward: 200.0 iter : 30289 P : 0.012
Episode: 377/1000 Total reward: 200.0 iter : 30489 P : 0.012
Episode: 378/1000 Total reward: 200.0 iter : 30689 P : 0.012
Episode: 379/1000 Total reward: 200.0 iter : 30889 P : 0.012
Episode: 380/1000 Total reward: 200.0 iter : 31089 P : 0.012
Episode: 381/1000 Total reward: 200.0 iter : 31289 P : 0.012
Episode: 382/1000 Total reward: 122.0 iter : 31411 P : 0.012
Episode: 383/1000 Total reward: 157.0 iter : 31568 P : 0.012
Episode: 384/1000 Total reward: 200.0 iter : 31768 P : 0.012
Episode: 385/1000 Total reward: 200.0 iter : 31968 P : 0.012
Episode: 386/1000 Total reward: 121.0 iter : 32089 P : 0.012
Episode: 387/1000 Total reward: 200.0 iter : 32289 P : 0.012
Episode: 388/1000 Total reward: 200.0 iter : 32489 P : 0.011
Episode: 389/1000 Total reward: 200.0 iter : 32689 P : 0.011
Episode: 390/1000 Total reward: 153.0 iter : 32842 P : 0.011
Episode: 391/1000 Total reward: 152.0 iter : 32994 P : 0.011
Episode: 392/1000 Total reward: 200.0 iter : 33194 P : 0.011
Episode: 393/1000 Total reward: 190.0 iter : 33384 P : 0.011
Episode: 394/1000 Total reward: 200.0 iter : 33584 P : 0.011
Episode: 395/1000 Total reward: 147.0 iter : 33731 P : 0.011
Episode: 396/1000 Total reward: 85.0 iter : 33816 P : 0.011
Episode: 397/1000 Total reward: 65.0 iter : 33881 P : 0.011
Episode: 398/1000 Total reward: 95.0 iter : 33976 P : 0.011
Episode: 399/1000 Total reward: 73.0 iter : 34049 P : 0.011
Episode: 400/1000 Total reward: 108.0 iter : 34157 P : 0.011
Episode: 401/1000 Total reward: 171.0 iter : 34328 P : 0.011
Episode: 402/1000 Total reward: 200.0 iter : 34528 P : 0.011
Episode: 403/1000 Total reward: 200.0 iter : 34728 P : 0.011
Episode: 404/1000 Total reward: 90.0 iter : 34818 P : 0.011
Episode: 405/1000 Total reward: 8.0 iter : 34826 P : 0.011
Episode: 406/1000 Total reward: 9.0 iter : 34835 P : 0.011
Episode: 407/1000 Total reward: 74.0 iter : 34909 P : 0.011
Episode: 408/1000 Total reward: 86.0 iter : 34995 P : 0.011
Episode: 409/1000 Total reward: 10.0 iter : 35005 P : 0.011
Episode: 410/1000 Total reward: 10.0 iter : 35015 P : 0.011
Episode: 411/1000 Total reward: 17.0 iter : 35032 P : 0.011
Episode: 412/1000 Total reward: 151.0 iter : 35183 P : 0.011
Episode: 413/1000 Total reward: 98.0 iter : 35281 P : 0.011
Episode: 414/1000 Total reward: 78.0 iter : 35359 P : 0.011
Episode: 415/1000 Total reward: 103.0 iter : 35462 P : 0.011
Episode: 416/1000 Total reward: 93.0 iter : 35555 P : 0.011
Episode: 417/1000 Total reward: 64.0 iter : 35619 P : 0.011
Episode: 418/1000 Total reward: 63.0 iter : 35682 P : 0.011
Episode: 419/1000 Total reward: 78.0 iter : 35760 P : 0.011
Episode: 420/1000 Total reward: 101.0 iter : 35861 P : 0.011
Episode: 421/1000 Total reward: 93.0 iter : 35954 P : 0.011
Episode: 422/1000 Total reward: 64.0 iter : 36018 P : 0.011
Episode: 423/1000 Total reward: 54.0 iter : 36072 P : 0.011
Episode: 424/1000 Total reward: 64.0 iter : 36136 P : 0.011
Episode: 425/1000 Total reward: 50.0 iter : 36186 P : 0.011
Episode: 426/1000 Total reward: 48.0 iter : 36234 P : 0.011
Episode: 427/1000 Total reward: 42.0 iter : 36276 P : 0.011
Episode: 428/1000 Total reward: 39.0 iter : 36315 P : 0.011
Episode: 429/1000 Total reward: 50.0 iter : 36365 P : 0.011
Episode: 430/1000 Total reward: 54.0 iter : 36419 P : 0.011
Episode: 431/1000 Total reward: 46.0 iter : 36465 P : 0.011
Episode: 432/1000 Total reward: 55.0 iter : 36520 P : 0.011
Episode: 433/1000 Total reward: 62.0 iter : 36582 P : 0.011
Episode: 434/1000 Total reward: 37.0 iter : 36619 P : 0.011
Episode: 435/1000 Total reward: 38.0 iter : 36657 P : 0.011
Episode: 436/1000 Total reward: 31.0 iter : 36688 P : 0.011
Episode: 437/1000 Total reward: 20.0 iter : 36708 P : 0.011
Episode: 438/1000 Total reward: 16.0 iter : 36724 P : 0.011
Episode: 439/1000 Total reward: 15.0 iter : 36739 P : 0.011
Episode: 440/1000 Total reward: 22.0 iter : 36761 P : 0.011
Episode: 441/1000 Total reward: 19.0 iter : 36780 P : 0.011
Episode: 442/1000 Total reward: 20.0 iter : 36800 P : 0.011
Episode: 443/1000 Total reward: 27.0 iter : 36827 P : 0.011
Episode: 444/1000 Total reward: 20.0 iter : 36847 P : 0.011
Episode: 445/1000 Total reward: 25.0 iter : 36872 P : 0.011
Episode: 446/1000 Total reward: 29.0 iter : 36901 P : 0.011
Episode: 447/1000 Total reward: 21.0 iter : 36922 P : 0.011
Episode: 448/1000 Total reward: 27.0 iter : 36949 P : 0.011
Episode: 449/1000 Total reward: 20.0 iter : 36969 P : 0.011
Episode: 450/1000 Total reward: 24.0 iter : 36993 P : 0.011
Episode: 451/1000 Total reward: 18.0 iter : 37011 P : 0.011
Episode: 452/1000 Total reward: 17.0 iter : 37028 P : 0.011
Episode: 453/1000 Total reward: 21.0 iter : 37049 P : 0.011
Episode: 454/1000 Total reward: 20.0 iter : 37069 P : 0.011
Episode: 455/1000 Total reward: 18.0 iter : 37087 P : 0.011
Episode: 456/1000 Total reward: 15.0 iter : 37102 P : 0.011
Episode: 457/1000 Total reward: 14.0 iter : 37116 P : 0.011
Episode: 458/1000 Total reward: 16.0 iter : 37132 P : 0.011
Episode: 459/1000 Total reward: 14.0 iter : 37146 P : 0.011
Episode: 460/1000 Total reward: 16.0 iter : 37162 P : 0.011
Episode: 461/1000 Total reward: 17.0 iter : 37179 P : 0.011
Episode: 462/1000 Total reward: 22.0 iter : 37201 P : 0.011
Episode: 463/1000 Total reward: 20.0 iter : 37221 P : 0.011
Episode: 464/1000 Total reward: 21.0 iter : 37242 P : 0.011
Episode: 465/1000 Total reward: 19.0 iter : 37261 P : 0.011
Episode: 466/1000 Total reward: 14.0 iter : 37275 P : 0.011
Episode: 467/1000 Total reward: 16.0 iter : 37291 P : 0.011
Episode: 468/1000 Total reward: 15.0 iter : 37306 P : 0.011
Episode: 469/1000 Total reward: 13.0 iter : 37319 P : 0.011
Episode: 470/1000 Total reward: 15.0 iter : 37334 P : 0.011
Episode: 471/1000 Total reward: 19.0 iter : 37353 P : 0.011
Episode: 472/1000 Total reward: 15.0 iter : 37368 P : 0.011
Episode: 473/1000 Total reward: 17.0 iter : 37385 P : 0.011
Episode: 474/1000 Total reward: 18.0 iter : 37403 P : 0.011
Episode: 475/1000 Total reward: 13.0 iter : 37416 P : 0.011
Episode: 476/1000 Total reward: 20.0 iter : 37436 P : 0.011
Episode: 477/1000 Total reward: 17.0 iter : 37453 P : 0.011
Episode: 478/1000 Total reward: 21.0 iter : 37474 P : 0.011
Episode: 479/1000 Total reward: 19.0 iter : 37493 P : 0.011
Episode: 480/1000 Total reward: 17.0 iter : 37510 P : 0.011
Episode: 481/1000 Total reward: 18.0 iter : 37528 P : 0.011
Episode: 482/1000 Total reward: 16.0 iter : 37544 P : 0.011
Episode: 483/1000 Total reward: 16.0 iter : 37560 P : 0.011
Episode: 484/1000 Total reward: 14.0 iter : 37574 P : 0.011
Episode: 485/1000 Total reward: 15.0 iter : 37589 P : 0.011
Episode: 486/1000 Total reward: 13.0 iter : 37602 P : 0.011
Episode: 487/1000 Total reward: 14.0 iter : 37616 P : 0.011
Episode: 488/1000 Total reward: 13.0 iter : 37629 P : 0.011
Episode: 489/1000 Total reward: 15.0 iter : 37644 P : 0.011
Episode: 490/1000 Total reward: 16.0 iter : 37660 P : 0.011
Episode: 491/1000 Total reward: 22.0 iter : 37682 P : 0.011
Episode: 492/1000 Total reward: 23.0 iter : 37705 P : 0.011
Episode: 493/1000 Total reward: 19.0 iter : 37724 P : 0.011
Episode: 494/1000 Total reward: 21.0 iter : 37745 P : 0.011
Episode: 495/1000 Total reward: 20.0 iter : 37765 P : 0.011
Episode: 496/1000 Total reward: 16.0 iter : 37781 P : 0.011
Episode: 497/1000 Total reward: 17.0 iter : 37798 P : 0.011
Episode: 498/1000 Total reward: 18.0 iter : 37816 P : 0.011
Episode: 499/1000 Total reward: 16.0 iter : 37832 P : 0.011
Episode: 500/1000 Total reward: 20.0 iter : 37852 P : 0.011
Episode: 501/1000 Total reward: 18.0 iter : 37870 P : 0.011
Episode: 502/1000 Total reward: 18.0 iter : 37888 P : 0.011
Episode: 503/1000 Total reward: 14.0 iter : 37902 P : 0.011
Episode: 504/1000 Total reward: 16.0 iter : 37918 P : 0.011
Episode: 505/1000 Total reward: 15.0 iter : 37933 P : 0.011
Episode: 506/1000 Total reward: 15.0 iter : 37948 P : 0.011
Episode: 507/1000 Total reward: 14.0 iter : 37962 P : 0.010
Episode: 508/1000 Total reward: 16.0 iter : 37978 P : 0.010
Episode: 509/1000 Total reward: 15.0 iter : 37993 P : 0.010
Episode: 510/1000 Total reward: 19.0 iter : 38012 P : 0.010
Episode: 511/1000 Total reward: 17.0 iter : 38029 P : 0.010
Episode: 512/1000 Total reward: 16.0 iter : 38045 P : 0.010
Episode: 513/1000 Total reward: 18.0 iter : 38063 P : 0.010
Episode: 514/1000 Total reward: 16.0 iter : 38079 P : 0.010
Episode: 515/1000 Total reward: 18.0 iter : 38097 P : 0.010
Episode: 516/1000 Total reward: 15.0 iter : 38112 P : 0.010
Episode: 517/1000 Total reward: 21.0 iter : 38133 P : 0.010
Episode: 518/1000 Total reward: 21.0 iter : 38154 P : 0.010
Episode: 519/1000 Total reward: 13.0 iter : 38167 P : 0.010
Episode: 520/1000 Total reward: 21.0 iter : 38188 P : 0.010
Episode: 521/1000 Total reward: 17.0 iter : 38205 P : 0.010
Episode: 522/1000 Total reward: 21.0 iter : 38226 P : 0.010
Episode: 523/1000 Total reward: 15.0 iter : 38241 P : 0.010
Episode: 524/1000 Total reward: 15.0 iter : 38256 P : 0.010
Episode: 525/1000 Total reward: 19.0 iter : 38275 P : 0.010
Episode: 526/1000 Total reward: 20.0 iter : 38295 P : 0.010
Episode: 527/1000 Total reward: 19.0 iter : 38314 P : 0.010
Episode: 528/1000 Total reward: 24.0 iter : 38338 P : 0.010
Episode: 529/1000 Total reward: 22.0 iter : 38360 P : 0.010
Episode: 530/1000 Total reward: 15.0 iter : 38375 P : 0.010
Episode: 531/1000 Total reward: 19.0 iter : 38394 P : 0.010
Episode: 532/1000 Total reward: 19.0 iter : 38413 P : 0.010
Episode: 533/1000 Total reward: 15.0 iter : 38428 P : 0.010
Episode: 534/1000 Total reward: 22.0 iter : 38450 P : 0.010
Episode: 535/1000 Total reward: 23.0 iter : 38473 P : 0.010
Episode: 536/1000 Total reward: 15.0 iter : 38488 P : 0.010
Episode: 537/1000 Total reward: 31.0 iter : 38519 P : 0.010
Episode: 538/1000 Total reward: 24.0 iter : 38543 P : 0.010
Episode: 539/1000 Total reward: 21.0 iter : 38564 P : 0.010
Episode: 540/1000 Total reward: 22.0 iter : 38586 P : 0.010
Episode: 541/1000 Total reward: 19.0 iter : 38605 P : 0.010
Episode: 542/1000 Total reward: 23.0 iter : 38628 P : 0.010
Episode: 543/1000 Total reward: 18.0 iter : 38646 P : 0.010
Episode: 544/1000 Total reward: 17.0 iter : 38663 P : 0.010
Episode: 545/1000 Total reward: 19.0 iter : 38682 P : 0.010
Episode: 546/1000 Total reward: 20.0 iter : 38702 P : 0.010
Episode: 547/1000 Total reward: 17.0 iter : 38719 P : 0.010
Episode: 548/1000 Total reward: 21.0 iter : 38740 P : 0.010
Episode: 549/1000 Total reward: 23.0 iter : 38763 P : 0.010
Episode: 550/1000 Total reward: 23.0 iter : 38786 P : 0.010
Episode: 551/1000 Total reward: 25.0 iter : 38811 P : 0.010
Episode: 552/1000 Total reward: 21.0 iter : 38832 P : 0.010
Episode: 553/1000 Total reward: 18.0 iter : 38850 P : 0.010
Episode: 554/1000 Total reward: 24.0 iter : 38874 P : 0.010
Episode: 555/1000 Total reward: 24.0 iter : 38898 P : 0.010
Episode: 556/1000 Total reward: 26.0 iter : 38924 P : 0.010
Episode: 557/1000 Total reward: 23.0 iter : 38947 P : 0.010
Episode: 558/1000 Total reward: 31.0 iter : 38978 P : 0.010
Episode: 559/1000 Total reward: 20.0 iter : 38998 P : 0.010
Episode: 560/1000 Total reward: 23.0 iter : 39021 P : 0.010
Episode: 561/1000 Total reward: 20.0 iter : 39041 P : 0.010
Episode: 562/1000 Total reward: 26.0 iter : 39067 P : 0.010
Episode: 563/1000 Total reward: 26.0 iter : 39093 P : 0.010
Episode: 564/1000 Total reward: 19.0 iter : 39112 P : 0.010
Episode: 565/1000 Total reward: 33.0 iter : 39145 P : 0.010
Episode: 566/1000 Total reward: 19.0 iter : 39164 P : 0.010
Episode: 567/1000 Total reward: 38.0 iter : 39202 P : 0.010
Episode: 568/1000 Total reward: 26.0 iter : 39228 P : 0.010
Episode: 569/1000 Total reward: 19.0 iter : 39247 P : 0.010
Episode: 570/1000 Total reward: 20.0 iter : 39267 P : 0.010
Episode: 571/1000 Total reward: 36.0 iter : 39303 P : 0.010
Episode: 572/1000 Total reward: 27.0 iter : 39330 P : 0.010
Episode: 573/1000 Total reward: 19.0 iter : 39349 P : 0.010
Episode: 574/1000 Total reward: 14.0 iter : 39363 P : 0.010
Episode: 575/1000 Total reward: 19.0 iter : 39382 P : 0.010
Episode: 576/1000 Total reward: 27.0 iter : 39409 P : 0.010
Episode: 577/1000 Total reward: 29.0 iter : 39438 P : 0.010
Episode: 578/1000 Total reward: 28.0 iter : 39466 P : 0.010
Episode: 579/1000 Total reward: 33.0 iter : 39499 P : 0.010
Episode: 580/1000 Total reward: 26.0 iter : 39525 P : 0.010
Episode: 581/1000 Total reward: 26.0 iter : 39551 P : 0.010
Episode: 582/1000 Total reward: 30.0 iter : 39581 P : 0.010
Episode: 583/1000 Total reward: 26.0 iter : 39607 P : 0.010
Episode: 584/1000 Total reward: 31.0 iter : 39638 P : 0.010
Episode: 585/1000 Total reward: 28.0 iter : 39666 P : 0.010
Episode: 586/1000 Total reward: 38.0 iter : 39704 P : 0.010
Episode: 587/1000 Total reward: 30.0 iter : 39734 P : 0.010
Episode: 588/1000 Total reward: 30.0 iter : 39764 P : 0.010
Episode: 589/1000 Total reward: 43.0 iter : 39807 P : 0.010
Episode: 590/1000 Total reward: 27.0 iter : 39834 P : 0.010
Episode: 591/1000 Total reward: 60.0 iter : 39894 P : 0.010
Episode: 592/1000 Total reward: 28.0 iter : 39922 P : 0.010
Episode: 593/1000 Total reward: 39.0 iter : 39961 P : 0.010
Episode: 594/1000 Total reward: 29.0 iter : 39990 P : 0.010
Episode: 595/1000 Total reward: 47.0 iter : 40037 P : 0.010
Episode: 596/1000 Total reward: 59.0 iter : 40096 P : 0.010
Episode: 597/1000 Total reward: 200.0 iter : 40296 P : 0.010
Episode: 598/1000 Total reward: 200.0 iter : 40496 P : 0.010
Episode: 599/1000 Total reward: 200.0 iter : 40696 P : 0.010
Episode: 600/1000 Total reward: 43.0 iter : 40739 P : 0.010
Episode: 601/1000 Total reward: 10.0 iter : 40749 P : 0.010
Episode: 602/1000 Total reward: 11.0 iter : 40760 P : 0.010
Episode: 603/1000 Total reward: 23.0 iter : 40783 P : 0.010
Episode: 604/1000 Total reward: 200.0 iter : 40983 P : 0.010
Episode: 605/1000 Total reward: 200.0 iter : 41183 P : 0.010
Episode: 606/1000 Total reward: 200.0 iter : 41383 P : 0.010
Episode: 607/1000 Total reward: 118.0 iter : 41501 P : 0.010
Episode: 608/1000 Total reward: 25.0 iter : 41526 P : 0.010
Episode: 609/1000 Total reward: 14.0 iter : 41540 P : 0.010
Episode: 610/1000 Total reward: 18.0 iter : 41558 P : 0.010
Episode: 611/1000 Total reward: 200.0 iter : 41758 P : 0.010
Episode: 612/1000 Total reward: 200.0 iter : 41958 P : 0.010
Episode: 613/1000 Total reward: 200.0 iter : 42158 P : 0.010
Episode: 614/1000 Total reward: 144.0 iter : 42302 P : 0.010
Episode: 615/1000 Total reward: 97.0 iter : 42399 P : 0.010
Episode: 616/1000 Total reward: 95.0 iter : 42494 P : 0.010
Episode: 617/1000 Total reward: 24.0 iter : 42518 P : 0.010
Episode: 618/1000 Total reward: 184.0 iter : 42702 P : 0.010
Episode: 619/1000 Total reward: 172.0 iter : 42874 P : 0.010
Episode: 620/1000 Total reward: 106.0 iter : 42980 P : 0.010
Episode: 621/1000 Total reward: 158.0 iter : 43138 P : 0.010
Episode: 622/1000 Total reward: 192.0 iter : 43330 P : 0.010
Episode: 623/1000 Total reward: 62.0 iter : 43392 P : 0.010
Episode: 624/1000 Total reward: 22.0 iter : 43414 P : 0.010
Episode: 625/1000 Total reward: 200.0 iter : 43614 P : 0.010
Episode: 626/1000 Total reward: 200.0 iter : 43814 P : 0.010
Episode: 627/1000 Total reward: 200.0 iter : 44014 P : 0.010
Episode: 628/1000 Total reward: 200.0 iter : 44214 P : 0.010
Episode: 629/1000 Total reward: 200.0 iter : 44414 P : 0.010
Episode: 630/1000 Total reward: 200.0 iter : 44614 P : 0.010
Episode: 631/1000 Total reward: 200.0 iter : 44814 P : 0.010
Episode: 632/1000 Total reward: 200.0 iter : 45014 P : 0.010
Episode: 633/1000 Total reward: 200.0 iter : 45214 P : 0.010
Episode: 634/1000 Total reward: 200.0 iter : 45414 P : 0.010
Episode: 635/1000 Total reward: 200.0 iter : 45614 P : 0.010
Episode: 636/1000 Total reward: 200.0 iter : 45814 P : 0.010
Episode: 637/1000 Total reward: 200.0 iter : 46014 P : 0.010
Episode: 638/1000 Total reward: 200.0 iter : 46214 P : 0.010
Episode: 639/1000 Total reward: 200.0 iter : 46414 P : 0.010
Episode: 640/1000 Total reward: 200.0 iter : 46614 P : 0.010
Episode: 641/1000 Total reward: 200.0 iter : 46814 P : 0.010
Episode: 642/1000 Total reward: 200.0 iter : 47014 P : 0.010
Episode: 643/1000 Total reward: 200.0 iter : 47214 P : 0.010
Episode: 644/1000 Total reward: 200.0 iter : 47414 P : 0.010
Episode: 645/1000 Total reward: 200.0 iter : 47614 P : 0.010
Episode: 646/1000 Total reward: 200.0 iter : 47814 P : 0.010
Episode: 647/1000 Total reward: 200.0 iter : 48014 P : 0.010
Episode: 648/1000 Total reward: 200.0 iter : 48214 P : 0.010
Episode: 649/1000 Total reward: 34.0 iter : 48248 P : 0.010
Episode: 650/1000 Total reward: 10.0 iter : 48258 P : 0.010
Episode: 651/1000 Total reward: 45.0 iter : 48303 P : 0.010
Episode: 652/1000 Total reward: 10.0 iter : 48313 P : 0.010
Episode: 653/1000 Total reward: 10.0 iter : 48323 P : 0.010
Episode: 654/1000 Total reward: 200.0 iter : 48523 P : 0.010
Episode: 655/1000 Total reward: 200.0 iter : 48723 P : 0.010
Episode: 656/1000 Total reward: 84.0 iter : 48807 P : 0.010
Episode: 657/1000 Total reward: 140.0 iter : 48947 P : 0.010
Episode: 658/1000 Total reward: 95.0 iter : 49042 P : 0.010
Episode: 659/1000 Total reward: 200.0 iter : 49242 P : 0.010
Episode: 660/1000 Total reward: 200.0 iter : 49442 P : 0.010
Episode: 661/1000 Total reward: 200.0 iter : 49642 P : 0.010
Episode: 662/1000 Total reward: 200.0 iter : 49842 P : 0.010
Episode: 663/1000 Total reward: 200.0 iter : 50042 P : 0.010
Episode: 664/1000 Total reward: 200.0 iter : 50242 P : 0.010
Episode: 665/1000 Total reward: 200.0 iter : 50442 P : 0.010
Episode: 666/1000 Total reward: 200.0 iter : 50642 P : 0.010
Episode: 667/1000 Total reward: 200.0 iter : 50842 P : 0.010
Episode: 668/1000 Total reward: 200.0 iter : 51042 P : 0.010
Episode: 669/1000 Total reward: 200.0 iter : 51242 P : 0.010
Episode: 670/1000 Total reward: 200.0 iter : 51442 P : 0.010
Episode: 671/1000 Total reward: 200.0 iter : 51642 P : 0.010
Episode: 672/1000 Total reward: 200.0 iter : 51842 P : 0.010
Episode: 673/1000 Total reward: 200.0 iter : 52042 P : 0.010
Episode: 674/1000 Total reward: 200.0 iter : 52242 P : 0.010
Episode: 675/1000 Total reward: 200.0 iter : 52442 P : 0.010
Episode: 676/1000 Total reward: 162.0 iter : 52604 P : 0.010
Episode: 677/1000 Total reward: 11.0 iter : 52615 P : 0.010
Episode: 678/1000 Total reward: 12.0 iter : 52627 P : 0.010
Episode: 679/1000 Total reward: 200.0 iter : 52827 P : 0.010
Episode: 680/1000 Total reward: 200.0 iter : 53027 P : 0.010
Episode: 681/1000 Total reward: 103.0 iter : 53130 P : 0.010
Episode: 682/1000 Total reward: 107.0 iter : 53237 P : 0.010
Episode: 683/1000 Total reward: 98.0 iter : 53335 P : 0.010
Episode: 684/1000 Total reward: 61.0 iter : 53396 P : 0.010
Episode: 685/1000 Total reward: 69.0 iter : 53465 P : 0.010
Episode: 686/1000 Total reward: 75.0 iter : 53540 P : 0.010
Episode: 687/1000 Total reward: 65.0 iter : 53605 P : 0.010
Episode: 688/1000 Total reward: 61.0 iter : 53666 P : 0.010
Episode: 689/1000 Total reward: 82.0 iter : 53748 P : 0.010
Episode: 690/1000 Total reward: 73.0 iter : 53821 P : 0.010
Episode: 691/1000 Total reward: 44.0 iter : 53865 P : 0.010
Episode: 692/1000 Total reward: 46.0 iter : 53911 P : 0.010
Episode: 693/1000 Total reward: 38.0 iter : 53949 P : 0.010
Episode: 694/1000 Total reward: 43.0 iter : 53992 P : 0.010
Episode: 695/1000 Total reward: 29.0 iter : 54021 P : 0.010
Episode: 696/1000 Total reward: 25.0 iter : 54046 P : 0.010
Episode: 697/1000 Total reward: 37.0 iter : 54083 P : 0.010
Episode: 698/1000 Total reward: 24.0 iter : 54107 P : 0.010
Episode: 699/1000 Total reward: 25.0 iter : 54132 P : 0.010
Episode: 700/1000 Total reward: 23.0 iter : 54155 P : 0.010
Episode: 701/1000 Total reward: 15.0 iter : 54170 P : 0.010
Episode: 702/1000 Total reward: 18.0 iter : 54188 P : 0.010
Episode: 703/1000 Total reward: 16.0 iter : 54204 P : 0.010
Episode: 704/1000 Total reward: 30.0 iter : 54234 P : 0.010
Episode: 705/1000 Total reward: 22.0 iter : 54256 P : 0.010
Episode: 706/1000 Total reward: 16.0 iter : 54272 P : 0.010
Episode: 707/1000 Total reward: 24.0 iter : 54296 P : 0.010
Episode: 708/1000 Total reward: 26.0 iter : 54322 P : 0.010
Episode: 709/1000 Total reward: 20.0 iter : 54342 P : 0.010
Episode: 710/1000 Total reward: 23.0 iter : 54365 P : 0.010
Episode: 711/1000 Total reward: 14.0 iter : 54379 P : 0.010
Episode: 712/1000 Total reward: 22.0 iter : 54401 P : 0.010
Episode: 713/1000 Total reward: 16.0 iter : 54417 P : 0.010
Episode: 714/1000 Total reward: 21.0 iter : 54438 P : 0.010
Episode: 715/1000 Total reward: 21.0 iter : 54459 P : 0.010
Episode: 716/1000 Total reward: 18.0 iter : 54477 P : 0.010
Episode: 717/1000 Total reward: 21.0 iter : 54498 P : 0.010
Episode: 718/1000 Total reward: 16.0 iter : 54514 P : 0.010
Episode: 719/1000 Total reward: 19.0 iter : 54533 P : 0.010
Episode: 720/1000 Total reward: 22.0 iter : 54555 P : 0.010
Episode: 721/1000 Total reward: 20.0 iter : 54575 P : 0.010
Episode: 722/1000 Total reward: 24.0 iter : 54599 P : 0.010
Episode: 723/1000 Total reward: 17.0 iter : 54616 P : 0.010
Episode: 724/1000 Total reward: 16.0 iter : 54632 P : 0.010
Episode: 725/1000 Total reward: 14.0 iter : 54646 P : 0.010
Episode: 726/1000 Total reward: 15.0 iter : 54661 P : 0.010
Episode: 727/1000 Total reward: 13.0 iter : 54674 P : 0.010
Episode: 728/1000 Total reward: 19.0 iter : 54693 P : 0.010
Episode: 729/1000 Total reward: 18.0 iter : 54711 P : 0.010
Episode: 730/1000 Total reward: 14.0 iter : 54725 P : 0.010
Episode: 731/1000 Total reward: 20.0 iter : 54745 P : 0.010
Episode: 732/1000 Total reward: 23.0 iter : 54768 P : 0.010
Episode: 733/1000 Total reward: 21.0 iter : 54789 P : 0.010
Episode: 734/1000 Total reward: 21.0 iter : 54810 P : 0.010
Episode: 735/1000 Total reward: 25.0 iter : 54835 P : 0.010
Episode: 736/1000 Total reward: 17.0 iter : 54852 P : 0.010
Episode: 737/1000 Total reward: 24.0 iter : 54876 P : 0.010
Episode: 738/1000 Total reward: 13.0 iter : 54889 P : 0.010
Episode: 739/1000 Total reward: 17.0 iter : 54906 P : 0.010
Episode: 740/1000 Total reward: 15.0 iter : 54921 P : 0.010
Episode: 741/1000 Total reward: 22.0 iter : 54943 P : 0.010
Episode: 742/1000 Total reward: 14.0 iter : 54957 P : 0.010
Episode: 743/1000 Total reward: 15.0 iter : 54972 P : 0.010
Episode: 744/1000 Total reward: 15.0 iter : 54987 P : 0.010
Episode: 745/1000 Total reward: 18.0 iter : 55005 P : 0.010
Episode: 746/1000 Total reward: 17.0 iter : 55022 P : 0.010
Episode: 747/1000 Total reward: 13.0 iter : 55035 P : 0.010
Episode: 748/1000 Total reward: 19.0 iter : 55054 P : 0.010
Episode: 749/1000 Total reward: 15.0 iter : 55069 P : 0.010
Episode: 750/1000 Total reward: 19.0 iter : 55088 P : 0.010
Episode: 751/1000 Total reward: 13.0 iter : 55101 P : 0.010
Episode: 752/1000 Total reward: 16.0 iter : 55117 P : 0.010
Episode: 753/1000 Total reward: 17.0 iter : 55134 P : 0.010
Episode: 754/1000 Total reward: 22.0 iter : 55156 P : 0.010
Episode: 755/1000 Total reward: 18.0 iter : 55174 P : 0.010
Episode: 756/1000 Total reward: 26.0 iter : 55200 P : 0.010
Episode: 757/1000 Total reward: 17.0 iter : 55217 P : 0.010
Episode: 758/1000 Total reward: 16.0 iter : 55233 P : 0.010
Episode: 759/1000 Total reward: 16.0 iter : 55249 P : 0.010
Episode: 760/1000 Total reward: 19.0 iter : 55268 P : 0.010
Episode: 761/1000 Total reward: 29.0 iter : 55297 P : 0.010
Episode: 762/1000 Total reward: 31.0 iter : 55328 P : 0.010
Episode: 763/1000 Total reward: 32.0 iter : 55360 P : 0.010
Episode: 764/1000 Total reward: 27.0 iter : 55387 P : 0.010
Episode: 765/1000 Total reward: 21.0 iter : 55408 P : 0.010
Episode: 766/1000 Total reward: 14.0 iter : 55422 P : 0.010
Episode: 767/1000 Total reward: 15.0 iter : 55437 P : 0.010
Episode: 768/1000 Total reward: 47.0 iter : 55484 P : 0.010
Episode: 769/1000 Total reward: 37.0 iter : 55521 P : 0.010
Episode: 770/1000 Total reward: 38.0 iter : 55559 P : 0.010
Episode: 771/1000 Total reward: 26.0 iter : 55585 P : 0.010
Episode: 772/1000 Total reward: 19.0 iter : 55604 P : 0.010
Episode: 773/1000 Total reward: 40.0 iter : 55644 P : 0.010
Episode: 774/1000 Total reward: 26.0 iter : 55670 P : 0.010
Episode: 775/1000 Total reward: 26.0 iter : 55696 P : 0.010
Episode: 776/1000 Total reward: 23.0 iter : 55719 P : 0.010
Episode: 777/1000 Total reward: 22.0 iter : 55741 P : 0.010
Episode: 778/1000 Total reward: 32.0 iter : 55773 P : 0.010
Episode: 779/1000 Total reward: 22.0 iter : 55795 P : 0.010
Episode: 780/1000 Total reward: 33.0 iter : 55828 P : 0.010
Episode: 781/1000 Total reward: 27.0 iter : 55855 P : 0.010
Episode: 782/1000 Total reward: 18.0 iter : 55873 P : 0.010
Episode: 783/1000 Total reward: 18.0 iter : 55891 P : 0.010
Episode: 784/1000 Total reward: 38.0 iter : 55929 P : 0.010
Episode: 785/1000 Total reward: 49.0 iter : 55978 P : 0.010
Episode: 786/1000 Total reward: 20.0 iter : 55998 P : 0.010
Episode: 787/1000 Total reward: 28.0 iter : 56026 P : 0.010
Episode: 788/1000 Total reward: 40.0 iter : 56066 P : 0.010
Episode: 789/1000 Total reward: 23.0 iter : 56089 P : 0.010
Episode: 790/1000 Total reward: 31.0 iter : 56120 P : 0.010
Episode: 791/1000 Total reward: 33.0 iter : 56153 P : 0.010
Episode: 792/1000 Total reward: 37.0 iter : 56190 P : 0.010
Episode: 793/1000 Total reward: 24.0 iter : 56214 P : 0.010
Episode: 794/1000 Total reward: 48.0 iter : 56262 P : 0.010
Episode: 795/1000 Total reward: 34.0 iter : 56296 P : 0.010
Episode: 796/1000 Total reward: 25.0 iter : 56321 P : 0.010
Episode: 797/1000 Total reward: 33.0 iter : 56354 P : 0.010
Episode: 798/1000 Total reward: 30.0 iter : 56384 P : 0.010
Episode: 799/1000 Total reward: 35.0 iter : 56419 P : 0.010
Episode: 800/1000 Total reward: 55.0 iter : 56474 P : 0.010
Episode: 801/1000 Total reward: 60.0 iter : 56534 P : 0.010
Episode: 802/1000 Total reward: 44.0 iter : 56578 P : 0.010
Episode: 803/1000 Total reward: 55.0 iter : 56633 P : 0.010
Episode: 804/1000 Total reward: 200.0 iter : 56833 P : 0.010
Episode: 805/1000 Total reward: 200.0 iter : 57033 P : 0.010
Episode: 806/1000 Total reward: 200.0 iter : 57233 P : 0.010
Episode: 807/1000 Total reward: 200.0 iter : 57433 P : 0.010
Episode: 808/1000 Total reward: 200.0 iter : 57633 P : 0.010
Episode: 809/1000 Total reward: 200.0 iter : 57833 P : 0.010
Episode: 810/1000 Total reward: 200.0 iter : 58033 P : 0.010
Episode: 811/1000 Total reward: 200.0 iter : 58233 P : 0.010
Episode: 812/1000 Total reward: 200.0 iter : 58433 P : 0.010
Episode: 813/1000 Total reward: 200.0 iter : 58633 P : 0.010
Episode: 814/1000 Total reward: 200.0 iter : 58833 P : 0.010
Episode: 815/1000 Total reward: 200.0 iter : 59033 P : 0.010
Episode: 816/1000 Total reward: 200.0 iter : 59233 P : 0.010
Episode: 817/1000 Total reward: 200.0 iter : 59433 P : 0.010
Episode: 818/1000 Total reward: 200.0 iter : 59633 P : 0.010
Episode: 819/1000 Total reward: 200.0 iter : 59833 P : 0.010
Episode: 820/1000 Total reward: 200.0 iter : 60033 P : 0.010
Episode: 821/1000 Total reward: 200.0 iter : 60233 P : 0.010
Episode: 822/1000 Total reward: 200.0 iter : 60433 P : 0.010
Episode: 823/1000 Total reward: 200.0 iter : 60633 P : 0.010
Episode: 824/1000 Total reward: 200.0 iter : 60833 P : 0.010
Episode: 825/1000 Total reward: 200.0 iter : 61033 P : 0.010
Episode: 826/1000 Total reward: 200.0 iter : 61233 P : 0.010
Episode: 827/1000 Total reward: 200.0 iter : 61433 P : 0.010
Episode: 828/1000 Total reward: 200.0 iter : 61633 P : 0.010
Episode: 829/1000 Total reward: 200.0 iter : 61833 P : 0.010
Episode: 830/1000 Total reward: 200.0 iter : 62033 P : 0.010
Episode: 831/1000 Total reward: 200.0 iter : 62233 P : 0.010
Episode: 832/1000 Total reward: 200.0 iter : 62433 P : 0.010
Episode: 833/1000 Total reward: 200.0 iter : 62633 P : 0.010
Episode: 834/1000 Total reward: 200.0 iter : 62833 P : 0.010
Episode: 835/1000 Total reward: 200.0 iter : 63033 P : 0.010
Episode: 836/1000 Total reward: 200.0 iter : 63233 P : 0.010
Episode: 837/1000 Total reward: 200.0 iter : 63433 P : 0.010
Episode: 838/1000 Total reward: 200.0 iter : 63633 P : 0.010
Episode: 839/1000 Total reward: 200.0 iter : 63833 P : 0.010
Episode: 840/1000 Total reward: 200.0 iter : 64033 P : 0.010
Episode: 841/1000 Total reward: 200.0 iter : 64233 P : 0.010
Episode: 842/1000 Total reward: 200.0 iter : 64433 P : 0.010
Episode: 843/1000 Total reward: 200.0 iter : 64633 P : 0.010
Episode: 844/1000 Total reward: 200.0 iter : 64833 P : 0.010
Episode: 845/1000 Total reward: 200.0 iter : 65033 P : 0.010
Episode: 846/1000 Total reward: 200.0 iter : 65233 P : 0.010
Episode: 847/1000 Total reward: 200.0 iter : 65433 P : 0.010
Episode: 848/1000 Total reward: 200.0 iter : 65633 P : 0.010
Episode: 849/1000 Total reward: 200.0 iter : 65833 P : 0.010
Episode: 850/1000 Total reward: 200.0 iter : 66033 P : 0.010
Episode: 851/1000 Total reward: 200.0 iter : 66233 P : 0.010
Episode: 852/1000 Total reward: 91.0 iter : 66324 P : 0.010
Episode: 853/1000 Total reward: 200.0 iter : 66524 P : 0.010
Episode: 854/1000 Total reward: 200.0 iter : 66724 P : 0.010
Episode: 855/1000 Total reward: 200.0 iter : 66924 P : 0.010
Episode: 856/1000 Total reward: 200.0 iter : 67124 P : 0.010
Episode: 857/1000 Total reward: 200.0 iter : 67324 P : 0.010
Episode: 858/1000 Total reward: 200.0 iter : 67524 P : 0.010
Episode: 859/1000 Total reward: 200.0 iter : 67724 P : 0.010
Episode: 860/1000 Total reward: 200.0 iter : 67924 P : 0.010
Episode: 861/1000 Total reward: 200.0 iter : 68124 P : 0.010
Episode: 862/1000 Total reward: 200.0 iter : 68324 P : 0.010
Episode: 863/1000 Total reward: 200.0 iter : 68524 P : 0.010
Episode: 864/1000 Total reward: 200.0 iter : 68724 P : 0.010
Episode: 865/1000 Total reward: 200.0 iter : 68924 P : 0.010
Episode: 866/1000 Total reward: 200.0 iter : 69124 P : 0.010
Episode: 867/1000 Total reward: 200.0 iter : 69324 P : 0.010
Episode: 868/1000 Total reward: 200.0 iter : 69524 P : 0.010
Episode: 869/1000 Total reward: 200.0 iter : 69724 P : 0.010
Episode: 870/1000 Total reward: 200.0 iter : 69924 P : 0.010
Episode: 871/1000 Total reward: 200.0 iter : 70124 P : 0.010
Episode: 872/1000 Total reward: 200.0 iter : 70324 P : 0.010
Episode: 873/1000 Total reward: 200.0 iter : 70524 P : 0.010
Episode: 874/1000 Total reward: 200.0 iter : 70724 P : 0.010
Episode: 875/1000 Total reward: 200.0 iter : 70924 P : 0.010
Episode: 876/1000 Total reward: 200.0 iter : 71124 P : 0.010
Episode: 877/1000 Total reward: 200.0 iter : 71324 P : 0.010
Episode: 878/1000 Total reward: 200.0 iter : 71524 P : 0.010
Episode: 879/1000 Total reward: 200.0 iter : 71724 P : 0.010
Episode: 880/1000 Total reward: 10.0 iter : 71734 P : 0.010
Episode: 881/1000 Total reward: 200.0 iter : 71934 P : 0.010
Episode: 882/1000 Total reward: 200.0 iter : 72134 P : 0.010
Episode: 883/1000 Total reward: 200.0 iter : 72334 P : 0.010
Episode: 884/1000 Total reward: 200.0 iter : 72534 P : 0.010
Episode: 885/1000 Total reward: 200.0 iter : 72734 P : 0.010
Episode: 886/1000 Total reward: 200.0 iter : 72934 P : 0.010
Episode: 887/1000 Total reward: 200.0 iter : 73134 P : 0.010
Episode: 888/1000 Total reward: 200.0 iter : 73334 P : 0.010
Episode: 889/1000 Total reward: 200.0 iter : 73534 P : 0.010
Episode: 890/1000 Total reward: 200.0 iter : 73734 P : 0.010
Episode: 891/1000 Total reward: 200.0 iter : 73934 P : 0.010
Episode: 892/1000 Total reward: 200.0 iter : 74134 P : 0.010
Episode: 893/1000 Total reward: 200.0 iter : 74334 P : 0.010
Episode: 894/1000 Total reward: 200.0 iter : 74534 P : 0.010
Episode: 895/1000 Total reward: 200.0 iter : 74734 P : 0.010
Episode: 896/1000 Total reward: 200.0 iter : 74934 P : 0.010
Episode: 897/1000 Total reward: 200.0 iter : 75134 P : 0.010
Episode: 898/1000 Total reward: 200.0 iter : 75334 P : 0.010
Episode: 899/1000 Total reward: 200.0 iter : 75534 P : 0.010
Episode: 900/1000 Total reward: 200.0 iter : 75734 P : 0.010
Episode: 901/1000 Total reward: 200.0 iter : 75934 P : 0.010
Episode: 902/1000 Total reward: 200.0 iter : 76134 P : 0.010
Episode: 903/1000 Total reward: 200.0 iter : 76334 P : 0.010
Episode: 904/1000 Total reward: 200.0 iter : 76534 P : 0.010
Episode: 905/1000 Total reward: 200.0 iter : 76734 P : 0.010
Episode: 906/1000 Total reward: 200.0 iter : 76934 P : 0.010
Episode: 907/1000 Total reward: 200.0 iter : 77134 P : 0.010
Episode: 908/1000 Total reward: 200.0 iter : 77334 P : 0.010
Episode: 909/1000 Total reward: 200.0 iter : 77534 P : 0.010
Episode: 910/1000 Total reward: 200.0 iter : 77734 P : 0.010
Episode: 911/1000 Total reward: 8.0 iter : 77742 P : 0.010
Episode: 912/1000 Total reward: 9.0 iter : 77751 P : 0.010
Episode: 913/1000 Total reward: 13.0 iter : 77764 P : 0.010
Episode: 914/1000 Total reward: 200.0 iter : 77964 P : 0.010
Episode: 915/1000 Total reward: 200.0 iter : 78164 P : 0.010
Episode: 916/1000 Total reward: 200.0 iter : 78364 P : 0.010
Episode: 917/1000 Total reward: 200.0 iter : 78564 P : 0.010
Episode: 918/1000 Total reward: 200.0 iter : 78764 P : 0.010
Episode: 919/1000 Total reward: 200.0 iter : 78964 P : 0.010
Episode: 920/1000 Total reward: 200.0 iter : 79164 P : 0.010
Episode: 921/1000 Total reward: 200.0 iter : 79364 P : 0.010
Episode: 922/1000 Total reward: 97.0 iter : 79461 P : 0.010
Episode: 923/1000 Total reward: 26.0 iter : 79487 P : 0.010
Episode: 924/1000 Total reward: 29.0 iter : 79516 P : 0.010
Episode: 925/1000 Total reward: 24.0 iter : 79540 P : 0.010
Episode: 926/1000 Total reward: 200.0 iter : 79740 P : 0.010
Episode: 927/1000 Total reward: 200.0 iter : 79940 P : 0.010
Episode: 928/1000 Total reward: 200.0 iter : 80140 P : 0.010
Episode: 929/1000 Total reward: 200.0 iter : 80340 P : 0.010
Episode: 930/1000 Total reward: 200.0 iter : 80540 P : 0.010
Episode: 931/1000 Total reward: 200.0 iter : 80740 P : 0.010
Episode: 932/1000 Total reward: 200.0 iter : 80940 P : 0.010
Episode: 933/1000 Total reward: 200.0 iter : 81140 P : 0.010
Episode: 934/1000 Total reward: 200.0 iter : 81340 P : 0.010
Episode: 935/1000 Total reward: 200.0 iter : 81540 P : 0.010
Episode: 936/1000 Total reward: 200.0 iter : 81740 P : 0.010
Episode: 937/1000 Total reward: 200.0 iter : 81940 P : 0.010
Episode: 938/1000 Total reward: 50.0 iter : 81990 P : 0.010
Episode: 939/1000 Total reward: 10.0 iter : 82000 P : 0.010
Episode: 940/1000 Total reward: 9.0 iter : 82009 P : 0.010
Episode: 941/1000 Total reward: 200.0 iter : 82209 P : 0.010
Episode: 942/1000 Total reward: 200.0 iter : 82409 P : 0.010
Episode: 943/1000 Total reward: 200.0 iter : 82609 P : 0.010
Episode: 944/1000 Total reward: 200.0 iter : 82809 P : 0.010
Episode: 945/1000 Total reward: 200.0 iter : 83009 P : 0.010
Episode: 946/1000 Total reward: 200.0 iter : 83209 P : 0.010
Episode: 947/1000 Total reward: 200.0 iter : 83409 P : 0.010
Episode: 948/1000 Total reward: 200.0 iter : 83609 P : 0.010
Episode: 949/1000 Total reward: 200.0 iter : 83809 P : 0.010
Episode: 950/1000 Total reward: 200.0 iter : 84009 P : 0.010
Episode: 951/1000 Total reward: 200.0 iter : 84209 P : 0.010
Episode: 952/1000 Total reward: 200.0 iter : 84409 P : 0.010
Episode: 953/1000 Total reward: 200.0 iter : 84609 P : 0.010
Episode: 954/1000 Total reward: 200.0 iter : 84809 P : 0.010
Episode: 955/1000 Total reward: 200.0 iter : 85009 P : 0.010
Episode: 956/1000 Total reward: 200.0 iter : 85209 P : 0.010
Episode: 957/1000 Total reward: 200.0 iter : 85409 P : 0.010
Episode: 958/1000 Total reward: 200.0 iter : 85609 P : 0.010
Episode: 959/1000 Total reward: 200.0 iter : 85809 P : 0.010
Episode: 960/1000 Total reward: 200.0 iter : 86009 P : 0.010
Episode: 961/1000 Total reward: 200.0 iter : 86209 P : 0.010
Episode: 962/1000 Total reward: 200.0 iter : 86409 P : 0.010
Episode: 963/1000 Total reward: 200.0 iter : 86609 P : 0.010
Episode: 964/1000 Total reward: 200.0 iter : 86809 P : 0.010
Episode: 965/1000 Total reward: 200.0 iter : 87009 P : 0.010
Episode: 966/1000 Total reward: 200.0 iter : 87209 P : 0.010
Episode: 967/1000 Total reward: 200.0 iter : 87409 P : 0.010
Episode: 968/1000 Total reward: 200.0 iter : 87609 P : 0.010
Episode: 969/1000 Total reward: 200.0 iter : 87809 P : 0.010
Episode: 970/1000 Total reward: 200.0 iter : 88009 P : 0.010
Episode: 971/1000 Total reward: 200.0 iter : 88209 P : 0.010
Episode: 972/1000 Total reward: 200.0 iter : 88409 P : 0.010
Episode: 973/1000 Total reward: 200.0 iter : 88609 P : 0.010
Episode: 974/1000 Total reward: 200.0 iter : 88809 P : 0.010
Episode: 975/1000 Total reward: 200.0 iter : 89009 P : 0.010
Episode: 976/1000 Total reward: 200.0 iter : 89209 P : 0.010
Episode: 977/1000 Total reward: 200.0 iter : 89409 P : 0.010
Episode: 978/1000 Total reward: 200.0 iter : 89609 P : 0.010
Episode: 979/1000 Total reward: 200.0 iter : 89809 P : 0.010
Episode: 980/1000 Total reward: 200.0 iter : 90009 P : 0.010
Episode: 981/1000 Total reward: 200.0 iter : 90209 P : 0.010
Episode: 982/1000 Total reward: 51.0 iter : 90260 P : 0.010
Episode: 983/1000 Total reward: 8.0 iter : 90268 P : 0.010
Episode: 984/1000 Total reward: 9.0 iter : 90277 P : 0.010
Episode: 985/1000 Total reward: 9.0 iter : 90286 P : 0.010
Episode: 986/1000 Total reward: 200.0 iter : 90486 P : 0.010
Episode: 987/1000 Total reward: 200.0 iter : 90686 P : 0.010
Episode: 988/1000 Total reward: 200.0 iter : 90886 P : 0.010
Episode: 989/1000 Total reward: 200.0 iter : 91086 P : 0.010
Episode: 990/1000 Total reward: 200.0 iter : 91286 P : 0.010
Episode: 991/1000 Total reward: 200.0 iter : 91486 P : 0.010
Episode: 992/1000 Total reward: 200.0 iter : 91686 P : 0.010
Episode: 993/1000 Total reward: 200.0 iter : 91886 P : 0.010
Episode: 994/1000 Total reward: 200.0 iter : 92086 P : 0.010
Episode: 995/1000 Total reward: 200.0 iter : 92286 P : 0.010
Episode: 996/1000 Total reward: 200.0 iter : 92486 P : 0.010
Episode: 997/1000 Total reward: 200.0 iter : 92686 P : 0.010
Episode: 998/1000 Total reward: 200.0 iter : 92886 P : 0.010
Episode: 999/1000 Total reward: 11.0 iter : 92897 P : 0.010

Training Result

With and "high" Learning Rate (0.01), let's plot the reward provided by the environment and the reshaped reward done. To ease the reading, results are passed thought a moving average function.

In [11]:
def moving_average(data_set, periods=3):
    weights = np.ones(periods) / periods
    return np.convolve(data_set, weights, mode='valid')
In [17]:
with open( "DQN_with_memory_simple_reward.p", "rb" ) as f:
    stat_1 = pickle.load(f)
with open( "DQN_with_memory_dense_reward.p", "rb" ) as f:
    stat_2 = pickle.load(f)
with open( "DQN_without_memory_simple_reward.p", "rb" ) as f:
    stat_3 = pickle.load(f)
with open( "DQN_without_memory_dense_reward.p", "rb" ) as f:
    stat_4 = pickle.load(f)
In [20]:
fig, axes = plt.subplots(4, 1, figsize=(20,20))
data = [stat_1, stat_2, stat_3, stat_4]
titles = ["With Memory and Normal Reward",
          "With Memory and Reshaped Reward",
          "Without Memory and Normal Reward",
          "Without Memory and Reshaped Reward"]
for stat, axe, title in zip(data, axes, titles):
    simple_reward = list(zip(*stat.buffer))[1]
    dense_reward = list(zip(*stat.buffer))[2]
    axe.plot(moving_average(simple_reward, periods=5))
    axe.plot(moving_average(dense_reward, periods=5))
    axe.set_title(title)
plt.show()

We can see several drops. They can be explained by 3 points :

  • The learning rate is too high and some bad actions taken in the past get rewarded for any reason so the policy got too impacted
  • The replay memory/batch contains only states similar (because we succeed on several trials) and the model is highly overfitting those examples
  • The exploration provides wrong decision multiple times leading to a lose

Below, you can see the result of the same training with a LR a lot smaller (0.0002)

In [24]:
with open( "DQN_with_memory_simple_reward_reduced_lr.p", "rb" ) as f:
    stat_1 = pickle.load(f)
with open( "DQN_with_memory_dense_reward_reduced_lr.p", "rb" ) as f:
    stat_2 = pickle.load(f)
with open( "DQN_without_memory_simple_reward_reduced_lr.p", "rb" ) as f:
    stat_3 = pickle.load(f)
with open( "DQN_without_memory_dense_reward_reduced_lr.p", "rb" ) as f:
    stat_4 = pickle.load(f)
In [25]:
fig, axes = plt.subplots(4, 1, figsize=(20,20))
data = [stat_1, stat_2, stat_3, stat_4]
titles = ["With Memory and Normal Reward",
          "With Memory and Reshaped Reward",
          "Without Memory and Normal Reward",
          "Without Memory and Reshaped Reward"]
for stat, axe, title in zip(data, axes, titles):
    simple_reward = list(zip(*stat.buffer))[1]
    dense_reward = list(zip(*stat.buffer))[2]
    axe.plot(moving_average(simple_reward, periods=5))
    axe.plot(moving_average(dense_reward, periods=5))
    axe.set_title(title)
plt.show()