我想为Chrome-No-Internet-Dino-Game创建一个AI。因此,我调整了这个Github-Repository以满足我的需求。我使用以下公式来计算新的Q:
资料来源:https://en.wikipedia.org/wiki/Q-learning
我现在的问题是,即使在大约2.000.000次迭代之后,我的游戏分数也没有增加。
你可以在这里找到游戏文件:https://pastebin.com/XrwQ0suJ
QLearning.py:
import pickle
import Game_headless
import Game
import numpy as np
from collections import defaultdict
rewardAlive = 1
rewardKill = -10000
alpha = 0.2 # Learningrate
gamma = 0.9 # Discount
Q = defaultdict(lambda: [0, 0, 0]) # 0 = Jump / 1 = Duck / 2 = Do Nothing
oldState = None
oldAction = None
gameCounter = 0
gameScores = []
def paramsToState(params):
cactus1X = round(params["cactus1X"] / 10) * 10
cactus2X = round(params["cactus2X"] / 10) * 10
cactus1Height = params["cactus1Height"]
cactus2Height = params["cactus2Height"]
pteraX = round(params["pteraX"] / 10) * 10
pteraY = params["pteraY"]
playerY = round(params["playerY"] / 10) * 10
gamespeed = params["gamespeed"]
return str(cactus1X) + "_" + str(cactus2X) + "_" + str(cactus1Height) + "_" + \
str(cactus2Height) + "_" + str(pteraX) + "_" + str(pteraY) + "_" + \
str(playerY) + "_" + str(gamespeed)
def shouldEmulateKeyPress(params): # 0 = Jump / 1 = Duck / 2 = Do Nothing
global oldState
global oldAction
state = paramsToState(params)
oldState = state
estReward = Q[state]
action = estReward.index(max(estReward))
if oldAction is None:
oldAction = action
return action
# Previous action was successful
# -> Update Q
prevReward = Q[oldState]
prevReward[oldAction] = (1 - alpha) * prevReward[oldAction] + \
alpha * (rewardAlive + gamma * max(estReward))
Q[oldState] = prevReward
oldAction = action
return action
在每一帧上,来自的函数调用 。然后,所述函数返回 0 表示 Jump,返回 1 表示 duck,返回 2 表示无。我尝试调整常量,但这没有显示出任何效果。如果您有任何疑问,请随时问我!提前感谢您!gameplay()Game_headless.pyshouldEmulateKeyPress()
白板的微信
慕虎7371278
相关分类