在Karpathy写的下面的代码中,为什么我们有这一行(为什么我们需要比较均匀分布来选择一个动作,而策略函数是这样做的)
# forward the policy network and sample an action from the returned probability
aprob, h = policy_forward(x)
action = 2 if np.random.uniform() < aprob else 3 # roll the dice!
而不仅仅是
# forward the policy network and sample an action from the returned probability
aprob, h = policy_forward(x)
action = 2 if 0.5 < aprob else 3 # roll the dice!
....Karpathy 的完整代码来自:https://gist.github.com/karpathy/a4166c7fe253700972fcbc77e4ea32c5
米脂
相关分类