猿问

unity ml 代理 python api 的奇怪结果

我正在使用 3DBall 示例环境,但我得到了一些非常奇怪的结果,我不明白它们为什么会发生。到目前为止,我的代码只是一个 for range 循环,用于查看奖励并用随机值填充所需的输入。然而,当我这样做时,从未显示出负面奖励,并且随机不会有决策步骤,这是有道理的,但它不应该继续模拟直到有决策步骤吗?任何帮助将不胜感激,因为除了文档之外,几乎没有任何资源。


env = UnityEnvironment()

env.reset()

behavior_names = env.behavior_specs


for i in range(50):

    arr = []

    behavior_names = env.behavior_specs

    for i in behavior_names:

        print(i)

    DecisionSteps = env.get_steps("3DBall?team=0")

    print(DecisionSteps[0].reward,len(DecisionSteps[0].reward))

    print(DecisionSteps[0].action_mask) #for some reason it returns action mask as false when Decisionsteps[0].reward is empty and is None when not



    for i in range(len(DecisionSteps[0])):

        arr.append([])

        for b in range(2):

            arr[-1].append(random.uniform(-10,10))

    if(len(DecisionSteps[0])!= 0):

        env.set_actions("3DBall?team=0",numpy.array(arr))

        env.step()

    else:

        env.step()

env.close()


森栏
浏览 94回答 1
1回答

白板的微信

我认为您的问题是,当模拟终止并需要重置时,代理不会返回 adecision_step而是返回terminal_step. 这是因为代理已经丢球了,terminal_step 中返回的奖励将为 -1.0。我已经获取了你的代码并做了一些更改,现在它运行良好(除了你可能想要更改,这样你就不会在每次代理之一掉球时重置)。import numpy as npimport mlagentsfrom mlagents_envs.environment import UnityEnvironment# -----------------# This code is used to close an env that might not have been closed beforetry:    unity_env.close()except:    pass# -----------------env = UnityEnvironment(file_name = None)env.reset()for i in range(1000):    arr = []    behavior_names = env.behavior_specs    # Go through all existing behaviors    for behavior_name in behavior_names:        decision_steps, terminal_steps = env.get_steps(behavior_name)        for agent_id_terminated in terminal_steps:            print("Agent " + behavior_name + " has terminated, resetting environment.")            # This is probably not the desired behaviour, as the other agents are still active.             env.reset()        actions = []        for agent_id_decisions in decision_steps:            actions.append(np.random.uniform(-1,1,2))        # print(decision_steps[0].reward)        # print(decision_steps[0].action_mask)        if len(actions) > 0:            env.set_actions(behavior_name, np.array(actions))    try:        env.step()    except:        print("Something happend when taking a step in the environment.")        print("The communicatior has probably terminated, stopping simulation early.")        breakenv.close()
随时随地看视频慕课网APP

相关分类

Python
我要回答