[ml-agent] Build-in reinforcement learning
Unity3D/ml-agent 2019. 6. 5. 09:30안드로이드 플랫폼 빌드후 기기에 설치 하고 강화 학습을 시도해보았다.
여러경로를 통해 현재는 불가능 하다는것을 알았다.
PC, Mac & Linux Standalone platform으로 빌드를 했다.
다음과 같이 실행파일이 생성되었다.
파일 경로는 다음과 같다.
C:\Users\smilejsu\Desktop\test-mlagent.exe
Anaconda Prompt 를 사용하여 다음 명령어를 통해 훈련을 시도 해보았다.
(base) C:\Users\smilejsu>activate ml-agents
(ml-agents) C:\Users\smilejsu>d:
(ml-agents) D:\>cd D:\workspace\unity\Test\UnitySDK
(ml-agents) D:\workspace\unity\Test\UnitySDK>mlagents-learn trainer_config.yaml --env=C:\Users\smilejsu\Desktop\test-mlagent.exe --run-id=train --train
▄▄▄▓▓▓▓
╓▓▓▓▓▓▓█▓▓▓▓▓
,▄▄▄m▀▀▀' ,▓▓▓▀▓▓▄ ▓▓▓ ▓▓▌
▄▓▓▓▀' ▄▓▓▀ ▓▓▓ ▄▄ ▄▄ ,▄▄ ▄▄▄▄ ,▄▄ ▄▓▓▌▄ ▄▄▄ ,▄▄
▄▓▓▓▀ ▄▓▓▀ ▐▓▓▌ ▓▓▌ ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌ ╒▓▓▌
▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓ ▓▀ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▄ ▓▓▌
▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄ ▓▓ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▐▓▓
^█▓▓▓ ▀▓▓▄ ▐▓▓▌ ▓▓▓▓▄▓▓▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▓▄ ▓▓▓▓`
'▀▓▓▓▄ ^▓▓▓ ▓▓▓ └▀▀▀▀ ▀▀ ^▀▀ `▀▀ `▀▀ '▀▀ ▐▓▓▌
▀▀▀▀▓▄▄▄ ▓▓▓▓▓▓, ▓▓▓▓▀
`▀█▓▓▓▓▓▓▓▓▓▌
¬`▀▀▀█▓
INFO:mlagents.trainers:{'--base-port': '5005',
'--curriculum': 'None',
'--debug': False,
'--docker-target-name': 'None',
'--env': 'C:\\Users\\smilejsu\\Desktop\\test-mlagent.exe',
'--help': False,
'--keep-checkpoints': '5',
'--lesson': '0',
'--load': False,
'--no-graphics': False,
'--num-envs': '1',
'--num-runs': '1',
'--run-id': 'train',
'--save-freq': '50000',
'--seed': '-1',
'--slow': False,
'--train': True,
'<trainer-config-path>': 'trainer_config.yaml'}
c:\users\smilejsu\appdata\local\conda\conda\envs\ml-agents\lib\site-packages\mlagents\trainers\learn.py:141: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
trainer_config = yaml.load(data_file)
INFO:mlagents.envs:
'TestAcademy' started successfully!
Unity Academy name: TestAcademy
Number of Brains: 1
Number of Training Brains : 1
Reset Parameters :
Unity brain name: RollerBallBrain
Number of Visual Observations (per agent): 0
Vector Observation space size (per agent): 6
Number of stacked Vector Observation: 1
Vector Action space type: continuous
Vector Action space size (per agent): [2]
Vector Action descriptions: ,
2019-06-05 09:23:09.796920: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
INFO:mlagents.envs:Hyperparameters for the PPO Trainer of brain RollerBallBrain:
batch_size: 10
beta: 0.005
buffer_size: 100
epsilon: 0.2
gamma: 0.99
hidden_units: 128
lambd: 0.95
learning_rate: 0.0003
max_steps: 5.0e4
normalize: False
num_epoch: 3
num_layers: 2
time_horizon: 64
sequence_length: 64
summary_freq: 1000
use_recurrent: False
summary_path: ./summaries/train-0_RollerBallBrain
memory_size: 256
use_curiosity: False
curiosity_strength: 0.01
curiosity_enc_size: 128
model_path: ./models/train-0/RollerBallBrain
c:\users\smilejsu\appdata\local\conda\conda\envs\ml-agents\lib\site-packages\numpy\core\fromnumeric.py:2957: RuntimeWarning: Mean of empty slice.
out=out, **kwargs)
c:\users\smilejsu\appdata\local\conda\conda\envs\ml-agents\lib\site-packages\numpy\core\_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
INFO:mlagents.trainers: train-0: RollerBallBrain: Step: 1000. Time Elapsed: 8.189 s Mean Reward: -0.976. Std of Reward: 0.623. Training.
INFO:mlagents.trainers: train-0: RollerBallBrain: Step: 2000. Time Elapsed: 15.860 s Mean Reward: -1.394. Std of Reward: 1.200. Training.
INFO:mlagents.trainers: train-0: RollerBallBrain: Step: 3000. Time Elapsed: 23.409 s Mean Reward: -0.484. Std of Reward: 1.027. Training.
INFO:mlagents.trainers: train-0: RollerBallBrain: Step: 4000. Time Elapsed: 30.629 s Mean Reward: -0.551. Std of Reward: 0.405. Training.
INFO:mlagents.trainers: train-0: RollerBallBrain: Step: 5000. Time Elapsed: 37.751 s Mean Reward: -0.378. Std of Reward: 0.480. Training.
다음과 같이 실행파일 (test-mlagent.exe)이 자동으로 실행 되며 훈련이 잘 진행 되었다.
참고:
https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Executable.md
https://github.com/Unity-Technologies/ml-agents/issues/2099
학습이 끝나면 .nn파일도 만들어 졌으며 동적으로 훈련된 모델을 Brain에 넣어 실행 시키는것도 가능했다.
INFO:mlagents.envs:Saved Model
INFO:mlagents.trainers:List of nodes to export for brain :RollerBallBrain
INFO:mlagents.trainers: is_continuous_control
INFO:mlagents.trainers: version_number
INFO:mlagents.trainers: memory_size
INFO:mlagents.trainers: action_output_shape
INFO:mlagents.trainers: action
INFO:mlagents.trainers: action_probs
INFO:mlagents.trainers: value_estimate
INFO:tensorflow:Restoring parameters from ./models/train-0/RollerBallBrain\model-15901.cptk
INFO:tensorflow:Froze 17 variables.
Converted 17 variables to const ops.
Converting ./models/train-0/RollerBallBrain/frozen_graph_def.pb to ./models/train-0/RollerBallBrain.nn
IGNORED: StopGradient unknown layer
GLOBALS: 'is_continuous_control', 'version_number', 'memory_size', 'action_output_shape'
IN: 'vector_observation': [-1, 1, 1, 6] => 'main_graph_0/hidden_0/BiasAdd'
IN: 'vector_observation': [-1, 1, 1, 6] => 'main_graph_1/hidden_0/BiasAdd'
IN: 'epsilon': [-1, 1, 1, 2] => 'mul'
OUT: 'action', 'action_probs', 'value_estimate'
DONE: wrote ./models/train-0/RollerBallBrain.nn file.
INFO:mlagents.trainers:Exported ./models/train-0/RollerBallBrain.nn file
(ml-agents) D:\workspace\unity\Test\UnitySDK>
Android platform 빌드후 기기(안드로이드)에 apk를 설치 하고 앱을 실행시켜 강화 학습을 할수 있는지 궁금하다
'Unity3D > ml-agent' 카테고리의 다른 글
Unity ML-Agents Release 3 자습서 (2) (0) | 2020.06.30 |
---|---|
[ml-agent] Imitation Learning (0) | 2019.06.05 |
How do I train brain after building mobile with ml-agent? (0) | 2019.06.02 |
Running "mlagents-learn" from Unity Process (0) | 2019.06.01 |
[ml-agent] Learning-Environment-Create-New (0) | 2019.05.30 |