在 Google Research Football 上利用 PPO 训练 AI （Win10）-慈云数据

1、前言

在笔者进行环境配置的过程中遇到了一系列的问题，其中最核心的问题在于显卡的兼容性问题，由于 Google Research Football 推荐使用的是Tensorflow 1.15版本，官方版是在 CUDA 10.0上进行编译的，笔者使用的 RTX3060 在运行训练代码时会报 “failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED” 错误，检查显存时发现显存占用并不多，究其原因是30系显卡对 CUDA 10.0 的兼容性不好，因此本文将在 1650 Ti 显卡和 RTX3060 分别进行测试。10系和20系显卡可以参考1650 Ti 显卡的安装方法，30系和40系显卡可以参考3060显卡的安装方法。

2、1650Ti 显卡上安装强化学习环境

2.1 安装依赖

CUDA 10.0
CUDNN 7.6.4
python 3.7

tensorflow 1.15

2.2 安装步骤

使用如下指令安装dm-sonnet和tensorflow，dm-sonnet最好安装2.*版本的，不然会与football的很多依赖项有冲突。

python -m pip install dm-sonnet==2.* psutil -i https://pypi.tuna.tsinghua.edu.cn/simple
python -m pip install tensorflow-gpu==1.15 -i https://pypi.tuna.tsinghua.edu.cn/simple

使用如下指令安装OpenAI Baselines：

git clone https://github.com/openai/baselines.git
cd baselines
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple

使用如下指令运行训练代码，如果成功的话在终端中会出现下图所示内容：

python -m gfootball.examples.run_ppo2 --level=academy_empty_goal_close

在这里插入图片描述

3、3060 显卡上安装强化学习环境

在 3060 显卡上安装强化学习环境可以安装CUDA 11.7，CUDNN 8.4，这样兼容性比较好。

3.1 安装依赖

CUDA 11.7
CUDNN 8.4
python 3.7

tensorflow 2.6

3.2 安装步骤

使用如下指令安装dm-sonnet和tensorflow，dm-sonnet最好安装2.*版本的，不然会与football的很多依赖项有冲突。

python -m pip install dm-sonnet==2.* psutil -i https://pypi.tuna.tsinghua.edu.cn/simple
python -m pip install tensorflow-gpu==2.6 -i https://pypi.tuna.tsinghua.edu.cn/simple

使用如下指令安装OpenAI Baselines：

git clone https://github.com/openai/baselines.git
cd baselines
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple

前两步跟 1650Ti 上安装的步骤区别不大，但是对于 3060 显卡来说，安装现在才开始。如果觉得下面过程太复杂了，可以下载笔者修改好的baselines代码。首先可能会遇到 “ImportError: cannot import name ‘dtensor’ from ‘tensorflow.compat.v2.experimental’” 错误。

在这里插入图片描述

主要原因是tensorflow和Keras版本不匹配，Keras版本太高，应该与Tensorflow版本相同。

pip install keras==2.6.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

解决上面问题后，会出现“AttributeError: module ‘tensorflow’ has no attribute ‘set_random_seed’”错误：

在这里插入图片描述

这个错误以及后续所有类似缺少属性的错误都是因为这些属性是tensorflow 1.x版本中的变量，在tensorflow 2.x版本中都移除了，只能使用兼容模式，找到出错的文件，在文件中找到“import tensorflow as tf”语句，用“import tensorflow.compat.v1 as tf”代替“import tensorflow as tf”，这样可以解决兼容性问题，有很多文件都存在这个问题，所有类似的错误都可以用这个方法解决。

在这里插入图片描述

接下来还有一类问题，如下图所示，“AttributeError: ‘int’ object has no attribute ‘value’”：

在这里插入图片描述

这个错误也是因为tensoflow版本的问题，不过这个处理方法也比较简单，只需要在对应的语句中把".value"几个字符删除即可，所有此类问题都可以用该方法解决。

将这些问题都解决之后，就可以成功运行了。

4、使用训练的权重进行比赛

Google Research Football 支持两个模型进行对战，可以使用以下指令：

python -m gfootball.play_game --players "ppo2_cnn:left_players=1,checkpoint=weights/01600;ppo2_cnn:right_players=1,checkpoint=weights/01900"

5、常见错误

在运行训练代码时可能会报：your generated code is out of date and must be regenerated with protoc >= 3.19.0

解决方法：改错误是由于 protobuf 版本太高导致的，需要降级到3.20以下

pip install protobuf==3.20.* -i https://pypi.tuna.tsinghua.edu.cn/simple

在运行训练代码时可能会报：TypeError: can’t pickle FlagValues 错误。

解决方法：该错误的原因目前还不清楚，可能是依赖库的版本兼容问题，好在问题可以通过修改代码来解决。在anaconda的football环境中找到以下路径中的run_ppo2.py文件：football\lib\site-packages\gfootball\examples\run_ppo2.py 并将run_ppo2.py中的代码用如下代码替换：

"""Runs football_env on OpenAI's ppo2."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import multiprocessing
import os
from absl import app
from absl import flags
from baselines import logger
from baselines.bench import monitor
from baselines.common.vec_env.subproc_vec_env import SubprocVecEnv
from baselines.ppo2 import ppo2
import gfootball.env as football_env
from gfootball.examples import models  
FLAGS = flags.FLAGS
flags.DEFINE_string('level', 'academy_empty_goal_close',
                    'Defines type of problem being solved')
flags.DEFINE_enum('state', 'extracted_stacked', ['extracted',
  'extracted_stacked'],
                  'Observation to be used for training.')
flags.DEFINE_enum('reward_experiment', 'scoring',
                  ['scoring', 'scoring,checkpoints'],
                  'Reward to be used for training.')
flags.DEFINE_enum('policy', 'cnn', ['cnn', 'lstm', 'mlp', 'impala_cnn',
                                    'gfootball_impala_cnn'],
                  'Policy architecture')
flags.DEFINE_integer('num_timesteps', int(2e6),
                     'Number of timesteps to run for.')
flags.DEFINE_integer('num_envs', 1,
                     'Number of environments to run in parallel.')
flags.DEFINE_integer('nsteps', 128, 'Number of environment steps per epoch; '
                     'batch size is nsteps * nenv')
flags.DEFINE_integer('noptepochs', 4, 'Number of updates per epoch.')
flags.DEFINE_integer('nminibatches', 8,
                     'Number of minibatches to split one epoch to.')
flags.DEFINE_integer('save_interval', 100,
                     'How frequently checkpoints are saved.')
flags.DEFINE_integer('seed', 0, 'Random seed.')
flags.DEFINE_float('lr', 0.00008, 'Learning rate')
flags.DEFINE_float('ent_coef', 0.01, 'Entropy coeficient')
flags.DEFINE_float('gamma', 0.993, 'Discount factor')
flags.DEFINE_float('cliprange', 0.27, 'Clip range')
flags.DEFINE_float('max_grad_norm', 0.5, 'Max gradient norm (clipping)')
flags.DEFINE_bool('render', False, 'If True, environment rendering is enabled.')
flags.DEFINE_bool('dump_full_episodes', False,
                  'If True, trace is dumped after every episode.')
flags.DEFINE_bool('dump_scores', False,
                  'If True, sampled traces after scoring are dumped.')
flags.DEFINE_string('load_path', None, 'Path to load initial checkpoint from.')
def create_single_football_env(iprocess, level, state, reward_experiment, render,
  dump_full_episodes, dump_scores):
  env = football_env.create_environment(
  env_name=level,
  stacked=('stacked' in state),
  rewards=reward_experiment,
  logdir=logger.get_dir(),
  write_goal_dumps=dump_scores and (iprocess == 0),
  write_full_episode_dumps=dump_full_episodes and (iprocess == 0),
  render=render and (iprocess == 0),
  dump_frequency=50 if render and iprocess == 0 else 0)
  env = monitor.Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(iprocess)))
  return env
def train(level, state, reward_experiment, policy, num_timesteps, num_envs, nsteps, noptepochs,
  nminibatches, save_interval, seed, lr, ent_coef, gamma, cliprange, max_grad_norm,
  render, dump_full_episodes, dump_scores, load_path):
  vec_env = SubprocVecEnv([
  (lambda _i=i: create_single_football_env(_i, level, state, reward_experiment, render,
  dump_full_episodes, dump_scores))
  for i in range(num_envs)
  ], context=None)
  # Import tensorflow after we create environments. TF is not fork sake, and
  # we could be using TF as part of environment if one of the players is
  # controlled by an already trained model.
  import tensorflow.compat.v1 as tf
  ncpu = multiprocessing.cpu_count()
  config = tf.ConfigProto(allow_soft_placement=True,
                          intra_op_parallelism_threads=ncpu,
                          inter_op_parallelism_threads=ncpu)
  config.gpu_options.allow_growth = True
  tf.Session(config=config).__enter__()
  ppo2.learn(network=policy,
            total_timesteps=num_timesteps,
            env=vec_env,
            seed=seed,
            nsteps=nsteps,
            nminibatches=nminibatches,
            noptepochs=noptepochs,
            max_grad_norm=max_grad_norm,
            gamma=gamma,
            ent_coef=ent_coef,
            lr=lr,
            log_interval=1,
            save_interval=save_interval,
            cliprange=cliprange,
            load_path=load_path)
  
if __name__ == '__main__':
  # app.run(train)
  app.run(lambda _: train(
    FLAGS.level,
    FLAGS.state,
    FLAGS.reward_experiment,
    FLAGS.policy,
    FLAGS.num_timesteps,
    FLAGS.num_envs,
    FLAGS.nsteps,
    FLAGS.noptepochs,
    FLAGS.nminibatches,
    FLAGS.save_interval,
    FLAGS.seed,
    FLAGS.lr,
    FLAGS.ent_coef,
    FLAGS.gamma,
    FLAGS.cliprange,
    FLAGS.max_grad_norm,
    FLAGS.render,
    FLAGS.dump_full_episodes,
    FLAGS.dump_scores,
    FLAGS.load_path
    ))

在Windows系统上若在训练时打开了渲染的选项，可能会报“BrokenPipeError: [WinError 109] 管道已结束。”或者“OSError: [WinError 6] 句柄无效。”的错误，所以不建议在训练时开启渲染。

在 Google Research Football 上利用 PPO 训练 AI （Win10）

1、前言

2、1650Ti 显卡上安装强化学习环境

2.1 安装依赖

2.2 安装步骤

3、3060 显卡上安装强化学习环境

3.1 安装依赖

3.2 安装步骤

4、使用训练的权重进行比赛

5、常见错误

php redis分布式锁

linux内存缓存占用过高分析和优化

stm32编写Modbus步骤

如何保证数据库和缓存的一致性

Mongodb聚合操作中的$unset

私域引流宝PHP源码以及搭建教程

1、前言

2、1650Ti 显卡上安装强化学习环境

2.1 安装依赖

2.2 安装步骤

3、3060 显卡上安装强化学习环境

3.1 安装依赖

3.2 安装步骤

4、使用训练的权重进行比赛

5、常见错误

猜你喜欢

php redis分布式锁

linux内存缓存占用过高分析和优化

stm32编写Modbus步骤

如何保证数据库和缓存的一致性

Mongodb聚合操作中的$unset

私域引流宝PHP源码 以及搭建教程

私域引流宝PHP源码以及搭建教程