实验6:目标检测算法
一:实验目的与要求
1:了解两阶段目标检测模型 RCNN或Faster RCNN模型的原理和结构。
2:学习通过RCNN或Faster RCNN模型解决目标检测问题。
二:实验内容
常用的深度学习框架包括PyTorch和PaddlePaddle等,请选择一种深度学习框架,完成后续实验。
2.1 RCNN模型简介
区域卷积神经网络(RCNN)系列模型为两阶段目标检测器,包含对图像生成候选区域,提取特征,判别特征类别并修正候选框位置等几个步骤。 RCNN系列目前包含两个代表模型:Faster RCNN和Mask RCNN。
Faster RCNN 整体网络可以分为4个主要内容:
1、基础卷积层。作为一种卷积神经网络目标检测方法,Faster RCNN首先使用一组基础的卷积网络提取图像的特征图。特征图被后续 区域生成网络RPN层和全连接层共享。本示例采用ResNet-50作为基础卷积层。
2、区域生成网络(RPN)。RPN网络用于生成候选区域(proposals)。该层通过一组固定的尺寸和比例得到一组锚点(anchors), 通过softmax判断锚点属于前景或者背景,再利用区域回归修正锚点从而获得精确的候选区域。
3、RoI Align。该层收集输入的特征图和候选区域,将候选区域映射到特征图中并池化为统一大小的区域特征图,送入全连接层判定目标类别, 该层可选用RoIPool和RoIAlign两种方式,在config.py中设置roi_func。
4、检测层。利用区域特征图计算候选区域的类别,同时再次通过区域回归获得检测框最终的精确位置。
2.2 数据集
实验提供了昆虫数据集和螺丝螺母检测数据集供同学们在实验中使用,同学们也可以结合自身研究,使用其它真实数据集进行实验。
三:实验资源
1.Faster RCNN模型实验代码。
2. 参考paddle版本代码:
Faster RCNN目标检测 - 飞桨AI Studio星河社区
3:螺丝螺母数据集和昆虫数据集。
四:实验要求
1、阅读目标检测算法的相关论文,理解两阶段目标检测的主要思想。
2、结合目标检测算法的相关论文https://arxiv.org/abs/1506.01497,补充完成参考代码(或自选框架实现),实现目标检测算法(Faster RCNN)。
3、通过昆虫数据集、螺丝螺母检测数据集或自选数据集验证模型效果,并进行对比分析。
4、根据上述要求,撰写实验报告。
五:实验环境
本实验所使用的环境条件如下表所示。
Ubuntu(Linux) | |
程序语言 | Python(3.11.4) |
第三方依赖 | numpy, matplotlib,pytorch,openmmlab等 |
六:算法流程
两阶段(two-stage)目标检测模型通常由两个阶段组成:区域提议(Region Proposal)和目标分类与定位(Object Classification and Localization)。
【1】区域提议
(1)步骤:
- 输入图像:将待检测的图像输入模型。
- 特征提取:使用预训练的卷积神经网络(CNN)对输入图像进行特征提取,例如VGG、ResNet等。
- 生成候选框:在提取的特征图上使用滑动窗口或者其他方法生成候选目标框,这些框是可能包含目标的区域。
(2)输出:生成的候选目标框
【2】目标分类与定位
(1)步骤:
- 候选框裁剪:对于每个候选目标框,从原始图像中提取相应的区域。
- 特征提取与调整:将提取的候选框区域输入到CNN中,进行特征提取,并进行必要的调整以适应分类和定位的任务。
- 目标分类:使用softmax分类器对提取的特征进行目标分类,确定候选框内是否包含目标物体。
- 边界框回归:对于被分类为目标的候选框,进一步使用回归器来调整其位置,以更准确地定位目标的边界框。
(2)输出:每个目标框的类别标签和边界框位置调整参数。
七:实验展示
本实验所用的数据集来源于URP《基于机器视觉的智慧选种研究》,数据集包括广泛种植在中国的6类大米的显微镜拍摄图像,共计约有30000张大米图像。
【1】Faster RCNN
训练过程中,各类参数的设置如下:batch_size为16,backbone为resnet,neck为fpn,optimizer为SGD,学习速率为0.02,动量为0.9,权重衰减为0.0001,迭代次数为12,计算目标定位的损失函数为平滑L1损失,计算目标分类的损失函数为交叉熵损失。
12次迭代的训练时间约为6小时,训练过程中的验证集mAP结果如下所示。
(1)平均精度(mAP),无检测框的置信度限制
(2)平均精度(mAP),限制检测框的置信度大于等于0.5
(3)平均精度(mAP),限制检测框的置信度大于等于0.75
(4)平均精度(mAP),目标的尺寸范围为大尺寸(large)
图(1)到图(4)的mAP数据如下图所示。
采用OOD数据集的测试结果如下图所示。
放大结果如下图所示。
【2】TridentNet
训练过程中,各类参数的设置类似Faster RCNN。
训练过程中的验证集mAP结果如下所示。
(1)平均精度(mAP),无检测框的置信度限制
(2)平均精度(mAP),限制检测框的置信度大于等于0.5
(3)平均精度(mAP),限制检测框的置信度大于等于0.75
(4)平均精度(mAP),目标的尺寸范围为大尺寸(large)
图(1)到图(4)的mAP数据如下图所示。
采用OOD数据集的测试结果如下图所示。
放大结果如下图所示。
八:实验结论与心得
1:两阶段目标检测模型主要具备以下几个特点:
(1)区域提议阶段:通过在图像上生成候选目标框,提高了效率,减少了需要处理的区域数量。
(2)目标分类与定位阶段:对提取的候选框进行分类和位置调整,以确定目标物体的类别并准确地定位其边界框。
(3)端到端训练:整个模型通常采用端到端的方式进行训练,通过联合优化区域提议和目标分类与定位的任务来学习模型参数。
2:在两阶段目标检测模型中,代表性的模型包括Faster R-CNN、R-CNN、Mask R-CNN等。
3:在目标检测中,评估目标检测模型的性能包括准确率、召回率、平均精确度(mAP)等指标。
4:TridentNet最初由微软亚洲研究院提出,是一种两阶段的目标检测模型,其设计旨在提高检测器对小目标和长尾类别的检测性能。
5:TridentNet模型主要具备以下几个特点:
(1)多尺度检测:引入了三个并行的检测分支,每个分支专注于检测不同尺度的目标,使得在检测小目标和大目标时都能保持高效性。
(2)语义上下文信息:引入了一种新的上下文引导模块(Context Guided Module),用于利用图像语义上下文信息来增强目标检测。
(3)级联检测:级联检测器由一系列级联的子检测器组成,每个子检测器在前一个阶段的基础上进一步筛选和优化目标框,从而提高最终的检测精度。
6:mAP 表示平均精度,它是在不同类别上的平均准确率(Precision)值。通常情况下,这个值是在所有目标尺寸范围内计算的平均精度。
7:在coco/bbox_mAP_s、coco/bbox_mAP_m、coco/bbox_mAP_l中,s、m、l 分别代表小尺寸(small)、中尺寸(medium)、大尺寸(large)目标。这些指标用于衡量模型在不同尺寸范围内目标的检测性能。
九:主要代码
绘图 |
# 打开文本文件 root = '/home/ubuntu/mmdetection-main/work_dirs/faster_rcnn_r50_fpn_1x_coco/20240421_074656/20240421_074656.log' with open(root, 'r') as file: lines = file.readlines() # 初始化存储数值的列表 map1 = [] # coco/bbox_mAP map2 = [] # coco/bbox_mAP_50 map3 = [] # coco/bbox_mAP_75 map4 = [] # coco/bbox_mAP_l # 遍历每一行文本,提取数值 for line in lines: if 'coco/bbox_mAP' in line: values = line.split('coco/bbox_mAP: ') # 使用split方法分割字符串,并获取数值部分 if len(values) > 1: value_str = values[1].strip().split()[0] map1.append(float(value_str)) # 将数值转换为浮点数并添加到列表中
values = line.split('coco/bbox_mAP_50: ') # 使用split方法分割字符串,并获取数值部分 if len(values) > 1: value_str = values[1].strip().split()[0] map2.append(float(value_str)) # 将数值转换为浮点数并添加到列表中
values = line.split('coco/bbox_mAP_75: ') # 使用split方法分割字符串,并获取数值部分 if len(values) > 1: value_str = values[1].strip().split()[0] map3.append(float(value_str)) # 将数值转换为浮点数并添加到列表中
values = line.split('coco/bbox_mAP_l: ') # 使用split方法分割字符串,并获取数值部分 if len(values) > 1: value_str = values[1].strip().split()[0] map4.append(float(value_str)) # 将数值转换为浮点数并添加到列表中 print(map1) print(map2) print(map3) print(map4) import matplotlib.pyplot as plt # 获取模型迭代次数(假设为每迭代一次记录一次) iterations = range(1, len(map1) + 1) # 绘制图像 plt.plot(iterations, map1, marker='o') plt.xlabel('Iterations') plt.ylabel('mAP') plt.title('coco/bbox_mAP on VALID over Iterations') plt.grid(True) plt.savefig('map1.png') plt.close() plt.plot(iterations, map2, marker='o') plt.xlabel('Iterations') plt.ylabel('mAP') plt.title('coco/bbox_mAP_50 on VALID over Iterations') plt.grid(True) plt.savefig('map2.png') plt.close() plt.plot(iterations, map3, marker='o') plt.xlabel('Iterations') plt.ylabel('mAP') plt.title('coco/bbox_mAP_75 on VALID over Iterations') plt.grid(True) plt.savefig('map3.png') plt.close() plt.plot(iterations, map4, marker='o') plt.xlabel('Iterations') plt.ylabel('mAP') plt.title('coco/bbox_mAP_l on VALID over Iterations') plt.grid(True) plt.savefig('map4.png') plt.close() """ map1 = [] # coco/bbox_mAP map2 = [] # coco/bbox_mAP_50 map3 = [] # coco/bbox_mAP_75 map4 = [] # coco/bbox_mAP_l """ |
Backbone:ResNet |
# Copyright (c) OpenMMLab. All rights reserved. import warnings import torch.nn as nn import torch.utils.checkpoint as cp from mmcv.cnn import build_conv_layer, build_norm_layer, build_plugin_layer from mmengine.model import BaseModule from torch.nn.modules.batchnorm import _BatchNorm from mmdet.registry import MODELS from ..layers import ResLayer class BasicBlock(BaseModule): expansion = 1 def __init__(self, inplanes, planes, stride=1, dilation=1, downsample=None, style='pytorch', with_cp=False, conv_cfg=None, norm_cfg=dict(type='BN'), dcn=None, plugins=None, init_cfg=None): super(BasicBlock, self).__init__(init_cfg) assert dcn is None, 'Not implemented yet.' assert plugins is None, 'Not implemented yet.' self.norm1_name, norm1 = build_norm_layer(norm_cfg, planes, postfix=1) self.norm2_name, norm2 = build_norm_layer(norm_cfg, planes, postfix=2) self.conv1 = build_conv_layer( conv_cfg, inplanes, planes, 3, stride=stride, padding=dilation, dilation=dilation, bias=False) self.add_module(self.norm1_name, norm1) self.conv2 = build_conv_layer( conv_cfg, planes, planes, 3, padding=1, bias=False) self.add_module(self.norm2_name, norm2) self.relu = nn.ReLU(inplace=True) self.downsample = downsample self.stride = stride self.dilation = dilation self.with_cp = with_cp @property def norm1(self): """nn.Module: normalization layer after the first convolution layer""" return getattr(self, self.norm1_name) @property def norm2(self): """nn.Module: normalization layer after the second convolution layer""" return getattr(self, self.norm2_name) def forward(self, x): """Forward function.""" def _inner_forward(x): identity = x out = self.conv1(x) out = self.norm1(out) out = self.relu(out) out = self.conv2(out) out = self.norm2(out) if self.downsample is not None: identity = self.downsample(x) out += identity return out if self.with_cp and x.requires_grad: out = cp.checkpoint(_inner_forward, x) else: out = _inner_forward(x) out = self.relu(out) return out class Bottleneck(BaseModule): expansion = 4 def __init__(self, inplanes, planes, stride=1, dilation=1, downsample=None, style='pytorch', with_cp=False, conv_cfg=None, norm_cfg=dict(type='BN'), dcn=None, plugins=None, init_cfg=None): """Bottleneck block for ResNet. If style is "pytorch", the stride-two layer is the 3x3 conv layer, if it is "caffe", the stride-two layer is the first 1x1 conv layer. """ super(Bottleneck, self).__init__(init_cfg) assert style in ['pytorch', 'caffe'] assert dcn is None or isinstance(dcn, dict) assert plugins is None or isinstance(plugins, list) if plugins is not None: allowed_position = ['after_conv1', 'after_conv2', 'after_conv3'] assert all(p['position'] in allowed_position for p in plugins) self.inplanes = inplanes self.planes = planes self.stride = stride self.dilation = dilation self.style = style self.with_cp = with_cp self.conv_cfg = conv_cfg self.norm_cfg = norm_cfg self.dcn = dcn self.with_dcn = dcn is not None self.plugins = plugins self.with_plugins = plugins is not None if self.with_plugins: # collect plugins for conv1/conv2/conv3 self.after_conv1_plugins = [ plugin['cfg'] for plugin in plugins if plugin['position'] == 'after_conv1' ] self.after_conv2_plugins = [ plugin['cfg'] for plugin in plugins if plugin['position'] == 'after_conv2' ] self.after_conv3_plugins = [ plugin['cfg'] for plugin in plugins if plugin['position'] == 'after_conv3' ] if self.style == 'pytorch': self.conv1_stride = 1 self.conv2_stride = stride else: self.conv1_stride = stride self.conv2_stride = 1 self.norm1_name, norm1 = build_norm_layer(norm_cfg, planes, postfix=1) self.norm2_name, norm2 = build_norm_layer(norm_cfg, planes, postfix=2) self.norm3_name, norm3 = build_norm_layer( norm_cfg, planes * self.expansion, postfix=3) self.conv1 = build_conv_layer( conv_cfg, inplanes, planes, kernel_size=1, stride=self.conv1_stride, bias=False) self.add_module(self.norm1_name, norm1) fallback_on_stride = False if self.with_dcn: fallback_on_stride = dcn.pop('fallback_on_stride', False) if not self.with_dcn or fallback_on_stride: self.conv2 = build_conv_layer( conv_cfg, planes, planes, kernel_size=3, stride=self.conv2_stride, padding=dilation, dilation=dilation, bias=False) else: assert self.conv_cfg is None, 'conv_cfg must be None for DCN' self.conv2 = build_conv_layer( dcn, planes, planes, kernel_size=3, stride=self.conv2_stride, padding=dilation, dilation=dilation, bias=False) self.add_module(self.norm2_name, norm2) self.conv3 = build_conv_layer( conv_cfg, planes, planes * self.expansion, kernel_size=1, bias=False) self.add_module(self.norm3_name, norm3) self.relu = nn.ReLU(inplace=True) self.downsample = downsample if self.with_plugins: self.after_conv1_plugin_names = self.make_block_plugins( planes, self.after_conv1_plugins) self.after_conv2_plugin_names = self.make_block_plugins( planes, self.after_conv2_plugins) self.after_conv3_plugin_names = self.make_block_plugins( planes * self.expansion, self.after_conv3_plugins) def make_block_plugins(self, in_channels, plugins): """make plugins for block. Args: in_channels (int): Input channels of plugin. plugins (list[dict]): List of plugins cfg to build. Returns: list[str]: List of the names of plugin. """ assert isinstance(plugins, list) plugin_names = [] for plugin in plugins: plugin = plugin.copy() name, layer = build_plugin_layer( plugin, in_channels=in_channels, postfix=plugin.pop('postfix', '')) assert not hasattr(self, name), f'duplicate plugin {name}' self.add_module(name, layer) plugin_names.append(name) return plugin_names def forward_plugin(self, x, plugin_names): out = x for name in plugin_names: out = getattr(self, name)(out) return out @property def norm1(self): """nn.Module: normalization layer after the first convolution layer""" return getattr(self, self.norm1_name) @property def norm2(self): """nn.Module: normalization layer after the second convolution layer""" return getattr(self, self.norm2_name) @property def norm3(self): """nn.Module: normalization layer after the third convolution layer""" return getattr(self, self.norm3_name) def forward(self, x): """Forward function.""" def _inner_forward(x): identity = x out = self.conv1(x) out = self.norm1(out) out = self.relu(out) if self.with_plugins: out = self.forward_plugin(out, self.after_conv1_plugin_names) out = self.conv2(out) out = self.norm2(out) out = self.relu(out) if self.with_plugins: out = self.forward_plugin(out, self.after_conv2_plugin_names) out = self.conv3(out) out = self.norm3(out) if self.with_plugins: out = self.forward_plugin(out, self.after_conv3_plugin_names) if self.downsample is not None: identity = self.downsample(x) out += identity return out if self.with_cp and x.requires_grad: out = cp.checkpoint(_inner_forward, x) else: out = _inner_forward(x) out = self.relu(out) return out @MODELS.register_module() class ResNet(BaseModule): """ResNet backbone. Args: depth (int): Depth of resnet, from {18, 34, 50, 101, 152}. stem_channels (int | None): Number of stem channels. If not specified, it will be the same as `base_channels`. Default: None. base_channels (int): Number of base channels of res layer. Default: 64. in_channels (int): Number of input image channels. Default: 3. num_stages (int): Resnet stages. Default: 4. strides (Sequence[int]): Strides of the first block of each stage. dilations (Sequence[int]): Dilation of each stage. out_indices (Sequence[int]): Output from which stages. style (str): `pytorch` or `caffe`. If set to "pytorch", the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. deep_stem (bool): Replace 7x7 conv in input stem with 3 3x3 conv avg_down (bool): Use AvgPool instead of stride conv when downsampling in the bottleneck. frozen_stages (int): Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. norm_cfg (dict): Dictionary to construct and config norm layer. norm_eval (bool): Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. plugins (list[dict]): List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - position (str, required): Position inside block to insert plugin, options are 'after_conv1', 'after_conv2', 'after_conv3'. - stages (tuple[bool], optional): Stages to apply plugin, length should be same as 'num_stages'. with_cp (bool): Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. zero_init_residual (bool): Whether to use zero init for last norm layer in resblocks to let them behave as identity. pretrained (str, optional): model pretrained path. Default: None init_cfg (dict or list[dict], optional): Initialization config dict. Default: None Example: >>> from mmdet.models import ResNet >>> import torch >>> self = ResNet(depth=18) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 64, 8, 8) (1, 128, 4, 4) (1, 256, 2, 2) (1, 512, 1, 1) """ arch_settings = { 18: (BasicBlock, (2, 2, 2, 2)), 34: (BasicBlock, (3, 4, 6, 3)), 50: (Bottleneck, (3, 4, 6, 3)), 101: (Bottleneck, (3, 4, 23, 3)), 152: (Bottleneck, (3, 8, 36, 3)) } def __init__(self, depth, in_channels=3, stem_channels=None, base_channels=64, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=-1, conv_cfg=None, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, dcn=None, stage_with_dcn=(False, False, False, False), plugins=None, with_cp=False, zero_init_residual=True, pretrained=None, init_cfg=None): super(ResNet, self).__init__(init_cfg) self.zero_init_residual = zero_init_residual if depth not in self.arch_settings: raise KeyError(f'invalid depth {depth} for resnet') block_init_cfg = None assert not (init_cfg and pretrained), \ 'init_cfg and pretrained cannot be specified at the same time' if isinstance(pretrained, str): warnings.warn('DeprecationWarning: pretrained is deprecated, ' 'please use "init_cfg" instead') self.init_cfg = dict(type='Pretrained', checkpoint=pretrained) elif pretrained is None: if init_cfg is None: self.init_cfg = [ dict(type='Kaiming', layer='Conv2d'), dict( type='Constant', val=1, layer=['_BatchNorm', 'GroupNorm']) ] block = self.arch_settings[depth][0] if self.zero_init_residual: if block is BasicBlock: block_init_cfg = dict( type='Constant', val=0, override=dict(name='norm2')) elif block is Bottleneck: block_init_cfg = dict( type='Constant', val=0, override=dict(name='norm3')) else: raise TypeError('pretrained must be a str or None') self.depth = depth if stem_channels is None: stem_channels = base_channels self.stem_channels = stem_channels self.base_channels = base_channels self.num_stages = num_stages assert num_stages >= 1 and num_stages |