maskrcnn-benchmark源码分析(2)_RPN计算.md | 天一阁

maskrcnn-benchmark源码分析(2)_RPN计算.md

RPN整体逻辑

RPN网络是在faster-rcnn中提出来的,主要是为了替代Selective Search算法。在mask-rcnn中,RPN根据Backbone CNN计算得到的图像feature map,来负责找到2000个可能含有物体的bbox坐标(为了方便,主要以e2e_mask_rcnR_50_FPN_1x.yaml的训练过程为主进行记录和说明)

maskrcnn-benchmark代码中,用RPNModule模块,封装整个RPN的计算过程。RPNModule的几个重要对象为

  • head (RPNHead): rpn的cnn网络,对feature map每个点计算对应的bbox和cls_logits
  • anchor_generator(AnchorGenerator): 根据原图大小和feature map的大小,算出所有anchor在原图上的坐标。
  • box_selector_train(RPNPostProcessor): 根据head计算的分值,选取最高2000个box,送入后续的roi_head的网络中。这部分功能对RPN网络本身的训练没有帮助。如果只是训练RPN网络,这步可以省略。
  • loss_evaluator(RPNLossComputation): 用target的box,在anchor中按一定比例选择正负样本,然后根据前面head计算的结果,计算RPN网络的Loss
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class RPNModule(torch.nn.Module):
def forward(self, images, features, targets=None):
# features是resnet+fpn生成的,
# head的就是用3*3和1*1的conv把features每个点的256d的特征向量,转换成num_anchors个锚点的判定结果,包括cls和bbox坐标
objectness, rpn_box_regression = self.head(features)
# 根据图像计算出每个框框的坐标, 每张图5个层级,每个层级每个点都有3个框
anchors = self.anchor_generator(images, features)
if self.training:
# 计算RPN网络的loss
return self._forward_train(anchors, objectness, rpn_box_regression, targets)
else:
return self._forward_test(anchors, objectness, rpn_box_regression)
def _forward_train(self, anchors, objectness, rpn_box_regression, targets):
if self.cfg.MODEL.RPN_ONLY:
boxes = anchors
else:
with torch.no_grad():
# 根据objectness分值,选出得分最高的FPN_POST_NMS_TOP_N_TRAIN(2000)个bbox
boxes = self.box_selector_train(
anchors, objectness, rpn_box_regression, targets
)
# 计算分类和bbox的loss
loss_objectness, loss_rpn_box_reg = self.loss_evaluator(
anchors, objectness, rpn_box_regression, targets
)
losses = {
"loss_objectness": loss_objectness,
"loss_rpn_box_reg": loss_rpn_box_reg,
}
return boxes, losses

RPN具体计算过程

1,RPNHead.forward

rpn_head通过一个简单的cnn网络计算objectness,rpn_box_regression。这是一个普通的CNN,比较简单

2, AnchorGenerator.forward

计算边框坐标,这个是固定的,只要知道图片宽高,featureMap的宽高就可以做了,会得到20w+个anchor的坐标

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class AnchorGenerator(nn.Module):
def forward(self, image_list, feature_maps):
# grid_sizes得到每个feature_map的宽和高
grid_sizes = [feature_map.shape[-2:] for feature_map in feature_maps]
# 计算出所有的边框在原图上的坐标
# grid_anchors函数流程:
# for 按层级循环
# 根据feature map的高宽grid_height, grid_width生成定位坐标
# 利用torch.meshgrid, torch.stack 生成所有点的bbox坐标矩阵
# 再把初始化计算的出anchor坐标加上上面的坐标矩阵
# 返回 所有层的anchors(也就是默认的bbox坐标)
anchors_over_all_feature_maps = self.grid_anchors(grid_sizes)
anchors = []
for i, (image_height, image_width) in enumerate(image_list.image_sizes):
anchors_in_image = []
for anchors_per_feature_map in anchors_over_all_feature_maps:
#BoxList封装每层的bbox坐标
boxlist = BoxList(
anchors_per_feature_map, (image_width, image_height), mode="xyxy"
)
self.add_visibility_to(boxlist)
anchors_in_image.append(boxlist)
anchors.append(anchors_in_image)
return anchors

3,RPNPostProcessor.forward

RPNPostProcessor.forward是从前面计算的anchor中,找到rpn_head得分最高的box,在训练时,是为了给后面roi_head准备的,对于RPN自身用处不大

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class RPNPostProcessor(torch.nn.Module):
def forward(self, anchors, objectness, box_regression, targets=None):
sampled_boxes = []
num_levels = len(objectness)
anchors = list(zip(*anchors))
# 按featues的层级选择得分最高的bbox , 这个features是FPN计算的5层不同尺度的特征向量
for a, o, b in zip(anchors, objectness, box_regression):
# forward_for_single_feature_map函数流程
# 1,从objectness中挑选得分最高的2000个
# 2,把得分最高的2000个anchor,根据box_regression,计算回归后的坐标
# 3,通过nms,在2000个bbox上去重,得到此层级采样出来的boxes
sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
# 把按层采样出来的box,改为按原图片(batch)分组
boxlists = list(zip(*sampled_boxes))
# 把每个图片的box合并成一个BoxList
boxlists = [cat_boxlist(boxlist) for boxlist in boxlists]
if num_levels > 1:
# 选出最后的FPN_POST_NMS_TOP_N_TRAIN 2000个box
boxlists = self.select_over_all_levels(boxlists)
# 如果是训练阶段,把ground truth 加入proposal,为了更好的训练roi_head
if self.training and targets is not None:
boxlists = self.add_gt_proposals(boxlists, targets)
return boxlists

4,RPNLossComputation

rpn的loss计算是对rpn_head的结果计算loss,这样通过训练,可以找到更好的proposal。这里计算loss的问题是,他不像分类那样target label都是很直接的。每张图片上也就几个框框作为target,而rpn_head能够计算出20多万个anchor,每个anchor都会产生出cls分值和bbox的回归系数。想到好的方法,优化这样的nn还是挺厉害的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
class RPNLossComputation(object):
def __call__(self, anchors, objectness, box_regression, targets):
# 把每张图片所有尺度上的anchor合并。每张图的所有anchor合并成一个BoxList,这时候anchor和target的结构就一致了
anchors = [cat_boxlist(anchors_per_image) for anchors_per_image in anchors]
# 根据target和anchor的IoU,计算每个anchor的标签,取值有1, 0, -1,1是物体,0是背景,-1是iou忽略或者超出边框
# IoU>0.7 为正样本,<0.3 为负样本
labels, regression_targets = self.prepare_targets(anchors, targets)
# 根据label情况,随机采样出正样本,负样本,各占一半,共256个
sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels)
sampled_pos_inds = torch.nonzero(torch.cat(sampled_pos_inds, dim=0)).squeeze(1)
sampled_neg_inds = torch.nonzero(torch.cat(sampled_neg_inds, dim=0)).squeeze(1)
# 最终计算loss的采样idx
sampled_inds = torch.cat([sampled_pos_inds, sampled_neg_inds], dim=0)
objectness_flattened = []
box_regression_flattened = []
# 把分类得分和回归系数全部展平,方便后面loss计算
for objectness_per_level, box_regression_per_level in zip(
objectness, box_regression
):
N, A, H, W = objectness_per_level.shape
objectness_per_level = objectness_per_level.permute(0, 2, 3, 1).reshape(
N, -1
)
box_regression_per_level = box_regression_per_level.view(N, -1, 4, H, W)
box_regression_per_level = box_regression_per_level.permute(0, 3, 4, 1, 2)
box_regression_per_level = box_regression_per_level.reshape(N, -1, 4)
objectness_flattened.append(objectness_per_level)
box_regression_flattened.append(box_regression_per_level)
# 合并分类得分和回归系数
objectness = cat(objectness_flattened, dim=1).reshape(-1)
box_regression = cat(box_regression_flattened, dim=1).reshape(-1, 4)
# 把target也展平
labels = torch.cat(labels, dim=0)
regression_targets = torch.cat(regression_targets, dim=0)
box_loss = smooth_l1_loss(
box_regression[sampled_pos_inds],
regression_targets[sampled_pos_inds],
beta=1.0 / 9,
size_average=False,
) / (sampled_inds.numel())
objectness_loss = F.binary_cross_entropy_with_logits(
objectness[sampled_inds], labels[sampled_inds]
)
return objectness_loss, box_loss