当前位置：首页 > news >正文

MMRotate训练遥感目标检测模型：从数据裁剪到模型测试的完整配置清单（附代码）

news 2026/6/10 11:12:38

MMRotate遥感目标检测实战：从数据预处理到模型调优的全流程指南

遥感图像中的目标检测一直是计算机视觉领域的重要研究方向。与传统水平框检测不同，旋转目标检测能够更精确地定位和识别任意方向排列的物体，在卫星图像分析、城市规划、农业监测等领域具有广泛应用价值。本文将基于MMRotate框架，详细介绍如何从原始遥感数据出发，构建完整的旋转目标检测流程。

1. 环境配置与数据准备

1.1 系统环境搭建

MMRotate作为OpenMMLab生态系统的一部分，对运行环境有特定要求。以下是推荐的配置方案：

# 创建并激活conda环境 conda create -n mmrotate python=3.8 -y conda activate mmrotate # 安装PyTorch与CUDA（根据显卡驱动选择版本） conda install pytorch==1.10.0 torchvision==0.11.0 cudatoolkit=11.3 -c pytorch # 安装MMCV和MMDetection pip install mmcv-full==1.4.7 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.10.0/index.html pip install mmdet==2.25.0 # 安装MMRotate git clone https://github.com/open-mmlab/mmrotate.git cd mmrotate pip install -r requirements/build.txt pip install -v -e .

提示：安装过程中常见问题包括CUDA版本不匹配、MMCV与PyTorch版本冲突等。建议严格按照官方文档的版本对应关系进行配置。

1.2 数据集格式转换

遥感数据通常以非标准格式存储，需要转换为DOTA格式才能被MMRotate处理。典型转换流程包括：

原始标注转换：将VOC/COCO格式标注转为DOTA格式
图像格式统一：确保所有图像为PNG格式
坐标系统一：确认旋转角度表示方法（le90或oc）

以下是一个将roLabelImg生成的XML标注转为DOTA格式的Python脚本核心部分：

def xml_to_dota(xml_path, img_path, output_dir): tree = ET.parse(xml_path) root = tree.getroot() with open(os.path.join(output_dir, 'label.txt'), 'w') as f: for obj in root.findall('object'): robndbox = obj.find('robndbox') cx = float(robndbox.find('cx').text) cy = float(robndbox.find('cy').text) w = float(robndbox.find('w').text) h = float(robndbox.find('h').text) angle = float(robndbox.find('angle').text) # 计算旋转后的四个顶点坐标 points = calculate_rotated_points(cx, cy, w, h, angle) line = ' '.join([str(p) for p in points]) + ' ' + obj.find('name').text + ' 0\n' f.write(line)

2. 数据预处理与增强策略

2.1 图像裁剪与分块

大尺寸遥感图像通常需要分割以适应GPU内存限制。MMRotate提供了内置的裁剪工具，可通过修改split_configs中的JSON文件进行配置：

{ "image_spliter": { "type": "sliding_window", "window_size": [1024, 1024], "stride": [512, 512], "padding": false, "save_dir": "split_results" }, "data_root": "/path/to/your/data", "ann_file": "trainval/annfiles", "img_dir": "trainval/images" }

执行裁剪命令：

python tools/data/dota/split/img_split.py --base_json split_configs/custom.json

2.2 数据增强配置

针对遥感数据特点，推荐在配置文件中添加以下增强策略：

train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='RResize', img_scale=(1024, 1024)), dict(type='RRandomFlip', flip_ratio=0.5), dict(type='RandomRotate', rate=0.5, angles=[30, 60, 90, 120, 150]), dict(type='BrightnessTransform', level=5), dict(type='ContrastTransform', level=5), dict(type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375]), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ]

3. 模型选择与配置优化

3.1 主流旋转检测模型对比

模型类型	优势	适用场景	训练效率
Rotated Faster R-CNN	检测精度高，稳定性好	中等规模数据集	中等
R3Det	对小目标检测效果好	密集小目标场景	较低
S2ANet	计算效率高	实时检测需求	较高
KFIoU	旋转框回归更准确	高精度定位要求	较低

3.2 关键参数配置

在configs/rotated_faster_rcnn目录下的配置文件中，需要特别关注以下参数：

model = dict( type='RotatedFasterRCNN', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch'), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( type='RotatedRPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHABBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), roi_head=dict( type='RotatedStandardRoIHead', bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='RotatedShared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=1, # 修改为实际类别数 bbox_coder=dict( type='DeltaXYWHABBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2, 0.1]), reg_class_agnostic=True, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))), train_cfg=dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, debug=False), rpn_proposal=dict( nms_pre=2000, max_per_img=2000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False)), test_cfg=dict( rpn=dict( nms_pre=2000, max_per_img=2000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( nms_pre=2000, max_per_img=2000, score_thr=0.05, nms=dict(type='nms', iou_threshold=0.1), max_per_img=2000)))

4. 训练优化与问题排查

4.1 内存不足解决方案

当遇到"CUDA out of memory"错误时，可尝试以下调整：

减小batch size：修改dotav1.py中的samples_per_gpu
优化数据加载：调整workers_per_gpu为2-4
使用混合精度训练：在配置中添加fp16 = dict(loss_scale=512.)

4.2 训练参数调优

关键训练参数推荐设置：

参数名称	推荐值范围	作用说明
base_lr	0.001-0.01	基础学习率
warmup_iters	500-1000	学习率预热步数
optimizer.momentum	0.9-0.99	优化器动量
lr_config.step	[8, 11]	学习率衰减时机
total_epochs	12-24	总训练轮数

4.3 模型评估与测试

训练完成后，使用以下命令进行测试：

python tools/test.py \ configs/rotated_faster_rcnn/rotated_faster_rcnn_r50_fpn_1x_dota_le90.py \ work_dirs/rotated_faster_rcnn/latest.pth \ --eval mAP

评估指标解读：

mAP: 平均精度均值，主要评估指标
AP50: IoU阈值为0.5时的AP值
AP75: IoU阈值为0.75时的AP值

5. 部署与性能优化

5.1 模型导出为ONNX格式

from mmdet.apis import init_detector, export_model config_file = 'configs/rotated_faster_rcnn/rotated_faster_rcnn_r50_fpn_1x_dota_le90.py' checkpoint_file = 'work_dirs/rotated_faster_rcnn/latest.pth' export_model(config_file, checkpoint_file, 'model.onnx')

5.2 推理速度优化技巧

模型剪枝：移除冗余卷积层
量化压缩：将FP32转为INT8
TensorRT加速：转换模型为TensorRT引擎
多尺度测试优化：合理设置测试尺度

test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1024, 1024), flip=False, transforms=[ dict(type='RResize'), dict(type='Normalize'), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ]

在实际项目中，我们发现将输入尺寸从1024x1024降至800x800可使推理速度提升约40%，而mAP仅下降2-3个百分点，这种权衡在实时性要求高的场景中非常实用。

查看全文

http://www.cnnetsun.cn/news/2857820.html