当前位置：首页 > news >正文

CCPD车牌数据集转YOLOv5格式的完整脚本与避坑指南（附Python代码）

news 2026/5/30 14:12:29

CCPD车牌数据集高效转YOLOv5格式的工程化实践

在智能交通系统中，车牌检测作为关键环节，其模型训练效果高度依赖数据质量。CCPD作为目前最大的中文车牌数据集，包含超过30万张真实场景车牌图像，但原始数据格式与YOLOv5不兼容的问题让许多开发者望而却步。本文将分享一套经过工业级验证的自动化转换方案，涵盖从数据解析到训练验证的全流程最佳实践。

1. 环境配置与工程架构设计

1.1 开发环境标准化配置

推荐使用以下环境组合保证兼容性：

# 创建隔离环境 conda create -n ccpd python=3.8 -y conda activate ccpd # 安装核心依赖 pip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html pip install opencv-python-headless==4.5.5.64 albumentations==1.1.0 pandas==1.4.2

关键组件说明：

OpenCV-headless：无GUI依赖的计算机视觉库
Albumentations：支持YOLO格式的数据增强库
Pandas：用于处理标注信息的结构化数据

1.2 工程目录结构规范

采用模块化设计提升可维护性：

ccpd2yolo/ ├── configs/ │ ├── paths.yaml # 路径配置文件 │ └── split_ratio.yaml # 数据集划分比例 ├── src/ │ ├── parser.py # 文件名解析器 │ ├── converter.py # 格式转换核心逻辑 │ └── validator.py # 标注验证工具 └── datasets/ ├── raw/ # 原始CCPD数据 └── processed/ # YOLO格式输出

2. CCPD文件名解析与标注提取

2.1 文件名编码规则解密

CCPD文件名包含完整标注信息，例如：025-95_113-154&383_386&473-386&473_177&457_154&383_363&402-0_0_22_27_27_33_16-37-15.jpg

各字段含义解析：

025：图像序列号
95_113：亮度与模糊度指标
154&383_386&473：车牌区域左上(154,383)和右下(386,473)坐标
386&473_177&457...：车牌四角顶点坐标
0_0_22_27_27_33_16：车牌号码编码
37：车牌倾斜角度
15：车牌类型代码

2.2 自动化解析实现

import re from pathlib import Path def parse_ccpd_filename(filename): pattern = r'^(?P<seq>\d+)-(?P<quality>[\d_]+)-(?P<coords>[\d&_]+)-(?P<vertices>[\d&_]+)-(?P<plate>[\d_]+)-(?P<angle>\d+)-(?P<type>\d+)' match = re.match(pattern, filename.stem) if not match: raise ValueError(f"Invalid CCPD filename format: {filename}") # 提取边界框坐标 lt, rb = match.group('coords').split('_') lx, ly = map(int, lt.split('&')) rx, ry = map(int, rb.split('&')) return { 'bbox': (lx, ly, rx, ry), 'vertices': match.group('vertices'), 'plate_type': int(match.group('type')) }

3. YOLO格式转换核心算法

3.1 坐标归一化计算

关键转换公式：

图像宽度 W = image.shape[1] 图像高度 H = image.shape[0] YOLO格式： 中心点x = (lx + (rx - lx)/2) / W 中心点y = (ly + (ry - ly)/2) / H 归一化宽度 = (rx - lx) / W 归一化高度 = (ry - ly) / H

3.2 健壮性处理增强

import cv2 from tqdm import tqdm def convert_to_yolo(image_dir, output_dir): image_dir = Path(image_dir) output_dir = Path(output_dir) for img_path in tqdm(list(image_dir.glob('*.jpg'))): try: # 读取图像获取尺寸 img = cv2.imread(str(img_path)) if img is None: print(f"Warning: Failed to read {img_path}, skipping") continue # 解析标注信息 anno = parse_ccpd_filename(img_path) lx, ly, rx, ry = anno['bbox'] # 坐标归一化 W, H = img.shape[1], img.shape[0] cx = (lx + (rx - lx)/2) / W cy = (ly + (ry - ly)/2) / H nw = (rx - lx) / W nh = (ry - ly) / H # 写入YOLO格式标注 txt_path = output_dir / f"{img_path.stem}.txt" with open(txt_path, 'w') as f: f.write(f"{anno['plate_type']} {cx:.6f} {cy:.6f} {nw:.6f} {nh:.6f}") except Exception as e: print(f"Error processing {img_path}: {str(e)}") continue

4. 工业级数据处理流水线

4.1 自动化质量验证机制

实现标注与图像的自动校验：

def validate_annotation(image_dir, label_dir): for img_path in Path(image_dir).glob('*.jpg'): txt_path = Path(label_dir) / f"{img_path.stem}.txt" if not txt_path.exists(): print(f"Missing label: {txt_path}") continue with open(txt_path) as f: line = f.readline().strip() cls, cx, cy, nw, nh = map(float, line.split()) if not (0 <= cx <=1 and 0 <= cy <=1): print(f"Invalid center coordinates in {txt_path}") if not (0 < nw <=1 and 0 < nh <=1): print(f"Invalid dimensions in {txt_path}")

4.2 数据集智能分割策略

采用分层抽样保证数据分布一致性：

import numpy as np from sklearn.model_selection import train_test_split def split_dataset(image_dir, ratios=(0.7, 0.2, 0.1)): all_files = list(Path(image_dir).glob('*.jpg')) plate_types = [parse_ccpd_filename(f)['plate_type'] for f in all_files] # 按车牌类型分层划分 train_val, test = train_test_split( all_files, test_size=ratios[2], stratify=plate_types) train, val = train_test_split( train_val, test_size=ratios[1]/(ratios[0]+ratios[1]), stratify=[plate_types[i] for i in train_val]) return {'train': train, 'val': val, 'test': test}

5. 性能优化与异常处理

5.1 多进程加速处理

from multiprocessing import Pool def process_single(args): img_path, output_dir = args try: convert_to_yolo(img_path, output_dir) return True except Exception as e: return False def batch_convert(image_dir, output_dir, workers=8): image_dir = Path(image_dir) args_list = [(p, output_dir) for p in image_dir.glob('*.jpg')] with Pool(workers) as p: results = list(tqdm(p.imap(process_single, args_list), total=len(args_list))) success_rate = sum(results)/len(results) print(f"Conversion completed with {success_rate:.1%} success rate")

5.2 常见异常处理方案

异常类型	触发场景	解决方案
图像读取失败	文件损坏或格式错误	自动跳过并记录日志
坐标越界	标注超出图像边界	自动裁剪到有效范围
文件名格式错误	非标准CCPD命名	正则表达式严格校验
内存溢出	大尺寸图像处理	分块处理+垃圾回收

在完成全部转换流程后，建议使用可视化工具随机检查标注质量。这里提供一个快速验证脚本：

import matplotlib.pyplot as plt import matplotlib.patches as patches def plot_yolo_sample(image_path, label_path): img = cv2.imread(str(image_path)) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) with open(label_path) as f: line = f.readline().strip() cls, cx, cy, nw, nh = map(float, line.split()) # 转换回绝对坐标 W, H = img.shape[1], img.shape[0] lx = int((cx - nw/2) * W) ly = int((cy - nh/2) * H) rx = int((cx + nw/2) * W) ry = int((cy + nh/2) * H) fig, ax = plt.subplots(1) ax.imshow(img) rect = patches.Rectangle((lx,ly), rx-lx, ry-ly, linewidth=2, edgecolor='r', facecolor='none') ax.add_patch(rect) plt.show()

查看全文

http://www.cnnetsun.cn/news/2636333.html