告别手动转换！用Python脚本一键搞定Labelme JSON到YOLO TXT格式（附数据集划分）-二趣网

高效自动化：Python实现Labelme JSON到YOLO TXT格式转换与数据集划分

在计算机视觉项目中，数据标注是模型训练前的关键步骤。Labelme作为一款流行的图像标注工具，生成的JSON格式文件需要转换为YOLO模型所需的TXT格式。传统手动转换方式不仅耗时耗力，还容易出错。本文将介绍如何通过Python脚本实现一键式转换，并自动完成数据集的合理划分。

1. 环境准备与工具选择

在开始之前，我们需要确保开发环境配置正确。推荐使用Python 3.7或更高版本，并安装以下必要的库：

pip install numpy pandas opencv-python tqdm

这些库将帮助我们处理JSON文件、图像数据以及提供进度条显示。对于大型数据集，tqdm库能让我们清晰了解处理进度。

为什么选择自动化脚本？

节省时间：手动转换1000个文件可能需要数小时，而脚本只需几分钟
减少错误：人工操作难免出错，脚本保证转换一致性
可重复使用：一次编写，多次使用，适合迭代开发
标准化输出：确保所有文件格式统一，便于后续模型训练

提示：建议在虚拟环境中安装依赖，避免与其他项目产生冲突

2. JSON文件结构解析与YOLO格式理解

Labelme生成的JSON文件包含丰富的标注信息，我们需要从中提取关键数据。一个典型的Labelme JSON文件结构如下：

{ "version": "4.5.6", "flags": {}, "shapes": [ { "label": "cat", "points": [[100, 150], [300, 350]], "shape_type": "rectangle" } ], "imagePath": "example.jpg", "imageWidth": 640, "imageHeight": 480 }

而YOLO需要的TXT格式则更为简洁，每行表示一个标注对象，格式为：

<class_id> <x_center> <y_center> <width> <height>

其中坐标值都是相对于图像宽高的归一化值（0-1之间）。

关键转换步骤：

读取JSON文件中的图像尺寸信息
提取每个标注对象的边界框坐标
将绝对坐标转换为相对坐标
根据类别名称映射到对应的class_id
按照YOLO格式写入TXT文件

3. 核心转换脚本实现

下面是一个完整的Python脚本，实现从Labelme JSON到YOLO TXT的转换：

import json import os from tqdm import tqdm def convert_labelme_to_yolo(json_path, output_dir, class_mapping): # 读取JSON文件 with open(json_path, 'r') as f: data = json.load(f) # 获取图像尺寸 img_width = data['imageWidth'] img_height = data['imageHeight'] # 准备输出TXT文件路径 txt_filename = os.path.splitext(os.path.basename(json_path))[0] + '.txt' txt_path = os.path.join(output_dir, txt_filename) with open(txt_path, 'w') as f: # 处理每个标注对象 for shape in data['shapes']: if shape['shape_type'] != 'rectangle': continue # 只处理矩形标注 # 获取类别ID class_name = shape['label'] class_id = class_mapping.get(class_name, -1) if class_id == -1: continue # 忽略未定义类别 # 获取边界框坐标 points = shape['points'] x1, y1 = points[0] x2, y2 = points[1] # 计算YOLO格式的归一化坐标 x_center = ((x1 + x2) / 2) / img_width y_center = ((y1 + y2) / 2) / img_height width = (x2 - x1) / img_width height = (y2 - y1) / img_height # 写入TXT文件 f.write(f"{class_id} {x_center:.6f} {y_center:.6f} {width:.6f} {height:.6f}\n") def batch_convert(json_dir, output_dir, class_mapping): # 确保输出目录存在 os.makedirs(output_dir, exist_ok=True) # 获取所有JSON文件 json_files = [f for f in os.listdir(json_dir) if f.endswith('.json')] # 批量转换 for json_file in tqdm(json_files, desc="Converting JSON to YOLO format"): json_path = os.path.join(json_dir, json_file) convert_labelme_to_yolo(json_path, output_dir, class_mapping) if __name__ == "__main__": # 配置参数 JSON_DIR = "path/to/labelme/json/files" OUTPUT_DIR = "path/to/save/yolo/txt/files" # 类别映射表（根据实际项目修改） CLASS_MAPPING = { "cat": 0, "dog": 1, "person": 2 } # 执行批量转换 batch_convert(JSON_DIR, OUTPUT_DIR, CLASS_MAPPING)

注意：脚本中的路径需要根据实际情况修改，CLASS_MAPPING也需要与项目中的类别对应

4. 数据集自动划分策略

获得YOLO格式的标注文件后，我们通常需要将数据集划分为训练集、验证集和测试集。以下脚本实现了自动划分功能：

import os import random import shutil from tqdm import tqdm def split_dataset(image_dir, label_dir, output_dir, ratios=(0.7, 0.2, 0.1)): # 创建输出目录结构 splits = ['train', 'val', 'test'] for split in splits: os.makedirs(os.path.join(output_dir, split, 'images'), exist_ok=True) os.makedirs(os.path.join(output_dir, split, 'labels'), exist_ok=True) # 获取所有图像文件（假设图像和标签文件名相同，扩展名不同） image_files = [f for f in os.listdir(image_dir) if f.lower().endswith(('.jpg', '.png'))] random.shuffle(image_files) # 随机打乱 # 计算各集合大小 total = len(image_files) train_end = int(ratios[0] * total) val_end = train_end + int(ratios[1] * total) # 分配文件到不同集合 for i, img_file in enumerate(tqdm(image_files, desc="Splitting dataset")): # 确定当前文件属于哪个集合 if i < train_end: split = 'train' elif i < val_end: split = 'val' else: split = 'test' # 复制图像文件 img_src = os.path.join(image_dir, img_file) img_dst = os.path.join(output_dir, split, 'images', img_file) shutil.copy2(img_src, img_dst) # 复制对应的标签文件 label_name = os.path.splitext(img_file)[0] + '.txt' label_src = os.path.join(label_dir, label_name) label_dst = os.path.join(output_dir, split, 'labels', label_name) if os.path.exists(label_src): shutil.copy2(label_src, label_dst) else: # 如果没有标签文件，创建一个空文件（适用于没有标注对象的图像） open(label_dst, 'w').close() if __name__ == "__main__": # 配置参数 IMAGE_DIR = "path/to/original/images" LABEL_DIR = "path/to/yolo/label/files" OUTPUT_DIR = "path/to/save/split/dataset" # 划分比例（训练:验证:测试） SPLIT_RATIOS = (0.7, 0.2, 0.1) # 70%训练，20%验证，10%测试 # 执行数据集划分 split_dataset(IMAGE_DIR, LABEL_DIR, OUTPUT_DIR, SPLIT_RATIOS)

数据集划分的最佳实践：

对于小型数据集（<1k样本），建议使用60/20/20的比例
中型数据集（1k-10k）可以使用70/15/15
大型数据集（>10k）可以采用80/10/10或90/5/5
确保每个类别在各个集合中都有代表性样本

5. 高级功能与错误处理

在实际应用中，我们还需要考虑一些边界情况和增强功能：

多线程处理加速对于超大规模数据集，可以使用多线程加速处理：

from concurrent.futures import ThreadPoolExecutor def batch_convert_parallel(json_dir, output_dir, class_mapping, workers=4): os.makedirs(output_dir, exist_ok=True) json_files = [f for f in os.listdir(json_dir) if f.endswith('.json')] def process_file(json_file): json_path = os.path.join(json_dir, json_file) convert_labelme_to_yolo(json_path, output_dir, class_mapping) with ThreadPoolExecutor(max_workers=workers) as executor: list(tqdm(executor.map(process_file, json_files), total=len(json_files)))

错误处理与日志记录完善的错误处理能确保脚本的健壮性：

import logging logging.basicConfig(filename='conversion.log', level=logging.INFO) def safe_convert_labelme_to_yolo(json_path, output_dir, class_mapping): try: with open(json_path, 'r') as f: data = json.load(f) # 验证必要字段 if 'imageWidth' not in data or 'imageHeight' not in data: logging.warning(f"Missing dimensions in {json_path}") return # 其余转换逻辑... except Exception as e: logging.error(f"Error processing {json_path}: {str(e)}")

支持的标注类型扩展除了矩形标注，还可以支持多边形等其他形状：

def convert_polygon_to_yolo(points, img_width, img_height): # 将多边形转换为最小外接矩形 x_coords = [p[0] for p in points] y_coords = [p[1] for p in points] x_min, x_max = min(x_coords), max(x_coords) y_min, y_max = min(y_coords), max(y_coords) # 转换为YOLO格式 x_center = (x_min + x_max) / 2 / img_width y_center = (y_min + y_max) / 2 / img_height width = (x_max - x_min) / img_width height = (y_max - y_min) / img_height return x_center, y_center, width, height

6. 实际应用案例与性能优化

在实际项目中应用这些脚本时，有几个关键点需要注意：

处理大型数据集的技巧

分批处理：对于特别大的数据集，可以分批处理避免内存不足
进度保存：记录已处理文件，支持断点续传
内存优化：使用生成器而非列表存储文件路径

与训练流程的集成可以将这些脚本集成到模型训练管道中：

# 示例训练管道 python convert_labelme_to_yolo.py && \ python split_dataset.py && \ python train_yolo.py --data dataset.yaml --weights yolov5s.pt

性能对比

方法	100个文件	1000个文件	10000个文件
手动转换	~60分钟	~10小时	~4天
单线程脚本	~30秒	~5分钟	~50分钟
多线程脚本(4线程)	~10秒	~2分钟	~15分钟

从表格可以看出，自动化脚本能带来数十倍甚至上百倍的效率提升。

企业官网建设流程全解析

高效自动化：Python实现Labelme JSON到YOLO TXT格式转换与数据集划分

1. 环境准备与工具选择

2. JSON文件结构解析与YOLO格式理解

3. 核心转换脚本实现

4. 数据集自动划分策略

5. 高级功能与错误处理

6. 实际应用案例与性能优化

热门文章

文章分类

标签云

需要专业的网站建设服务？

企业官网建设流程全解析

高效自动化：Python实现Labelme JSON到YOLO TXT格式转换与数据集划分

1. 环境准备与工具选择

2. JSON文件结构解析与YOLO格式理解

3. 核心转换脚本实现

4. 数据集自动划分策略

5. 高级功能与错误处理

6. 实际应用案例与性能优化

热门文章

文章分类

标签云

相关文章

终极指南：如何高效自动化下载Google Drive共享文件

远程办公神器：如何用USB Network Gate把家里的打印机和扫描仪共享给公司电脑（Win/Mac跨平台教程）

从零自制VI曲线追踪器：示波器X-Y模式在电路诊断中的实战应用

需要专业的网站建设服务？