YOLO11介绍
YOLO11 是 Ultralytics 体系的新一代 YOLO 目标检测/分割/姿态等任务模型迭代版本,延续了 YOLO 系列“一阶段、端到端、实时”的设计思路,通过改进网络结构、特征融合与训练/推理策略,在同等算力下提升精度与速度的权衡,并提供从 n/s/m/l/x 等不同规模以适配端侧到服务器的部署需求。
💡 提示
在 YOLO 相关模型命名里,n / s / m / l / x 这类字母通常表示模型规模(size) ,也就是网络的宽度/深度配置不同,带来参数量、计算量和精度/速度的权衡。
常见含义:
- n = nano:最小、最快、精度相对低,适合端侧/低算力
- s = small:小型
- m = medium:中型
- l = large:大型
- x = xlarge / extra-large:最大、最慢、精度通常最
目标
我们将部署模型到泰山派3M-RK3576板子上,使用 rknn_model_zoo 的官方Demo进行演示。
环境准备
主机环境:Ubuntu22.04(x86)
开发板:泰山派3M-RK3576
数据线:连接PC和开发板用于ADB传输文件。
安装miniforge3
为了防止在一个主机中不同的环境造成的 python 环境问题,我们使用 miniforge3 管理。
安装 miniforge3 :
# 下载 miniforge3 安装脚本
wget -c https://mirrors.bfsu.edu.cn/github-release/conda-forge/miniforge/LatestRelease/Miniforge3-Linux-x86_64.sh
# 运行安装脚本
bash Miniforge3-Linux-x86_64.sh
# 1.按下Enter回车继续运行
# 2.然后使用向下箭头,向下滚动查看协议
# 3.最后输入yes
# 4.提示Proceed with initialization?输入yes2
3
4
5
6
7
8
9
10
可以去 https://mirrors.bfsu.edu.cn/github-release/conda-forge/miniforge/LatestRelease/ 这个目录下查看目前最新的 .sh 文件名。
初始化 conda 环境变量:
source ~/miniforge3/bin/activate成功之后,命令行前方会显示一个
(base)
创建rknn-toolkit2环境
创建并激活 Conda 环境:YOLO11-RKNN-Toolkit2(这里推荐使用 python 3.10 版本)
后面我们将ONNX模型转化为RKNN模型的时候需要用到。
# 创建环境
conda create -n YOLO11-RKNN-Toolkit2 python=3.10
# 遇到Proceed ([y]/n)?
# 输入y即可2
3
4
5
激活 Conda 环境:
conda activate YOLO11-RKNN-Toolkit2
# 激活之后在命令行前面会出现:(YOLO11-RKNN-Toolkit2)2
3
安装依赖环境:
# 安装rknn-toolkit2
pip install rknn-toolkit2 -i https://mirrors.aliyun.com/pypi/simple
# 安装指定版本onnx==1.18.0
pip install onnx==1.18.0 -i https://mirrors.aliyun.com/pypi/simple2
3
4
5
安装完成之后,退出 YOLO11-RKNN-Toolkit2 环境:
conda deactivate创建yolo11环境
创建并激活 Conda 环境:Tspi3-YOLO11(这里推荐使用 python 3.10 版本)
# 创建环境
conda create -n Tspi3-YOLO11 python=3.10
# 遇到Proceed ([y]/n)?
# 输入y即可2
3
4
5
激活 Conda 环境:
conda activate Tspi3-YOLO11
# 激活之后在命令行前面会出现:(TaishanPi3-YOLO11)2
3
安装依赖工具,为 YOLO11 做准备:
pip install ultralytics onnx onnxscript -i https://mirrors.aliyun.com/pypi/simple测试:
(Tspi3-YOLO11) lipeng@host:~/workspace$ yolo -v
8.3.2482
模型转换
接下来我么需要执行三个重要的步骤:
- 拉取pt文件。
- 使用rockchip优化过的yolo11项目导出onnx模型。
- 使用rknn-toolkit2将onnx模型转化为能硬件加速的RKNN模型。
拉取pt文件
所谓的 .pt 文件,就是训练好的 YOLO11 模型权重(参数),只有拿到这个文件,才能去识别目标。
否则即使有 YOLO11 的代码,也只是一个空架子,无法完成检测。
在 https://github.com/ultralytics/assets/releases/ 这个地址中,有着 ultralytics 官方给我们提供的 .pt 权重文件,我们只需要下载需要的:
wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11n-pose.pt导出ONNX模型
我们接下来就要拉取 Rockchip 官方修改的 ultralytics_yolo11项目,针对 RKNPU 进行了专门的适配:
- 修改输出结构, 移除后处理结构. (后处理结果对于量化不友好)
- dfl 结构在 NPU 处理上性能不佳,移至模型外部的后处理阶段,此操作大部分情况下可提升推理性能。
- 模型输出分支新增置信度的总和,用于后处理阶段加速阈值筛选。
详情:https://github.com/airockchip/ultralytics_yolo11/blob/main/RKOPT_README.zh-CN.md
继续使用 Tspi3-YOLO11 环境:
conda activate Tspi3-YOLO11拉取 airockchip/ultralytics_yolo11 项目:
git clone https://github.com/airockchip/ultralytics_yolo11.git拉取完成之后,进入目录:
cd ultralytics_yolo11修改 ultralytics_yolo11/ultralytics/cfg/default.yaml 文件中的 model 为刚刚拉取的 .pt 文件绝对路径:
要根据自己的
.pt文件路径,进行填写。
# Train settings -------------------------------------------------------------------------------------------------------
-model: yolo11n.pt # (str, optional) path to model file, i.e. yolo11n.pt, yolo11n.yaml
+model: /home/lipeng/workspace/yolo11/yolo11n-pose.pt # (str, optional) path to model file, i.e. yolo11n.pt, yolo11n.yaml
data: # (str, optional) path to data file, i.e. coco8.yaml
epochs: 100 # (int) number of epochs to train for
time: # (float, optional) number of hours to train for, overrides epochs if supplied2
3
4
5
6
修改 ultralytics_yolo11/ultralytics/engine/exporter.py 文件,添加 dynamo=False 参数,强制使用旧版 TorchScript-based 的 ONNX 导出器,防止最新版本的 PyTorch 导出 pose 的 ONNX 模型报错:
--- a/ultralytics/engine/exporter.py
+++ b/ultralytics/engine/exporter.py
@@ -413,6 +413,7 @@ class Exporter:
f,
verbose=False,
opset_version=12,
+ dynamo=False, # Use legacy TorchScript-based exporter for compatibility
do_constant_folding=True, # WARNING: DNN inference with torch>=1.12 may require do_constant_folding=False
input_names=['images'])2
3
4
5
6
7
8
9
10
设置导出路径为当前目录:
export PYTHONPATH=./使用脚本开始导出 ONNX模型:
python ./ultralytics/engine/exporter.pyONNX转RKNN
退出 Tspi3-YOLO11 环境:
conda deactivate进入YOLO11-RKNN-Toolkit2 环境
conda activate YOLO11-RKNN-Toolkit2接下来我们将使用 rknn_model_zoo 中的 转换脚本 将 ONNX 转换为 RKNN 模型,拉取项目:
git clone https://github.com/airockchip/rknn_model_zoo.git进入 rknn_model_zoo/examples/yolov8_pose/python 目录下:
特别注意
因为 YOLO11 没有专门的分割案例,我们直接使用 YOLOv8 的案例即可,
cd rknn_model_zoo/examples/yolov8_pose/python修改 rknn_model_zoo/examples/yolov8_pose/python/convert.py 脚本,不使用混合量化部分,因为我们导出的ONNX模型层级不同,故而使用标准量化:
--- a/examples/yolov8_pose/python/convert.py
+++ b/examples/yolov8_pose/python/convert.py
@@ -59,23 +59,19 @@ if __name__ == '__main__':
if platform in ["rv1109","rv1126","rk1808"] :
ret = rknn.build(do_quantization=do_quant, dataset=DATASET_PATH, auto_hybrid_quant=True)
else:
- if do_quant:
- rknn.hybrid_quantization_step1(
- dataset=DATASET_PATH,
- proposal= False,
- custom_hybrid=[['/model.22/cv4.0/cv4.0.0/act/Mul_output_0','/model.22/Concat_6_output_0'],
- ['/model.22/cv4.1/cv4.1.0/act/Mul_output_0','/model.22/Concat_6_output_0'],
- ['/model.22/cv4.2/cv4.2.0/act/Mul_output_0','/model.22/Concat_6_output_0']]
- )
-
- model_name=os.path.basename(model_path).replace('.onnx','')
- rknn.hybrid_quantization_step2(
- model_input = model_name+".model", # 表示第一步生成的模型文件
- data_input= model_name+".data", # 表示第一步生成的配置文件
- model_quantization_cfg=model_name+".quantization.cfg" # 表示第一步生成的量化配置文件
- )
- else:
- ret = rknn.build(do_quantization=do_quant, dataset=DATASET_PATH)
+ # Standard quantization (default, fastest)
+ ret = rknn.build(do_quantization=do_quant, dataset=DATASET_PATH)
+
+ # Uncomment below to enable AUTO hybrid quantization (automatically finds sensitive layers):
+ # if do_quant:
+ # rknn.hybrid_quantization_step1(dataset=DATASET_PATH, proposal=True)
+ # model_name = os.path.basename(model_path).replace('.onnx','')
+ # # Check generated .quantization.cfg, adjust if needed, then run:
+ # rknn.hybrid_quantization_step2(
+ # model_input=model_name+".model",
+ # data_input=model_name+".data",
+ # model_quantization_cfg=model_name+".quantization.cfg"
+ # )
if ret != 0:
print('Build model failed!')
exit(ret)2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
运行 rknn_model_zoo/examples/yolov8_pose/python/convert.py 脚本转化RKNN模型:
# 语法:python3 convert.py onnx_model_path [platform] [dtype] [output_rknn_path]
## platform:[rk3562, rk3566, rk3568, rk3576, rk3588, rv1126b, rv1109, rv1126, rk1808]
## dtype:[i8, fp] for [rk3562, rk3566, rk3568, rk3576, rk3588, rv1126b]
## dtype:[u8, fp] for [rv1109, rv1126, rk1808]
python convert.py /home/lipeng/workspace/yolo11/yolo11n-pose.onnx rk3576 i82
3
4
5
6
platform选择的平台有rk3562,rk3566,rk3568,rk3576,rk3588,rv1126b,rv1109,rv1126,rk1808可选择
dtype:
- 选择
i8或fp适用于rk3562,rk3566,rk3568,rk3576,rk3588,rv1126b这些平台- 选择
u8或fp适用于rv1109,rv1126,rk1808这些平台
执行成功之后,会在 rknn_model_zoo/examples/yolov8_pose/model 目录下生成一个 .rknn 模型文件。
Demo编译
说明
在 rockchip官方的开源项目 中使用的是C++编写的Demo,可以通过运行
rknn_model_zoo/build-linux.shrknn_model_zoo/build-android.sh
这两个脚本(将交叉编译路径替换为实际路径)直接编译示例代码。
在部署目录中生成一个install/demo_Linux_aarch64 或 install/demo_Android_aarch64 文件夹,包含 imgenc、llm、demo 和 lib 文件夹。
退出环境
conda deactivate看到命令行前面出现 (base) 字样就可以了。
安装交叉编译器
我们需要在PC主机上面编译Demo生成文件,在泰山派3M-RK3576的板子上面运行,所以我们直接使用 apt 安装 aarch64-linux-gnu:
sudo apt update && \
sudo apt install -y cmake make gcc-aarch64-linux-gnu g++-aarch64-linux-gnu2
修改Demo源码
在原有的 YOLOv8 的 pose 姿态检 Demo 中代码使用 float16 读取关键点数据。输出层是 INT8 类型,因此必须使用 INT8 的反量化。
修改 rknn_model_zoo/examples/yolov8_pose/cpp/postprocess.cc 文件:
diff --git a/examples/yolov8_pose/cpp/postprocess.cc b/examples/yolov8_pose/cpp/postprocess.cc
index 8d11271..d45cc8f 100644
--- a/examples/yolov8_pose/cpp/postprocess.cc
+++ b/examples/yolov8_pose/cpp/postprocess.cc
@@ -470,30 +470,35 @@ int post_process(rknn_app_context_t *app_ctx, void *outputs, letterbox_t *letter
float h = filterBoxes[n * 5 + 3];
int keypoints_index = (int)filterBoxes[n * 5 + 4];
+ // Calculate total anchor points dynamically from output tensor
+ int total_anchors = index; // Total anchor points from all scales
+
for (int j = 0; j < 17; ++j) {
if (app_ctx->is_quant) {
#ifdef RKNPU1
- od_results->results[last_count].keypoints[j][0] = (deqnt_affine_u8_to_f32(((uint8_t *)_outputs[3].buf)[j * 3 * 8400 + 0 * 8400 + keypoints_index],
+ od_results->results[last_count].keypoints[j][0] = (deqnt_affine_u8_to_f32(((uint8_t *)_outputs[3].buf)[(j * 3 + 0) * total_anchors + keypoints_index],
app_ctx->output_attrs[3].zp, app_ctx->output_attrs[3].scale)- letter_box->x_pad)/ letter_box->scale;
- od_results->results[last_count].keypoints[j][1] = (deqnt_affine_u8_to_f32(((uint8_t *)_outputs[3].buf)[j * 3 * 8400 + 1 * 8400 + keypoints_index],
+ od_results->results[last_count].keypoints[j][1] = (deqnt_affine_u8_to_f32(((uint8_t *)_outputs[3].buf)[(j * 3 + 1) * total_anchors + keypoints_index],
app_ctx->output_attrs[3].zp, app_ctx->output_attrs[3].scale)- letter_box->y_pad)/ letter_box->scale;
- od_results->results[last_count].keypoints[j][2] = deqnt_affine_u8_to_f32(((uint8_t *)_outputs[3].buf)[j * 3 * 8400 + 2 * 8400 + keypoints_index],
+ od_results->results[last_count].keypoints[j][2] = deqnt_affine_u8_to_f32(((uint8_t *)_outputs[3].buf)[(j * 3 + 2) * total_anchors + keypoints_index],
app_ctx->output_attrs[3].zp, app_ctx->output_attrs[3].scale);
#else
- od_results->results[last_count].keypoints[j][0] = ((float)((rknpu2::float16 *)_outputs[3].buf)[j*3*8400+0*8400+keypoints_index]
- - letter_box->x_pad)/ letter_box->scale;
- od_results->results[last_count].keypoints[j][1] = ((float)((rknpu2::float16 *)_outputs[3].buf)[j*3*8400+1*8400+keypoints_index]
- - letter_box->y_pad)/ letter_box->scale;
- od_results->results[last_count].keypoints[j][2] = (float)((rknpu2::float16 *)_outputs[3].buf)[j*3*8400+2*8400+keypoints_index];
+ // RKNPU2: INT8 quantized output, use deqnt_affine_to_f32
+ od_results->results[last_count].keypoints[j][0] = (deqnt_affine_to_f32(((int8_t *)_outputs[3].buf)[(j*3+0)*total_anchors+keypoints_index],
+ app_ctx->output_attrs[3].zp, app_ctx->output_attrs[3].scale) - letter_box->x_pad) / letter_box->scale;
+ od_results->results[last_count].keypoints[j][1] = (deqnt_affine_to_f32(((int8_t *)_outputs[3].buf)[(j*3+1)*total_anchors+keypoints_index],
+ app_ctx->output_attrs[3].zp, app_ctx->output_attrs[3].scale) - letter_box->y_pad) / letter_box->scale;
+ od_results->results[last_count].keypoints[j][2] = deqnt_affine_to_f32(((int8_t *)_outputs[3].buf)[(j*3+2)*total_anchors+keypoints_index],
+ app_ctx->output_attrs[3].zp, app_ctx->output_attrs[3].scale);
#endif
}
else
{
- od_results->results[last_count].keypoints[j][0] = (((float *)_outputs[3].buf)[j*3*8400+0*8400+keypoints_index]
+ od_results->results[last_count].keypoints[j][0] = (((float *)_outputs[3].buf)[(j*3+0)*total_anchors+keypoints_index]
- letter_box->x_pad)/ letter_box->scale;
- od_results->results[last_count].keypoints[j][1] = (((float *)_outputs[3].buf)[j*3*8400+1*8400+keypoints_index]
+ od_results->results[last_count].keypoints[j][1] = (((float *)_outputs[3].buf)[(j*3+1)*total_anchors+keypoints_index]
- letter_box->y_pad)/ letter_box->scale;
- od_results->results[last_count].keypoints[j][2] = ((float *)_outputs[3].buf)[j*3*8400+2*8400+keypoints_index];
+ od_results->results[last_count].keypoints[j][2] = ((float *)_outputs[3].buf)[(j*3+2)*total_anchors+keypoints_index];
}
}2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// Copyright (c) 2024 by Rockchip Electronics Co., Ltd. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "yolov8-pose.h"
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#ifndef RKNPU1
#include <Float16.h>
#endif
#include <iostream>
#include <cmath>
#include <algorithm>
#include <set>
#include <vector>
#define LABEL_NALE_TXT_PATH "./model/yolov8_pose_labels_list.txt"
static char *labels[OBJ_CLASS_NUM];
inline static int clamp(float val, int min, int max) { return val > min ? (val < max ? val : max) : min; }
static char *readLine(FILE *fp, char *buffer, int *len) {
int ch;
int i = 0;
size_t buff_len = 0;
buffer = (char *)malloc(buff_len + 1);
if (!buffer)
return NULL; // Out of memory
while ((ch = fgetc(fp)) != '\n' && ch != EOF) {
buff_len++;
void *tmp = realloc(buffer, buff_len + 1);
if (tmp == NULL) {
free(buffer);
return NULL; // Out of memory
}
buffer = (char *)tmp;
buffer[i] = (char)ch;
i++;
}
buffer[i] = '\0';
*len = buff_len;
// Detect end
if (ch == EOF && (i == 0 || ferror(fp))) {
free(buffer);
return NULL;
}
return buffer;
}
static int readLines(const char *fileName, char *lines[], int max_line) {
FILE *file = fopen(fileName, "r");
char *s;
int i = 0;
int n = 0;
if (file == NULL) {
printf("Open %s fail!\n", fileName);
return -1;
}
while ((s = readLine(file, s, &n)) != NULL) {
lines[i++] = s;
if (i >= max_line)
break;
}
fclose(file);
return i;
}
static int loadLabelName(const char *locationFilename, char *label[]) {
printf("load lable %s\n", locationFilename);
readLines(locationFilename, label, OBJ_CLASS_NUM);
return 0;
}
static float CalculateOverlap(float xmin0, float ymin0, float xmax0, float ymax0, float xmin1, float ymin1, float xmax1,
float ymax1)
{
float w = fmax(0.f, fmin(xmax0, xmax1) - fmax(xmin0, xmin1) + 1.0);
float h = fmax(0.f, fmin(ymax0, ymax1) - fmax(ymin0, ymin1) + 1.0);
float i = w * h;
float u = (xmax0 - xmin0 + 1.0) * (ymax0 - ymin0 + 1.0) + (xmax1 - xmin1 + 1.0) * (ymax1 - ymin1 + 1.0) - i;
return u <= 0.f ? 0.f : (i / u);
}
static int nms(int validCount, std::vector<float> &outputLocations, std::vector<int> classIds, std::vector<int> &order,
int filterId, float threshold)
{
for (int i = 0; i < validCount; ++i)
{
int n = order[i];
if (n == -1 || classIds[n] != filterId)
{
continue;
}
for (int j = i + 1; j < validCount; ++j)
{
int m = order[j];
if (m == -1 || classIds[m] != filterId)
{
continue;
}
float xmin0 = outputLocations[n * 5 + 0];
float ymin0 = outputLocations[n * 5 + 1];
float xmax0 = outputLocations[n * 5 + 0] + outputLocations[n * 5 + 2];
float ymax0 = outputLocations[n * 5 + 1] + outputLocations[n * 5 + 3];
float xmin1 = outputLocations[m * 5 + 0];
float ymin1 = outputLocations[m * 5 + 1];
float xmax1 = outputLocations[m * 5 + 0] + outputLocations[m * 5 + 2];
float ymax1 = outputLocations[m * 5 + 1] + outputLocations[m * 5 + 3];
float iou = CalculateOverlap(xmin0, ymin0, xmax0, ymax0, xmin1, ymin1, xmax1, ymax1);
if (iou > threshold)
{
order[j] = -1;
}
}
}
return 0;
}
static int quick_sort_indice_inverse(std::vector<float> &input, int left, int right, std::vector<int> &indices) {
float key;
int key_index;
int low = left;
int high = right;
if (left < right) {
key_index = indices[left];
key = input[left];
while (low < high) {
while (low < high && input[high] <= key) {
high--;
}
input[low] = input[high];
indices[low] = indices[high];
while (low < high && input[low] >= key) {
low++;
}
input[high] = input[low];
indices[high] = indices[low];
}
input[low] = key;
indices[low] = key_index;
quick_sort_indice_inverse(input, left, low - 1, indices);
quick_sort_indice_inverse(input, low + 1, right, indices);
}
return low;
}
static float sigmoid(float x) {
return 1.0 / (1.0 + expf(-x));
}
static float unsigmoid(float y) {
return -1.0 * logf((1.0 / y) - 1.0);
}
inline static int32_t __clip(float val, float min, float max) {
float f = val <= min ? min : (val >= max ? max : val);
return f;
}
static int8_t qnt_f32_to_affine(float f32, int32_t zp, float scale) {
float dst_val = (f32 / scale) + zp;
int8_t res = (int8_t)__clip(dst_val, -128, 127);
return res;
}
static uint8_t qnt_f32_to_affine_u8(float f32, int32_t zp, float scale) {
float dst_val = (f32 / scale) + zp;
uint8_t res = (uint8_t)__clip(dst_val, 0, 255);
return res;
}
static float deqnt_affine_to_f32(int8_t qnt, int32_t zp, float scale) {
return ((float)qnt - (float)zp) * scale;
}
static float deqnt_affine_u8_to_f32(uint8_t qnt, int32_t zp, float scale) {
return ((float)qnt - (float)zp) * scale;
}
void softmax(float *input, int size) {
float max_val = input[0];
for (int i = 1; i < size; ++i) {
if (input[i] > max_val) {
max_val = input[i];
}
}
float sum_exp = 0.0;
for (int i = 0; i < size; ++i) {
sum_exp += expf(input[i] - max_val);
}
for (int i = 0; i < size; ++i) {
input[i] = expf(input[i] - max_val) / sum_exp;
}
}
static int process_i8(int8_t *input, int grid_h, int grid_w, int stride,
std::vector<float> &boxes, std::vector<float> &boxScores, std::vector<int> &classId, float threshold,
int32_t zp, float scale, int index) {
int input_loc_len = 64;
int tensor_len = input_loc_len + OBJ_CLASS_NUM;
int validCount = 0;
int8_t thres_i8 = qnt_f32_to_affine(unsigmoid(threshold), zp, scale);
for (int h = 0; h < grid_h; h++) {
for (int w = 0; w < grid_w; w++) {
for (int a = 0; a < OBJ_CLASS_NUM; a++) {
if(input[(input_loc_len + a)*grid_w * grid_h + h * grid_w + w ] >= thres_i8) { //[1,tensor_len,grid_h,grid_w]
float box_conf_f32 = sigmoid(deqnt_affine_to_f32(input[(input_loc_len + a) * grid_w * grid_h + h * grid_w + w ],
zp, scale));
float loc[input_loc_len];
for (int i = 0; i < input_loc_len; ++i) {
loc[i] = deqnt_affine_to_f32(input[i * grid_w * grid_h + h * grid_w + w], zp, scale);
}
for (int i = 0; i < input_loc_len / 16; ++i) {
softmax(&loc[i * 16], 16);
}
float xywh_[4] = {0, 0, 0, 0};
float xywh[4] = {0, 0, 0, 0};
for (int dfl = 0; dfl < 16; ++dfl) {
xywh_[0] += loc[dfl] * dfl;
xywh_[1] += loc[1 * 16 + dfl] * dfl;
xywh_[2] += loc[2 * 16 + dfl] * dfl;
xywh_[3] += loc[3 * 16 + dfl] * dfl;
}
xywh_[0]=(w+0.5)-xywh_[0];
xywh_[1]=(h+0.5)-xywh_[1];
xywh_[2]=(w+0.5)+xywh_[2];
xywh_[3]=(h+0.5)+xywh_[3];
xywh[0]=((xywh_[0]+xywh_[2])/2)*stride;
xywh[1]=((xywh_[1]+xywh_[3])/2)*stride;
xywh[2]=(xywh_[2]-xywh_[0])*stride;
xywh[3]=(xywh_[3]-xywh_[1])*stride;
xywh[0]=xywh[0]-xywh[2]/2;
xywh[1]=xywh[1]-xywh[3]/2;
boxes.push_back(xywh[0]);//x
boxes.push_back(xywh[1]);//y
boxes.push_back(xywh[2]);//w
boxes.push_back(xywh[3]);//h
boxes.push_back(float(index + (h * grid_w) + w));//keypoints index
boxScores.push_back(box_conf_f32);
classId.push_back(a);
validCount++;
}
}
}
}
return validCount;
}
static int process_u8(uint8_t *input, int grid_h, int grid_w, int stride,
std::vector<float> &boxes, std::vector<float> &boxScores, std::vector<int> &classId, float threshold,
int32_t zp, float scale, int index) {
int input_loc_len = 64;
int tensor_len = input_loc_len + OBJ_CLASS_NUM;
int validCount = 0;
uint8_t thres_i8 = qnt_f32_to_affine_u8(unsigmoid(threshold), zp, scale);
for (int h = 0; h < grid_h; h++) {
for (int w = 0; w < grid_w; w++) {
for (int a = 0; a < OBJ_CLASS_NUM; a++) {
if(input[(input_loc_len + a)*grid_w * grid_h + h * grid_w + w ] >= thres_i8) { //[1,tensor_len,grid_h,grid_w]
float box_conf_f32 = sigmoid(deqnt_affine_u8_to_f32(input[(input_loc_len + a) * grid_w * grid_h + h * grid_w + w ],
zp, scale));
float loc[input_loc_len];
for (int i = 0; i < input_loc_len; ++i) {
loc[i] = deqnt_affine_u8_to_f32(input[i * grid_w * grid_h + h * grid_w + w], zp, scale);
}
for (int i = 0; i < input_loc_len / 16; ++i) {
softmax(&loc[i * 16], 16);
}
float xywh_[4] = {0, 0, 0, 0};
float xywh[4] = {0, 0, 0, 0};
for (int dfl = 0; dfl < 16; ++dfl) {
xywh_[0] += loc[dfl] * dfl;
xywh_[1] += loc[1 * 16 + dfl] * dfl;
xywh_[2] += loc[2 * 16 + dfl] * dfl;
xywh_[3] += loc[3 * 16 + dfl] * dfl;
}
xywh_[0]=(w+0.5)-xywh_[0];
xywh_[1]=(h+0.5)-xywh_[1];
xywh_[2]=(w+0.5)+xywh_[2];
xywh_[3]=(h+0.5)+xywh_[3];
xywh[0]=((xywh_[0]+xywh_[2])/2)*stride;
xywh[1]=((xywh_[1]+xywh_[3])/2)*stride;
xywh[2]=(xywh_[2]-xywh_[0])*stride;
xywh[3]=(xywh_[3]-xywh_[1])*stride;
xywh[0]=xywh[0]-xywh[2]/2;
xywh[1]=xywh[1]-xywh[3]/2;
boxes.push_back(xywh[0]);//x
boxes.push_back(xywh[1]);//y
boxes.push_back(xywh[2]);//w
boxes.push_back(xywh[3]);//h
boxes.push_back(float(index + (h * grid_w) + w));//keypoints index
boxScores.push_back(box_conf_f32);
classId.push_back(a);
validCount++;
}
}
}
}
return validCount;
}
static int process_fp32(float *input, int grid_h, int grid_w, int stride,
std::vector<float> &boxes, std::vector<float> &boxScores, std::vector<int> &classId, float threshold,
int32_t zp, float scale, int index) {
int input_loc_len = 64;
int tensor_len = input_loc_len + OBJ_CLASS_NUM;
int validCount = 0;
float thres_fp = unsigmoid(threshold);
for (int h = 0; h < grid_h; h++) {
for (int w = 0; w < grid_w; w++) {
for (int a = 0; a < OBJ_CLASS_NUM; a++) {
if(input[(input_loc_len + a)*grid_w * grid_h + h * grid_w + w ] >= thres_fp) { //[1,tensor_len,grid_h,grid_w]
float box_conf_f32 = sigmoid(input[(input_loc_len + a) * grid_w * grid_h + h * grid_w + w ]);
float loc[input_loc_len];
for (int i = 0; i < input_loc_len; ++i) {
loc[i] = input[i * grid_w * grid_h + h * grid_w + w];
}
for (int i = 0; i < input_loc_len / 16; ++i) {
softmax(&loc[i * 16], 16);
}
float xywh_[4] = {0, 0, 0, 0};
float xywh[4] = {0, 0, 0, 0};
for (int dfl = 0; dfl < 16; ++dfl) {
xywh_[0] += loc[dfl] * dfl;
xywh_[1] += loc[1 * 16 + dfl] * dfl;
xywh_[2] += loc[2 * 16 + dfl] * dfl;
xywh_[3] += loc[3 * 16 + dfl] * dfl;
}
xywh_[0]=(w+0.5)-xywh_[0];
xywh_[1]=(h+0.5)-xywh_[1];
xywh_[2]=(w+0.5)+xywh_[2];
xywh_[3]=(h+0.5)+xywh_[3];
xywh[0]=((xywh_[0]+xywh_[2])/2)*stride;
xywh[1]=((xywh_[1]+xywh_[3])/2)*stride;
xywh[2]=(xywh_[2]-xywh_[0])*stride;
xywh[3]=(xywh_[3]-xywh_[1])*stride;
xywh[0]=xywh[0]-xywh[2]/2;
xywh[1]=xywh[1]-xywh[3]/2;
boxes.push_back(xywh[0]);//x
boxes.push_back(xywh[1]);//y
boxes.push_back(xywh[2]);//w
boxes.push_back(xywh[3]);//h
boxes.push_back(float(index + (h * grid_w) + w));//keypoints index
boxScores.push_back(box_conf_f32);
classId.push_back(a);
validCount++;
}
}
}
}
return validCount;
}
int post_process(rknn_app_context_t *app_ctx, void *outputs, letterbox_t *letter_box, float conf_threshold, float nms_threshold,
object_detect_result_list *od_results) {
#if defined(RV1106_1103)
rknn_tensor_mem **_outputs = (rknn_tensor_mem **)outputs;
#else
rknn_output *_outputs = (rknn_output *)outputs;
#endif
std::vector<float> filterBoxes;
std::vector<float> objProbs;
std::vector<int> classId;
int validCount = 0;
int stride = 0;
int grid_h = 0;
int grid_w = 0;
int model_in_w = app_ctx->model_width;
int model_in_h = app_ctx->model_height;
memset(od_results, 0, sizeof(object_detect_result_list));
int index = 0;
#ifdef RKNPU1
for (int i = 0; i < 3; i++) {
grid_h = app_ctx->output_attrs[i].dims[1];
grid_w = app_ctx->output_attrs[i].dims[0];
stride = model_in_h / grid_h;
if (app_ctx->is_quant) {
validCount += process_u8((uint8_t *)_outputs[i].buf, grid_h, grid_w, stride, filterBoxes, objProbs,
classId, conf_threshold, app_ctx->output_attrs[i].zp, app_ctx->output_attrs[i].scale, index);
}
else
{
validCount += process_fp32((float *)_outputs[i].buf, grid_h, grid_w, stride, filterBoxes, objProbs,
classId, conf_threshold, app_ctx->output_attrs[i].zp, app_ctx->output_attrs[i].scale, index);
}
index += grid_h * grid_w;
}
#else
for (int i = 0; i < 3; i++) {
grid_h = app_ctx->output_attrs[i].dims[2];
grid_w = app_ctx->output_attrs[i].dims[3];
stride = model_in_h / grid_h;
if (app_ctx->is_quant) {
validCount += process_i8((int8_t *)_outputs[i].buf, grid_h, grid_w, stride, filterBoxes, objProbs,
classId, conf_threshold, app_ctx->output_attrs[i].zp, app_ctx->output_attrs[i].scale,index);
}
else
{
validCount += process_fp32((float *)_outputs[i].buf, grid_h, grid_w, stride, filterBoxes, objProbs,
classId, conf_threshold, app_ctx->output_attrs[i].zp, app_ctx->output_attrs[i].scale, index);
}
index += grid_h * grid_w;
}
#endif
// no object detect
if (validCount <= 0) {
return 0;
}
std::vector<int> indexArray;
for (int i = 0; i < validCount; ++i) {
indexArray.push_back(i);
}
quick_sort_indice_inverse(objProbs, 0, validCount - 1, indexArray);
std::set<int> class_set(std::begin(classId), std::end(classId));
for (auto c : class_set) {
nms(validCount, filterBoxes, classId, indexArray, c, nms_threshold);
}
int last_count = 0;
od_results->count = 0;
/* box valid detect target */
for (int i = 0; i < validCount; ++i) {
if (indexArray[i] == -1 || last_count >= OBJ_NUMB_MAX_SIZE) {
continue;
}
int n = indexArray[i];
float x1 = filterBoxes[n * 5 + 0] - letter_box->x_pad;
float y1 = filterBoxes[n * 5 + 1] - letter_box->y_pad;
float w = filterBoxes[n * 5 + 2];
float h = filterBoxes[n * 5 + 3];
int keypoints_index = (int)filterBoxes[n * 5 + 4];
// Calculate total anchor points dynamically from output tensor
int total_anchors = index; // Total anchor points from all scales
for (int j = 0; j < 17; ++j) {
if (app_ctx->is_quant) {
#ifdef RKNPU1
od_results->results[last_count].keypoints[j][0] = (deqnt_affine_u8_to_f32(((uint8_t *)_outputs[3].buf)[(j * 3 + 0) * total_anchors + keypoints_index],
app_ctx->output_attrs[3].zp, app_ctx->output_attrs[3].scale)- letter_box->x_pad)/ letter_box->scale;
od_results->results[last_count].keypoints[j][1] = (deqnt_affine_u8_to_f32(((uint8_t *)_outputs[3].buf)[(j * 3 + 1) * total_anchors + keypoints_index],
app_ctx->output_attrs[3].zp, app_ctx->output_attrs[3].scale)- letter_box->y_pad)/ letter_box->scale;
od_results->results[last_count].keypoints[j][2] = deqnt_affine_u8_to_f32(((uint8_t *)_outputs[3].buf)[(j * 3 + 2) * total_anchors + keypoints_index],
app_ctx->output_attrs[3].zp, app_ctx->output_attrs[3].scale);
#else
// RKNPU2: INT8 quantized output, use deqnt_affine_to_f32
od_results->results[last_count].keypoints[j][0] = (deqnt_affine_to_f32(((int8_t *)_outputs[3].buf)[(j*3+0)*total_anchors+keypoints_index],
app_ctx->output_attrs[3].zp, app_ctx->output_attrs[3].scale) - letter_box->x_pad) / letter_box->scale;
od_results->results[last_count].keypoints[j][1] = (deqnt_affine_to_f32(((int8_t *)_outputs[3].buf)[(j*3+1)*total_anchors+keypoints_index],
app_ctx->output_attrs[3].zp, app_ctx->output_attrs[3].scale) - letter_box->y_pad) / letter_box->scale;
od_results->results[last_count].keypoints[j][2] = deqnt_affine_to_f32(((int8_t *)_outputs[3].buf)[(j*3+2)*total_anchors+keypoints_index],
app_ctx->output_attrs[3].zp, app_ctx->output_attrs[3].scale);
#endif
}
else
{
od_results->results[last_count].keypoints[j][0] = (((float *)_outputs[3].buf)[(j*3+0)*total_anchors+keypoints_index]
- letter_box->x_pad)/ letter_box->scale;
od_results->results[last_count].keypoints[j][1] = (((float *)_outputs[3].buf)[(j*3+1)*total_anchors+keypoints_index]
- letter_box->y_pad)/ letter_box->scale;
od_results->results[last_count].keypoints[j][2] = ((float *)_outputs[3].buf)[(j*3+2)*total_anchors+keypoints_index];
}
}
int id = classId[n];
float obj_conf = objProbs[i];
od_results->results[last_count].box.left = (int)(clamp(x1, 0, model_in_w) / letter_box->scale);
od_results->results[last_count].box.top = (int)(clamp(y1, 0, model_in_h) / letter_box->scale);
od_results->results[last_count].box.right = (int)(clamp(x1+w, 0, model_in_w) / letter_box->scale);
od_results->results[last_count].box.bottom = (int)(clamp(y1+h, 0, model_in_h) / letter_box->scale);
// od_results->results[last_count].box.angle = angle;
od_results->results[last_count].prop = obj_conf;
od_results->results[last_count].cls_id = id;
last_count++;
}
od_results->count = last_count;
return 0;
}
int init_post_process() {
int ret = 0;
ret = loadLabelName(LABEL_NALE_TXT_PATH, labels);
if (ret < 0) {
printf("Load %s failed!\n", LABEL_NALE_TXT_PATH);
return -1;
}
return 0;
}
char *coco_cls_to_name(int cls_id) {
if (cls_id >= OBJ_CLASS_NUM) {
return "null";
}
if (labels[cls_id]) {
return labels[cls_id];
}
return "null";
}
void deinit_post_process() {
for (int i = 0; i < OBJ_CLASS_NUM; i++) {
if (labels[i] != nullptr) {
free(labels[i]);
labels[i] = nullptr;
}
}
}2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
编译
进入项目目录:
cd rknn_model_zoo/给予 build-linux.sh 运行权限:
sudo chmod +x ./build-linux.sh运行编译脚本:
./build-linux.sh -t <target> -a <arch> -d <build_demo_name> [-b <build_type>] [-m] [-r] [-j]
-t : target (rk356x/rk3576/rk3588/rv1106/rv1126b/rv1126/rk1808)
-a : arch (aarch64/armhf)
-d : demo name
-b : build_type(Debug/Release)
-m : enable address sanitizer, build_type need set to Debug
-r : disable rga, use cpu resize image
-j : disable libjpeg to avoid conflicts between libjpeg and opencv
# 我们运行RK3576相关的命令即可。:
./build-linux.sh -t rk3576 -a aarch64 -d yolov8_pose2
3
4
5
6
7
8
9
10
11
注意
<demo name>这个参数要和rknn_model_zoo/examples中的目标文件夹名称保持一致,因为依靠此参数选择编译的Demo。
最终生成 install/ 文件目录如下:
(base) lipeng@host:~/workspace/yolo11/rknn_model_zoo$ tree install
install
`-- rk3576_linux_aarch64
|-- rknn_yolov8_pose_demo
| |-- lib
| | |-- librga.so
| | `-- librknnrt.so
| |-- model
| | |-- bus.jpg
| | |-- yolov8_pose.rknn
| | `-- yolov8_pose_labels_list.txt
| `-- rknn_yolov8_pose_demo2
3
4
5
6
7
8
9
10
11
12
板端Demo演示
转移文件
接下来我们需要将 rknn_model_zoo/install/rk3576_linux_aarch64/rknn_yolov8_pose_demo 目录转移到到板子上面:
推荐使用
adb工具,进行转移,泰山派3m默认开启ADB,或者使用TF卡,ssh或者U盘都可以。参考:https://wiki.lckfb.com/zh-hans/tspi-3-rk3576/system-usage/debian12-usage/adb-usage.html
adb push yolo11/rknn_model_zoo/install/rk3576_linux_aarch64/rknn_yolov8_pose_demo /home/lckfb/板端运行
详细请阅读:https://github.com/airockchip/rknn_model_zoo/blob/main/examples/yolo11/README.md
我们进入泰山派开发板的终端,然后进入 rknn_yolov8_pose_demo/ 目录:
# 进入目录
cd rknn_yolov8_pose_demo/2
设置动态库路径:(为当前目录下的 ./lib 目录下 )
# 设置动态库路径 (非常重要,否则会报错误)
export LD_LIBRARY_PATH=./lib2
赋予demo可执行权限
sudo chmod +x rknn_yolov8_pose_demo运行Demo:
# 命令格式:./rknn_yolov8_pose_demo <RKNN模型路径> <传入的图片路径>
sudo ./rknn_yolov8_pose_demo model/yolov8_pose.rknn model/bus.jpg2
最终会生成一个
out.png图片,保存有最终识别之后的成果。