背景
在微信偶然發(fā)現(xiàn)聆思科技的CSK6開發(fā)板的評(píng)估活動(dòng),因?yàn)榻?jīng)常在各種硬件平臺(tái)上測(cè)試模型,因此申請(qǐng)了測(cè)評(píng)。很榮幸能被選中。
官方提供了開源分類的模型轉(zhuǎn)換,但平常使用分類模型較少因此嘗試了目標(biāo)檢測(cè)模型的轉(zhuǎn)換。
模型架構(gòu)
模型的思路采自centernet,基干是修改過的普通VGG塊,F(xiàn)PN是簡(jiǎn)單的自頂向下結(jié)構(gòu),head輸出了一個(gè)hm(中心點(diǎn))和wh,但截至于寫文章時(shí),官方提供的燒錄板子接口只能輸出一個(gè)head,所以又將hm和wh cat在一起。網(wǎng)絡(luò)結(jié)構(gòu)如下
過程
環(huán)境搭建
linger與thinker 環(huán)境搭建
linger是用于量化訓(xùn)練的,thinker是用來轉(zhuǎn)換模型的。我使用的是wsl中Ubuntu18環(huán)境。
linger配置
conda create -n linger-env python==3.7.0
conda activate linger-env
git clone https://github.com/LISTENAI/linger.git
cd linger && sh install.sh
pip -U pip
cat requirements.txt |xargs -n 1 pip install
thinker配置
conda create -n thinker-env python==3.7.0
conda activate thinker-env
git clone https://github.com/LISTENAI/thinker.git
cd thinker
bash ./scripts/x86_linux.sh
pip -U pip
cat requirements.txt |xargs -n 1 pip install
兩個(gè)環(huán)境分開搭建,搭建好后我們就可以進(jìn)行訓(xùn)練了。
模型訓(xùn)練及轉(zhuǎn)換
模型訓(xùn)練過程中需要先進(jìn)行浮點(diǎn)訓(xùn)練,再進(jìn)行定點(diǎn)訓(xùn)練,然后再轉(zhuǎn)換成.bin格式。
linger不支持tensorboard,所以要把相關(guān)代碼注釋掉。其余的就是添加幾行代碼就ok了。
原始代碼
model = create_model()
model = model.to(cfg.device)
optimizer = torch.optim.Adam(model.parameters(), cfg.lr, betas=(0.9, 0.999), eps=1e-08, weight_decay=1e-4)
修改后代碼
import linger
model = create_model()
model = model.to(cfg.device)
dummy_input = torch.randn(1, 3, 128, 128,requires_grad=True).cuda()
linger.trace_layers(model, model, dummy_input, fuse_bn=True)
type_modules = (nn.Conv2d,nn.BatchNorm2d,nn.ConvTranspose2d)
normalize_modules =(nn.Conv2d,nn.BatchNorm2d,nn.ConvTranspose2d)
linger.normalize_module(model, type_modules=type_modules, normalize_weight_value=16, normalize_bias_value=16,
normalize_output_value=16)
optimizer = torch.optim.Adam(model.parameters(), cfg.lr, betas=(0.9, 0.999), eps=1e-08, weight_decay=1e-4)
再訓(xùn)練完浮點(diǎn)模型后需要加載保存的浮點(diǎn)模型進(jìn)行定點(diǎn)訓(xùn)練,注意需要使用更小的學(xué)習(xí)率。
import linger
model = create_model(arch=cfg.arch, num_classes=train_dataset.num_classes, inference_mode=True, onnx_flag=False)
model = model.to(cfg.device)
dummy_input = torch.randn(1, 3, 128, 128, requires_grad=True).cuda()
type_modules = (nn.Conv2d,nn.BatchNorm2d,nn.ConvTranspose2d)
normalize_modules = (nn.Conv2d,nn.BatchNorm2d,nn.ConvTranspose2d)
linger.normalize_module(model, type_modules=type_modules, normalize_weight_value=16, normalize_bias_value=16,
normalize_output_value=16)
model = linger.normalize_layers(model, normalize_modules=normalize_modules, normalize_weight_value=8,
normalize_bias_value=8, normalize_output_value=8)
quant_modules = (nn.Conv2d,nn.BatchNorm2d,nn.ConvTranspose2d)
model = linger.init(model, quant_modules=quant_modules)
model.load_state_dict(torch.load(cfg.load_model)['state_dict'])
optimizer = torch.optim.Adam(model.parameters(), cfg.lr, betas=(0.9, 0.999), eps=1e-08, weight_decay=1e-4)
定點(diǎn)模型訓(xùn)練完畢后,需要轉(zhuǎn)換成onnx格式
import linger
model = create_model()
model = model.to(cfg.device)
dummy_input = torch.randn(1, 3, 128, 128, requires_grad=True).cuda()
linger.SetIQTensorCat(True)
type_modules = (nn.Conv2d,nn.BatchNorm2d,nn.ConvTranspose2d)
normalize_modules = (nn.Conv2d,nn.BatchNorm2d,nn.ConvTranspose2d)
linger.normalize_module(model, type_modules=type_modules, normalize_weight_value=16, normalize_bias_value=16,
normalize_output_value=16)
model = linger.normalize_layers(model, normalize_modules=normalize_modules, normalize_weight_value=8,
normalize_bias_value=8, normalize_output_value=8)
quant_modules = (nn.Conv2d,nn.BatchNorm2d,nn.ConvTranspose2d)
model = linger.init(model, quant_modules=quant_modules)
model.load_state_dict(torch.load(cfg.load_model)['state_dict'])
model.eval()
dummy_input = torch.ones(1, 3, 128, 128).cuda()
with torch.no_grad():
torch.onnx.export(model, dummy_input, 'lnn.onnx',input_names=['input'], output_names=['hm'],
export_params=True,opset_version=12,operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK)
模型轉(zhuǎn)成onnx后需要在thinker環(huán)境中轉(zhuǎn)成.bin格式
conda activate thinker-env
tpacker -g net.onnx -d True -o model.bin
如果最后一步報(bào)錯(cuò),可能是因?yàn)椴环限D(zhuǎn)換要求導(dǎo)致,按照官方要求來就行。
燒錄模型到板子
首先需要安裝lisa環(huán)境,這里我選擇了wsl中的ubuntu22.04環(huán)境
先安裝lisa zep 命令行工具,wasi-sdk,wasm-sdk,
編譯程序
配置環(huán)境變量
export WASM_THINKER_SDK="/path_to_sdk/wasm-sdk"
export WASI_TOOLCHAIN_PATH="/path_to_sdk/wasi-sdk-17.0"
安裝完成后,下載官方提供的demo
lisa zep create --from-git https://cloud.listenai.com/listenai/samples/camera_image_detect.git
修改wasm 應(yīng)用
cd app_wasm/
vi main.c
將文件修改為如下
#include < stdio.h >
#include "thinker/thinker.h"
static tModelHandle model_hdl;
static tExecHandle hdl;
int
main(int argc, char **argv)
{
printf("BOOT: WAMRn");
tStatus ret;
char version[30];
tGetVersion(0, version, sizeof(version));
printf("[WASM] tGetVersion: %sn", version);
ret = tInitialize();
printf("[WASM] tInitialize: %dn", ret);
if (ret != T_SUCCESS) return 1;
return 0;
}
int
set_model(void *ptr, uint32_t size)
{
tStatus ret;
uint32_t use_psram_size = 0;
uint32_t use_share_size = 0;
int num_memory = 0;
tMemory memory_list[7];
ret = tGetMemoryPlan(
memory_list, &num_memory, (int8_t *)ptr, size, &use_psram_size, &use_share_size);
printf("[WASM] tGetMemoryPlan: %dn", ret);
if (ret != T_SUCCESS) return 1;
printf("[WASM] * num_memory=%dn", num_memory);
printf("[WASM] * use_psram_size=%dn", use_psram_size);
printf("[WASM] * use_share_size=%dn", use_share_size);
for (int i = 0; i < num_memory; i++) {
printf("[WASM] * memory_list[%d].dev_type=%dn", i, memory_list[i].dev_type_);
printf("[WASM] * memory_list[%d].mem_type=%dn", i, memory_list[i].mem_type_);
printf("[WASM] * memory_list[%d].size=%dn", i, memory_list[i].size_);
printf("[WASM] * memory_list[%d].addr=0x%08llxn", i, memory_list[i].dptr_);
}
ret = tModelInit(&model_hdl, (int8_t *)ptr, size, memory_list, num_memory);
printf("[WASM] tModelInit: %d, model=0x%llxn", ret, model_hdl);
if (ret != T_SUCCESS) return 1;
ret = tCreateExecutor(model_hdl, &hdl, memory_list, num_memory);
printf("[WASM] tCreateExecutor: %d, hdl=0x%llxn", ret, hdl);
if (ret != T_SUCCESS) return 1;
return 0;
}
int
set_input(void *ptr, uint32_t size)
{
printf("[WASM] set_input(%p, %d)n", ptr, size);
tStatus ret;
int32_t in_c = 3;
int32_t in_h = 128;
int32_t in_w = 128;
tData input;
input.dtype_ = Int8;
input.scale_ = 5;
input.shape_.ndim_ = 4;
input.shape_.dims_[0] = 1;
input.shape_.dims_[1] = in_c;
input.shape_.dims_[2] = in_h;
input.shape_.dims_[3] = in_w;
input.dptr_ = ptr;
ret = tSetInput(hdl, 0, &input);
printf("[WASM] tSetInput: %dn", ret);
return ret;
}
int
get_output(void **ptr, uint32_t *size)
{
printf("[WASM] get_outputn");
tStatus ret;
ret = tForward(hdl);
printf("[WASM] tForward: %dn", ret);
if (ret != T_SUCCESS) return 1;
tData output;
ret = tGetOutput(hdl, 0, &output);
printf("[WASM] tGetOutput: %dn", ret);
if (ret != T_SUCCESS) return 1;
printf("[WASM] * output.dtype=%un", output.dtype_);
printf("[WASM] * output.shape.ndim=%un", output.shape_.ndim_);
printf("[WASM] * output.shape.ndim=%un", output.shape_.ndim_);
printf("[WASM] * output.dptr=0x%pn", output.dptr_);
int shape_size = (output.dtype_ & 0xF);
for (int i = 0; i < output.shape_.ndim_; i++) {
shape_size *= output.shape_.dims_[i];
}
printf("[WASM] * shape_size=%dn", shape_size);
*ptr = output.dptr_;
*size = shape_size;
return ret;
}
主要是修改set_input中in_h和in_w為自己模型的輸入輸出,input.scale_的值修改為自己模型的值,這個(gè)如何查看開源通過onnx中的input quant中的scale_x, scale_x=pow(2,input.scale_)
修改主程序
cd ..
cd camera_image_detect
vi main.c
修改后的代碼為
#include < zephyr/kernel.h >
#include < zephyr/device.h >
#include < zephyr/drivers/gpio.h >
#include < zephyr/drivers/video.h >
#include < zephyr/storage/flash_map.h >
#include < math.h >
#include < csk_malloc.h >
#include < lsf/services/thinker.h >
#include "lib_image.h"
#include "venus_ap.h"
#define THINKER_MODEL_ADDR (FLASH_BASE + FLASH_AREA_OFFSET(thinker_model))
double CIFAR100_TRAIN_MEAN[] = {0.5070751592371323, 0.48654887331495095, 0.4409178433670343};
double CIFAR100_TRAIN_STD[] = {0.2673342858792401, 0.2564384629170883, 0.27615047132568404};
void main(void)
{
int ret;
printk("Hello World! %sn", CONFIG_BOARD);
/* 加載 Thinker 模型,注意傳入模型的實(shí)際字節(jié)數(shù) */
lsf_thinker_set_model((void *)THINKER_MODEL_ADDR, 421520);
const struct device *video = device_get_binding(DT_LABEL(DT_NODELABEL(dvp)));
if (video == NULL) {
printk("Video device not foundn");
return;
}
struct video_format fmt;
fmt.pixelformat = VIDEO_PIX_FMT_VYUY;
fmt.width = 640;
fmt.height = 480;
fmt.pitch = fmt.width * 2;
if (video_set_format(video, VIDEO_EP_OUT, &fmt)) {
printk("Unable to set video formatn");
return;
}
// 圖像輸入?yún)^(qū)域
float box[4] = {0, 0, 0, 0};
box[2] = fmt.width;
box[3] = fmt.height;
struct video_buffer *buffers[2];
/* Size to allocate for each buffer */
int bsize = fmt.width * fmt.height * 2;
/* Alloc video buffers and enqueue for capture */
for (int i = 0; i < ARRAY_SIZE(buffers); i++) {
printk("#%d: Alloc video buffer: %dn", i, bsize);
buffers[i] = video_buffer_alloc(bsize);
if (buffers[i] == NULL) {
csk_heap_info();
printk("Unable to alloc video buffern");
return;
}
video_enqueue(video, VIDEO_EP_OUT, buffers[i]);
}
ret = video_stream_start(video);
if (ret != 0) {
printk("Unable to start video streamn");
return;
}
size_t result_size = 3 * 128 * 128;
size_t pixel_count = fmt.width * fmt.height;
uint8_t *result = csk_malloc(result_size); // 縮放后的圖像
assert(result != NULL);
uint8_t *rgb_buffer = csk_malloc(pixel_count * 3); // 存 RGB 數(shù)組
assert(rgb_buffer != NULL);
double *input_data = csk_malloc(result_size * sizeof(double)); // 特征換算
assert(input_data != NULL);
uint8_t *final_input = csk_malloc(result_size); // 最終輸入
assert(final_input != NULL);
size_t one_third_result = result_size / 3;
int8_t *output;
uint32_t output_size;
// Start process
struct video_buffer *vbuf;
ret = video_dequeue(video, VIDEO_EP_OUT, &vbuf, K_MSEC(1000));
if (ret != 0) {
printk("Video buffer dequeued failed: %dn", ret);
return;
}
uint8_t *buffer = vbuf- >buffer;
printk("Processing...n");
vyuy_to_rgb24(buffer, rgb_buffer, pixel_count);
video_enqueue(video, VIDEO_EP_OUT, vbuf);
printk("Resizing...n");
ImagingResample(rgb_buffer, fmt.width, fmt.height, result, 128, 128, box);
// 特征換算
printk("Feature extraction...n");
for (int i = 0; i < result_size; i++) {
int index = i % 3;
uint8_t value = result[i];
input_data[i] = (double)(value / 255.0 - CIFAR100_TRAIN_MEAN[index]) /
CIFAR100_TRAIN_STD[index];
}
// 輸入數(shù)據(jù)
printk("Input data to Thinker...n");
for (int i = 0; i < result_size; i++) {
// final_input[i] = (int8_t)floor(input_data[i] * 64 + 0.5);
if (i < one_third_result) {
final_input[i] = (int8_t)floor(input_data[i * 3] * 64 + 0.5);
} else if (i < one_third_result * 2) {
final_input[i] = (int8_t)floor(
input_data[(i - one_third_result) * 3 + 1] * 64 + 0.5);
} else {
final_input[i] = (int8_t)floor(
input_data[(i - one_third_result) * 3 + 2] * 64 + 0.5);
}
}
// input data 給 Thinker
lsf_thinker_set_input(final_input, result_size);
// 獲取 Output
lsf_thinker_get_output((void **)&output, &output_size);
int max_value = -129;
int cur_index = 0;
int xs = 0;
int ys = 0;
float score = 0;
int w = 32;
int h =32;
for (int c = 0; c < 1; c++) {
for (int h1 = 0; h1 < h; h1++) {
for (int w1 = 0; w1 < w; w1++) {
int value = output[cur_index];
if (value > max_value) {
max_value = value;
xs = w1;
ys = h1;
score = value / pow(2.0, 3);
}
cur_index++;
}
}
}
int32_t idx_lx = w * h + ys * w + xs;
int32_t idx_ty = w * h + idx_lx;
int32_t idx_rx = w * h * 2 + idx_lx;
int32_t idx_by = w * h * 3 + idx_lx;
float a1 = xs - (float)(output[idx_lx]) / pow(2.0, 3);
float b1 = ys - (float)(output[idx_ty]) / pow(2.0, 3);
float c1 = xs + (float)(output[idx_rx]) / pow(2.0, 3);
float d1 = ys + (float)(output[idx_by]) / pow(2.0, 3);
float x1 = a1 * 4 * (640.0 /128.0);
float y1 = b1 * 4* (480.0 /128.0);
float x2 = c1 * 4* (640.0 /128.0);
float y2 = d1 * 4* (480.0 /128.0);
printk("facebox:x1:%f,y1:%f,x2:%f,y2:%f,score:%fn",x1,y1,x2,y2,score);
for (int j = 0; j < 128; j++) {
for (int i = 0; i < 128; i++) {
uint8_t pixe_r = result[j * 128 * 3 + i * 3];
uint8_t pixe_g = result[j * 128 * 3 + i * 3 + 1];
uint8_t pixe_b = result[j * 128 * 3 + i * 3 + 2];
printk("?33[0;38;2;%d;%d;%dm#", pixe_r, pixe_g, pixe_b);
}
printk("?33[00mn");
}
}
需要修改的地方為
1.double CIFAR100_TRAIN_MEAN[]和double CIFAR100_TRAIN_STD[]改為自己模型使用的值
2.lsfthinkersetmodel((void )THINKERMODELADDR, 421520)
中的第二個(gè)參數(shù)為自己模型的大小
3.resultsize = 3 128 128;改為自己模型的輸入大小
4.ImagingResample(rgbbuffer, fmt.width, fmt.height, result, 128, 128, box);中的倒數(shù)第二個(gè)和第三個(gè)參數(shù)改為自己模型的輸入大小
5.修改自己的后處理:將講一下我自己的后處理邏輯:我參照的是centernet的邏輯,只不過將hm和wh兩個(gè)輸出cat在一起,因此最后需要拆開來看;hm為13232的特征圖,wh為43232的特征圖。hm代表目標(biāo)中心的score值,wh代表對(duì)應(yīng)中心點(diǎn)的左上右下的距離,組合起來就是目標(biāo)框的左上和右下角點(diǎn)。因此先計(jì)算hm特征圖,為了節(jié)省時(shí)間以及單純認(rèn)為只檢測(cè)一個(gè)人臉,遍歷hm特征圖后得到最大值就是目標(biāo)中心點(diǎn),然后獲取該中心點(diǎn)位置的wh的4個(gè)值就是目標(biāo)框。
燒錄到板子
1.編譯wasm應(yīng)用
cd app_wasm
lisa zep exec python $WASM_THINKER_SDK/cmake/sdk.py build -p .
此時(shí)會(huì)在當(dāng)前目錄下build文件夾中生成:thinker_resnet18.aot
2.編譯主程序
cd camera_image_detect
lisa zep build -b csk6011a_nano
此時(shí)會(huì)在當(dāng)前目錄下build文件夾中生成:zephyr/zephyr.bin
3.燒錄到板子
lisa zep exec cskburn -s COMx -C 6 0x0 ./build/zephyr/zephyr.bin -b 748800
lisa zep exec cskburn -s COMx -C 6 0x100000 ./resource/cp.bin -b 748800
lisa zep exec cskburn -s COMx -C 6 0x200000 ./app_wasm/build/thinker_resnet18.aot -b 748800
lisa zep exec cskburn -s COMx -C 6 0x300000 ./resource/resnet18_model.bin -b 748800
其中 COMx 為 自己板子的串口,我的是COM3,需要修改對(duì)應(yīng)的串口名字
./resource/resnet18_model.bin 是自己的模型,需要修改成對(duì)應(yīng)的路徑
4.串口查看結(jié)果
使用官方提供的串口工具
去查看模型結(jié)果
連接板子后,按動(dòng)復(fù)位鍵,就會(huì)看到結(jié)果
-
printf函數(shù)
+關(guān)注
關(guān)注
0文章
31瀏覽量
6101 -
Ubuntu系統(tǒng)
+關(guān)注
關(guān)注
0文章
91瀏覽量
4285
發(fā)布評(píng)論請(qǐng)先 登錄
KiCad 中的自定義規(guī)則(KiCon 演講)

HarmonyOS應(yīng)用自定義鍵盤解決方案
手把手教你把coze扣子智能體接入CSK6大模型開發(fā)板實(shí)現(xiàn)聽新聞自由
如何添加自定義單板
聆思CSK6大模型語音開發(fā)板接入DeepSeek資料匯總(包含深度求索/火山引擎/硅基流動(dòng)華為昇騰滿血版)
使用OpenVINO? 2021.4在CPU和MYRIAD上進(jìn)行自定義對(duì)象檢測(cè),為什么結(jié)果差異巨大?
Altium Designer 15.0自定義元件設(shè)計(jì)

think-cell:自定義think-cell(四)

think-cell;自定義think-cell(一)

TPS659xx應(yīng)用程序自定義工具

創(chuàng)建自定義的基于閃存的引導(dǎo)加載程序(BSL)

【附實(shí)操視頻】聆思CSK6大模型開發(fā)板接入國(guó)內(nèi)主流大模型(星火大模型、文心一言、豆包、kimi、智譜glm、通義千問)
智能硬件接入主流大模型做語音交互(附文心一言、豆包、kimi、智譜glm、通義千問示例)
NVIDIA AI Foundry 為全球企業(yè)打造自定義 Llama 3.1 生成式 AI 模型

評(píng)論