Model size / speed / accuracy trade-offs | Current: ResNet50 + FPN | Target: faster inference
| Option | Backbone | Params | Est. img/s | Est. MAE | Model Size | Recommend |
|---|---|---|---|---|---|---|
| Current V2 | ResNet50 | 25M | 408 | 1.52 | 102 MB | — |
| A | ResNet18 | 11M | ~700 | ~1.60 | ~45 MB | ★★★ |
| B | EfficientNet-B0 | 5M | ~800 | ~1.65 | ~22 MB | ★★★ |
| C (推薦) | MobileNetV3-Large | 5M | ~1000 | ~1.70 | ~20 MB | ★★★ 最佳 |
| D | MobileNetV3-Small | 2.5M | ~1500 | ~1.90 | ~10 MB | ★★ |
| E | GhostNet v2 | 4.9M | ~1200 | ~1.75 | ~20 MB | ★★ |
| F | TinyViT-5M | 5M | ~900 | ~1.65 | ~20 MB | ★★ |
# Before: ResNet50 + FPN
backbone = ResNet50(pretrained)
# layer1-4 outputs: 256, 512, 1024, 2048 channels
# 4-level FPN
# After: MobileNetV3-Large + simplified FPN
import timm
backbone = timm.create_model(
"mobilenetv3_large_100.ra_in1k",
pretrained=True, features_only=True
)
# Block outputs: 16, 24, 40, 112, 960 channels
# Use blocks 2/4/6 for 3-level FPN
# Or just blocks.4 for single-level decoder (even faster)
當前 FPN:4 lateral + 3 smooth + 3 head = 10 conv layers
極簡版:1 lateral + 1 conv + upsample + 1 head = 4 conv layers,再省 30% 時間
| 優化方式 | 加速 | 難度 | 準度影響 | 說明 |
|---|---|---|---|---|
| TensorRT fp16 engine | 2-3x | 低 | 無 | torch.onnx.export → trtexec --fp16 |
| TensorRT INT8 | 再 1.5-2x | 中 | -1~3% MAE | 需要 calibration 資料(100-500 張圖) |
| 輸入降到 256x256 | 2.3x | 極低 | +5-10% MAE | 只改 preprocess resize size |
| torch.compile | 1.2-1.5x | 極低 | 無 | 加一行 model = torch.compile(model) |
| ONNX Runtime | 1.3-1.7x | 低 | 無 | CPU/邊緣部署 |
| 場景 | 推薦方案 | 目標 |
|---|---|---|
| 最高 fps(多相機同時處理) | MobileNetV3-Large + TensorRT INT8 | 3000+ img/s |
| 最低延遲(即時單幀) | MobileNetV3-Small + TensorRT | < 1 ms/frame |
| 邊緣部署(Jetson Nano/NX) | MobileNetV3-Small + INT8 | 低功耗,100-300 img/s |
| 伺服器部署(省 GPU) | 當前 V2 + TensorRT | ~2000 img/s,無需重訓 |
| 保準度第一 | ResNet18 + TensorRT | 900+ img/s,MAE 僅 +0.08 |
| 路徑 | 時間 | 結果 |
|---|---|---|
| 路徑 1:TensorRT 優化(零重訓) | 30 分鐘 | V2 → ~2000 img/s,MAE 不變 |
| 路徑 2:訓 MobileNetV3-Large | 1-2 小時 | V3 → ~1000 img/s,MAE ~1.7 |
| 路徑 3:兩者結合 ⭐ | 2-3 小時 | V3+TRT → ~3000 img/s,MAE ~1.7 |
建議:
Generated 2026-04-14 | Benchmark estimates based on GB10/RTX 5090 profiles | Related: Models Guide | Current V2 Report