また続き。

ユースケース

最初のステップの次として、HelloWorldのほかに書かれているのが、ユースケース例。

DepthAI’s Documentation — DepthAI documentation | Luxonis

これを見ると、Hand Trackingのサンプルが出ていたので、これをやってみたい。

Hand Tracking

上記のTry Nowをクリックすると、GitHubのページに飛ぶ。

github.com

これを読んでみた感じ。

GoogleのMediaPipeのHand Trackingのモデルを使っている。
Body Pre Focusingという機能を入れていて、人がカメラから遠くにいても使えるようにしている(MediaPipeはやっぱりカメラに近いところに特化ぎみらしい)
- 人の体の検出は、MoveNet(Google開発)を使っているとのこと
  MoveNet: 超高速で高性能な姿勢検出モデル | TensorFlow Hub
  MoveNetを用いて関節の角度を検出するレシピ
- BlazePose(MediaPipeのHolistic Solutionに使われている)も候補には上がったが、MoveNetのほうがシンプルな構造なので、こちらを選んだ、とのこと
  BlazePoseを用いてダンスの類似モーションを取得する - 電通総研テックブログ
もちろんNNはエッジ側で動かせる
デモ用のPythonコードも用意されているので、すぐに試せる

環境用意

まずリポジトリのクローン。

git clone https://github.com/geaxgx/depthai_hand_tracker

一応、今回クローンしたコミットを控えておく。

Author: geaxgx <geaxgx@gmail.com>
Date: 2023/01/15 2:07:41
Commit hash: 97731232fffd467e8de2c3a34ab8382962bb385e

Replacing np.int by np.int32 as np.int is no longer available starting from numpy 1.24

requirements.txtを見てみると、前回のhello-worldのものと比べて、若干depthaiのバージョンが新しくなっている。

前回作ったdepthai環境で、アップデートしてみることにする。

cd .\depthai_hand_tracker\
pip install -r .\requirements.txt

(depthai) PS C:\work\oak-d_test> cd .\depthai_hand_tracker\
(depthai) PS C:\work\oak-d_test\depthai_hand_tracker> pip install -r .\requirements.txt
Requirement already satisfied: opencv-python>=4.5.1.48 in c:\users\a\.conda\envs\depthai\lib\site-packages (from -r .\requirements.txt (line 1)) (4.5.1.48)
Collecting depthai>=2.13 (from -r .\requirements.txt (line 2))
  Downloading depthai-2.25.0.0-cp38-cp38-win_amd64.whl.metadata (9.0 kB)
Requirement already satisfied: numpy>=1.17.3 in c:\users\a\.conda\envs\depthai\lib\site-packages (from opencv-python>=4.5.1.48->-r .\requirements.txt (line 1)) (1.24.4)
Downloading depthai-2.25.0.0-cp38-cp38-win_amd64.whl (10.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.6/10.6 MB 8.6 MB/s eta 0:00:00
Installing collected packages: depthai
  Attempting uninstall: depthai
    Found existing installation: depthai 2.11.1.1
    Uninstalling depthai-2.11.1.1:
      Successfully uninstalled depthai-2.11.1.1
Successfully installed depthai-2.25.0.0
(depthai) PS C:\work\oak-d_test\depthai_hand_tracker>

大丈夫だった。

実行

ひとまず何もオプションなしで。

python demo.py

(depthai) PS C:\work\oak-d_test\depthai_hand_tracker> python demo.py
Palm detection blob : C:\work\oak-d_test\depthai_hand_tracker\models\palm_detection_sh4.blob
Landmark blob       : C:\work\oak-d_test\depthai_hand_tracker\models\hand_landmark_lite_sh4.blob
Internal camera FPS set to: 23
Sensor resolution: (1920, 1080)
Internal camera image size: 1152 x 648 - crop_w:0 pad_h: 252
896 anchors have been created
Creating pipeline...
Creating Color Camera...
Creating Palm Detection Neural Network...
Creating Hand Landmark Neural Network (2 threads)...
Pipeline created.
[19443010819FF41200] [2.6] [1.057] [NeuralNetwork(4)] [warning] Network compiled for 4 shaves, maximum available 13, compiling for 6 shaves likely will yield in better performance
Pipeline started - USB speed: SUPER
[19443010819FF41200] [2.6] [1.058] [NeuralNetwork(6)] [warning] Network compiled for 4 shaves, maximum available 13, compiling for 6 shaves likely will yield in better performance
[19443010819FF41200] [2.6] [1.067] [NeuralNetwork(4)] [warning] The issued warnings are orientative, based on optimal settings for a single network, if multiple networks are running in parallel the optimal settings may vary
[19443010819FF41200] [2.6] [1.067] [NeuralNetwork(6)] [warning] The issued warnings are orientative, based on optimal settings for a single network, if multiple networks are running in parallel the optimal settings may vary
FPS : 22.3 f/s (# frames = 3149)
# frames w/ no hand           : 1333 (42.3%)
# frames w/ palm detection    : 1726 (54.8%)
# frames w/ landmark inference : 1816 (57.7%)- # after palm detection: 393 - # after landmarks ROI prediction: 1423 
On frames with at least one landmark inference, average number of landmarks inferences/frame: 1.00
# lm inferences: 1816 - # failed lm inferences: 238 (13.1%)
(depthai) PS C:\work\oak-d_test\depthai_hand_tracker>

動いた。画像ウィンドウが出て、手の認識がされている。'q'キーを押したら終了する。

が、失敗することもあった。カメラのコネクタ根元をちょっといじって接続がおかしくなったときかな。

(depthai) PS C:\work\oak-d_test\depthai_hand_tracker> python demo.py
Palm detection blob : C:\work\oak-d_test\depthai_hand_tracker\models\palm_detection_sh4.blob
Landmark blob       : C:\work\oak-d_test\depthai_hand_tracker\models\hand_landmark_lite_sh4.blob
Internal camera FPS set to: 23
Sensor resolution: (1920, 1080)
Internal camera image size: 1152 x 648 - crop_w:0 pad_h: 252
896 anchors have been created
Creating pipeline...
Creating Color Camera...
Creating Palm Detection Neural Network...
Creating Hand Landmark Neural Network (2 threads)...
Pipeline created.
[19443010819FF41200] [2.6] [1.769] [NeuralNetwork(4)] [warning] Network compiled for 4 shaves, maximum available 13, compiling for 6 shaves likely will yield in better performance
[19443010819FF41200] [2.6] [1.770] [NeuralNetwork(6)] [warning] Network compiled for 4 shaves, maximum available 13, compiling for 6 shaves likely will yield in better performance
Pipeline started - USB speed: HIGH
[19443010819FF41200] [2.6] [1.780] [NeuralNetwork(4)] [warning] The issued warnings are orientative, based on optimal settings for a single network, if multiple networks are running in parallel the optimal settings may vary
[19443010819FF41200] [2.6] [1.780] [NeuralNetwork(6)] [warning] The issued warnings are orientative, based on optimal settings for a single network, if multiple networks are running in parallel the optimal settings may vary
Traceback (most recent call last):
  File "demo.py", line 85, in <module>
    frame, hands, bag = tracker.next_frame()
  File "C:\work\oak-d_test\depthai_hand_tracker\HandTracker.py", line 558, in next_frame
    inference = self.q_lm_out.get()
RuntimeError: Communication exception - possible device error/misconfiguration. Original message 'Couldn't read data from stream: 'lm_out' (X_LINK_ERROR)'
Stack trace (most recent call last):
#31   Object "", at 00007FFAE5C4B53B, in PyModule_ClearDict
#30   Object "", at 00007FFAE5C16B4B, in PyDict_Pop
#29   Object "", at 00007FFAE5C672E0, in PyType_GenericNew
#28   Object "", at 00007FFAE5C16A06, in PyDict_Pop
#27   Object "", at 00007FFAE5C672E0, in PyType_GenericNew
#26   Object "", at 00007FFAE5C16A06, in PyDict_Pop
#25   Object "", at 00007FFAC7AD7EE2, in pybind11::error_already_set::error_already_set
#24   Object "", at 00007FFAC7AD7DEE, in pybind11::error_already_set::error_already_set
#23   Object "", at 00007FFAC7B1B2B9, in PyInit_depthai
#22   Object "", at 00007FFAC7B2A554, in PyInit_depthai
#21   Object "", at 00007FFAC7D73DFC, in PyInit_depthai
#20   Object "", at 00007FFAC7D5365B, in PyInit_depthai
#19   Object "", at 00007FFAC7D74F30, in PyInit_depthai
#18   Object "", at 00007FFAC7D54157, in PyInit_depthai
#17   Object "", at 00007FFAC7D6468D, in PyInit_depthai
#16   Object "", at 00007FFAC7D3899E, in PyInit_depthai
#15   Object "", at 00007FFAC7D38CDB, in PyInit_depthai
#14   Object "", at 00007FFAC7D51DB1, in PyInit_depthai
#13   Object "", at 00007FFAC7D503AF, in PyInit_depthai
#12   Object "", at 00007FFB6FC33C66, in RtlCaptureContext2
#11   Object "", at 00007FFAC7E76CC5, in PyInit_depthai
#10   Object "", at 00007FFAC7E7FDB0, in PyInit_depthai
#9    Object "", at 00007FFAC7EB6A44, in PyInit_depthai
#8    Object "", at 00007FFAC7E741C3, in PyInit_depthai
#7    Object "", at 00007FFB6D2D5B0C, in RaiseException
#6    Object "", at 00007FFB6FBE4475, in RtlRaiseException
#5    Object "", at 00007FFB6FBAE466, in RtlFindCharInUnicodeString
#4    Object "", at 00007FFB6FC3441F, in _chkstk
#3    Object "", at 00007FFAC7E73529, in PyInit_depthai
#2    Object "", at 00007FFAC7E76845, in PyInit_depthai
#1    Object "", at 00007FFAC7E767E0, in PyInit_depthai
#0    Object "", at 00007FFAC7E75B42, in PyInit_depthai

こんなエラーも出た。実行時、カメラからカチカチ音が出て、動こうとしていたが、結局ダメだった感じ。

(depthai) PS C:\work\oak-d_test\depthai_hand_tracker> python demo.py
Palm detection blob : C:\work\oak-d_test\depthai_hand_tracker\models\palm_detection_sh4.blob
Landmark blob       : C:\work\oak-d_test\depthai_hand_tracker\models\hand_landmark_lite_sh4.blob
[19443010819FF41200] [2.6] [1711058217.082] [host] [warning] Device crashed, but no crash dump could be extracted.
Traceback (most recent call last):
  File "demo.py", line 59, in <module>
    tracker = HandTracker(
  File "C:\work\oak-d_test\depthai_hand_tracker\HandTracker.py", line 130, in __init__
    self.device = dai.Device()
RuntimeError: Device already closed or disconnected: io error

動画保存オプション

-o (動画ファイル名)で保存できるとのこと。試してみた。

python demo.py -o out.mp4

(depthai) PS C:\work\oak-d_test\depthai_hand_tracker> python demo.py -o out.mp4
Palm detection blob : C:\work\oak-d_test\depthai_hand_tracker\models\palm_detection_sh4.blob
Landmark blob       : C:\work\oak-d_test\depthai_hand_tracker\models\hand_landmark_lite_sh4.blob
Internal camera FPS set to: 23
Sensor resolution: (1920, 1080)
Internal camera image size: 1152 x 648 - crop_w:0 pad_h: 252
896 anchors have been created
Creating pipeline...
Creating Color Camera...
Creating Palm Detection Neural Network...
Creating Hand Landmark Neural Network (2 threads)...
Pipeline created.
[19443010819FF41200] [2.6] [1.138] [NeuralNetwork(4)] [warning] Network compiled for 4 shaves, maximum available 13, compiling for 6 shaves likely will yield in better performance
[19443010819FF41200] [2.6] [1.139] [NeuralNetwork(6)] [Pipeline started - USB speed: SUPER
warning] Network compiled for 4 shaves, maximum available 13, compiling for 6 shaves likely will yield in better performance
[19443010819FF41200] [2.6] [1.148] [NeuralNetwork(4)] [warning] The issued warnings are orientative, based on optimal settings for a single network, if multiple networks are running in parallel the optimal settings may vary
[19443010819FF41200] [2.6] [1.148] [NeuralNetwork(6)] [warning] The issued warnings are orientative, based on optimal settings for a single network, if multiple networks are running in parallel the optimal settings may vary
OpenCV: FFMPEG: tag 0x47504a4d/'MJPG' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
FPS : 19.7 f/s (# frames = 269)
# frames w/ no hand           : 21 (7.8%)
# frames w/ palm detection    : 49 (18.2%)
# frames w/ landmark inference : 248 (92.2%)- # after palm detection: 28 - # after landmarks ROI prediction: 220    
On frames with at least one landmark inference, average number of landmarks inferences/frame: 1.00
# lm inferences: 249 - # failed lm inferences: 5 (2.0%)

とりあえず拡張子.mp4にしてみたが、うまくいかなかった。ファイルはできていたが、再生したら真っ黒画像だった。

拡張子.aviにしたらいけた。

python demo.py -o out.avi

(depthai) PS C:\work\oak-d_test\depthai_hand_tracker> python demo.py -o out.avi
Palm detection blob : C:\work\oak-d_test\depthai_hand_tracker\models\palm_detection_sh4.blob
Landmark blob       : C:\work\oak-d_test\depthai_hand_tracker\models\hand_landmark_lite_sh4.blob
Internal camera FPS set to: 23
Sensor resolution: (1920, 1080)
Internal camera image size: 1152 x 648 - crop_w:0 pad_h: 252
896 anchors have been created
Creating pipeline...
Creating Color Camera...
Creating Palm Detection Neural Network...
Creating Hand Landmark Neural Network (2 threads)...
Pipeline created.
[19443010819FF41200] [2.6] [0.956] [NeuralNetwork(4)] [warning] Network compiled for 4 shaves, maximum available 13, compiling for 6 shaves likely will yield in better performance
Pipeline started - USB speed: SUPER
[19443010819FF41200] [2.6] [0.957] [NeuralNetwork(6)] [warning] Network compiled for 4 shaves, maximum available 13, compiling for 6 shaves likely will yield in better performance
[19443010819FF41200] [2.6] [0.967] [NeuralNetwork(4)] [warning] The issued warnings are orientative, based on optimal settings for a single network, if multiple networks are running in parallel the optimal settings may vary
[19443010819FF41200] [2.6] [0.967] [NeuralNetwork(6)] [warning] The issued warnings are orientative, based on optimal settings for a single network, if multiple networks are running in parallel the optimal settings may vary
FPS : 20.3 f/s (# frames = 337)
# frames w/ no hand           : 18 (5.3%)
# frames w/ palm detection    : 52 (15.4%)
# frames w/ landmark inference : 319 (94.7%)- # after palm detection: 34 - # after landmarks ROI prediction: 285    
On frames with at least one landmark inference, average number of landmarks inferences/frame: 1.00
# lm inferences: 320 - # failed lm inferences: 4 (1.2%)

3次元位置オプション

-xyzを付けると、手首の位置の3次元座標が出るよう。

python demo.py -o out_xyz.avi -xyz

(depthai) PS C:\work\oak-d_test\depthai_hand_tracker> python demo.py -o out_xyz.avi -xyz
Palm detection blob : C:\work\oak-d_test\depthai_hand_tracker\models\palm_detection_sh4.blob
Landmark blob       : C:\work\oak-d_test\depthai_hand_tracker\models\hand_landmark_lite_sh4.blob
Internal camera FPS set to: 23
Sensor resolution: (1920, 1080)
Internal camera image size: 1152 x 648 - crop_w:0 pad_h: 252
896 anchors have been created
Creating pipeline...
Creating Color Camera...
Creating MonoCameras, Stereo and SpatialLocationCalculator nodes...
RGB calibration lens position: 131
Creating Palm Detection Neural Network...
Creating Hand Landmark Neural Network (2 threads)...
Pipeline created.
[19443010819FF41200] [2.6] [1.367] [NeuralNetwork(10)] [warning] Network compiled for 4 shaves, maximum available 12, compiling for 6 shaves likely will yield in better performance
[19443010819FF41200] [2.6] [1.369] [NeuralNetwork(12)] [warning] Network compiled for 4 shaves, maximum available 12, compiling for 6 shaves likely will yield in better performance
[19443010819FF41200] [2.6] [1.384] [NeuralNetwork(10)] [warning] The issued warnings are orientative, based on optimal settings for a single network, if multiple networks are running in parallel the optimal settings may vary        
Pipeline started - USB speed: SUPER
[19443010819FF41200] [2.6] [1.385] [NeuralNetwork(12)] [warning] The issued warnings are orientative, based on optimal settings for a single network, if multiple networks are running in parallel the optimal settings may vary        
FPS : 16.4 f/s (# frames = 300)
# frames w/ no hand           : 7 (2.3%)
# frames w/ palm detection    : 51 (17.0%)
# frames w/ landmark inference : 293 (97.7%)- # after palm detection: 44 - # after landmarks ROI prediction: 249    
On frames with at least one landmark inference, average number of landmarks inferences/frame: 1.00
# lm inferences: 293 - # failed lm inferences: 17 (5.8%)
Spatial location requests round trip : 5.2 ms

少しフレームレートが下がっているが、手首の位置の座標が出ている。

ここまで

次は、これをもう少しいじってみるか、ほかのサンプルを動かしてみるか。

勉強しないとな～blog

ちゃんと勉強せねば…な電気設計エンジニアです。

OAK-D S2進める - 手検出のサンプル