OpenVINO tour: Let us do some human pose estimation!



Inferencing myself the BLUE dots are left side of the human body, while GREEN ones are the right side


In this tutorial, we will run a human pose estimation model to get body key points.


Disclaimer:

- This tutorial is not for training a model, it is for using a pre-trained one
- You may have ZERO knowledge in Deep learning :D and it is more than enough for this tutorial!

Note:
- Used OpenVINO in this tutorial is 2020.r1, any later versions shall be compatible
- Download and setup OpenVINO form Intel website
- This demo contains two step inferences detection model and pose estimation model
- Original demo codes can be found at
/opt/intel/openvino/inference_engine/demos/python_demos/single_human_pose_estimation_demo


As an app developer, it would be overwhelming to learn the basics of deep learning frameworks in order to run a model! Common, you just want to develop the app and the core of it is to load model, give it a frame, get the output!. However, several implementations and several frameworks are out there! Do I have to master them all! Eh, even I barely master my development frameworks ya!


Thanks to the OpenVINO toolkit. It takes away that job by giving me Intermediate Representation (IR); xml file + bin file. This representation is what we need to load and use to develop DeepLearning application!

We will work on running the OpenVINO sample demo "single-human-pose-estimation-demo" [4]. Personally, I don't like to mess up with original codes so I copied the required codes and give user permissions to write on the files (provided in the linked repo for easiness).

So, first, we need to download the required models. From the demo code, it requires 2 different models. The first model is to detect a person, the second one is to estimate the key points of the same person, LOL!

For the detection model, I've chosen pedestrian-detection-adas-002 [2] which is a camera view at about (1 ~ 1.5) meters from the ground so I could use my webcam for test( similar to a laptop webcam view, well if you have your laptop on the table! not on the ceiling :p).

The original models.list has more models. I've just kept two of them in the list for sake of simplicity and trial. You may skip the "Converting the estimator model" by downloading the converted one from the [drive].

Preparing the environment

The following steps are automated in: download.sh and convert_shp_model.sh scripts in the GitHub repo provided

Using OpenVINO interpreter

Source your OpenVINO environment:

source /opt/intel/openvino/bin/setupvars.sh

Downloading the models

Run the downloader to download models files

python3 /opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --list ./models.lst

That command will download the models into the directory, you will find two folders. 



Converting the estimator model

The pedestrian detection model is in IR already with 3 different number precisions (FP16, FP32, FP32-INT). However, the estimator model is in Pytorch format (.pth). There is no direct conversion from PyTorch to IR so, we convert it to onnx (.onnx) and from onnx we convert it into IR (.xml, .bin).



cd to the directory where the onnx file exists under public/single-human-pose-estimation-001 and run the convert_shp_model.sh script OR just download the result IR from here.

Running the inference

To run the demo execute run.sh script after configuring the parameters

- DETECTOR_MODEL_PATH: path to the detector xml
- ESTIMATOR_MODEL_PATH: path to the estimator xml
- INPUT_FILE: (Video/Image file)
- ACCELERATOR: (CPU/GPU/VPU)



Inference result




Github Repo:
https://github.com/Mohamad1994HD/OpenVINO-inference-pose-estimation

Estimator Model XML/Bin files:
https://drive.google.com/open?id=1SEBEIqcOy6pfNg2APLrTg7buJjdPEgJ3




References:

  1. https://software.intel.com/en-us/forums/intel-distribution-of-openvino-toolkit/topic/850603
  2. https://docs.openvinotoolkit.org/2019_R1/_pedestrian_detection_adas_0002_description_pedestrian_detection_adas_0002.html
  3. Gong, K., Liang, X., Zhang, D., Shen, X., & Lin, L. (2017). Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 932-940).
  4. Osokin, D. (2019). Global Context for Convolutional Pose Machines. arXiv preprint arXiv:1906.04104.

Comments