For a simple case of images you could directly use the Object Detection model on TensorFlow.
This example makes the detection on an image. The current function works with boxes and classes , and the recommendation is to use the score of the detection to have a better experience in the result .
The model used in the implementation is ssd_mobilenet_v1_coco_2017_11_17 so you could use a more agile one if you want more speed.
For the case of videos or the stream of the cam, well here if CV is useful, I left the modification of the example with a video, simulating the stream of the cam. * here we must work the parallelism for some production
The complete project is in Colab , I'll soon include it with GitHub to keep it better.