Sample - Mono selfie maker - see your face in two mono cameras and save them using spacebar

This sample requires TK library to run (for opening file dialog)

It also requires face detection model, see this tutorial to see how to compile one

Stereo camera pair is required to run this example, it can either be RPi Compute, OAK-D or any custom setup using these cameras

Demo

Capturing process

Captured image

captured

Source code

import cv2
import depthai
import consts.resource_paths

if not depthai.init_device(consts.resource_paths.device_cmd_fpath):
    raise RuntimeError("Error initializing device. Try to reset it.")

pipeline = depthai.create_pipeline(config={
    'streams': ['left', 'right', 'metaout'],
    'ai': {
        "blob_file": "/path/to/face-detection-retail-0004.blob",
        "blob_file_config": "/path/to/face-detection-retail-0004.json",
    },
    'camera': {'mono': {'resolution_h': 720, 'fps': 30}},
})

if pipeline is None:
    raise RuntimeError('Pipeline creation failed!')

entries_prev = []
face_frame_left = None
face_frame_right = None

while True:
    nnet_packets, data_packets = pipeline.get_available_nnet_and_data_packets()

    for nnet_packet in nnet_packets:
        entries_prev = []
        for e in nnet_packet.entries():
            if e[0]['id'] == -1.0 or e[0]['confidence'] == 0.0:
                break
            if e[0]['confidence'] > 0.5:
                entries_prev.append(e[0])

    for packet in data_packets:
        if packet.stream_name == 'left' or packet.stream_name == 'right':
            frame = packet.getData()

            img_h = frame.shape[0]
            img_w = frame.shape[1]

            for i, e in enumerate(entries_prev):
                left = int(e['left'] * img_w)
                top = int(e['top'] * img_h)
                right = int(e['right'] * img_w)
                bottom = int(e['bottom'] * img_h)

                face_frame = frame[top:bottom, left:right]
                if face_frame.size == 0:
                    continue
                cv2.imshow(f'face-{packet.stream_name}', face_frame)
                if packet.stream_name == 'left':
                    face_frame_left = face_frame
                else:
                    face_frame_right = face_frame

    key = cv2.waitKey(1)
    if key == ord('q'):
        break
    if key == ord(' ') and face_frame_left is not None and face_frame_right is not None:
        from tkinter import Tk, messagebox
        from tkinter.filedialog import asksaveasfilename
        Tk().withdraw()
        filename = asksaveasfilename(defaultextension=".png", filetypes=(("Image files", "*.png"),("All Files", "*.*")))
        joined_frame = cv2.hconcat([face_frame_left, face_frame_right])
        cv2.imwrite(filename, joined_frame)
        messagebox.showinfo("Success", "Image saved successfully!")
        Tk().destroy()

del pipeline

Explanation

Our network returns bounding boxes of the faces it detects (we have them stored in entries_prev array). So in this sample, we have to do two main things: crop the frame to contain only the face and save it to the location specified by user.

Performing the crop

Cropping the frame requires us to modify the minimal working code sample, so that we don’t produce two points for rectangle, but instead we need all four points: two of them that determine start of the crop (top starts Y-axis crop and left starts X-axis crop), and another two as the end of the crop (bottom ends Y-axis crop and right ends X-axis crop)

                left = int(e['left'] * img_w)
                top = int(e['top'] * img_h)
                right = int(e['right'] * img_w)
                bottom = int(e['bottom'] * img_h)

Now, since our frame is in HWC format (Height, Width, Channels), we first crop the Y-axis (being height) and then the X-axis (being width). So the cropping code looks like this:

                face_frame = frame[top:bottom, left:right]

Now, there’s one additional thing to do. Since sometimes the network may produce such bounding box, what when cropped will produce an empty frame, we have to secure ourselves from this scenario, as cv2.imshow will throw an error if invoked with empty frame.

                if face_frame.size == 0:
                    continue
                cv2.imshow(f'face-{packet.stream_name}', face_frame)

Later on, as we’re having two cameras operating same time, we’re assigning the shown frame to either left or right face frame variable, which will help us later during image saving

                if packet.stream_name == 'left':
                    face_frame_left = face_frame
                else:
                    face_frame_right = face_frame

Storing the frame

To save the image we’ll need to do two things:

  • Merge the face frames from both left and right cameras into one frame
  • Save the prepared frame to the disk

Thankfully, OpenCV has it all sorted out, so for each point we’ll use just a single line of code, invoking cv2.hconcat for frames merging and cv2.imwrite to store the image

Rest of the code, utilizing tkinter package, is optional and can be removed if you don’t require user interaction to save the frame.

In this sample, we use tkinter for two dialog boxes:

  • To obtain destination filepath (stored as filepath) that allows us to invoke cv2.imwrite as it requires path as it’s first argument
  • To confirm that the file was saved successfully
    key = cv2.waitKey(1)
    if key == ord('q'):
        break
    if key == ord(' ') and face_frame_left is not None and face_frame_right is not None:
        from tkinter import Tk, messagebox
        from tkinter.filedialog import asksaveasfilename
        Tk().withdraw()
        filename = asksaveasfilename(defaultextension=".png", filetypes=(("Image files", "*.png"),("All Files", "*.*")))
        joined_frame = cv2.hconcat([face_frame_left, face_frame_right])
        cv2.imwrite(filename, joined_frame)
        messagebox.showinfo("Success", "Image saved successfully!")
        Tk().destroy()

Do you have any questions/suggestions? Feel free to get in touch and let us know!