Perception

Two kinds:

Geometric Perception
Deep Perception

Geometric Perception

Harris corner detection, SIFT, Hough Transform, Blob Detectors

Look at an image and identify geometric primitives, corners, edges (global and local), map similar points, track objects, iterative closest point

Deep Perception

Rapidly growing, new papers every week making things faster, more reliable/robust

Need two things: Massive Compute Power (with GPUs) and carefully curated datasets

Use pretrained models, adjust the output layer, train on your own dataset, smaller and tailored to application. Or just use new models straight out of the box like Segment Anything, or CLIP which are trained on super large datasets \(10^6 - 10^9\) labelled samples.

Mask R-CNN for Instance segmentation, Semantic segmentation, Keypoint Identification Deep learning methods implicitly describe uncertainty, by giving probabilities of outcomes. Successful architectures are simple. Dex-Net 2.0, kPAM-SC, Space Api, more models out there relevant to point cloud segmentation, shape completion, instance segmentation etc.

Robotics and Disassembly Domain

Representing grasp poses as quaternions, category level pose estimation, keypoints based on part categories, fasteners, flanges, clips, building components \(\rightarrow\) category level disassembly actions

Put some experiments in deep percetion and geometric perception here:

SfM Tools:

COLMAP:

Start with feature extraction
Match Features across images
Reconstruction

Find the prebuilt app here
and some documentation here