- Published on
Local Image Descriptors: From SIFT to Learned Features
- Authors
- Name
- Jared Chung
Local image descriptors are fundamental building blocks of computer vision, enabling robust matching between images despite changes in viewpoint, lighting, and scale. Unlike global descriptors that characterize entire images, local descriptors focus on distinctive regions or keypoints, making them ideal for tasks like image stitching, object recognition, and SLAM.
In this comprehensive guide, we'll explore the evolution from classical hand-crafted descriptors to modern learned approaches.
Introduction to Local Features
Local image features consist of two main components:
- Keypoint Detection: Finding interest points that are repeatable across different views
- Feature Description: Computing distinctive descriptors for each keypoint
Key Properties of Good Local Features
- Repeatability: Same features detected under different viewing conditions
- Distinctiveness: Descriptors allow reliable matching between correct correspondences
- Locality: Features are local, so robust to occlusion and clutter
- Quantity: Sufficient number of features for robust matching
- Accuracy: Precise localization of feature positions
- Efficiency: Fast computation for real-time applications
import cv2
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.neighbors import NearestNeighbors
import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Tuple, List, Optional
import time
Classical Hand-Crafted Descriptors
1. SIFT (Scale-Invariant Feature Transform)
SIFT remains one of the most influential local feature descriptors, providing excellent invariance properties.
class SIFTDetector:
"""SIFT keypoint detector and descriptor"""
def __init__(self, n_features=0, n_octave_layers=3, contrast_threshold=0.04,
edge_threshold=10, sigma=1.6):
self.detector = cv2.SIFT_create(
nfeatures=n_features,
nOctaveLayers=n_octave_layers,
contrastThreshold=contrast_threshold,
edgeThreshold=edge_threshold,
sigma=sigma
)
def detect_and_compute(self, image):
"""Detect keypoints and compute descriptors"""
if len(image.shape) == 3:
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
else:
gray = image
keypoints, descriptors = self.detector.detectAndCompute(gray, None)
return keypoints, descriptors
def visualize_keypoints(self, image, keypoints, max_keypoints=100):
"""Visualize detected keypoints"""
# Limit number of keypoints for visualization
if len(keypoints) > max_keypoints:
# Sort by response and take top keypoints
keypoints = sorted(keypoints, key=lambda x: x.response, reverse=True)
keypoints = keypoints[:max_keypoints]
# Draw keypoints
img_with_kpts = cv2.drawKeypoints(
image, keypoints, None,
color=(0, 255, 0),
flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS
)
return img_with_kpts
def get_keypoint_info(self, keypoints):
"""Extract keypoint information"""
info = []
for kp in keypoints:
info.append({
'x': kp.pt[0],
'y': kp.pt[1],
'size': kp.size,
'angle': kp.angle,
'response': kp.response,
'octave': kp.octave
})
return info
# Example usage and visualization
def demonstrate_sift():
"""Demonstrate SIFT feature detection"""
# Load sample image
image = cv2.imread('sample_image.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Initialize SIFT
sift_detector = SIFTDetector(n_features=500)
# Detect keypoints and compute descriptors
keypoints, descriptors = sift_detector.detect_and_compute(image)
print(f"Detected {len(keypoints)} keypoints")
print(f"Descriptor shape: {descriptors.shape if descriptors is not None else None}")
# Visualize keypoints
img_with_kpts = sift_detector.visualize_keypoints(image, keypoints)
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.imshow(image)
plt.title('Original Image')
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(img_with_kpts)
plt.title(f'SIFT Keypoints ({len(keypoints)})')
plt.axis('off')
plt.tight_layout()
plt.show()
return keypoints, descriptors
2. SURF (Speeded-Up Robust Features)
SURF provides a faster alternative to SIFT while maintaining robustness:
class SURFDetector:
"""SURF keypoint detector and descriptor"""
def __init__(self, hessian_threshold=400, n_octaves=4, n_octave_layers=3, extended=False):
# Note: SURF is not available in OpenCV 4.x by default
# This is a conceptual implementation
self.hessian_threshold = hessian_threshold
self.n_octaves = n_octaves
self.n_octave_layers = n_octave_layers
self.extended = extended
try:
self.detector = cv2.xfeatures2d.SURF_create(
hessianThreshold=hessian_threshold,
nOctaves=n_octaves,
nOctaveLayers=n_octave_layers,
extended=extended
)
except AttributeError:
print("SURF not available. Using SIFT instead.")
self.detector = cv2.SIFT_create()
def detect_and_compute(self, image):
"""Detect keypoints and compute descriptors"""
if len(image.shape) == 3:
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
else:
gray = image
keypoints, descriptors = self.detector.detectAndCompute(gray, None)
return keypoints, descriptors
# SURF implementation using integral images and Hessian matrix
class SimpleSURF:
"""Simplified SURF implementation for educational purposes"""
def __init__(self, threshold=0.04):
self.threshold = threshold
def _integral_image(self, image):
"""Compute integral image"""
return np.cumsum(np.cumsum(image, axis=0), axis=1)
def _box_filter(self, integral_img, x, y, w, h):
"""Fast box filter using integral image"""
x1, y1 = max(0, x), max(0, y)
x2, y2 = min(integral_img.shape[1], x + w), min(integral_img.shape[0], y + h)
A = integral_img[y1, x1] if x1 over 0 and y1 over 0 else 0
B = integral_img[y1, x2-1] if x2 over 0 and y1 over 0 else 0
C = integral_img[y2-1, x1] if x1 over 0 and y2 over 0 else 0
D = integral_img[y2-1, x2-1] if x2 over 0 and y2 over 0 else 0
return D - B - C + A
def _hessian_determinant(self, integral_img, x, y, scale):
"""Compute Hessian determinant for keypoint detection"""
# Approximate second derivatives using box filters
size = int(2 * scale)
# Dxx approximation
dxx = (self._box_filter(integral_img, x-size, y-size//2, size, size) -
2 * self._box_filter(integral_img, x-size//2, y-size//2, size, size) +
self._box_filter(integral_img, x, y-size//2, size, size))
# Dyy approximation
dyy = (self._box_filter(integral_img, x-size//2, y-size, size, size) -
2 * self._box_filter(integral_img, x-size//2, y-size//2, size, size) +
self._box_filter(integral_img, x-size//2, y, size, size))
# Dxy approximation
dxy = (self._box_filter(integral_img, x-size//2, y-size//2, size//2, size//2) +
self._box_filter(integral_img, x, y, size//2, size//2) -
self._box_filter(integral_img, x, y-size//2, size//2, size//2) -
self._box_filter(integral_img, x-size//2, y, size//2, size//2))
# Hessian determinant
det = dxx * dyy - (0.9 * dxy) ** 2
return det
def detect_keypoints(self, image):
"""Detect SURF keypoints"""
if len(image.shape) == 3:
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
else:
gray = image
integral_img = self._integral_image(gray.astype(np.float32))
keypoints = []
scales = [1.2, 1.6, 2.0, 2.4] # Different scales
for scale in scales:
for y in range(int(3*scale), gray.shape[0] - int(3*scale), int(scale)):
for x in range(int(3*scale), gray.shape[1] - int(3*scale), int(scale)):
det = self._hessian_determinant(integral_img, x, y, scale)
if det > self.threshold:
# Create keypoint
kp = cv2.KeyPoint(x, y, scale*2, response=det)
keypoints.append(kp)
return keypoints
3. ORB (Oriented FAST and Rotated BRIEF)
ORB provides a free alternative to SIFT/SURF with binary descriptors:
class ORBDetector:
"""ORB keypoint detector and descriptor"""
def __init__(self, n_features=500, scale_factor=1.2, n_levels=8,
edge_threshold=31, first_level=0, WTA_K=2, score_type=cv2.ORB_HARRIS_SCORE,
patch_size=31, fast_threshold=20):
self.detector = cv2.ORB_create(
nfeatures=n_features,
scaleFactor=scale_factor,
nlevels=n_levels,
edgeThreshold=edge_threshold,
firstLevel=first_level,
WTA_K=WTA_K,
scoreType=score_type,
patchSize=patch_size,
fastThreshold=fast_threshold
)
def detect_and_compute(self, image):
"""Detect keypoints and compute descriptors"""
if len(image.shape) == 3:
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
else:
gray = image
keypoints, descriptors = self.detector.detectAndCompute(gray, None)
return keypoints, descriptors
def match_features(self, desc1, desc2, distance_threshold=30):
"""Match ORB features using Hamming distance"""
if desc1 is None or desc2 is None:
return []
# Use BFMatcher with Hamming distance for binary descriptors
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(desc1, desc2)
# Filter matches by distance
good_matches = [m for m in matches if m.distance < distance_threshold]
# Sort by distance
good_matches = sorted(good_matches, key=lambda x: x.distance)
return good_matches
# Demonstrate ORB feature matching
def demonstrate_orb_matching(img1_path, img2_path):
"""Demonstrate ORB feature matching between two images"""
# Load images
img1 = cv2.imread(img1_path)
img2 = cv2.imread(img2_path)
img1 = cv2.cvtColor(img1, cv2.COLOR_BGR2RGB)
img2 = cv2.cvtColor(img2, cv2.COLOR_BGR2RGB)
# Initialize ORB
orb_detector = ORBDetector(n_features=1000)
# Detect features
kp1, desc1 = orb_detector.detect_and_compute(img1)
kp2, desc2 = orb_detector.detect_and_compute(img2)
print(f"Image 1: {len(kp1)} keypoints")
print(f"Image 2: {len(kp2)} keypoints")
# Match features
matches = orb_detector.match_features(desc1, desc2)
print(f"Found {len(matches)} matches")
# Draw matches
img_matches = cv2.drawMatches(
img1, kp1, img2, kp2, matches[:50], None,
flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS
)
plt.figure(figsize=(15, 8))
plt.imshow(img_matches)
plt.title(f'ORB Feature Matches ({len(matches)})')
plt.axis('off')
plt.show()
return matches, kp1, kp2
4. FAST Corner Detector
FAST (Features from Accelerated Segment Test) for rapid corner detection:
class FASTDetector:
"""FAST corner detector"""
def __init__(self, threshold=20, nonmax_suppression=True, fast_type=cv2.FAST_FEATURE_DETECTOR_TYPE_9_16):
self.threshold = threshold
self.nonmax_suppression = nonmax_suppression
self.type = fast_type
self.detector = cv2.FastFeatureDetector_create(
threshold=threshold,
nonmaxSuppression=nonmax_suppression,
type=fast_type
)
def detect(self, image):
"""Detect FAST corners"""
if len(image.shape) == 3:
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
else:
gray = image
keypoints = self.detector.detect(gray)
return keypoints
def compute_brief_descriptors(self, image, keypoints):
"""Compute BRIEF descriptors for FAST keypoints"""
brief = cv2.xfeatures2d.BriefDescriptorExtractor_create()
keypoints, descriptors = brief.compute(image, keypoints)
return keypoints, descriptors
# Manual FAST implementation for educational purposes
class SimpleFAST:
"""Simplified FAST corner detector implementation"""
def __init__(self, threshold=20, n_points=16):
self.threshold = threshold
self.n_points = n_points
# Bresenham circle offsets for 16-point circle
self.circle_offsets = [
(0, 3), (1, 3), (2, 2), (3, 1), (3, 0), (3, -1), (2, -2), (1, -3),
(0, -3), (-1, -3), (-2, -2), (-3, -1), (-3, 0), (-3, 1), (-2, 2), (-1, 3)
]
def detect_corners(self, image):
"""Detect FAST corners"""
if len(image.shape) == 3:
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
else:
gray = image
corners = []
h, w = gray.shape
for y in range(3, h - 3):
for x in range(3, w - 3):
if self._is_corner(gray, x, y):
corners.append((x, y))
return corners
def _is_corner(self, image, x, y):
"""Check if point is a corner using FAST test"""
center_intensity = image[y, x]
# Sample intensities around circle
circle_intensities = []
for dx, dy in self.circle_offsets:
circle_intensities.append(image[y + dy, x + dx])
# Check for continuous arc of brighter or darker pixels
bright_count = sum(1 for intensity in circle_intensities
if intensity > center_intensity + self.threshold)
dark_count = sum(1 for intensity in circle_intensities
if intensity < center_intensity - self.threshold)
# Need at least 12 continuous pixels for corner
min_arc_length = 12
# Check for continuous arcs
if bright_count >= min_arc_length or dark_count >= min_arc_length:
return self._has_continuous_arc(circle_intensities, center_intensity, min_arc_length)
return False
def _has_continuous_arc(self, intensities, center, min_length):
"""Check for continuous arc of similar intensities"""
n = len(intensities)
# Check bright arc
bright_mask = [i > center + self.threshold for i in intensities]
if self._find_max_continuous(bright_mask) >= min_length:
return True
# Check dark arc
dark_mask = [i < center - self.threshold for i in intensities]
if self._find_max_continuous(dark_mask) >= min_length:
return True
return False
def _find_max_continuous(self, mask):
"""Find maximum continuous sequence in circular array"""
n = len(mask)
max_len = 0
current_len = 0
# Check twice to handle circular nature
for i in range(2 * n):
if mask[i % n]:
current_len += 1
max_len = max(max_len, current_len)
else:
current_len = 0
return max_len
Advanced Local Descriptors
1. DAISY Descriptor
DAISY provides dense feature computation for all pixels:
class DAISYDescriptor:
"""DAISY descriptor for dense feature computation"""
def __init__(self, step=4, radius=15, rings=3, histograms=8,
orientations=8, normalization='l2'):
self.step = step
self.radius = radius
self.rings = rings
self.histograms = histograms
self.orientations = orientations
self.normalization = normalization
def compute_dense_descriptors(self, image):
"""Compute DAISY descriptors densely over the image"""
from skimage.feature import daisy
if len(image.shape) == 3:
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
else:
gray = image
# Normalize image
gray = gray.astype(np.float32) / 255.0
# Compute DAISY descriptors
descriptors = daisy(
gray,
step=self.step,
radius=self.radius,
rings=self.rings,
histograms=self.histograms,
orientations=self.orientations,
normalization=self.normalization
)
return descriptors
def visualize_descriptor_field(self, image, descriptors, subsample=20):
"""Visualize DAISY descriptor field"""
h, w = descriptors.shape[:2]
plt.figure(figsize=(12, 8))
plt.imshow(image, cmap='gray' if len(image.shape) == 2 else None)
# Draw descriptor locations
for y in range(0, h, subsample):
for x in range(0, w, subsample):
# Convert descriptor coordinates to image coordinates
img_x = x * self.step
img_y = y * self.step
if img_x < image.shape[1] and img_y < image.shape[0]:
plt.plot(img_x, img_y, 'ro', markersize=2)
plt.title('DAISY Descriptor Locations')
plt.axis('off')
plt.show()
2. FREAK (Fast Retina Keypoint)
FREAK is inspired by the human visual system:
class FREAKDescriptor:
"""FREAK (Fast Retina Keypoint) descriptor"""
def __init__(self, orientation_normalized=True, scale_normalized=True,
pattern_scale=22.0, n_octaves=4):
try:
self.descriptor = cv2.xfeatures2d.FREAK_create(
orientationNormalized=orientation_normalized,
scaleNormalized=scale_normalized,
patternScale=pattern_scale,
nOctaves=n_octaves
)
except AttributeError:
print("FREAK not available in this OpenCV version")
self.descriptor = None
def compute(self, image, keypoints):
"""Compute FREAK descriptors for given keypoints"""
if self.descriptor is None:
return None, None
if len(image.shape) == 3:
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
else:
gray = image
keypoints, descriptors = self.descriptor.compute(gray, keypoints)
return keypoints, descriptors
# Simplified FREAK-like implementation
class SimpleFREAK:
"""Simplified FREAK-like descriptor"""
def __init__(self, num_pairs=512):
self.num_pairs = num_pairs
self.pairs = self._generate_retinal_pairs()
def _generate_retinal_pairs(self):
"""Generate retinal sampling pattern pairs"""
# Simplified retinal pattern - in practice, this would be more sophisticated
pairs = []
# Generate concentric circles (retinal-like pattern)
radii = [2, 4, 6, 8, 10, 12]
angles = np.linspace(0, 2*np.pi, 8, endpoint=False)
points = [(0, 0)] # Center point
for r in radii:
for a in angles:
x = int(r * np.cos(a))
y = int(r * np.sin(a))
points.append((x, y))
# Generate pairs with distance-based selection
for i in range(self.num_pairs):
# Select pairs with different distance preferences
idx1 = np.random.randint(0, len(points))
idx2 = np.random.randint(0, len(points))
pairs.append((points[idx1], points[idx2]))
return pairs
def compute_descriptor(self, image, keypoint):
"""Compute FREAK-like descriptor for a single keypoint"""
if len(image.shape) == 3:
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
else:
gray = image
x, y = int(keypoint.pt[0]), int(keypoint.pt[1])
descriptor = np.zeros(self.num_pairs, dtype=np.uint8)
h, w = gray.shape
for i, ((x1, y1), (x2, y2)) in enumerate(self.pairs):
# Apply keypoint scale and orientation
scale = keypoint.size / 30.0 # Normalize scale
angle = np.radians(keypoint.angle) if keypoint.angle != -1 else 0
# Rotate and scale offsets
cos_a, sin_a = np.cos(angle), np.sin(angle)
rx1 = scale * (x1 * cos_a - y1 * sin_a)
ry1 = scale * (x1 * sin_a + y1 * cos_a)
rx2 = scale * (x2 * cos_a - y2 * sin_a)
ry2 = scale * (x2 * sin_a + y2 * cos_a)
# Get pixel coordinates
px1, py1 = x + int(rx1), y + int(ry1)
px2, py2 = x + int(rx2), y + int(ry2)
# Check bounds and compare intensities
if (0 <= px1 < w and 0 <= py1 < h and
0 <= px2 < w and 0 <= py2 < h):
if gray[py1, px1] > gray[py2, px2]:
descriptor[i] = 1
return descriptor
Modern Learned Local Features
1. SuperPoint
SuperPoint is a self-supervised framework for joint keypoint detection and description:
class SuperPointNet(nn.Module):
"""SuperPoint Network Architecture"""
def __init__(self, detection_threshold=0.015, nms_radius=4, max_keypoints=1024):
super().__init__()
self.detection_threshold = detection_threshold
self.nms_radius = nms_radius
self.max_keypoints = max_keypoints
# Shared encoder
self.encoder = nn.Sequential(
# Conv Block 1
nn.Conv2d(1, 64, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(64, 64, 3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(2, stride=2),
# Conv Block 2
nn.Conv2d(64, 64, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(64, 64, 3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(2, stride=2),
# Conv Block 3
nn.Conv2d(64, 128, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(128, 128, 3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(2, stride=2),
# Conv Block 4
nn.Conv2d(128, 128, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(128, 128, 3, padding=1),
nn.ReLU(inplace=True),
)
# Detection head
self.detector = nn.Sequential(
nn.Conv2d(128, 256, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 65, 1), # 64 + 1 for dustbin
)
# Description head
self.descriptor = nn.Sequential(
nn.Conv2d(128, 256, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, 1),
)
def forward(self, x):
"""Forward pass"""
# Shared encoder
features = self.encoder(x)
# Detection head
detection_logits = self.detector(features)
detection_scores = F.softmax(detection_logits, dim=1)[:, :-1] # Remove dustbin
# Description head
descriptor_features = self.descriptor(features)
descriptor_features = F.normalize(descriptor_features, p=2, dim=1)
return detection_scores, descriptor_features
def extract_keypoints_and_descriptors(self, image):
"""Extract keypoints and descriptors from image"""
if isinstance(image, np.ndarray):
# Convert to tensor
if len(image.shape) == 3:
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
else:
gray = image
# Normalize and convert to tensor
gray = gray.astype(np.float32) / 255.0
tensor = torch.from_numpy(gray).unsqueeze(0).unsqueeze(0)
else:
tensor = image
with torch.no_grad():
detection_scores, descriptor_features = self.forward(tensor)
# Extract keypoints using NMS
keypoints = self._extract_keypoints(detection_scores[0])
# Sample descriptors at keypoint locations
descriptors = self._sample_descriptors(descriptor_features[0], keypoints)
return keypoints, descriptors
def _extract_keypoints(self, detection_scores):
"""Extract keypoints using NMS"""
# Reshape detection scores
h, w = detection_scores.shape[1:]
detection_scores = detection_scores.view(8, 8, h, w)
detection_scores = detection_scores.permute(2, 3, 0, 1)
detection_scores = detection_scores.contiguous().view(h*8, w*8)
# Apply threshold
keypoint_mask = detection_scores > self.detection_threshold
# Get keypoint coordinates
keypoint_coords = torch.nonzero(keypoint_mask, as_tuple=False)
scores = detection_scores[keypoint_coords[:, 0], keypoint_coords[:, 1]]
# Apply NMS
if len(keypoint_coords) over 0:
keypoint_coords, scores = self._nms(keypoint_coords.float(), scores, self.nms_radius)
# Limit number of keypoints
if len(keypoint_coords) > self.max_keypoints:
top_k_indices = torch.topk(scores, self.max_keypoints)[1]
keypoint_coords = keypoint_coords[top_k_indices]
scores = scores[top_k_indices]
return keypoint_coords, scores
def _nms(self, keypoints, scores, radius):
"""Non-Maximum Suppression for keypoints"""
if len(keypoints) == 0:
return keypoints, scores
# Sort by scores
sorted_indices = torch.argsort(scores, descending=True)
keypoints = keypoints[sorted_indices]
scores = scores[sorted_indices]
# Apply NMS
keep = []
for i in range(len(keypoints)):
if i == 0:
keep.append(i)
else:
# Check distance to all previously selected keypoints
distances = torch.norm(keypoints[i:i+1] - keypoints[keep], dim=1)
if torch.all(distances > radius):
keep.append(i)
return keypoints[keep], scores[keep]
def _sample_descriptors(self, descriptor_features, keypoints):
"""Sample descriptors at keypoint locations"""
if len(keypoints[0]) == 0:
return torch.empty(0, descriptor_features.shape[0])
# Convert keypoint coordinates to feature map coordinates
coords = keypoints[0] # (N, 2)
coords = coords / 8.0 # Account for downsampling
# Normalize coordinates to [-1, 1] for grid_sample
h, w = descriptor_features.shape[1:]
coords[:, 0] = 2 * coords[:, 0] / (w - 1) - 1 # x
coords[:, 1] = 2 * coords[:, 1] / (h - 1) - 1 # y
# Sample descriptors using bilinear interpolation
coords = coords.unsqueeze(0).unsqueeze(0) # (1, 1, N, 2)
descriptors = F.grid_sample(
descriptor_features.unsqueeze(0), coords,
mode='bilinear', align_corners=True
)
descriptors = descriptors.squeeze().transpose(0, 1) # (N, 256)
return descriptors
# Example usage
def demonstrate_superpoint():
"""Demonstrate SuperPoint feature extraction"""
# Initialize model
model = SuperPointNet()
model.eval()
# Load and preprocess image
image = cv2.imread('sample_image.jpg', cv2.IMREAD_GRAYSCALE)
image = image.astype(np.float32) / 255.0
# Extract features
keypoints, descriptors = model.extract_keypoints_and_descriptors(image)
print(f"Detected {len(keypoints[0])} keypoints")
print(f"Descriptor shape: {descriptors.shape}")
return keypoints, descriptors
2. D2-Net (Detect-to-Describe)
D2-Net jointly learns keypoint detection and description:
class D2Net(nn.Module):
"""D2-Net: Detect-to-Describe Network"""
def __init__(self, model_file=None, multiscale=True, max_keypoints=1024):
super().__init__()
self.multiscale = multiscale
self.max_keypoints = max_keypoints
# Use VGG-like backbone
self.features = self._make_vgg_backbone()
if model_file:
self.load_state_dict(torch.load(model_file))
def _make_vgg_backbone(self):
"""Create VGG-like backbone"""
layers = []
# Block 1
layers += [nn.Conv2d(3, 64, 3, padding=1), nn.ReLU(inplace=True)]
layers += [nn.Conv2d(64, 64, 3, padding=1), nn.ReLU(inplace=True)]
layers += [nn.MaxPool2d(2, stride=2)]
# Block 2
layers += [nn.Conv2d(64, 128, 3, padding=1), nn.ReLU(inplace=True)]
layers += [nn.Conv2d(128, 128, 3, padding=1), nn.ReLU(inplace=True)]
layers += [nn.MaxPool2d(2, stride=2)]
# Block 3
layers += [nn.Conv2d(128, 256, 3, padding=1), nn.ReLU(inplace=True)]
layers += [nn.Conv2d(256, 256, 3, padding=1), nn.ReLU(inplace=True)]
layers += [nn.Conv2d(256, 256, 3, padding=1), nn.ReLU(inplace=True)]
layers += [nn.MaxPool2d(2, stride=2)]
# Block 4
layers += [nn.Conv2d(256, 512, 3, padding=1), nn.ReLU(inplace=True)]
layers += [nn.Conv2d(512, 512, 3, padding=1), nn.ReLU(inplace=True)]
layers += [nn.Conv2d(512, 512, 3, padding=1), nn.ReLU(inplace=True)]
return nn.Sequential(*layers)
def forward(self, x):
"""Forward pass through D2-Net"""
features = self.features(x)
# Compute detection scores and descriptors
detection_scores = self._compute_detection_scores(features)
descriptors = F.normalize(features, p=2, dim=1)
return detection_scores, descriptors
def _compute_detection_scores(self, features):
"""Compute detection scores using ratio-to-max"""
# Channel-wise max
channel_max = F.max_pool2d(features, kernel_size=1)
# Spatial max in local neighborhood
spatial_max = F.max_pool2d(features, kernel_size=3, stride=1, padding=1)
# Ratio to max
ratio_to_max = features / (spatial_max + 1e-8)
# Sum over channels for final detection score
detection_scores = torch.sum(ratio_to_max, dim=1, keepdim=True)
return detection_scores
def extract_multiscale_features(self, image, scales=[1.0, 0.7, 1.4]):
"""Extract features at multiple scales"""
all_keypoints = []
all_descriptors = []
for scale in scales:
# Resize image
if scale != 1.0:
h, w = image.shape[-2:]
new_h, new_w = int(h * scale), int(w * scale)
scaled_image = F.interpolate(image, size=(new_h, new_w),
mode='bilinear', align_corners=False)
else:
scaled_image = image
# Extract features
with torch.no_grad():
detection_scores, descriptors = self.forward(scaled_image)
# Extract keypoints
keypoints = self._extract_keypoints(detection_scores, scale)
all_keypoints.extend(keypoints)
all_descriptors.extend(descriptors)
return all_keypoints, all_descriptors
def _extract_keypoints(self, detection_scores, scale=1.0):
"""Extract keypoints from detection scores"""
# Apply threshold and NMS
threshold = 0.005
keypoint_mask = detection_scores > threshold
# Get coordinates
coords = torch.nonzero(keypoint_mask.squeeze(), as_tuple=False)
# Scale coordinates back to original image size
coords = coords.float() / scale
return coords
Feature Matching and Applications
Robust Feature Matching
class FeatureMatcher:
"""Robust feature matching with various strategies"""
def __init__(self, descriptor_type='float', distance_metric='l2'):
self.descriptor_type = descriptor_type
self.distance_metric = distance_metric
if descriptor_type == 'binary':
self.matcher = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=False)
else:
if distance_metric == 'l2':
self.matcher = cv2.BFMatcher(cv2.NORM_L2, crossCheck=False)
else:
self.matcher = cv2.BFMatcher(cv2.NORM_L1, crossCheck=False)
def match_features(self, desc1, desc2, ratio_threshold=0.8):
"""Match features using ratio test"""
if desc1 is None or desc2 is None:
return []
# K-NN matching
matches = self.matcher.knnMatch(desc1, desc2, k=2)
# Apply ratio test
good_matches = []
for match_pair in matches:
if len(match_pair) == 2:
m, n = match_pair
if m.distance < ratio_threshold * n.distance:
good_matches.append(m)
return good_matches
def match_features_mutual(self, desc1, desc2, ratio_threshold=0.8):
"""Bidirectional matching with ratio test"""
# Forward matches
matches12 = self.match_features(desc1, desc2, ratio_threshold)
# Backward matches
matches21 = self.match_features(desc2, desc1, ratio_threshold)
# Keep only mutual best matches
mutual_matches = []
for m12 in matches12:
for m21 in matches21:
if m12.queryIdx == m21.trainIdx and m12.trainIdx == m21.queryIdx:
mutual_matches.append(m12)
break
return mutual_matches
def geometric_verification(self, kp1, kp2, matches, method='homography'):
"""Geometric verification using RANSAC"""
if len(matches) under 4:
return [], None
# Extract matched points
src_pts = np.float32([kp1[m.queryIdx].pt for m in matches]).reshape(-1, 1, 2)
dst_pts = np.float32([kp2[m.trainIdx].pt for m in matches]).reshape(-1, 1, 2)
if method == 'homography':
# Find homography
H, mask = cv2.findHomography(src_pts, dst_pts,
cv2.RANSAC, 5.0)
geometric_matches = [matches[i] for i in range(len(matches)) if mask[i]]
return geometric_matches, H
elif method == 'fundamental':
# Find fundamental matrix
F, mask = cv2.findFundamentalMat(src_pts, dst_pts,
cv2.FM_RANSAC, 3.0, 0.99)
geometric_matches = [matches[i] for i in range(len(matches)) if mask[i]]
return geometric_matches, F
else:
return matches, None
# Image matching pipeline
class ImageMatchingPipeline:
"""Complete pipeline for image matching"""
def __init__(self, detector_type='sift', matcher_type='ratio'):
self.detector_type = detector_type
self.matcher_type = matcher_type
# Initialize detector
if detector_type == 'sift':
self.detector = SIFTDetector()
self.matcher = FeatureMatcher('float', 'l2')
elif detector_type == 'orb':
self.detector = ORBDetector()
self.matcher = FeatureMatcher('binary', 'hamming')
else:
raise ValueError(f"Unsupported detector: {detector_type}")
def match_images(self, img1, img2, visualize=True):
"""Complete image matching pipeline"""
# Detect features
kp1, desc1 = self.detector.detect_and_compute(img1)
kp2, desc2 = self.detector.detect_and_compute(img2)
print(f"Image 1: {len(kp1)} features")
print(f"Image 2: {len(kp2)} features")
# Match features
if self.matcher_type == 'ratio':
matches = self.matcher.match_features(desc1, desc2)
elif self.matcher_type == 'mutual':
matches = self.matcher.match_features_mutual(desc1, desc2)
print(f"Initial matches: {len(matches)}")
# Geometric verification
geometric_matches, transform = self.matcher.geometric_verification(
kp1, kp2, matches, method='homography'
)
print(f"Geometric matches: {len(geometric_matches)}")
if visualize:
self._visualize_matches(img1, img2, kp1, kp2, geometric_matches)
return {
'keypoints1': kp1,
'keypoints2': kp2,
'matches': geometric_matches,
'transform': transform
}
def _visualize_matches(self, img1, img2, kp1, kp2, matches):
"""Visualize feature matches"""
img_matches = cv2.drawMatches(
img1, kp1, img2, kp2, matches[:50], None,
flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS
)
plt.figure(figsize=(15, 8))
plt.imshow(img_matches)
plt.title(f'Feature Matches ({len(matches)})')
plt.axis('off')
plt.show()
Performance Evaluation
Benchmark Framework
class LocalFeatureBenchmark:
"""Benchmark local feature detectors and descriptors"""
def __init__(self, detectors, image_pairs):
self.detectors = detectors
self.image_pairs = image_pairs
def evaluate_repeatability(self, detector_name, homography_threshold=3.0):
"""Evaluate keypoint repeatability under viewpoint changes"""
detector = self.detectors[detector_name]
repeatability_scores = []
for img1_path, img2_path, H in self.image_pairs:
# Load images
img1 = cv2.imread(img1_path, cv2.IMREAD_GRAYSCALE)
img2 = cv2.imread(img2_path, cv2.IMREAD_GRAYSCALE)
# Detect keypoints
kp1, _ = detector.detect_and_compute(img1)
kp2, _ = detector.detect_and_compute(img2)
# Transform keypoints from img1 to img2 using homography
if len(kp1) over 0 and H is not None:
pts1 = np.array([kp.pt for kp in kp1], dtype=np.float32).reshape(-1, 1, 2)
pts1_transformed = cv2.perspectiveTransform(pts1, H)
# Find correspondences within threshold
correspondences = 0
for i, pt_transformed in enumerate(pts1_transformed):
min_distance = float('inf')
for kp in kp2:
distance = np.linalg.norm(pt_transformed[0] - np.array(kp.pt))
min_distance = min(min_distance, distance)
if min_distance < homography_threshold:
correspondences += 1
repeatability = correspondences / len(kp1) if len(kp1) over 0 else 0
repeatability_scores.append(repeatability)
return np.mean(repeatability_scores)
def evaluate_matching_performance(self, detector_name):
"""Evaluate matching performance"""
detector = self.detectors[detector_name]
matcher = FeatureMatcher()
precision_scores = []
recall_scores = []
for img1_path, img2_path, H in self.image_pairs:
# Load images and detect features
img1 = cv2.imread(img1_path, cv2.IMREAD_GRAYSCALE)
img2 = cv2.imread(img2_path, cv2.IMREAD_GRAYSCALE)
kp1, desc1 = detector.detect_and_compute(img1)
kp2, desc2 = detector.detect_and_compute(img2)
if desc1 is None or desc2 is None:
continue
# Match features
matches = matcher.match_features(desc1, desc2)
# Evaluate matches using ground truth homography
correct_matches = 0
total_matches = len(matches)
if H is not None and total_matches over 0:
for match in matches:
pt1 = np.array(kp1[match.queryIdx].pt, dtype=np.float32).reshape(1, 1, 2)
pt2_gt = cv2.perspectiveTransform(pt1, H)[0, 0]
pt2_matched = np.array(kp2[match.trainIdx].pt)
distance = np.linalg.norm(pt2_gt - pt2_matched)
if distance under 3.0: # Threshold for correct match
correct_matches += 1
precision = correct_matches / total_matches if total_matches over 0 else 0
precision_scores.append(precision)
return np.mean(precision_scores)
def run_benchmark(self):
"""Run complete benchmark"""
results = {}
for detector_name in self.detectors.keys():
print(f"Evaluating {detector_name}...")
repeatability = self.evaluate_repeatability(detector_name)
matching_precision = self.evaluate_matching_performance(detector_name)
results[detector_name] = {
'repeatability': repeatability,
'matching_precision': matching_precision
}
print(f" Repeatability: {repeatability:.3f}")
print(f" Matching Precision: {matching_precision:.3f}")
return results
def visualize_results(self, results):
"""Visualize benchmark results"""
detectors = list(results.keys())
repeatability = [results[d]['repeatability'] for d in detectors]
precision = [results[d]['matching_precision'] for d in detectors]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
# Repeatability
bars1 = ax1.bar(detectors, repeatability)
ax1.set_title('Keypoint Repeatability')
ax1.set_ylabel('Repeatability Score')
ax1.set_ylim(0, 1)
# Matching precision
bars2 = ax2.bar(detectors, precision)
ax2.set_title('Matching Precision')
ax2.set_ylabel('Precision Score')
ax2.set_ylim(0, 1)
# Add value labels
for bars in [bars1, bars2]:
for bar in bars:
height = bar.get_height()
bars[0].axes.text(bar.get_x() + bar.get_width()/2., height + 0.01,
f'{height:.3f}', ha='center', va='bottom')
plt.tight_layout()
plt.show()
Conclusion
Local image descriptors have evolved significantly from classical hand-crafted methods to modern learned approaches. Key insights:
Classical vs. Modern Methods
Classical strengths:
- Proven robustness: Well-understood invariance properties
- Computational efficiency: Fast extraction and matching
- No training required: Work out-of-the-box
- Interpretability: Clear understanding of feature properties
Modern advantages:
- Better performance: Higher accuracy on challenging datasets
- Learned invariances: Adapt to specific domains and conditions
- Joint optimization: Detection and description learned together
- Generalization: Transfer across different image types
Best Practices
- Task-specific selection: Choose based on application requirements
- Proper evaluation: Use appropriate metrics and datasets
- Robustness considerations: Test under various conditions
- Computational constraints: Balance accuracy vs. speed
- Domain adaptation: Consider fine-tuning for specific applications
Future Directions
- Self-supervised learning: Learning without manual annotations
- Real-time deployment: Efficient architectures for mobile devices
- Domain adaptation: Features that work across different environments
- Multi-modal fusion: Combining different types of features
The choice between classical and modern local features depends on your specific requirements: accuracy needs, computational constraints, training data availability, and deployment environment.
References
- Lowe, D. G. (2004). "Distinctive Image Features from Scale-Invariant Keypoints."
- Bay, H., et al. (2006). "SURF: Speeded Up Robust Features."
- Rublee, E., et al. (2011). "ORB: An efficient alternative to SIFT or SURF."
- DeTone, D., et al. (2018). "SuperPoint: Self-Supervised Interest Point Detection and Description."
- Dusmanu, M., et al. (2019). "D2-Net: A Trainable CNN for Joint Description and Detection of Local Features."