- Published on
OpenCV Tutorial: Computer Vision with Python
- Authors
- Name
- Jared Chung
OpenCV (Open Source Computer Vision Library) is essential out of the box multi purpose package for working with images and videos. Whether you want to enhance photos, track moving objects etc OpenCV provides the tools to make it happen.
This Blog will teach you OpenCV's core concepts through practical examples, focusing on understanding what each technique does and when to use it.
Introduction to OpenCV
What is OpenCV?
Think of OpenCV as your computer vision toolkit that provides:
Image Processing: Like having Photoshop tools in code
- Blur, sharpen, adjust brightness and contrast
- Convert between color spaces (RGB, grayscale, HSV)
- Apply artistic filters and effects
Feature Detection: Teaching computers to "see" important parts
- Find corners, edges, and interesting points in images
- Detect patterns that humans recognize easily
- Track these features as they move in video
Object Detection: Recognizing things in images
- Face detection (like your phone's camera)
- Find specific objects using templates
- Identify shapes, text, or custom objects
Video Analysis: Understanding motion and change
- Track objects as they move through video
- Detect when something new appears or disappears
- Analyze movement patterns and speed
Getting Started: The Essentials
# Install OpenCV (run this in your terminal)
# pip install opencv-python
import cv2
import numpy as np
# The three things you'll do most often:
# 1. Load an image
image = cv2.imread('photo.jpg')
# 2. Do something with it (example: convert to grayscale)
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# 3. Save or display the result
cv2.imwrite('result.jpg', gray_image)
print("OpenCV is ready to go!")
Key Concept: Images are just arrays of numbers to OpenCV. A color image is a 3D array (height × width × 3 colors), while a grayscale image is 2D (height × width).
Basic Image Operations
The Essential Image Operations
Think of these as the fundamental "verbs" of computer vision - the basic actions you can perform on any image:
1. Loading and Saving Images
import cv2
# Load an image (OpenCV's way of opening a file)
image = cv2.imread('my_photo.jpg') # Load in color
gray_image = cv2.imread('my_photo.jpg', 0) # Load as grayscale
# Save an image
cv2.imwrite('output.jpg', image) # Save the image
# Check if loading worked
if image is None:
print("Error: Could not load image")
else:
print(f"Image loaded successfully! Size: {image.shape}")
Quick Tip: OpenCV uses BGR (Blue-Green-Red) instead of RGB. Don't worry about this unless you're displaying images in other libraries.
2. Resizing Images
# Get original dimensions
height, width = image.shape[:2]
print(f"Original size: {width} x {height}")
# Resize to specific dimensions
resized = cv2.resize(image, (800, 600)) # (width, height)
# Resize by scale factor (easier for proportional resizing)
half_size = cv2.resize(image, None, fx=0.5, fy=0.5) # 50% of original
double_size = cv2.resize(image, None, fx=2.0, fy=2.0) # 200% of original
# Smart resize that maintains aspect ratio
def smart_resize(image, target_width):
"""Resize image to target width while keeping proportions"""
height, width = image.shape[:2]
ratio = target_width / width
target_height = int(height * ratio)
return cv2.resize(image, (target_width, target_height))
# Use it like this:
thumbnail = smart_resize(image, 300) # Make thumbnail 300 pixels wide
3. Cropping Images
# Cropping is just array slicing in Python!
# Syntax: image[y1:y2, x1:x2]
# Crop a 200x200 square from top-left corner
crop = image[0:200, 0:200]
# Crop from center
height, width = image.shape[:2]
center_x, center_y = width // 2, height // 2
crop_size = 150
center_crop = image[
center_y - crop_size:center_y + crop_size,
center_x - crop_size:center_x + crop_size
]
print(f"Cropped size: {center_crop.shape}")
4. Color Space Conversions
The Big Three Color Spaces:
# Convert color image to grayscale (most common conversion)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Convert to HSV (great for color-based detection)
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
# Convert to RGB (if you need it for other libraries)
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
When to use each color space:
- Grayscale: When color doesn't matter (edge detection, text analysis)
- HSV: When detecting specific colors (finding red objects, green screens)
- RGB: When interfacing with other libraries (matplotlib, PIL)
Putting It All Together
def process_image_basic(input_path, output_path):
"""A practical example combining all basic operations"""
# 1. Load the image
image = cv2.imread(input_path)
if image is None:
print(f"Error: Cannot load {input_path}")
return
# 2. Resize if it's too large (common preprocessing step)
height, width = image.shape[:2]
if width > 1920: # If wider than Full HD
image = smart_resize(image, 1920)
print("Resized large image")
# 3. Convert to grayscale for processing
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# 4. Save the result
cv2.imwrite(output_path, gray)
print(f"Processed image saved to {output_path}")
# Example usage
process_image_basic('input.jpg', 'processed_output.jpg')
Image Filtering and Enhancement
Think of image filters as Instagram effects, but with a specific purpose. Each filter transforms your image to highlight certain features or remove unwanted elements.
The Essential Filters You'll Use
1. Blurring Filters (Removing Details)
Gaussian Blur - The "Soft Focus" Effect
# Gentle blur (like background in portrait mode)
gentle_blur = cv2.GaussianBlur(image, (15, 15), 0)
# Strong blur (for privacy, backgrounds)
strong_blur = cv2.GaussianBlur(image, (51, 51), 0)
# The (15, 15) is the blur kernel size - bigger numbers = more blur
# Must be odd numbers: (5,5), (15,15), (31,31), etc.
When to use Gaussian blur:
- Remove image noise before further processing
- Create background blur effects
- Smooth out details you don't need
Median Blur - The "Noise Remover"
# Great for removing "salt and pepper" noise (random white/black dots)
denoised = cv2.medianBlur(image, 5)
# Median blur preserves edges better than Gaussian blur
2. Sharpening (Enhancing Details)
def sharpen_image(image):
"""Make image details more crisp and defined"""
# Create a sharpening kernel (like a recipe for sharpening)
kernel = np.array([[-1, -1, -1],
[-1, 9, -1],
[-1, -1, -1]])
# Apply the kernel to the image
sharpened = cv2.filter2D(image, -1, kernel)
return sharpened
# Use it like this:
sharp_image = sharpen_image(blurry_photo)
When to sharpen:
- Photos that look slightly out of focus
- Scanned documents that need crisper text
- Enhancing details before feature detection
3. Edge Detection (Finding Boundaries)
Canny Edge Detection - The Gold Standard
# Convert to grayscale first (edges work better on grayscale)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Detect edges
edges = cv2.Canny(gray, 50, 150)
# The numbers (50, 150) are thresholds:
# - 50: minimum edge strength to consider
# - 150: strong edge threshold
# Lower numbers = more edges detected, higher numbers = fewer edges
What edge detection shows you:
- Outlines of objects
- Boundaries between different regions
- Useful for shape detection and analysis
4. Morphological Operations (Shape Processing)
These operations work on binary (black and white) images to clean up shapes:
# First, create a binary image (usually from edge detection or thresholding)
_, binary = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
# Erosion - makes white regions smaller (removes noise)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
eroded = cv2.erode(binary, kernel, iterations=1)
# Dilation - makes white regions larger (fills gaps)
dilated = cv2.dilate(binary, kernel, iterations=1)
# Opening - erosion followed by dilation (removes noise)
opened = cv2.morphologyEx(binary, cv2.MORPH_OPEN, kernel)
# Closing - dilation followed by erosion (fills gaps)
closed = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel)
When to use morphological operations:
- Clean up binary images from thresholding
- Remove small noise blobs
- Fill gaps in detected shapes
- Separate touching objects
Practical Filter Combinations
def enhance_document(image):
"""Common pipeline for scanned documents"""
# 1. Convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# 2. Remove noise
denoised = cv2.medianBlur(gray, 3)
# 3. Sharpen text
kernel = np.array([[-1, -1, -1], [-1, 9, -1], [-1, -1, -1]])
sharpened = cv2.filter2D(denoised, -1, kernel)
return sharpened
def prepare_for_analysis(image):
"""Common preprocessing for computer vision tasks"""
# 1. Slight blur to remove noise
smooth = cv2.GaussianBlur(image, (3, 3), 0)
# 2. Convert to grayscale
gray = cv2.cvtColor(smooth, cv2.COLOR_BGR2GRAY)
# 3. Detect edges
edges = cv2.Canny(gray, 50, 150)
return edges
Feature Detection: Finding Important Points
Feature detection is like teaching a computer to identify "landmarks" in images - distinctive points that are easy to recognize and track.
Corner Detection: Finding Interesting Points
Why corners matter: Corners are stable, distinctive features that don't change much when lighting or viewpoint changes slightly.
# Harris Corner Detection - finds sharp corners
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
corners = cv2.cornerHarris(gray, 2, 3, 0.04)
# Mark corners on the image
image_with_corners = image.copy()
image_with_corners[corners > 0.01 * corners.max()] = [0, 0, 255] # Red dots
# Goodness metric: corners are where image changes in multiple directions
When to use corner detection:
- Tracking objects in video (corners are stable reference points)
- Image stitching (finding common points between photos)
- 3D reconstruction (matching the same corner in different views)
Template Matching: Finding Specific Objects
Template matching is like playing "Where's Waldo?" - finding a specific pattern in a larger image.
def find_object_in_image(main_image, template):
"""Find where a template appears in the main image"""
# Convert both to grayscale
gray_main = cv2.cvtColor(main_image, cv2.COLOR_BGR2GRAY)
gray_template = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
# Perform template matching
result = cv2.matchTemplate(gray_main, gray_template, cv2.TM_CCOEFF_NORMED)
# Find the best match location
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)
# If confidence is high enough, we found it!
if max_val > 0.8: # 80% confidence threshold
h, w = gray_template.shape
top_left = max_loc
bottom_right = (top_left[0] + w, top_left[1] + h)
# Draw rectangle around found object
cv2.rectangle(main_image, top_left, bottom_right, (0, 255, 0), 3)
print(f"Object found with {max_val:.2%} confidence!")
else:
print("Object not found in image")
return main_image
Object Detection: Recognizing Faces and More
Face Detection with Haar Cascades
Face detection is one of the most practical computer vision applications:
def detect_faces_simple(image):
"""Detect faces in an image using pre-trained classifier"""
# Load the pre-trained face detector
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
# Convert to grayscale (face detection works better on grayscale)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Detect faces
faces = face_cascade.detectMultiScale(
gray,
scaleFactor=1.1, # How much to reduce image size at each scale
minNeighbors=5, # How many neighbors each face needs to be valid
minSize=(30, 30) # Minimum face size to detect
)
# Draw rectangles around detected faces
for (x, y, w, h) in faces:
cv2.rectangle(image, (x, y), (x+w, y+h), (255, 0, 0), 2)
print(f"Found {len(faces)} faces!")
return image
# Use it like this:
image_with_faces = detect_faces_simple(your_image)
How face detection works:
- Haar cascades are pre-trained classifiers that learned facial patterns
- The algorithm slides a window across the image at different scales
- For each window, it checks if the pattern looks like a face
- Multiple detections are combined to find final face locations
Video Processing: Working with Moving Images
Video is just a sequence of images (frames) processed one by one:
def process_video_simple(video_path):
"""Basic video processing example"""
# Open the video file
cap = cv2.VideoCapture(video_path)
while True:
# Read one frame
ret, frame = cap.read()
if not ret: # No more frames
break
# Process this frame (example: detect faces)
processed_frame = detect_faces_simple(frame)
# Display the frame
cv2.imshow('Video', processed_frame)
# Press 'q' to quit
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Clean up
cap.release()
cv2.destroyAllWindows()
# For webcam instead of file:
# cap = cv2.VideoCapture(0) # 0 is usually the default camera
Practical OpenCV Projects
Project 1: Color-Based Object Tracking
def track_colored_object(video_source=0):
"""Track objects of a specific color (like a red ball)"""
cap = cv2.VideoCapture(video_source)
# Define color range for red objects (in HSV)
lower_red = np.array([0, 50, 50])
upper_red = np.array([10, 255, 255])
while True:
ret, frame = cap.read()
if not ret:
break
# Convert to HSV color space (better for color detection)
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
# Create mask for red objects
mask = cv2.inRange(hsv, lower_red, upper_red)
# Find contours (object boundaries)
contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Draw circles around detected objects
for contour in contours:
if cv2.contourArea(contour) > 500: # Ignore small objects
(x, y), radius = cv2.minEnclosingCircle(contour)
center = (int(x), int(y))
radius = int(radius)
cv2.circle(frame, center, radius, (0, 255, 0), 2)
cv2.putText(frame, "Red Object", (center[0]-50, center[1]-50),
cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)
cv2.imshow('Color Tracking', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Project 2: Motion Detection Security Camera
def motion_detection_camera(video_source=0):
"""Detect motion and highlight moving objects"""
cap = cv2.VideoCapture(video_source)
# Read first frame as background
ret, background = cap.read()
background = cv2.cvtColor(background, cv2.COLOR_BGR2GRAY)
background = cv2.GaussianBlur(background, (21, 21), 0)
while True:
ret, frame = cap.read()
if not ret:
break
# Prepare current frame
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (21, 21), 0)
# Find difference from background
diff = cv2.absdiff(background, gray)
# Convert difference to binary image
_, thresh = cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)
# Find moving objects
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Draw rectangles around moving objects
for contour in contours:
if cv2.contourArea(contour) > 1000: # Ignore small movements
x, y, w, h = cv2.boundingRect(contour)
cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 0, 255), 2)
cv2.putText(frame, "Motion Detected", (x, y-10),
cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 2)
cv2.imshow('Motion Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Troubleshooting Common Issues
Image Loading Problems
# Always check if image loaded successfully
image = cv2.imread('photo.jpg')
if image is None:
print("Error: Could not load image. Check file path and format.")
Video Capture Issues
# Check if camera/video opened successfully
cap = cv2.VideoCapture(0)
if not cap.isOpened():
print("Error: Could not open video source")
exit()
Display Problems
# Always add these lines when using cv2.imshow()
cv2.waitKey(0) # Wait for key press
cv2.destroyAllWindows() # Close all windows
Best Practices for OpenCV Projects
1. Always Preprocess Your Images
def preprocess_image(image):
"""Standard preprocessing pipeline"""
# 1. Resize if too large (speeds up processing)
if image.shape[1] > 1920:
image = cv2.resize(image, (1920, 1080))
# 2. Remove noise
image = cv2.GaussianBlur(image, (3, 3), 0)
# 3. Convert to grayscale if color not needed
if len(image.shape) == 3:
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
return gray
return image
2. Optimize for Real-Time Processing
# For video processing, reduce frame size for speed
def resize_for_speed(frame):
height, width = frame.shape[:2]
if width > 640:
scale = 640 / width
new_height = int(height * scale)
return cv2.resize(frame, (640, new_height))
return frame
3. Error Handling
def safe_imread(path):
"""Load image with error handling"""
try:
image = cv2.imread(path)
if image is None:
raise ValueError(f"Could not load image: {path}")
return image
except Exception as e:
print(f"Error loading image: {e}")
return None
Conclusion: Your OpenCV Journey
OpenCV transforms you from someone who just views images to someone who can teach computers to see and understand them. Here's what you've learned:
Core Concepts Mastered
- Image Fundamentals: Understanding images as arrays of numbers
- Basic Operations: Loading, resizing, cropping, and color conversion
- Filtering: Blurring, sharpening, edge detection, and noise removal
- Feature Detection: Finding corners, keypoints, and distinctive patterns
- Object Detection: Recognizing faces and tracking objects
- Video Processing: Working with real-time camera feeds
From Here to Computer Vision Expert
Beginner Projects (Start Here):
- Photo enhancement app (blur, sharpen, filters)
- Simple face detection in photos
- Color-based object tracking
Intermediate Projects:
- Motion detection security system
- Document scanner with perspective correction
- Real-time object counting
Advanced Applications:
- Custom object detection with machine learning
- Augmented reality applications
- 3D reconstruction from multiple views
Key Takeaways
- Start Simple: Master basic operations before attempting complex projects
- Preprocess Everything: Clean, resize, and enhance images before analysis
- Experiment with Parameters: Most OpenCV functions have tunable parameters
- Combine Techniques: Real applications use multiple techniques together
- Practice with Real Data: Work with your own images and videos
The Bigger Picture
OpenCV is your gateway to computer vision, but it's just the beginning. As you advance, you'll discover:
- Machine Learning Integration: Using OpenCV with TensorFlow, PyTorch
- Deep Learning: Modern neural networks for object detection and recognition
- Specialized Libraries: Domain-specific tools for medical imaging, robotics, etc.
The computer vision field is rapidly evolving, but the OpenCV fundamentals you've learned provide a solid foundation for any future developments.
Remember: Every expert was once a beginner. Start with simple projects, experiment fearlessly, and gradually tackle more complex challenges. Computer vision is both a technical skill and a creative tool - use it to build something amazing!
Quick Reference
Essential OpenCV Functions
# Loading and saving
cv2.imread('image.jpg')
cv2.imwrite('output.jpg', image)
# Basic operations
cv2.resize(image, (width, height))
cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image[y1:y2, x1:x2] # Cropping
# Filtering
cv2.GaussianBlur(image, (15, 15), 0)
cv2.Canny(gray, 50, 150)
# Detection
cv2.CascadeClassifier().detectMultiScale()
cv2.matchTemplate(image, template, cv2.TM_CCOEFF_NORMED)
# Video
cv2.VideoCapture(0) # Camera
cv2.VideoCapture('video.mp4') # File
References
- Bradski, G., & Kaehler, A. (2008). "Learning OpenCV: Computer Vision with the OpenCV Library."
- Szeliski, R. (2010). "Computer Vision: Algorithms and Applications."
- OpenCV Documentation: https://opencv.org/
- OpenCV Python Tutorials: https://docs.opencv.org/4.x/d0/de3/tutorial_py_intro.html