Scikit-Image Docs
I wanted to explore the scikit-image docs to see what this library has to offer in the ways of image processing.
Scikit-Image
scikit-image is a collection of algorithms for image processing.
References
Getting Started
scikit-image is an image processing Python package that works with numpy arrays. The package is imported as skimage. Most of the functions of skimage are found within submodules. Within skikit-image, images are represented as NumPy arrays, for example 2-D arrays for grayscale 2-D images. The skimage.data submodule provides a set of functions returning example images, that can be used to get started quickly on using scikit-image's functions. You can use natsort to load multiple files.
import skimage as ski
import matplotlib.pyplot as plt
camera = ski.data.camera()
# Images in Library
coins = ski.data.coins()
threshold_value = ski.filters.threshold_otsu(coins)
print(type(camera),camera.shape)
# Load Own Images
import os
keynes = ski.io.imread(os.path.join(os.getcwd(),'..','..','images','keynes.jpg'))
fig, ax = plt.subplots(1,2,layout="constrained")
ax[0].imshow(coins,cmap="gray")
ax[0].set_title("Data From scikit-image")
ax[1].imshow(keynes)
ax[1].set_title("Own Data (Keynes)")
plt.show()
# Load Multiple Images
path_to_img_dir = os.path.join(os.getcwd(),'..','..','images')
from natsort import natsorted, ns
list_files = os.listdir(path_to_img_dir)
list_files = natsorted(list_files)
fig, ax = plt.subplots(3,2,layout="constrained")
image_list = []
file_names = []
for filename in list_files:
image_list.append(ski.io.imread(os.path.join(path_to_img_dir,filename)))
file_names.append(filename.split(".")[0])
print(file_names)
curr = 0
for j in range(2):
for i in range(3):
if curr==1 and i == 2:
break
ax[i][j].imshow(image_list[curr*2 + i+j])
ax[i][j].set_title(file_names[curr*2 + i+j])
if i==2:
curr+=1
ax[2,1].remove()
plt.show()
A Crash Course on NumPy for Images
Images in scikit-image are represented as NumPy ndarrays. Hence, many common operations can be achieved using standard NumPy methods for manipulating arrays. I have already gone over the common methods of indexing of NumPy arrays, so I won't go into depth here. A color image is a NumPy array with an addition trailing dimension for the channels. You can set which axis contains the color with the channel_axis argument. Two-dimensional grayscale images are indexed (row,column). In the case of multichannel images, any dimension can be used for color channels, but the dimension is usually last (channel_axis=-1 is default).
When processing must be done plane wise and since planes are stacked along teh leasing dimensions, teh following syntax can be used:
for pln, image in enumerate(im3d):
img3d[pln] = ski.filters.sobel(image)
import numpy as np
camera = ski.data.camera()
print("Geometry of the image (px):",camera.shape)
print("Number of Pixels:",camera.size)
print("Statistical Information about Intensity Values (min,max,mean):",(camera.min(),camera.max(),camera.mean()))
print("Pixel value:",camera[10,20])
fig, ax = plt.subplots(1,2,layout="constrained")
ax[0].imshow(camera,cmap="gray")
camera[3,10] = 0 # Setting values
camera [:100] = 0
camera[:,-100:] = 255
# Fancy Indexing
inds_r = np.arange(len(camera))
inds_c = 4 * inds_r % len(camera)
camera[inds_r, inds_c] = 0
nrows, ncols = camera.shape
row, col = np.ogrid[:nrows, :ncols]
cnt_row, cnt_col = nrows / 2, ncols / 2
outer_disk_mask = ((row - cnt_row)**2 + (col - cnt_col)**2 >
(nrows / 2)**2)
camera[outer_disk_mask] = 0
ax[1].imshow(camera,cmap="gray")
ax[0].set_title("Before Transformation")
ax[1].set_title("After Transformation")
plt.show()
## Color Images
cat = ski.data.chelsea()
print("Color Images:",type(cat),cat.shape)
print("Color Image Channel Values:",cat[10,20])
fig, ax = plt.subplots(2,2,layout="constrained")
ax[0,0].imshow(cat)
ax[0,0].set_title("Original Image")
red = cat[:,:,0].copy()
green = cat[:,:,0].copy()
blue = cat[:,:,0].copy()
cat[:,:,0] = 0
ax[0,1].imshow(cat)
ax[0,1].set_title("No Red")
cat[:,:,0] = red
cat[:,:,1] = 0
ax[1,0].imshow(cat)
ax[1,0].set_title("No Green")
cat[:,:,1] = green
cat[:,:,2] = 0
ax[1,1].imshow(cat)
ax[1,1].set_title("No Blue")
cat[:,:,2] = blue
plt.show()
eagle = ski.data.eagle()
Image Data Types and What They Mean
Note that float images should be restriced to range [−1,1] even through the data itself can exceed this range. The integer data types can span their whole ranges though. Functions in scikit-image usually can accept any data type but may return a different data type. If the image needs to be converted to a different data type to run the function, it will be converted and a warning message will be printed to the console. Conversion of types can result in a loss in precision. Note that OpenCV uses BGR format instead of RBG. Conversion to RGB from BGR and vice-versa: image = image[:, :, ::-1].When possible, functions should avoid blindly stretching image intensities (e.g. rescaling a float image so that the min and max intensities are 0 and 1), since this can heavily distort an image. Negative values are clipped to 0 when converting from signed to unsigned dtypes.
Handling Video Files
Sometimes it is necessary to read a sequence of images from a standard video file, such as .avi and .mov files.
In a scientific context, it is usually better to avoid these formats in favor of a simple directory of images or a multi-dimensional TIF. Video formats are more difficult to read piecemeal, typically do not support random frame access or research-minded meta data, and use lossy compression if not carefully configured.
Converting Video To Image Sequence
The simplest, surest route is to convert the video to a collection of sequentially numbered image files, often called image sequence. Then the image files can be read into an ImageCollection by skimage.io.read_collection. In FFmpeg, the following command generated an image file from each frame in a video. The files are numbered with five digits, passed on the left with zeros.
ffmpeg -i "video.mov" -f image2 "video-frame%05d.png"
Generating an image sequence can take some time. It is generally preferable to work directly with the original video file. There are several ways to use FFmpeg and LibAV, two open-source project that decode video from a sprawling number of formats, in Python:
Data Visualization
Data visualization takes an important place in image processing. Data can be a single 2D grayscale image or a more complex one with multidimensional aspects.Tools to look into:
Image Adjustment: Transforming Image Content
Color Manipulation
Most functions for manipulating color channels are found in the submodule skimage.color.
Conversion Between Color Models
skimage.color provides utility functions to convert images to and from different color spaces (RGB to HSV). Integer-type arrays can be transformed to floating point type by the conversion operation.
An inverted image is also called complementary image. For binary images, True values become False and conversely. For grayscale images, pixel values are replaced by the difference of the maximum value of the data type and the actual value. For RGB images, the same operation is done for each channel. This operation can be achieved with skimage.util.invert()
from skimage import color
# bright saturated red
red_pixel_rgb = np.array([[[255, 0, 0]]], dtype=np.uint8)
print(color.rgb2hsv(red_pixel_rgb))
# darker saturated blue
dark_blue_pixel_rgb = np.array([[[0, 0, 100]]], dtype=np.uint8)
print(color.rgb2hsv(dark_blue_pixel_rgb))
# less saturated pink
pink_pixel_rgb = np.array([[[255, 100, 255]]], dtype=np.uint8)
print(color.rgb2hsv(pink_pixel_rgb))
# Conversion from RGBA to RGB
img_rgba = ski.data.logo()
img_rgb = ski.color.rgba2rgb(img_rgba)
fig, ax = plt.subplots(1,2,layout="constrained")
ax[0].imshow(img_rgba)
ax[1].imshow(img_rgb)
ax[0].set_title("RGBA")
ax[1].set_title("RGB")
plt.show()
# Conversion From Color to Greyscale
img = ski.data.astronaut()
img_gray = ski.color.rgb2gray(img)
fig, ax = plt.subplots(1,2,layout="constrained")
ax[0].imshow(img)
ax[1].imshow(img_gray,cmap="gray")
ax[0].set_title("Color")
ax[1].set_title("Greyscale")
plt.show()
Geometric Transformations of Images
Cropping, Resizing, and Rescaling Images
Images being NumPy arrays, cropping an image can be done with simple slicing operations. In order to change the shape of the image, skimage.color provides several functions
Projective Transforms (Homographies)
Homographies are transformations of a Euclidean space that preserve the alignment of points. Specific cases of homographies correspond to the conservation of more properties, such as parallelism (affine transformation), shape (similar transformation) or distances (Euclidean transformation).
# Resizing and Scaling
from skimage import data, color
from skimage.transform import rescale, resize, downscale_local_mean
image = color.rgb2gray(data.astronaut())
image_rescaled = rescale(image, 0.25, anti_aliasing=False)
image_resized = resize(
image, (image.shape[0] // 4, image.shape[1] // 4), anti_aliasing=True
)
image_downscaled = downscale_local_mean(image, (4, 3))
Tutorials
Image Segmentation
Image segmentation is the task of labeling the pixels of objects of interest in an image. In this tutorial, we will see how to segment objects in a background. With this image, the segmentation of coins cannot be done directly from the histogram of gray values, because the background shared enough gray levels with the coins that a thresholding segmentation is not sufficient.Simply thresholding the image leads either to missing significant parts of the coins, or to merging parts of the background with the coins. This is due to inhomogeneous lighting. A first idea is to take advantage of the local contrast, that is, to use gradients rather than gray values.
Edge Based Segmentation
To detect edges, we will use the Canny detector of skimage.feature.canny. As the baground is smooth, almost all edges are found at the boundary or inside the coins. When we have the edges, we can fill the inner part of the holes using the scipy.ndimage.binary_fill_holes() function, which uses mathematical morphology to fill the holes. You can use the ndi.label function to remove objects smaller than a threshold (remove noise). One of the coins is not detected: its edge is not closed. Therefor, this segmentation method is not robust.
Region based Segmentation
First determine markers that you can unambiguously label as object or background. These markers will be used n watershed segmentation. The watershed transform floods an image of elevation starting from markers, in order to determine the catchment basins of these markers. Watershed lines separate these catchment basins, and correspond to the desired segmentation. The choice of an elevation map critical for segmentation. Here, the amplitude of the gradient provides a good elevation map. With this method, the result is satisfying for all coins. Even if the markers for the background were not well distributed, the barriers in the elevation map were high enough for these markers to flood the entire background.
coins = ski.data.coins()
hist, hist_centers = ski.exposure.histogram(coins)
fig, ax = plt.subplots(1,2,layout="constrained",figsize=(8,4))
ax[0].imshow(coins,cmap="gray")
ax[1].plot(hist)
ax[1].set_title("histogram of gray values")
plt.show()
coins_100 = coins.copy()
coins_100[coins > 100] = 255
coins_100[coins <= 100] = 0
coins_150 = coins.copy()
coins_150[coins > 150] = 255
coins_150[coins <= 150] = 0
fig, ax = plt.subplots(1,2,layout="constrained",figsize=(8,4))
ax[0].imshow(coins_100,cmap="gray")
ax[0].set_title("coins > 100")
ax[1].imshow(coins_150,cmap="gray")
ax[1].set_title("coins > 150")
plt.show()
edges = ski.feature.canny(coins / 255.)
import scipy as sp
fill_coins = sp.ndimage.binary_fill_holes(edges)
fig, ax = plt.subplots(1,2,layout="constrained",figsize=(8,4))
filled_coins = np.zeros((fill_coins.shape[0],fill_coins.shape[1]))
filled_coins[fill_coins==True] = 255
filled_coins[fill_coins==False] = 0
ax[0].imshow(filled_coins,cmap="gray")
ax[0].set_title("Before Removing Small Objects")
# Removing Small Objects
label_objects, nb_labels = sp.ndimage.label(fill_coins)
sizes = np.bincount(label_objects.ravel())
mask_sizes = sizes > 20
mask_sizes[0] = 0
coins_cleaned = mask_sizes[label_objects]
filled_coins = np.zeros((coins_cleaned.shape[0],coins_cleaned.shape[1]))
filled_coins[coins_cleaned==True] = 255
filled_coins[coins_cleaned==False] = 0
ax[1].imshow(filled_coins,cmap="gray")
ax[1].set_title("After Removing Small Objects")
plt.show()
### Region Based Segmentation
markers = np.zeros_like(coins)
markers[coins < 30] = 1
markers[coins > 150] = 2
elevation_map = ski.filters.sobel(coins)
from matplotlib.ticker import LinearLocator
fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
X = np.array([np.ones(coins.shape[1])*i for i in range(coins.shape[0])])
Y = np.array([np.arange(coins.shape[1]) for i in range(coins.shape[0])])
# Plot the surface.
surf = ax.plot_surface(X, Y, elevation_map, cmap=plt.cm.nipy_spectral,
linewidth=0, antialiased=False)
ax.set_zlim(0, 0.5)
ax.zaxis.set_major_locator(LinearLocator(0))
ax.zaxis.set_major_formatter('{x:.02f}')
ax.view_init(80, 90)
plt.show()
fig, ax = plt.subplots(1,1,layout="constrained")
ax.imshow(elevation_map,cmap="gray")
ax.set_title("elevation map")
plt.show()
markers = np.zeros_like(coins)
markers[coins < 30] = 1
markers[coins > 150] = 2
fig, ax = plt.subplots(1,1,layout="constrained")
ax.imshow(markers,cmap=plt.cm.nipy_spectral)
ax.set_title("markers")
plt.show()
segmentation = ski.segmentation.watershed(elevation_map, markers)