Sandipan DeyApr 4, 2018
Some Reinforcement Learning: The Greedy and ExploreExploit Algorithms for the MultiArmed Bandit Framework in Python
In this article the multiarmed bandit framework problem and a few algorithms to solve the problem is going to be discussed. This problem appeared as a lab assignment in the edX course DAT257x: Reinforcement Learning Explained by Microsoft. The problem description is taken from the assignment itself.
The Problem Statement and Some Theory
Given a set of actions with some unknown reward distributions, maximize the cumulative reward by taking the actions sequentially, one action at each time step and obtaining a reward immediately. This is the traditional exploreexploit problem in reinforcement learning. In order to find the optimal action, one needs to explore all the actions but not too much. At the same time, one needs to exploit the best action found sofar by exploring.
The following figure defines the problem mathematically and shows the explorationexploitation dilemma in a general setting of the reinforcement learning problems with sequential decision with incomplete information.
The following figure shows a motivating application of the multiarmed bandit problem in drug discovery. Given a set of experimental drugs (each of which can be considered an arm in the bandit framework) to be applied on a set of patients sequentially, with a reward 1 if a patient survives after the application of the drug and 0 if he dies, the goal is to save as many patients as we can.
The following figures show the naive and a few variants of the greedy algorithms for maximizing the cumulative rewards.
 The Naive RoundRobin algorithm basically chooses every action once to complete a round and repeats the rounds. Obviously it’s far from optimal since it explores too much and exploits little.
 The greedy algorithm tries to choose the arm that has maximum average reward, with the drawback that it may lockon to a suboptimal action forever.
 The epsilon greedy and optimistic greedy algorithms are variants of the greedy algorithm that try to recover from the drawback of the greedy algorithm. Epsilongreedy chooses an action uniformly at random with probability epsilon, whereas
the optimistic greedy algorithm initialized the estimated reward for each action to a high value, in order to prevent locking to a suboptimal action.
The following code from the github repository of the same course shows how the basic bandit Framework can be defined:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103

import numpy as np
import sys
class Environment( object ):
def reset( self ):
raise NotImplementedError( 'Inheriting classes must override reset.' )
def actions( self ):
raise NotImplementedError( 'Inheriting classes must override actions.' )
def step( self ):
raise NotImplementedError( 'Inheriting classes must override step' )
class ActionSpace( object ):
def __init__( self , actions):
self .actions = actions
self .n = len (actions)
class BanditEnv(Environment):
def __init__( self , num_actions = 10 , distribution = "bernoulli" , evaluation_seed = "387" ):
super (BanditEnv, self ).__init__()
self .action_space = ActionSpace( range (num_actions))
self .distribution = distribution
np.random.seed(evaluation_seed)
self .reward_parameters = None
if distribution = = "bernoulli" :
self .reward_parameters = np.random.rand(num_actions)
elif distribution = = "normal" :
self .reward_parameters = (np.random.randn(num_actions),
np.random.rand(num_actions))
elif distribution = = "heavytail" :
self .reward_parameters = np.random.rand(num_actions)
else :
print ( "Please use a supported reward distribution" )
sys.exit( 0 )
if distribution ! = "normal" :
self .optimal_arm = np.argmax( self .reward_parameters)
else :
self .optimal_arm = np.argmax( self .reward_parameters[ 0 ])
def reset( self ):
self .is_reset = True
return None
def compute_gap( self , action):
if self .distribution ! = "normal" :
gap = np.absolute( self .reward_parameters[ self .optimal_arm] 
self .reward_parameters[action])
else :
gap = np.absolute( self .reward_parameters[ 0 ][ self .optimal_arm] 
self .reward_parameters[ 0 ][action])
return gap
def step( self , action):
self .is_reset = False
valid_action = True
if (action is None or action = self .action_space.n):
print ( "Algorithm chose an invalid action; reset reward to inf" )
reward = float ( "inf" )
gap = float ( "inf" )
valid_action = False
if self .distribution = = "bernoulli" :
if valid_action:
reward = np.random.binomial( 1 , self .reward_parameters[action])
gap = self .reward_parameters[ self .optimal_arm] 
self .reward_parameters[action]
elif self .distribution = = "normal" :
if valid_action:
reward = self .reward_parameters[ 0 ][action] + self .reward_parameters[ 1 ][action] * np.random.randn()
gap = self .reward_parameters[ 0 ][ self .optimal_arm]  self .reward_parameters[ 0 ][action]
elif self .distribution = = "heavytail" :
if valid_action:
reward = self .reward_parameters[action] + np.random.standard_cauchy()
gap = self .reward_parameters[ self .optimal_arm]  self .reward_parameters[action]
else :
print ( "Please use a supported reward distribution" )
sys.exit( 0 )
return ( None , reward, self .is_reset, '')
class Policy:
def __init__( self , num_actions):
self .num_actions = num_actions
def act( self ):
pass
def feedback( self , action, reward):
pass

Now in order to implement an algorithm we need to just extend (inherit from) the Policy base class (interface) and implement the functions act() and feedback() for that algorithm (policy).
In order to theoretically analyze the greedy algorithms and find algorithms that have better performance guarantees, let’s define regret as the gap in between the total expected reward with the action chosen by the optimal policy and the cumulative reward with a set of actions chosen by any algorithm (assuming that the reward distributions are known), as shown in the following figure. Hence, maximizing cumulative reward is equivalent to minimizing the regret.
Given that greedy exploits too much an epsilon greedy explores too much, it can be shown that all the greedy variants have regrets linear in the number of timesteps T.
Also, it was theoretically proven by Lai and Robbins,that the lower bound on the regret is logarithmic in the number of timesteps T.
 The next figure shows the reward distributions for a 5armed bandit framework.
 If we take an action A_i, with probability p_i we shall get a reward of 1 and with probability 1p_i we shall get a reward of 0.
 Hence, each arm A_i has reward that is distributed as a Bernoulli random variable with some parameter p_i, so we have R_i ~ B(p_i), i=0..4.
 We first generate the parameters for the distribution corresponding to each arm randomly.
 As we can see, the arm 2 has the highest p_i, so if we choose arm 2, we have the highest probability to get a reward of 1 at any timestep.
 Obviously, the arm 2 is the arm/action that gives the most promising reward and any optimal policy should choose that arm at all timesteps.
Now, let’s assume that we don’t know p_i values and we use the Greedy and the Naive RoundRobin algorithms to maximize the cumulative rewards over 10000 timesteps.
As can be seen from the next animations and figures
 Greedy algorithm locks into the arm/action 0 and can’t find the optimal action.
 Round robin algorithm chooses the actions uniformly and can’t find the optimal action.
 Both Greedy and RoundRobin has linear regrets (w.r.t. timesteps).
 In this case the greedy algorithm even with suboptimal action still performs relatively better than the roundrobin.
Greedy
RoundRobin
 The next figure shows the reward distributions for a 10armed bandit framework.
 If we take an action A_i, with probability p_i we shall get a reward of 1 and with probability 1p_i we shall get a reward of 0.
 Hence, each arm A_i has reward that is distributed as a Bernoulli random variable with some parameter p_i, so we have R_i ~ B(p_i), i=0..9.
 We first generate the parameters for the distribution corresponding to each arm randomly.
 As we can see, the arm 6 has the highest p_i, so if we choose arm 6, we have the highest probability to get a reward of 1 at any timestep.
 Obviously, the arm 6 is the arm/action that gives the most promising reward and any optimal policy should choose that arm at all timesteps.
Again, let’s assume that we don’t know p_i values and we use the EpsilonGreedy (with different values of the hyperparameter ε) and the OptimisticGreedy (with different values of the hyperparameter R) algorithms to maximize the cumulative rewards over 10000 timesteps.
As can be seen from the next animations and figures
 EpsilonGreedy algorithm behaves exactly like Greedy when ε = 0 and behaves randomly when ε = 1. For both of these cases, the εgreedy algorithm has linear regret.
 ε = 0.1 and ε = 0.15 find the optimal arm 6 eventually and they have sublinear regrets.
 OptimisticGreedy algorithm behaves exactly like Greedy when R = 0 and behaves randomly when R = 10000. For both of these cases, the εgreedy algorithm has linear regret.
 R = 3 and R = 5 find the optimal arm 6 eventually and they have sublinear regrets(w.r.t. timesteps).
EpsilonGreedy
ε = 0
ε = 0.05
ε = 0.1
ε = 0.15
ε = 1.0
Optimistic Greedy
R = 0
R = 1
R = 3
R = 5
R = 10000
The next figure shows two algorithms (UCB and Bayesian ThompsonBeta Posterior Sampling) that achieve logarithmic regret.
 The next figure shows the reward distributions for a 10armed bandit framework.
 If we take an action A_i, with probability p_i we shall get a reward of 1 and with probability 1p_i we shall get a reward of 0.
 Hence, each arm A_i has reward that is distributed as a Bernoulli random variable with some parameter p_i, so we have R_i ~ B(p_i), i=0..9.
 We first generate the parameters for the distribution corresponding to each arm randomly.
 As we can see, the arm 4 has the highest p_i, so if we choose arm 4, we have the highest probability to get a reward of 1 at any timestep.
 Obviously, the arm 4 is the arm/action that gives the most promising reward and any optimal policy should choose that arm at all timesteps.
Again, let’s assume that we don’t know p_i values, we implement and use the UCB1 and ThompsonBeta algorithms to maximize the cumulative rewards over 10000 timesteps.
As can be seen from the next animations and figures
 Both the algorithms find the optimal arm 4 pretty quickly without much of exploration.
 Both the algorithms achieve logarithmic regret when the (unknown) reward distribution is Bernoulli.
 For posterior sampling with Thompson Beta, each arm’s reward is sampled from the posterior β distribution from the same exponential family with the unknown Bernoulli distributed rewards, starting with the noninformative flat prior.
 Posterior R_i ~ β(a_i, b_i) where a_i is the number of times we obtained a reward 0 and b_i is the number of times we obtained a reward 1, when the arm A_i was drawn, as shown in the figure.
 The Thompson sampling gives linear regret when the (unknown) reward distribution is normal.
UCB
Thompson Beta
Thompson Normal
>
Sandipan DeyFeb 28, 2018
Implementing LucasKanade Optical Flow algorithm in Python
In this article an implementation of the LucasKanade optical flow algorithm is going to be described. This problem appeared as an assignment in a computer vision course from UCSD. The inputs will be sequences of images (subsequent frames from a video) and the algorithm will output an optical flow field (u, v) and trace the motion of the moving objects. The problem description is taken from the assignment itself.
Problem Statement
SingleScale Optical Flow
 Let’s implement the singlescale LucasKanade optical flow algorithm. This involves finding the motion (u, v) that minimizes the sumsquared error of the brightness constancy equations for each pixel in a window. The algorithm will be implemented as a function with the following inputs:
def optical_flow(I1, I2, window_size, tau) # returns (u, v)
 Here, u and v are the x and y components of the optical flow, I1 and I2 are two images taken at times t = 1 and t = 2 respectively, and window_size is a 1 × 2 vector storing the width and height of the window used during flow computation.
 In addition to these inputs, a theshold τ should be added, such that if τ is larger than the smallest eigenvalue of A’A, then the the optical flow at that position should not be computed. Recall that the optical flow is only valid in regions where
has rank 2, which is what the threshold is checking. A typical value for τ is 0.01.
 We should try experimenting with different window sizes and find out the tradeoffs associated with using a small vs. a large window size.
 The following figure describes the algorithm, which considers a nxn (n>=3) window around each pixel and solves a leastsquare problem to find the best flow vectors for the pixel.
 The following codesnippet shows how the algorithm is implemented in python for a graylevel image.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

import numpy as np
from scipy import signal
def optical_flow(I1g, I2g, window_size, tau = 1e  2 ):
kernel_x = np.array([[  1. , 1. ], [  1. , 1. ]])
kernel_y = np.array([[  1. ,  1. ], [ 1. , 1. ]])
kernel_t = np.array([[ 1. , 1. ], [ 1. , 1. ]])
w = window_size / 2
I1g = I1g / 255.
I2g = I2g / 255.
mode = 'same'
fx = signal.convolve2d(I1g, kernel_x, boundary = 'symm' , mode = mode)
fy = signal.convolve2d(I1g, kernel_y, boundary = 'symm' , mode = mode)
ft = signal.convolve2d(I2g, kernel_t, boundary = 'symm' , mode = mode) +
signal.convolve2d(I1g,  kernel_t, boundary = 'symm' , mode = mode)
u = np.zeros(I1g.shape)
v = np.zeros(I1g.shape)
for i in range (w, I1g.shape[ 0 ]  w):
for j in range (w, I1g.shape[ 1 ]  w):
Ix = fx[i  w:i + w + 1 , j  w:j + w + 1 ].flatten()
Iy = fy[i  w:i + w + 1 , j  w:j + w + 1 ].flatten()
It = ft[i  w:i + w + 1 , j  w:j + w + 1 ].flatten()
nu = ...
u[i,j] = nu[ 0 ]
v[i,j] = nu[ 1 ]
return (u,v)

Some Results
 The following figures and animations show the results of the algorithm on a few image sequences. Some of these input image sequences / videos are from the course and some are collected from the internet.
 As can be seen, the algorithm performs best if the motion of the moving object(s) in between consecutive frames is slow. To the contrary, if the motion is large, the algorithm fails and we should implement / use multiplescale version LucasKanade with image pyramids.
 Finally, with small window size, the algorithm captures subtle motions but not large motions. With large size it happens the other way.
Input Sequences
Output Optical Flow with different window sizes
window size = 15
window size = 21
Input Sequences
Output Optical Flow
Input Sequences (hamburg taxi)
Output Optical Flow
Input Sequences
Output Optical Flow
Input Sequences
Output Optical Flow
Input Sequences
Output Optical Flow
Input Sequences
Output Optical Flow
Input Sequences
Output Optical Flow
Input Sequences
Output Optical Flow
Input Sequences
Output Optical Flow
Output Optical Flow
Input Sequences
Output Optical Flow with window size 45
Output Optical Flow with window size 10
Output Optical Flow with window size 25
Output Optical Flow with window size 45
Implementing LucasKanade Optical Flow algorithm in Python
Sandipan DeyFeb 28, 2018
GraphBased Image Segmentation in Python
In this article, an implementation of an efficient graphbased image segmentation technique will be described, this algorithm was proposed by Felzenszwalb et. al. from MIT. The slides on this paper can be found from Stanford Vision Lab.. The algorithm is closely related to Kruskal’s algorithm for constructing a minimum spanning tree of a graph, as stated by the author and hence can be
implemented to run in O(m log m) time, where m is the number of edges in the graph.
Problem Definition and the basic idea (from the paper)
 Let G = (V, E) be an undirected graph with vertices vi ∈ V, the set of elements to be segmented, and edges (vi, vj ) ∈ E corresponding to pairs of neighboring vertices. Each edge (vi, vj ) ∈ E has a corresponding weight w((vi, vj)), which is a nonnegative measure of the dissimilarity between neighboring elements vi and vj.

In the case of image segmentation, the elements in V are pixels and the weight of an edge is some measure of the dissimilarity between the two pixels connected by that edge (e.g., the difference in intensity, color, motion, location or some other local attribute).

Particularly for the implementation described here, an edge weight functionbased on the absolute intensity difference (in the yiq space) between the pixels connected by an edge, w((vi, vj )) = I(pi) − I(pj ).

In the graphbased approach, a segmentation S is a partition of V into components
such that each component (or region) C ∈ S corresponds to a connected component
in a graph G0 = (V, E0), where E0 ⊆ E.

In other words, any segmentation is induced by a subset of the edges in E. There are different ways to measure the quality of a segmentation but in general we want the elements in a component to be similar, and elements in different components to be dissimilar.

This means that edges between two vertices in the same component should have relatively low weights, and edges between vertices in different components should have higher weights.

The next figure shows the steps in the algorithm. The algorithm is very similar to Kruskal’s algorithm for computing the MST for an undirected graph.

The threshold function τ controls the degree to which the difference between two
components must be greater than their internal differences in order for there to be
evidence of a boundary between them.

For small components, Int(C) is not a good estimate of the local characteristics of the data. In the extreme case, when C = 1, Int(C) = 0. Therefore, a threshold function based on the size of the component, τ (C) = k/C is needed to be used, where C denotes the size of C, and k is some constant parameter.

That is, for small components we require stronger evidence for a boundary. In practice k sets a scale of observation, in that a larger k causes a preference for larger components.

In general, a Gaussian filter is used to smooth the image slightly before computing the edge weights, in order to compensate for digitization artifacts. We always use a Gaussian with σ = 0.8, which does not produce any visible change to the image but helps remove artifacts.

The following python code shows how to create the graph.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

import numpy as np
from scipy import signal
import matplotlib.image as mpimg
def gaussian_kernel(k, s = 0.5 ):
probs = [exp(  z * z / ( 2 * s * s)) / sqrt( 2 * pi * s * s) for z in range (  k,k + 1 )]
return np.outer(probs, probs)
def create_graph(imfile, k = 1. , sigma = 0.8 , sz = 1 ):
rgb = mpimg.imread(imfile)[:,:,: 3 ]
gauss_kernel = gaussian_kernel(sz, sigma)
for i in range ( 3 ):
rgb[:,:,i] = signal.convolve2d(rgb[:,:,i], gauss_kernel, boundary = 'symm' , mode = 'same' )
yuv = rgb2yiq(rgb)
(w, h) = yuv.shape[: 2 ]
edges = {}
for i in range (yuv.shape[ 0 ]):
for j in range (yuv.shape[ 1 ]):
for i1 in range (i  1 , i + 2 ):
for j1 in range (j  1 , j + 2 ):
if i1 = = i and j1 = = j: continue
if i1 > = 0 and i1 = 0 and j1 < h:
wt = np. abs (yuv[i,j, 0 ]  yuv[i1,j1, 0 ])
n1, n2 = ij2id(i,j,w,h), ij2id(i1,j1,w,h)
edges[n1, n2] = edges[n2, n1] = wt
return edges

Some Results
 The images are taken from the paper itself or from the internet. The following figures and animations show the result of segmentation as a result of iterative merging of the components (by choosing least weight edges), depending on the internal difference of the components.
 Although in the paper the author described the best value of the parameter k to be around 300, but since in this implementation the pixel RGB values are normalized (to have values in between 0 – 1) and then converted to YIQ values and the YIQ intensities are used for computing the weights (which are typically very small), the value of k that works best in this scenario is 0.0010.01.
 As we can see from the below results, higher the value of the parameter k, larger the size of the final component and lesser the number of components in the result.
 The minimum spanning tree creation is also shown, the red edges shown in the figures are the edges chosen by the algorithm to merge the components.
Input Image
Output Images for two different values of the parameter k
The forest created after a few iterations
Input Image
Output Images for two different values of the parameter k
The forest created after a few iterations
Input Image
Output Segmented Images
Input Image
Segmented Output Images
Input Image
Output Images for two different values of the parameter k
The forest created after a few iterations
Input Image (UMBC)
Segmented Output Images
Input Image
Segmented Output Image with k=0.001
Input Image
Segmented Output Image with k=0.001
Input Image
Segmented Output Image with k=0.001
Input Image (liver)
Segmented Output Images
Input Image
Segmented Output Images with different values of k
Input Image
Segmented Output Image
Input Image
Segmented Output Images for different k
GraphBased Image Segmentation in Python
Sandipan DeyFeb 13, 2018
Image Colorization Using Optimization in Python
This article is inspired by this SIGGRAPH paper by Levin et. al, for which they took this patent , the paper was referred to in the course CS1114 from Cornell. This method is also discussed in the coursera online image processing course by NorthWestern University. Some part of the problem description is taken from the paper itself. Also, one can refer to the implementation provided by the authors in matlab, the following link and the following python implementation in github.
The Problem
Colorization is a computerassisted process of adding color to a monochrome image or movie. In the paper the authors presented an optimizationbased colorization method that is based on a simple premise: neighboring pixels in spacetime that have similar intensities should have similar colors.
This premise is formulated using a quadratic cost function and as an optimization problem. In this approach an artist only needs to annotate the image with a few color scribbles, and the indicated colors are automatically propagated in both space and time to produce a fully colorized image or sequence.
In this article the optimization problem formulation and the way to solve it to obtain the automatically colored image will be described for the images only.
The Algorithm
YUV/YIQ color space is chosen to work in, this is commonly used in video, where Y is the monochromatic luminance channel, which we will refer to simply as intensity, while Uand V are the chrominance channels, encoding the color.
The algorithm is given as input an intensity volume Y(x,y,t) and outputs two color volumes U(x,y,t) and V(x,y,t). To simplify notation the boldface letters are used (e.g. r,s) to denote triplets. Thus, Y(r) is the intensity of a particular pixel.
Now, One needs to impose the constraint that two neighboring pixels r,s should have similar colors if their intensities are similar. Thus, the problem is formulated as the following optimization problem that aims to minimize the difference between the colorU(r) at pixel r and the weighted average of the colors at neighboring pixels, where w(r,s)is a weighting function that sums to one, large when Y(r) is similar to Y(s), and small whenthe two intensities are different.
When the intensity is constant the color should be constant, and when the intensity is an edge the color should also be an edge. Since the cost functions are quadratic and
the constraints are linear, this optimization problem yields a large, sparse system of linear equations, which may be solved using a number of standard methods.
As discussed in the paper, this algorithm is closely related to algorithms proposed for other tasks in image processing. In image segmentation algorithms based on normalized cuts [Shi and Malik 1997], one attempts to find the second smallest eigenvector of the matrix D − W where W is a npixels×npixels matrix whose elements are the pairwise affinities between pixels (i.e., the r,s entry of the matrix is w(r,s)) and D is a diagonal matrix whose diagonal elements are the sum of the affinities (in this case it is always 1). The second smallest eigenvector of any symmetric matrix A is a unit norm vector x that minimizes and is orthogonal to the first eigenvector. By direct inspection, the quadratic form minimized by normalized cuts is exactly the cost function J, that is . Thus, this algorithm minimizes the same cost function but under different constraints. In image denoising algorithms based on anisotropic diffusion[Perona and Malik 1989; Tang et al. 2001] one often minimizes a function
similar to equation 1, but the function is applied to the image intensity as well.
The following figures show an original grayscale image and the marked image with colorscribbles that are going to be used to compute the output colored image.
Original Grayscale Image Input
Grayscale image Input Marked with ColorScribbles
Implementation of the Algorithm
Here are the the steps for the algorithm:
 Convert both the original grayscale image and the marked image (marked with color scribbles for a few pixels by the artist) to from RGB to YUV / YIQ color space.
 Compute the difference image from the marked and the grayscale image. The pixels that differ are going to be pinned and they will appear in the output, they are directly going to be used as standalone constraints in the minimization problem.
 We need to compute the weight matrix W that depends on the similarities in the neighbor intensities for a pixel from the original grayscale image.
 The optimization problem finally boils down to solving the system of linear equations of the form that has a closed form leastsquare solution
. Same thing is to be repeated for the V channel too.
 However the W matrix is going to be very huge and sparse, hence sparsematrix based implementations must be used to obtain an efficient solution. However, in this python implementation in github, the scipy sparse lil_matrixwas used when constructing the sparse matrices, which is quite slow, we can construct more efficient scipy csc matrix rightaway, by using a dictionary to store the weights initially. It is much faster. The python code in the next figure shows my implementation for computing the weight matrix W.
 Once W is computed it’s just a matter of obtaining the leastsquare solution, by computing the pseudoinverse, which can be more efficiently computed with LU factorization and a sparse LU solver, as in this python implementation in github.
 Once the solution of the optimization problem is obtained in YUV / YIQ space, it needs to be converted back to RGB. The following formula is used for conversion.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

import scipy.sparse as sp
from collections import defaultdict
def compute_W(Y, colored):
(w, h) = Y.shape
W = defaultdict()
for i in range (w):
for j in range (h):
if not (i, j) in colored:
(N, sigma) = get_nbrs(Y, i, j, w, h)
Z = 0.
id1 = ij2id(i,j,w,h)
for (i1, j1) in N:
id2 = ij2id(i1,j1,w,h)
W[id1,id2] = np.exp(  (Y[i,j]  Y[i1,j1]) * * 2 / (sigma * * 2 )) if sigma > 0 else 0.
Z + = W[id1,id2]
if Z > 0 :
for (i1, j1) in N:
id2 = ij2id(i1,j1,w,h)
W[id1,id2] / =  Z
for i in range (w):
for j in range (h):
id = ij2id(i,j,w,h)
W[ id , id ] = 1.
rows, cols = zip ( * (W.keys()))
data = W.values()
return sp.csc_matrix((data, (rows, cols)), shape = (w * h, w * h))

Results
The following images and animations show the results obtained with the optimization algorithm. Most of the following images are taken from the paper itself.
Original Grayscale Image Input Grayscale image Input Marked
Color image Output
The next animations show how the incremental scribbling results in better and better color images.
Original Grayscale Image Input
As can be seen from the following animation, the different parts of the building get colored as more and more colorhints are scribbled / annotated.
Grayscale image Input Marked
Color image Output
Original Grayscale Image Input
Grayscale image Input Marked
Color image Output
Original Grayscale Image Input (me)
Grayscale image Input Marked incrementally
Color image Output
Original Grayscale Image Input
Grayscale image Input Marked
Color image Output
Original Grayscale Image Input
Grayscale image Input Marked
https://sandipanweb.files.wordpress.com/2018/01/madurga_gray_marked.png?w=150&h=128 150w” sizes=”(maxwidth: 413px) 100vw, 413px” />
Color image Output
Image Colorization Using Optimization in Python
Sandipan DeyFeb 13, 2018
Interactive Image Segmentation with GraphCut in Python
In this article, interactive image segmentation with graphcut is going to be discussed. and it will be used to segment the source object from the background in an image. This segmentation technique was proposed by Boycov and Jolli in this paper.
Problem Statement: Interactive graphcut segmentation
Let’s implement “intelligent paint” interactive segmentation tool using graph cuts algorithm on a weighted image grid. Our task will be to separate the foreground object from the background in an image.
Since it can be difficult sometimes to automatically define what’s foreground and what’s background for an image, the user is going to help us with a few interactive scribble lines using which our algorithm is going to identify the foreground and the background, after that it will be the algorithms job to obtain a complete segmentation of the foreground from the background image.
The following figures show how an input image gets scribbling from a user with two different colors (which is also going to be input to our algorithm) and the ideal segmented image output.
Scribbled Input Image Expected Segmented Output Image
The GraphCut Algorithm
The following describes how the segmentation problem is transformed into a graphcut problem:
 Let’s first define the Directed Graph G = (V, E) as follows:
 Each of the pixels in the image is going to be a vertex in the graph. There will be another couple of special terminal vertices: a source vertex (corresponds to the foreground object) and a sink vertex (corresponds to the background object in the image). Hence, V(G) = width x height + 2.
 Next, let’s defines the edges of the graph. As obvious, there is going to be two types of edges: terminal (edges that connect the terminal nodes to the nonterminal nodes) and nonterminal (edges that connect the nonterminal nodes only).
 There will be a directed edge from the terminal node source to each of nonterminal nodes in the graph. Similarly, a directed edge will be there from each nonterminal node (pixel) to the other terminal node sink. These are going to be all the terminal edges and hence, E_T(G) = 2 x width x height.
 Each of the nonterminal nodes (pixels) are going to be connected by edges with the nodes corresponding to the neighboring pixels (defined by 4 or 8 neighborhood of a pixel). Hence, E_N(G) = Nbd x width x height.
 Now let’s describe how to compute the edge weights in this graph.
 In order to compute the terminal edge weights, we need to estimate the feature distributions first, i.e., starting with the assumption that each of the nodes corresponding to the scribbled pixels have the probability 1.0 (since we want the solution to respect the regional hard constraints marked by the userseeds / scribbles) to be in foreground or background object in the image (distinguished by the scribble color, e.g.), we have to compute the probability that a node belongs to the foreground (or background) for all the other nonterminal nodes.
 The simplest way to compute and is to first fit a couple of Gaussian distributions on the scribbles by computing the parameters
(μ, ∑) with MLE from the scribbled pixel intensities and then computing the (classconditional) probabilities from the individual pdfs (followed by a normalization) for each of the pixels as shown in the next figures. The following code fragment show how the pdfs are being computed.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

import numpy as np
from collections import defaultdict
def compute_pdfs(imfile, imfile_scrib):
rgb = mpimg.imread(imfile)[:,:,: 3 ]
yuv = rgb2yiq(rgb)
rgb_s = mpimg.imread(imfile_scrib)[:,:,: 3 ]
yuv_s = rgb2yiq(rgb_s)
scribbles = find_marked_locations(rgb, rgb_s)
imageo = np.zeros(yuv.shape)
comps = defaultdict( lambda :np.array([]).reshape( 0 , 3 ))
for (i, j) in scribbles:
imageo[i,j,:] = rgbs[i,j,:]
comps[ tuple (imageo[i,j,:])] = np.vstack([comps[ tuple (imageo[i,j,:])], yuv[i,j,:]])
mu, Sigma = {}, {}
for c in comps:
mu[c] = np.mean(comps[c], axis = 0 )
Sigma[c] = np.cov(comps[c].T)
return (mu, Sigma)

3. In order to compute the nonterminal edge weights, we need to compute the similarities in between a pixel node and the nodes corresponding to its neighborhood pixels, e.g., with the formula shown in the next figures (e.g., how similar neighborhood pixels are in RGB / YIQ space).
 Now that the underlying graph is defined, the segmentation of the foreground from the background image boils down to computing the mincut in the graph or equivalently computing the maxflow (the dual problem) from the source to sink.
 The intuition is that the mincut solution will keep the pixels with high probabilities to belong to the side of the source (foreground) node and likewise the background pixels on the other side of the cut near the sink (background) node, since it’s going to respect the (relatively) highweight edges (by not going through the highlysimilar pixels).
 There are several standard algorithms, e.g., FordFulkerson (by finding an augmenting path with O(E max f ) time complexity) or EdmondsKarp (by using bfs to find the shortest augmenting path, with O(VE^{2}) time complexity) to solve the maxflow problem that run fast, in polynomial time in V or E. Here we are going to use an implementation (with pymaxflow) based on Vladimir Kolmogorov, which is shown to run faster on some images empirically.
Results
The following figures / animations show the interactivesegmentation results (computed probability densities, subset of the flowgraph & mincut, final segmented image) on a few images, some of them taken from the abovementioned courses / videos, some of them taken from Berkeley Vision dataset.
Input Image
Input Image with Scribbles
Fitted Densities from Color Scribbles
A Tiny Subgraph with MinCut
Input Image
Input Image with Scribbles
Fitted Densities from Color Scribbles
A Tiny Subgraph with MinCut
Input Image (liver)
Input Image with Scribbles
Input Image
Input Image with Scribbles
Input Imagehttps://sandipanweb.files.wordpress.com/2018/02/bunny.png?w=150&h=94 150w, https://sandipanweb.files.wordpress.com/2018/02/bunny.png?w=300&h=187 300w” sizes=”(maxwidth: 681px) 100vw, 681px” />
Input Image with Scribbles
Fitted Densities from Color Scribbles
A Tiny Subgraph of the flowgraph with MinCut
Input Image
Input Image with Scribbles
A Tiny Subgraph of the flowgraph with MinCut
Input Image Input Image with Scribbles
Input Image
Input Image with Scribbles
Fitted Densities from Color Scribbles
Input Image
Input Image with Scribbles
Fitted Densities from Color Scribbles
A Tiny Subgraph of the flowgraph with MinCut
Input Image (UMBC Campus Map)
Input Image with Scribbles
Input Image
Input Image with Scribbles
A Tiny Subgraph of the flowgraph with MinCut (with blue foreground nodes)
Changing the background of an image (obtained using graphcut segmentation) with another image’s background with cut & paste
The following figures / animation show how the background of a given image can be replaced by a new image using cut & paste (by replacing the corresponding pixels in the new image corresponding to foreground), once the foreground in the original image gets identified after segmentation.
Original Input Image
New Background
Interactive Image Segmentation with GraphCut in Python
Sandipan DeyOct 24, 2017
Some Computational Photography: Image Quilting (Texture Synthesis) with Dynamic Programming and Texture Transfer in Python
The following problems appeared as a programming assignment in the Computation Photography course (CS445) at UIUC. The description of the problem is taken from the assignment itself. In this assignment, a python implementation of the problems will be described instead of matlab, as expected in the course.
The Problems
 The goal of this assignment is to implement the image quilting algorithm for
texture synthesis and transfer, described in this SIGGRAPH 2001 paper by Efros
and Freeman.
 Texture synthesis is the creation of a larger texture image from a small sample.
 Texture transfer is giving an object the appearance of having the
same texture as a sample while preserving its basic shape.
 For texture synthesis, the main idea is to sample patches and lay them down in overlapping patterns, such that the overlapping regions are similar.
 The overlapping regions may not match exactly, which will result in noticeable
edges artifact. To fix this, we need to compute a path along pixels with similar intensities through the overlapping region and use it to select which overlapping patch from which to draw each pixel.
 Texture transfer is achieved by encouraging sampled patches to have similar appearance to a given target image, as well as matching overlapping regions of already sampled patches.
 In this project, we need to apply important techniques such as template matching, finding seams, and masking. These techniques are also useful for image stitching, image completion, image retargeting and blending.
Randomly Sampled Texture
First let’s randomly samples square patches of size patchsize from sample in order to create an output image of size outsize. Start from the upperleft corner, and tile samples until the image is full. If the patches don’t fit evenly into the output image, we can leave black borders at the edges. This is the simplest but least effective method. Save a result from a sample image to compare to the next two methods.
Overlapping Patches
Let’s start by sampling a random patch for the upperleft corner. Then sample new patches to overlap with existing ones. For example, the second patch along the top row will overlap by patchsize pixels in the vertical direction and overlap pixels in the horizontal direction. Patches in the first column will overlap by patchsize pixels in the horizontal direction and overlap pixels in the vertical direction. Other patches will have two overlapping regions (on the top and left) which should both be taken into account. Once the cost of each patch has been computed, randomly choose on patch whose cost is
less than a threshold determined by some tolerance value.
As described in the paper, the size of the block is the only parameter controlled by the user and it depends on the properties of a given texture; the block must be big enough to capture the relevant structures in the texture, but small enough so that the interaction between these structures is left up to the algorithm. The overlap size is taken to be onesixth of the block size (B/6) in general.
Seam Finding
Next we need to find the mincost contiguous path from the left to right side of the patch according to the cost. The cost of a path through each pixel is the square differences (summed over RGB for color images) of the output image and the newly
sampled patch. Use dynamic programming to find the mincost path.
The following figure describes the algorithm to be implemented for image quilting.
Texture Transfer
The final task is to implement texture transfer, based on the quilt implementation for creating a texture sample that is guided by a pair of sample/target correspondence images (section 3 of the paper). The main difference between this function and
quilt function is that there is an additional cost term based on the difference between
the sampled source patch and the target patch at the location to be filled.
Image quilting (texture synthesis) results
The following figures and animations show the results of the outputs obtained with the quilting algorithm. The input texture images are mostly taken from the paper .
Input sample Texture
100 sampled blocks of a fixed size (e.g. 50×50) from the input sample
The next animation shows how the large output texture gets created (100 times larger than the input sample texture) with the quilting algorithm.
Output Texture (10×10 times larger than the input) created with texture synthesis (quilting)
Input Texture
Output Texture (25 times larger than the input) created with texture synthesis (quilting) with the minimum cost seams (showed as red lines) computed with dynamic programming
Output Texture (25 times larger than the input) created with quilting
Input Texture
Output Texture (25 times larger than the input) created with quilting
Input Texture
Output Texture (25 times larger than the input) created with quilting
https://sandipanweb.files.wordpress.com/2017/10/quilt_q3.png?w=150 150w, https://sandipanweb.files.wordpress.com/2017/10/quilt_q3.png?w=300 300w” sizes=”(maxwidth: 632px) 100vw, 632px” />
Input Texture
Output Texture (12 times larger than the input) created with quilting
Input Texture
Output Texture (25 times larger than the input) created with quilting
Input Texture
Output Texture (25 times larger than the input) created with quilting
Input Texture
Output Texture (36 times larger than the input) created with quilting
Input Texture
Output Texture (9 times larger than the input) created with quilting
Input Texture
Output Texture (25 times larger than the input) created with quilting
Input Texture
Output Texture (9 times larger than the input) created with quilting along with the mincost seams (shown as red lines) computed with dynamic programming
Output Texture (9 times larger than the input) created with quilting
Texture transfer results
The following figures and animations show how the texture from an input image can be transferred to the target image using the abovementioned simple modification of the quilting algorithm. Again, some of the images are taken from the paper.
Input Texture (milk)
Target Image
Output Image after Texture Transfer
Input Texture (milk)
Target Image
Output Image after Texture Transfer
The following figure show the output image obtained when a few textures were transferred to my image.
Target Image (me)
Input Texture (fire)
Output Image after Texture Transfer (with small block size)
Input Texture (cloud)
Output Image after Texture Transfer (with small block size)
Input Texture (toast)
Output Image after Texture Transfer (with small block size)
Input Texture
Output Image after Texture Transfer (with small block size)
Input Texture
Output Image after Texture Transfer (with small block size)
The following animation shows how the milk texture is being transformed to the target image with the quilting algorithm with modified code.
Input Texture
Target Image (me)
The next figures and animations show the output image obtained after milk texture gets transferred to the target image of mine, for different block size of the samples(shown in red). As can be seen from the following outputs, the target image gets more and more prominent as the sampling block size gets smaller and smaller.
Some Computational Photography: Image Quilting (Texture Synthesis) with Dynamic Programming and Texture Transfer in Python
Recent Comments