We will look at three Computer Vision tasks, namely-
Some Important KeyPoints -
Question 1: How to think of this as a Sequential Decision Making Problem?
Answer 1: At each time step, the agent should decide in which region of the image to focus its attention so that it can find objects in a few steps.
Question 2: How to cast this as a Markov Decision Process?
Answer 2: We cast the problem of object localization as a Markov decision process (MDP) since this setting provides a formal framework to model an agent that makes a sequence of decisions. We will try to identify the components of MDP, the set of actions A, a set of states S, and a reward function R.
The set of actions A is composed of eight transformations that can be applied to the box and one action to terminate the search process.
Motivation behind the history vector: Useful to stabilize search trajectories that might get stuck in repetitive cycles, improving average precision by approximately 3 percent points.
Motivation:
where, b = be the box of an observable region, and
g = the ground truth box for a target object
Explanation:
Question 1: How to cast this as a Markov Decision Process?
Answer 1: The problem is cast as a MDP, in which the agent interacts with the environment and makes a sequence of decisions to achieve the settled goal.
The idea behind this irregular action, is to translate the temporal window to a new position away from the current site to avoid that the agent traps itself round the present location when there is no motion occurin nearby.
Motivation behind the history vector: Useful to stabilize search trajectories that might get stuck in repetitive cycles, improving average precision by approximately 3 percent points.
Motivation:
where, w and w' are attended windows corresponding to state s and s' respectively.
n = number of groundtruths within the video
1) Present better algorithm than the existing inefficient search algorithms that explore the region of interest and select the best candidate by matching with the tracking model, and
2) Presenting a method to train using unlabeled frames in a semi-supervised case.
Visual tracking solves the problem of finding the position of the target in a new frame from the current position.
The proposed tracker dynamically pursues the target by sequential actions
The tracker is defined as an agent of whose goal is to capture the target with a bounding box shape.