ETIS / ANR Saturn - Artificial Vision     

Login
Forgot your Password?


Artificial Vision

The goal of this work is to design an embedded and intelligent implementation of a vision based controller. Thanks to the Embodied Computing approach, this controller will expose properties to be able to adapt the size of the different hardware maps according to the states of its inputs, for vision that means the saliency of the visual information coming from the external environment (Fig. 1).

 Inspired from the feature integration theory [1], a solution coming from psychological models of human visual attention consists in the detection, from the features of the image, of some particularly interesting points named saliency points.
In our application scenario, we are interested by the ones extracted from the edges of the image [2].

Thanks to this saliency points, the application will focus only on the most important informations that will lead it to construct a sparse representation of its external world with a limited number of processing resources.
Our vision system will use three saliency maps [3] as depicted in Fig. 2. These three maps provide the robot with a sensorymotor cognitive capability in order to react and to adapt its behavior to the environment.

Three maps provide the robot with a sensorymotor cognitive capability in order to react and to adapt its behavior to the environment. From left to right: the maps extract 1) static edges, 2) movement detection in the visual scene, and 3) the position of the motors. 
Fig. 1. Three maps provide the robot with a sensorymotor cognitive capability
in order to react and to adapt its behavior to the environment. From left to
right: the maps extract 1) static edges, 2) movement detection in the visual
scene, and 3) the position of the motors.

 For instance, the artificial vision system of our robot stands on a spatio-temporal visual saliency model [20]. In this model the data contained in the input frames are divided in two types: static and dynamic.
The first step of the vision process consists in the extraction of the salient regions for each type of data. This information will be used to balance the allocation through the SOM. This can be done thanks to a retina-like neural network where a first ON-OFF layer (see fig.2) computes the magnitudes of the spatial gradient by convolving a difference of Gaussian kernel and a second layer computes the temporal gradient by differentiating the magnitudes on two consecutive temporal units.

The ON-OFF neurons. The first layer computes the magnitudes of the spatial gradient by convolving a difference of Gaussian kernel and a second layer computes the temporal gradient by differentiating the magnitudes on two consecutive temporal units.

Fig. 2. The ON-OFF neurons. The first layer computes the magnitudes of the spatial gradient by convolving a difference of Gaussian kernel and a second layer computes the temporal gradient by differentiating the magnitudes on two consecutive temporal units.

In a robotic scenario, the robot can also be aware of its actuators’ states. It takes this information into account through another neural network which feeds the third entry of the system as shown on Fig. 6. Then this preprocessed information is used as entry of the self-adaptation layer.

 


[1] A. Treisman, “A feature-integration theory of attention,” Cognitive Psychology, vol. 12, no. 1, pp. 97–136, Jan. 1980. [Online]. Available: http://dx.doi.org/10.1016/0010-0285(80)90005-5

[2] J. L. Crowley, O. Riff, and J. H. Piater, “Fast computation of characteristic scale using a half-octave pyramid,” in In 4th International Workshop on Cognitive Computing, 2002.

[3] L. Itti and C. Koch, “Computational modelling of visual attention," Nature Reviews Neuroscience, vol. 2, no. 3, pp. 194–203, Mar 2001.