Current Models of Object Recognition to Explain Object Constancy Essay Example | Topics and Well Written Essays

Current Models of Object Recognition to Explain Object Constancy Object recognition is a complex phenomenon, involving constant object representation while dealing with inconsistent visual stimuli. Our perception of an object's identity can endure no matter what forces act on the object in the environment. The object may be compelled to reorient itself, alter in perspective and even fleetingly disappear. A number of models exist that account for object recognition but a good model should also explain the chance of change in stimulus that can hinder object constancy. In reacting or responding to an object, we are only able to do so when we have identified what it is, recognized its main features and bring to mind its meaning. This is simply 'object recognition'. When a lion approaches us, we realise through prior knowledge that the animal is known to be ferocious, dangerous and sometimes doesn't hesitate to kill. With this information we are able to make instant decisions as to the next course of action. A meal at a dinner table tells us it is edible and we can approach it. This 'spatial localisation' is the establishing of where objects are in the surrounding space and time, and is also an important aspect for survival. Another factor necessary for survival is 'perceptual constancy'. This is when objects, although the eyes perceive them to be in motion, are kept constant in terms of appearance. Object recognition, spatial localisation and perceptual constancy are the three main characteristics of perception. One theory that aims to explain object recognition and constancy is Marr's theory which concerns itself with visual processing. It is also called the computational approach which involves taking two dimensional images and extracting valuable three-dimensional information from them. This theory requires examining the levels of grey in an image, creating a rough sketch, then a 2.5D sketch and representing the image as a 3D model. Marr's model of object recognition is concerned with drafting out representations of objects with increasing amount of information. The first step in this representation is creating the raw or full primal sketch. Raw primal sketches contain data regarding the light intensity variations of a shape or scene. A full primal sketch makes use of this data to determine how many outlines and objects are contained within the scene. The light intensity changes provided in the raw primal sketch gives the level of grey representation contained within the pixels of the image. Computation on the properties shape can begin when it has become coloured i.e. detached from the background. Properties like symmetry, centre of mass, size aspect ratios are likely to offer clues as to the object's identity. The centre of mass or medial axis from the skeleton of the object is crucial to analysing the shape of the object. The use of structural primitives and spatial relationships to represent an object lies in the determination of the medial axis. This would enable us to construct a 3D model of the object. Marr explains that a 2.5D surface sketch helps to represent the visible surface. A computer vision system could reconstruct the surface with this process. Boundary detection is difficult even with advanced edge locators, but can be achieved by surface reconstruction. The other model is Biederman's 'recognition by components' theory views all object and forms as being comprised of basic geometrical forms or 'geons'. Pattern recognition therefore is the simple identifying of these separate components. But objects need not necessarily be comprised of different components for them to be recognisable. Simple line drawings may suffice. The drawings of matchstick men, or outlines of cars or buildings, are still recognisable. Additional information such as size, colour, orientation, surface quality paints the whole picture, but it is the overall shape that is of primary importance (Biederman in Atkinson 2000, pp 164). When a silhouette of a four legged animal is shown, it is fairly easy to identify whether it is a dog, cat, rabbit or any other mammal. As dogs come in different shapes, a silhouette of one is still distinguishable from that of a lion or tiger. Further changes in size or colour or texture can also be recognised. These shapes consist of subcomponents which together form the object. Beiderman suggests the existence of 36 different types of geons such as wedges, arcs, blocks, spheres, triangles, cylinders etc. This may be considered a relatively small number to describe all objects but geons may be compared to the language component of phonemes. The English language has 44 different types of phonemes but various combinations of these can adequately handle all spoken words. Hence, 36 geons may be enough to describe all objects from their spatial connections. When the RBC theory is applied to visual objects, the interpretation of geons and their relationships is compared to the information about the objects and their attributes. For example, a mug maybe described as a cylinder with an arc attached to the side. However there are some objects that are unclassifiable such as a vertical spiral or a cylinder which concaves towards the middle. Some shapes are different from the norm but geon coding categorizes them as being identical. Shapes such as half a cylinder, cut along its long axis or a half a ring cut again along its cross section are, according to geon coding, undifferentiated pairs of identical geons. Although this model explains how objects can be recognised or identified, it does little to address the aspect of object organization and differentiation. Along with its inability to classify certain shapes, the RBC theory also has a redundant factor which leads to confusion when attempting to establish their unity and with regards to the variety of shapes. The RBC definitions of a shape's axis and cross section are: (i) The length of the axis is longer than the width of the cross section. (ii) The cross section does not alter and is constant. (iii) The cross section is proportional. An example of this confusion arises when considering a cone. This shape has a vertical axis as well as a circular increasing cross section. The RBC model says that the cone can be depicted as having a circular axis with a triangular cross section. Biederman's theory provides an analysis of the manner in which the geons of an object are established. The edge extraction of an object is an initial step in this theory. The next step must explore the ways in which the object can be divided in order to discover the geons or component parts. Here the RBC theory agrees with the Marr theory in the segmenting of an image into geons. Both concur on the importance of the curved parts of the shape. One of the key aspects of this theory is determining which edge information of a shape remains consistent when viewed from different angles. The theory mentions five invariant characteristics pertaining to edges. These are: points on a curve, points in parallel, edges ending at a common point, and points on a straight line. The template matching model of object recognition is based on visualization of an object or image that is projected on to the retina of an eye, deciphered by the brain and evaluated with a stock of images, associations and patterns. The patterns are known as templates and in order to find an equivalent, various stored patterns are sifted through until one a match is discovered. Template matching techniques are used in applications that use computer vision. Templates can be taught from standard images or generated from models. There are various matching methods such as using edge pixels and examining an image's intensity patches. Template matching entails determining similarities between the characteristics of the template image and the image in question. To detect an object, template matching is carried out by comparing the template to all dimensions, orientations, scales and locations. If a value acquired by using these methods, is equal to one a threshold value, then the probability of a match is high. The drawback of template matching models is that most of them use methods of comparison in makeshift ways. Humphreys says that template matching theories face the obstacle of object constancy, since minor variations in the retinal description of an object due to alterations in viewpoint, decrease the likelihood of a good fit between the template and the viewable object. But this can be overcome to adjust the template matching attributes to assist object constancy (Humphreys 1987. p.45). Object constancy is the perception of an object as the same from any angle, at any distance and any orientation or rotation. Object constancy is a relatively simple task for humans but to make a machine or computer vision attempt the same is a difficult endeavour. When an object is perceived its image falls on the retina but the brain knows it's the same object no matter at which angle or direction it changes. The brain does not need to memorize every single view of the object for it to be considered the same. One of the things that a mind can do is to perceive invariants i.e. the features of an object that remain constant while others change. For example, rotting fruit, even though skin colour, texture and size change, they are still perceived as fruit. One of the key aspects of object constancy is identifying the invariants in visual stimuli that remain the same when the object's position alters. The brain is capable of using the invariants to retrieve the object's shape. A transformation of an object will cause some aspects of it to change while keeping other aspects constant. Invariants are those aspects that remain the same. Take the example of a square. It is a 2D shape with all four sides of equal length. Rotating it will not change the shape since all four sides remain the same and are thus invariant due to rotation. Moving the square vertically or horizontally still keeps the length of the sides invariant. Depending on the observer's viewpoint, rotating the square may not be seen as invariant since it would resemble a diamond shape. If a scaling transformation was performed on the square i.e. made to grow or contract, the length of the sides would vary but it would remain invariant since the four lengths would be equal. View Based vs. Object Based Invariants This can be illustrated with an example. Young children seem only to consider their own viewpoint and do not see an object any other way. When an object which is blue on one side but red on the other is shown to them, then placed in front with the blue side facing in view, children would call the object blue. Scientific studies on the brain suggest that two types of spatial perception exist: Viewpoint based (egocentric) and Object based (allocentric). In the above example, children view objects based on their viewpoint or egocentrically. Although 3D objects cast variances in their shadow on our retina, they are seen to be invariant. A theorem in solid geometry posits that when three points are marked a 3D object's surface, and its 2D shadow is observed, the entire shape of the object can be retrieved from two separate 2D views. Such view-based invariance indicates that a computer vision system can ascertain the 3D object's shape from two views. Reference Frame Object constancy can be achieved in a number of ways. One is to place the object in an imaginary reference frame. Generally when an object is observed, observers identify the object's primary axes elongation to obtain a reference frame. In fact many theories pertaining to object recognition consider the importance of obtaining of a frame of reference using an object's primary axes of elongation (Biederman 1987). But other studies suggest that the function of axes elongation is not as crucial as originally assumed. A frame of reference can help us understand the occasion when there is a failure in object constancy and when it holds. Mach (1886) showed that a square can look like a diamond depending on the orientation of the assigned reference frame. Jolicoeur (1985) demonstrated that the more an object is deviant from its upright position, the longer it takes for a person to name the object. This is especially true for irregular shaped objects that have no real orientation. Research points to the fact that the axes and symmetry of an object is more easily perceived when the object's orientation is such that its symmetry is perpendicular (Palmer & Hemenway, 1978). But what are the criteria used for assigning a reference frame Marr and Biederman assumed that axis elongation was an important factor when assigning a reference frame but studies by Wiser (1981) and Quinlan & Humphreys (1993) on feature visibility and symmetry suggested that elongation was not a crucial factor for deriving a frame of reference in order to recognise an object. Recent studies reveal that affixing reference frame for object recognition depends on axes elongation. A symmetrical object may improve the effect of elongation. Most objects have vertical and horizontal axes of elongation and exist in our environment where vertical and horizontal are generally considered as default frames of reference. Since an objects axes of elongation shares the axes in the frame of reference, object constancy can be perceived. Suggestions have been made that object recognition requires various mental processes such as mental rotation, mental translation and mental zooming. In mental rotation the human or computer vision must already know what the 3D object looks like since rotation brings into view object parts that are previously hidden from view. But effectiveness in mental rotation depends on the complexity of an image. Attneave (1957) defines image complexity as the number of points needed to build a polygon. One of the factors affecting the speed of mental rotation tasks is the complexity of the object. Bethell-Fox and Shepard (1998) and Folk and Luce (1987) found that speed of rotation does depends on complexity. But Cooper (1975), Shepard and Metzler (1998) said that it does not. At cognitive levels mental rotation is similar to object recognition. Biederman (1987) and Marr (1982) suggest that object recognition does not depend on viewing angle or orientation but others have found that irregularly shaped wire objects have weak orientation invariance. Clay surfaces on the other hand showed superior orientation invariance. All these theories pertaining to object recognition have provided useful information into explaining object constancy in humans and the models offer insights into improving current research into computer vision. Most theories share similarities. Biederman's theory of recognition by components has a relationship with and Marr's visual processing theory - the RBC theory was founded on Marr's theory. Jolicoeur showed that frames of reference and an object's orientation were a factor in determining response times of participants asked to identify the objects. Other competing theories on object recognition, all view an object's spatial and dimensional properties as being crucial to the development of models and to the perception of object constancy. References Atkinson, R. L, Atkinson, R. C, Smith, E. E, Bem, D. J, Nolen-Hoeksema, S. (2000) Hilgard's Introduction to Psychology, Thirteenth Edition. Attneave, F. (1957). Physical determinants of the judged complexity of shapes. Journal of Experimental Psychology, 53, 221-227. Bethell-Fox, C. E., & Shepard, R. N. (1988). Mental Rotation of Stimulus Complexity and Familiarity. Journal of Experimental Psychology: Human Perception and Performance, 14(1), 12-23. Biederman, I. (1987). Recognition by components: A theory of human image understanding. Psychological Review, 94, 115 - 147. Cooper, L. A. (1975). Mental Rotation of Random Two-Dimensional Shapes. Cognitive Psychology, 7, 20-43. Folk, M. D., & Luce, R., D. (1987). Effects of Stimulus Complexity on Mental Rotation Rate of Polygons. Journal of Experimental Psychology: Human Perception and Performance, 13(3), 395-404. Humphreys, G. W. & Riddoch, M. J (1987). Visual Object Progressing. Psychology Press. Pp.45 Jolicoeur, P. (1985). The time to name disoriented natural objects. Memory & Cognition, 13(4), 289 - 303. Mach, E. (1886/1959). The analysis of sensations and the relation of the physical to the psychical. New York: Dover. Marr, D. (1982). Vision. San Francisco: Freeman. Palmer, S.E., & Hemenway, K. (1978). Orientation and symmetry: Effects of multiple, rotational, and near symmetries. Journal of Experimental Psychology: Human Perception and Performance, 4, 691 - 702. Quinlan, P.T., & Humphreys, G.W. (1993). Perceptual frames of reference and two - dimensional shape recognition: Further examination of internal axes. Perception, 22, 1343 - 1364. Shepard, S., & Metzler, D. (1988). Mental Rotation: Effects of Dimensionality of Objects and Type of Task. Journal of Experimental Psychology: Human Perception and Performance 14(1), 3-11. Wiser, M. (1981). The role of intrinsic axes in shape recognition. Cognitive Science Society, Berkeley, California. . Read More

Current Models of Object Recognition to Explain Object Constancy - Essay Example

Extract of sample "Current Models of Object Recognition to Explain Object Constancy"

CHECK THESE SAMPLES OF Current Models of Object Recognition to Explain Object Constancy

Is Jealousy a Normal Affective State

Cognitive Psychology: Perception and Memory

Normal and Pathological mourning

Speech Comprehension and the Human Language

Information Systems, ICT; Meta-data semantic languages, Meta Modelling REA and UMM/UML, XBRL

Contextual Prerequisites for Understanding

Constructivist Accounts of Learning and Development

Decision Theory of Perception vs Visual Theory of Perception