2-D Sensor

An image sensor that discerns the horizontal and vertical location of objects in front of it, but not their distance from it.

3-D Sensor

An image sensor that discerns not only objects' horizontal and vertical locations, but also their distance (i.e. depth) from it, by means of techniques such as stereo sensor arrays, structured light or time-of-flight.

4-D Sensor

See Plenoptic Camera

Active-Pixel Sensor

An APS, also commonly known as a CMOS sensor,is an image sensor that consists of an array of pixels, each containing a photo detector and active amplifier. An APS is typically fabricated on a conventional semiconductor process, unlike the CCD image sensor.

Adaptive Cruise Control

An ADAS system that dynamically varies an automobile's speed in order to maintain an appropriate distance from vehicles ahead of it.


ADAS: Advanced Driver Assistance Systems, an 'umbrella' term used to describe various technologies used in assisting a driver in navigating a vehicle.


A method, expressed as a list of instructions, for calculating a function. Beginning with an initial state and initial input, the instructions describe a computation that, when executed, will proceed through a finite number of defined states, eventually producing an output and terminating at a final state.


The discovery, analysis and reporting of meaningful patterns in data. In respect of embedded vision, the data input consists of still images and/or video frames.


Application Programming Interface, a specification intended for use as an interface to allow software components to communicate with each other. An API is typically source code-based, unlike an ABI (Application Binary Interface) which, as its name implies, is a binary interface.

Application Processor

A highly integrated system-on-chip, typically comprised a high-performance CPU core and a constellation of specialised co-processors, which may include a DSP, a GPU, a video processing unit (VPU), an image acquisition processor, etc. The specialised co-processors found in application processors are usually not user-programmable, which limits their utility for vision applications.

Augmented Reality

A live view of a physical, real-world environment whose elements are augmented by computer-generated sensory input such as sound, video, graphics or GPS data. The technology functions by enhancing one’s current perception of reality. In contrast, virtual reality replaces the real world with a simulated one.

Background Subtraction

A computational vision process that involves extracting foreground objects in a particular scene, in order to improve the subsequent analysis of them.

Barrel Distortion

An optical system distortion effect that causes objects to become 'spherised' or 'inflated', i.e. resulting in the bulging outward of normally straight lines at image margins. Such distortion is typically caused by wide-angle lenses, such as the fisheye lenses commonly found in automotive backup cameras. Embedded vision techniques can be used to reduce or eliminate barrel distortion effects.

Bayer Pattern

A common colour filter pattern used to extract chroma information from a nominally monochrome photo detector array, via filters placed in front of the image sensor. The Bayer Pattern contains twice as many green filters as either red or blue filters, mimicking the physiology of the human eye, which is most sensitive to green-frequency light. Interpolation generates an approximation of the remainder of each photo detector's full colour spectrum.


The identification of humans by their characteristics or traits. Embedded vision-based biometric schemes include facial recognition, fingerprint matching, and retina scanning.


A device used to record and store images; still, video, or both. Cameras typically contain several main subsystems; an optics assembly, an image sensor, and a high-speed data transfer bus to the remainder of the system. Image processing can occur in the camera, the system, or both. Cameras can also include supplemental illumination sources.


A charge-coupled device, used to store and subsequently transfer charge elsewhere for digital-value conversion and other analysis purposes. CCD-based image sensors employ specialised analog semiconductor processes and were the first technology to achieve widespread usage. They remain popular in comparatively cost-insensitive applications where high-quality image data is required, such as professional, medical and scientific setting.


An open-source C++ toolkit for image processing, useful in embedded vision implementations.

CMOS Sensor

See Active-Pixel Sensor

Collision Avoidance

An ADAS system that employs embedded vision, radar and/or other technologies to react to an object ahead of a vehicle. Passive collision avoidance systems alert the driver via sound, light, vibration of the steering wheel, etc. Active collision avoidance systems override the driver's manual control of the steering wheel, accelerator and/or brakes in order to prevent a collision.

Computer Vision

The use of digital processing and intelligent algorithms to interpret meaning from images or video. Computer vision has mainly been a field of academic research over the past several decades.


One of a number of algorithms which is used in delineating the outline of an object contained within a 2-D image.

Core Image

The pixel-accurate non-destructive image processing technology in Mac OS X (10.4 and later) and iOS (5 and later). Implemented as part of the QuartzCore framework, Core Image provides a plugin-based architecture for applying filters and effects within the Quartz graphics rendering layer.


Central Processing Unit, the hardware within a computer system which carries out programme instructions by performing basic arithmetical, logical and input/output operations of the system. Two common CPU functional units are the Arithmetic Logic Unit (ALU), which performs arithmetic and logical operations, and the Control Unit (CU), which extracts instructions from memory and decodes and executes them.


Compute Unified Device Architecture, a parallel computing `engine' developed by NVIDIA, found in Graphics Processing Units (GPUs), and accessible to software developers through variants of industry standard programming languages. Programmers use 'C for CUDA' (C with NVIDIA extensions and certain restrictions), compiled through a PathScale or Open64 C compiler, to code algorithms for execution on the GPU. AMD's competitive approach is known as Stream.


Is an application programming interface (API) that supports general-purpose computing on graphics processing units on Microsoft Windows Vista and Windows 7. DirectCompute is part of the Microsoft DirectX collection of APIs and was initially released with the DirectX 11 API but runs on both DirectX 10 and DirectX 11 graphics processing units.


A Digital Signal Processor, is a specialised microprocessor with an architecture optimised for the fast operational needs of digital signal processing. Digital signal processing algorithms typically require a large number of mathematical operations to be performed quickly and repeatedly on a set of data. Many DSP applications have constraints on latency; that is, for the system to work, the DSP operation must be completed within a fixed time, and deferred (or batch) processing is not viable.

Edge Detection

A fundamental tool in image processing, machine vision and computer vision. Particularly in feature detection and feature extraction, which aim to identify points in a digital image where the image brightness changes sharply or, has irregularities.

Emotion Discernment

Is the use of embedded vision image processing to discern the emotional state of a person in front of a camera, by means of facial expression, skin colour and pattern, eye movement, etc. One rudimentary example of the concept is the 'smile' feature of some cameras, which automatically takes a picture when the subject smiles.

Epipolar Geometry

The geometry of stereo vision. When two cameras view a 3-D scene from two distinct positions, a number of geometric relations exist between the 3-D points and their projections onto the 2-D images that lead to constraints between the image points. Epipolar geometry describes these relations between the two resulting views.

Face Detection

Using embedded vision algorithms to determine that one or multiple human (usually) faces are present in a scene, and then taking appropriate action. A camera that incorporates face detection features might, for example, adjust focus and exposure settings for optimum image capture of people found in a scene.

Face Recognition

An extrapolation of face detection, which attempts to recognise the person or people in an image. In the most advanced cases, biometric face recognition algorithms might attempt to explicitly identify an individual by comparing a captured image against a database of already identified faces. On a more elementary level, face recognition can be used in ascertaining a person's age, gender, ethnic orientation, etc.


A universal, reusable software platform used to develop applications, products and solutions. Frameworks include support programmes, compilers, code libraries, an application programming interface (API) and tool sets that bring together all the different components needed to enable development of a project or solution.


Also known as a subroutine, a segment of source code within a larger computer programme that performs a specific task and is relatively independent of the remaining code.

Gaze Tracking

Also known as eye tracking, the process of measuring the eye position and therefore the point of gaze (i.e. where the subject is looking). Embedded vision-based gaze tracking systems employ non-contact cameras in conjunction with infrared light reflected from the eye. Gaze tracking can be used as a computer user interface scheme, for example, with cursor location and movement that tracks eye movement, and it can also be used to assess driver alertness in ADAS applications.

Gesture Interface

The control of a computer or other electronic system by means of gestures incorporating the position and movement of fingers, hands, arms and other parts of the human body. Successive images are captured and interpreted via embedded vision cameras. Conventional 2-D image sensors enable elementary gesture interfaces; more advanced 3-D sensors, that discern not only horizontal and vertical movement but also per-image depth (distance), allow for more complex gestures, as tradeoffs of increased cost and computational requirements.


General-Purpose Computing on Graphics Processing Units, the design technique of using a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU).


Graphics Processing Unit, is a specialised electronic circuit designed to rapidly manipulate and alter memory to accelerate the building of images in a frame buffer intended for output to a display. GPUs are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for algorithms where processing of large blocks of data is done in parallel.


High Dynamic Range imaging, is a set of methods used to allow a greater dynamic range between the lightest and darkest areas of an image. This wide dynamic range allows HDR images to represent more accurately the range of intensity levels found in real scenes.

Image Processor

A specialised digital signal processor used as a component of a digital camera. The image processing engine can perform a range of tasks, including Bayer-to-full RGB per-pixel transformation, de-mosaic techniques, noise reduction and image sharpening.

Image Search

The process of searching through a database of existing images to find a match between objects contained within one or some of them (such as a face) and content in a newly captured image.

Image Sensor

A semiconductor device that converts an optical image into an electronic signal, commonly used in digital cameras, camera modules and other imaging devices. The most common image sensors are Charge-Coupled Device (CCD) and Complementary Metal–Oxide–Semiconductor (CMOS) active pixel sensors.

Image Warping

The process of digitally manipulating an image so that any shapes portrayed in the image are notably altered. In embedded vision applications, warping may be used either for correcting image distortion or to further distort an image as a means of assisting subsequent processing.


Image Processing Library is Texas Instruments' DSP-optimised still image processing function library for C programmers.

Industrial Vision

See Computer Vision

Infrared Sensor

An image sensor that responds to light in the infrared (and near-infrared, in some cases) frequency spectrum. The use of infrared light transmitters to assist in determining object distance from a camera can be useful in embedded vision applications because infrared light is not visible to the human eye. However, ambient infrared light in outdoor settings, for example, can interfere with the function of infrared-based embedded vision systems.

Intelligent Video

A term commonly used in surveillance systems, it comprises any solution where the system automatically performs an analysis of the captured video.

Lane Transition Alert

An ADAS system that employs embedded vision and/or other technologies to react to a vehicle in the process of transitioning from one roadway lane to another, or off the roadway to either side. Passive lane transition alert systems alert the driver via sound, light, vibration of the steering wheel, etc. Active collision avoidance systems override the driver's manual control of the steering wheel in order to return the vehicle to the previously occupied roadway lane.

Lens Distortion Correction

Lens Distortion Correction employs embedded vision algorithms to compensate for the image distortions caused by sub-optimal optics systems or those with inherent deformations, such as the barrel distortion of fish eye lenses.


A collection of resources used by programs, often to develop software. Libraries may include configuration data, documentation, help data, message templates, pre-written code and subroutines, classes, values and type specifications. Libraries contain code and data that provide services to independent programs. These resources encourage the sharing and changing of code and data in a modular fashion and ease the distribution of the code and data.

Machine Vision

See Computer Vision




Computer software that provides services to software applications beyond those available from the operating system. Middleware, which can be described as 'software glue,' makes it easier for software developers to perform communication and input/output, so they can focus on the specific purpose of their application.

Motion Capture

Also known as motion analysis, motion tracking and mocap, is the process of recording movement of one or more objects or persons. It is used in military, entertainment, sports and medical applications, and for validation of computer vision and robotics. In filmmaking and games, it refers to recording the movements (but not the visual appearance) of human actors via image samples taken many times per second, and using that information to animate digital character models in 2D or 3D computer animation. When it includes face and fingers or captures subtle expressions, it is often referred to as performance capture.


The NVIDIA Performance Primitives library, a collection of GPU-accelerated image, video, and signal processing functions. NPP comprises over 1,900 image processing primatives and approximately 600 signal processing primitives.

Object Tracking

The process of locating a moving object (or multiple objects) over time using a camera. The objective is to associate target objects in consecutive video frames. However, this association can be especially difficult when the objects are moving faster than the frame rate. Another situation that increases the complexity of the problem is when the tracked object changes orientation over time.


Optical Character Recognition,is the conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. OCR is widely used as a form of data entry from some sort of original paper data source, whether documents, sales receipts, mail, or any number of printed records. It is crucial to the computerisation of printed texts so that they can be electronically searched, stored more compactly, displayed on-line and used in machine processes such as machine translation, text-to-speech and text mining.


Open Computing Language, is a framework for writing programmes that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs) and other processors. OpenCL includes a language,based on C99, for writing kernels (functions that execute on OpenCL devices), plus application programming interfaces (APIs) that are used to define and then control the platforms. OpenCL provides parallel computing using task-based and data-based parallelism.


Is a library of programming functions mainly aimed at real-time image processing, originally developed by Intel and now supported by Willow Garage and Itseez. It is free to use under the open source BSD licence. The library is cross-platform.


Open Graphics Library, a standard specification defining a cross-language, multi-platform API for writing applications and simulating physics, that produce 2D and 3D computer graphics. The interface consists of over 250 different function calls, which can be used to draw complex three-dimensional scenes from simple primitives. OpenGL functions can also be used to implement some GPGPU operations.


Open Natural Interaction, an industry-led, non-profit organisation. It is focused on certifying and improving interoperability of natural user interface and organic user interface for natural interaction devices, applications that use those devices, and middleware that facilitates access and use of such devices.


A modular, extensible and high performance library for handling volumetric datasets. It provides a standard, uniform and easy to use API for accessing volumetric data. It allows the volumetric data to be laid out in different ways to optimise memory usage and speed. It supports reading/writing of volumetric data from/to files in different formats using plug-ins. It provides a framework for implementing various algorithms as plug-ins that can be easily incorporated into user applications. The plug-ins are implemented as shared libraries, which can be dynamically loaded as needed. OpenVL software is developed openly and is freely available on the web.

Operating System

A set of software that manages computer hardware resources and provides common services for computer programs. For hardware functions such as input and output and memory allocation, the operating system acts as an intermediary between programs and the computer hardware, although the application code is usually executed directly by the hardware and will frequently make a system call to an operating system function, or be interrupted by it.

Optical Flow

The pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer (an eye or a camera) and the scene. Optical flow techniques such as motion detection, object segmentation, time-to-collision and focus of expansion calculations, motion compensated encoding, and stereo disparity measurement utilise the motion of the objects' surfaces and edges.


The practice of determining the geometric properties of objects from photographic images. Algorithms for photogrammetry typically express the problem as that of minimising the sum of the squares of a set of errors. This minimisation is known as bundle adjustment and is often performed using the Levenberg–Marquardt algorithm.

Pincushion Distortion

The opposite of barrel distortion; image magnification increases with the distance from the optical axis. The visible effect is that lines that do not go through the centre of the image are bowed inwards, towards the centre of the image, like a pincushion. Embedded vision techniques can be used to reduce or eliminate pincushion distortion effects.

Plenoptic Camera

Also known as a light-field camera, it uses an array of microlenses, at the focal plane of the main lens and slightly ahead of the image sensor to capture light field information about a scene. The displacement of image parts that are not in focus can be analysed and depth information can be extracted. Such a camera system can therefore be used to refocus an image on a computer after the picture has been taken.

Point Cloud

A set of vertices in a three-dimensional system, usually defined by X, Y, and Z coordinates, and typically intended to be representative of the external surface of an object. Point clouds are often created by 3D scanners. These devices measure in an automatic way a large number of points on the surface of an object, and often output a point cloud as a data file. The point cloud represents the set of points that the device has measured.



Reference Design

A technical blueprint of a system that is intended for others to copy. It contains the essential elements of the system; recipients may enhance or modify the design as required. Reference designs enable customers to shorten their time to market, thereby supporting the development of next generation products using latest technologies. The reference design is proof of the platform concept and is usually targeted for specific applications. Hardware and software technology vendors create reference designs in order to increase the likelihood that their products will be used by OEMs, thereby resulting in a competitive advantage for the supplier.


The amount of detail that an image holds. Higher resolution means more image detail. Resolution quantifies how close lines can be to each other and still be visibly resolved. Resolution units can be tied to physical sizes (e.g. lines per mm, lines per inch), to the overall size of a picture (lines per picture height), to angular subtenant, or to the number of pixels in an image sensor. Line pairs are sometimes used instead of lines; a line pair comprises a dark line and an adjacent light line.


System-On-A-Chip, an integrated circuit (IC) that integrates most if not all components of an electronic system into a single chip. It may contain digital, analog, mixed-signal, and often radio-frequency functions—all on a single chip substrate.

Stereo Vision

The use of multiple cameras, each viewing a scene from a slightly different perspective, to discern the perceived depth of various objects in the scene. Stereo vision is employed by the human vision system via the two eyes. The varying perspective of each camera is also known as binocular disparity.


A set of hardware and software technologies originally developed by ATI Technologies and now managed by acquiring company AMD (and re-named App). Stream enables AMD graphics processors (GPUs), working with the system’s central processor (CPU), to accelerate applications beyond just graphics (i.e. GPGPU). The Stream Software Development Kit (SDK) allows development of applications in a high-level language called Brook+. Brook+ is built on top of ATI Compute Abstraction Layer (CAL), providing low-level hardware control and programmability. NVIDIA's competitive approach is known as CUDA.

Structured Light

A method of determining the depths of various objects in a scene, by projecting a predetermined pattern of light onto the scene for the purpose of analysis. 3-D sensors based on the structured light method use a projector to create the light pattern and a camera to sense the result. In the case of the Microsoft Kinect, the projector employs infrared light. Kinect uses an astigmatic lens with different focal lengths in the X and Y direction. An infrared laser behind the lens projects an image consisting of a large number of dots that transform into ellipses, whose particular shape and orientation in each case depends on how far the object is from the lens.

Surveillance System

A camera-based system that implements scene monitoring and security functions. Historically, a surveillance systems' camera outputs were viewed by humans via television monitors. Embedded vision-based surveillance systems are now replacing the often-unreliable human surveillance factor, via automated 'tripwire', facial detection and other techniques.


A method of determining the depths of various objects in a scene. A time-of-flight camera contains an image sensor, a lens and an active illumination source. The camera derives distance from the time it takes for projected light to travel from the transmission source to the object and back to the image sensor. The illumination source is typically either a pulsed laser or a modulated beam, depending on the image sensor type employed in the design.

Video Analytics

Also known as video content analysis, it is the capability of automatically analysing video to detect and determine temporal events not based on a single image. The algorithms can be implemented on general-purpose computers or specialised embedded vision systems. Functions that can be implemented include motion detection against a fixed background scene. More advanced functions include tracking, identification, behaviour analysis, and other forms of situation awareness.


Video Processing Library, is Texas Instruments' DSP-optimised video processing function library for C programmers.


Is a collection of open source C++ libraries for vision applications. The intent is to replace X with one of many letters, i.e. G (VGL) is a geometry library, N (VNL) is a numerics library, I (VIL) is an image processing library, etc. These libraries can also be used for general scientific computing applications.