Deep Learning has made crucial advancements in machine learning accuracy. How can this solve the future challenges of intelligent computer vision systems?
Scientists have always been on a quest to build intelligent tools that mimic human capabilities. One of the most mind-blowing advances is machine learning techniques that are transforming machine vision or the ability of computers to see. But is it really possible to artificially create an intelligent vision system that is similar to the human system? A system that is built with millions of neurons, and supported by emotion, illusion, color perception, and psychology. When engineers first tried creating a machine vision system that mimics the human vision system, they first addressed images in terms of 2D signals, and later, in terms of patterns. But, is this enough?
We know that intelligent machines recognize objects or actions based on features that are essentially a mathematical representation of the abstract description of the target. At one time, the selection of suitable features to solve specific problems was a formidable task. Researchers spent years modeling better features for object classifications without favorable results. Consider the example of a robot performing the task of lifting objects such as silicon wafers, and sorting and stacking them on racks. To perform this task, researchers split the challenges around ‘sensing’ from those around control. For example, sensing challenges such as ‘where’ and ‘what’ is the object of interest, and which is the closest rack in a robot’s course’ were segregated from control challenges such as ‘what motion pattern, dynamics, and 3D arm movements will be required to perform each action independently’.
But the fact is human vision is a combination of sensing and control systems. The most recent trend in AI is aimed at bridging this gap between machine and human vision. In the emerging scenario, ‘sensing’ and ‘control’ algorithms are not separate entities anymore. The system is a unified, intelligent whole — one that directly senses the environment and ensures control.
Breaking new ground in intelligent machine vision
Deep learning transformed AI over the last decade, leading to approximately 20% increase (from 75% to 95%) in machine learning accuracy from 2010 to 2015. Deep learning architecture is capable of deciding the best-suited feature for the target object or action recognition, making human-driven feature selection unnecessary. It’s as if the deep learning architecture itself develops the software or features and the classifier required for tackling the challenges.
This development is helping researchers break new ground in intelligent automation and robotics technology. The sensor is trained with final machine dynamics, eliminating the need to add a sensing algorithm that detects the object of interest and passes on information to the control module. Environment sensing is mapped to the final action taken by the autonomous intelligent machines. Sensors are also being trained for all types of actions to create long short term memory (LSTM) and 3D Convolutional Neural Network (CNN)-based models of deep learning architectures. Such architectures can identify and predict environmental catastrophes, territorial intrusions, accidents among other things.
However, deep learning frameworks come with multiple challenges. The embedded processors require high levels of power and data, and training takes months to complete. Researchers use the weight vector learned by one network (which could take a month or couple of months) by freezing most of the layers and training only a few layers (which takes only a few seconds). This concept of ‘transfer learning’ has reduced the training time down drastically from several months to a matter of seconds. Take Facebook’s DeepFace and Google’s FaceNet. They recognize a user’s image to tag them in the photos they share. These systems use more than 100 hidden layered deep neural networks and billions of users’ faces to extract features that took months to train the system. But today, by just unfreezing the last layer of the deep neural net, researchers use the learned weight matrix to achieve greater accuracy and performance in object recognition.
Engineering human perception in machine intelligence
Researchers are studying the limitations of vision systems by exploring the psycho-visual perception response of different optical illusions. Perception engineering aims to model the vision system by using digital image processors that generate an output exactly as humans perceive any scene, including optical illusions. The new intelligent machines can understand any scene in a hierarchical manner. For instance, they can recognize people not only from the texture of their eyebrows or eyes but also in totality. This development has benefits for not only vision engineers but also for data scientists and Big Data analysts.
According to a MarketsandMarkets research, the overall machine vision market is expected to grow from USD 8.12 billion in 2015 to USD 14.43 billion by 2022. The integration of deep learning and cloud-based analytics is leading the industry to a superior computer vision framework. This sets the stage for the industry to move towards designing a generic framework that can solve all vision-related challenges for all domains — from context-based search, multi-threaded real-time video analytics and coordination to action as well as incident recognition and prediction. What does this mean? One can now design a detailed and integrated framework that is generic, yet it can define the new architecture of deep learning networks to solve any challenge in intelligent vision sensing.