Computer vision and analytics is at a tipping point — it is poised to go commercial. According to a Cisco study, globally, IP video traffic is expected to be 82% of all consumer Internet traffic by 2020 and is expected to grow fourfold from 2015 to 2020, at a CAGR of 31%. These huge volumes of video data would require real-time or near-real-time processing. The good news is real-time processing increasingly looks possible. Nvidia research shows that advancements in deep learning have brought down classification inaccuracy of machine learning to less than 5%.
This development in conjunction with recent advancements in computer science creates significant opportunities for commercial application of computer vision and analytics. Here’s a quick look at some of the advances in the emerging technology, and its applications.
Saliency in contextual or motion data is advancing real-time video processing
Saliency helps algorithms decide ‘what’ to process and ‘where’ to process in a video. The saliency model is developed using clues from human behavioral (motion, color, and scale-space) and architectural (dorsal and ventral pathways in primary visual cortex of human brain) perception. For instance, motion close to our vision sensor is critical in identifying a probable harmful event. Using this system, you can create an intelligent machine such as a forward collision avoidance system for an automobile. It is interesting to note that the movements of a car’s wipers are filtered out by our brain. Ongoing research is focusing on modeling this context-based saliency to create a highly adaptive and evolving filter (both temporal and spatial) in computer vision.
Combining specific algorithms with machine learning is enabling correct object classification
Classical machine learning requires a broad choice of features and large sets of data to train the intelligent machine. For instance, the core module of the forward collision alert system is meant to detect objects of interest in an entire video, with certainty. Let’s take the example of a low or mid-segment car for which a monocular camera is the only affordable sensor option. To just detect a vehicle, the system requires training with all possible vehicle types, and therefore, thousands of samples. In such cases, automotive OEMs will be forced to incur significant costs, because collecting the video data for training and validation involves driving data collection vehicles for several hundred thousand miles. And even if they can bear the cost, no machine learning algorithm can detect all the vehicles on the road, because it depends on patterns learned from an image gallery that is finite. A solution to this problem is to develop an algorithm with the capability to detect the vehicle that might be subjected to collision from a static or moving monocular camera or stereo, without the need for training.
Using smart cameras connected to the cloud is driving real-time video analytics
Real-time video analytics provide a definite edge for traffic surveillance, medical research, and forensics. Smart cameras can better analyze video frames, and specific portions of the frames, for further processing. Using cloud, a computationally efficient portion of the analytics is performed right at the edge of the camera and a detailed complex algorithm is run on the cloud. This means businesses can completely do away with blade servers or high-performance computers.
For traffic analytics, smart cameras can be programmed to be activated by an incident, enabling it to capture the scene (what) while restricting itself to the region of interest (where). Only relevant frames are streamed to the cloud in real-time for complex cloud analytics. The results are then fed back to the edge of the camera or control unit from the cloud in near-real time, using the parallel architecture of the cloud server.
The video data for traffic analytics has to be compressed for faster communication and processing. However, such video compression, even within a tolerable compression ratio, can affect quality. It may therefore not be suitable for medical and forensic applications. Using the ‘where’ and ‘what’ context for video compression can help ensure that finer details of the subject region or object of interest are retained in the video.
Creating application-specific algorithms ensures production agility
Sometimes, standard algorithmic requirements are not technically feasible. For example, driver alertness monitoring ideally requires a camera installed right in front of the driver’s face, however, due to the problem of production agility, industries plan to place the camera on the A-pillar of the car. To resolve this, you can use a new algorithm to model 2.5D face from asymmetric profile face captured through monocular IR camera. The abstract pattern of face movement is translated into numbers, on which detailed data analytics is performed. This makes the system robust enough to handle problems such as shaking or variations in illumination and scale.
What does the future of computer vision and analytics look like?
The computer vision and analytics industry are moving towards designing a generic framework that solves all computer vision-related problems for individual domains. These problems currently range from context based search, multi-threaded real-time video analytics and coordination to action or event recognition and prediction. It’s encouraging to note that integration of Spark-based Hadoop, deep learning, and cloud-based analytics is set to introduce the perfect computer vision framework to address these challenges.