In the Information and Communication Technologies category

The Frontiers of Knowledge Award goes to Takeo Kanade for developing mathematical foundations for computer vision and robot perception

The BBVA Foundation Frontiers of Knowledge Award in the Information and Communication Technologies category has gone in this sixteenth edition to Professor Takeo Kanade for developing mathematical foundations that underlie the current capabilities of computers and robots to “comprehend and interpret visual images and scenes,” in the words of the selection committee. The emerging generation of self-driving vehicles, robots that assist surgeons in all kinds of operations, facial recognition systems for accessing our cell phones, and sports broadcasts offering panoramic replays of match highlights from multiple angles; all these advances owe a large debt to the contributions of this Japanese researcher, Founders University Professor of Computer Science and Robotics at Carnegie Mellon University (Pittsburgh, United States).

7 February, 2024

Profile

Takeo Kanade

Over four decades, Professor Kanade “has pioneered the scientific study of computer vision,” by devising “foundational algorithms for image understanding, motion processing and robotics perception,” said the committee in its citation. His contributions, it continued, “have not only shaped the scientific disciplines of artificial intelligence and robotics, but have also significantly transformed the technological world in which we live.”

Committee member Oussama Khatib, Professor of Computer Science and Director of the Robotics Laboratory at Stanford University (United States), describes Kanade’s work as pivotal: “Robotics relies on computer vision for perception. We can in fact define robotics as the intelligent connection between perception and action. Without perception, a robot cannot operate in an unpredicted, unstructured environment. For example, we couldn’t build any autonomous vehicle without the vision that enables it to avoid collisions. Professor Kanade has pushed the frontiers of work in this field in unprecedented ways, and his school of thought has been vitally important for machine vision and its applications in robot perception.”

“I feel deeply humbled at being selected for the prestigious Frontiers of Knowledge Award and to have my name added to the illustrious roll of past recipients,” said Professor Kanade in an interview granted shortly after hearing of the award. “As is evidenced by the fact that the visual cortex occupies the dominant portion of the human brain, vision or visual information processing provides humans with our richest and the most important information channel for understanding and communication. AI and robots with similar or even better computer vision capabilities contribute to the betterment of our lives. I see a lot of opportunities.”

The algorithms that revolutionized 3D computer vision

Kanade catalyzed the field of three-dimensional computer vision with a series of algorithms far faster than any previously available, opening up a new wealth of practical applications. Just as humans and animals need two eyes to see in depth, three-dimensional artificial vision requires the merging of images from at least two cameras. But the first artificial vision algorithms were designed to process just one image, and using them to combine several images was too slow a process to be useful in practice.

Kanade’s algorithms would prove instrumental in enabling practical applications for three-dimensional computer vision. “When Professor Kanade started working on machine vision, computing power was nowhere near what we have now,” Ossama Khatib points out. “And he was among the first to propose these very rich algorithms that could solve intractable problems in computer vision.”

To process a video recorded with a single camera (that is, in two dimensions) and automatically recognize the images it contains, one possible way is to sort through it frame by frame trying to reconstruct the objects appearing and deduce how they move. Fast, accurate tracking of the movement of points within an image, optical flow as it is known, is essential for tasks such as video compression or for a robot to find its way round a given space.

But this method is ruled out if each frame consists of merged images from multiple cameras, given the amount of computing power required. Kanade realized that, rather than merging each frame then tracking the movement of the objects, it would be far quicker to use the object motion information recorded by each camera to understand how the image is moving, even before combining the videos. “Once we understand that, there’s no need to send all the color or video information, we can just send the motion only,” said Kanade.

Along with his doctoral student Bruce Lucas, he developed a new method to estimate optical flow, presented at the 7th International Joint Conference on Artificial Intelligence (IJCAI), held in Vancouver, Canada in 1981. What has since become known as the Lucas-Kanade method also recovers the shape of objects and allows to deduce the direction and speed of their movement. “That is the basis of video coding,” the new laureate explains, “and my optical flow algorithm can be used for basically any moving image data compression technique.”

Even so, 3D images require much more computing power than their 2D equivalents, and Kanade also devised a way to drastically cut down the calculations involved in their processing. This contribution, co-authored with his doctoral student Carlo Tomasi and published in the International Journal of Computer Vision in 1992, made it possible for the computers of the time to work with three-dimensional images. This “feat” as Khatib describes it, “required a really good understanding of mathematics, a rigorous approach to problem solving, and also creativity in the way he used mathematical tools to solve physical problems.”

Autonomous cars, helicopters and drones

With the techniques Kanade devised, two Carnegie Mellon researchers crossed the U.S. by freeway, coast to coast, in one of the first autonomous vehicles ever built, manually engaging the accelerator and the brake but hardly touching the steering wheel. This was back in 1995, and “No Hands Across America”, as the project was called, showed that a van could steer itself autonomously, relying solely on the information from its cameras.

Although the autonomous cars that are beginning to appear in our cities come with added functionalities to deal with not knowing the intentions of pedestrians and other, human drivers, this van was a blueprint for robots operating in controlled settings like restaurants, airports or museums. And Kanade has recently been working on an autonomous helicopter capable of tracking an object. “If an autonomous helicopter has to track a target within a scene, 3D vision capability is used to locate and track the target,” he explains.

The techniques proposed by the awardee are also built into the drones in use today and all robots equipped with visual capabilities.

“Virtualized reality” for a 360 view of sporting highlights

In 2001, the Super Bowl final, the most watched program on American TV, featured a technological breakthrough in the field of computer vision that forever changed the way sports are broadcast, and Professor Kanade himself appeared on screen to explain it to the viewers.

The new technique, in effect, enabled 360-degree reproduction. To get this all-round view, a scene has to be recorded with a number of cameras, but with Kanade’s methods it is possible to obtain images of that scene independently of the cameras’ actual viewing angles, or to reconstruct any vantage point from a video recorded by a moving camera. “If we have a camera that takes shots from four angles, every 90 degrees, using that information it can reconstruct what the scene would look like from another viewpoint that doesn’t exist in the real image,” explains Kanade’s nominator Fernando Torres Medina, Professor of Systems Engineering and Automation and Director of the Automation, Robotics and Computer Vision Research Group at the University of Alicante (Spain). This is the basis of the “virtualized reality” that has transformed sporting events by allowing viewers, for instance, to follow a football match from the ball’s point of view or use the hawk-eye in tennis.

“When the term virtual reality appeared in the 1980s,” Kanade recalls, “people worked mostly on creating artificial worlds with computer graphics. But I thought it would be more interesting to start from reality; in other words, to input reality to the computer so it becomes virtual.” It was to highlight this aspect and distinguish his proposal from the artificial worlds beginning to emerge that the researcher coined the concept of “virtualized reality.”

The system’s debut at the 2001 Super Bowl, under the name EyeVision, was the first time viewers could enjoy replays of the game’s highlights in panoramic mode. “The stadium had 33 cameras mounted on the upper deck, looking onto the field, and when there was a beautiful move, the broadcaster could replay it, spinning around the main player. It was like the big scene in The Matrix, where the camera seems to circle the protagonist,” remarks Kanade: “And now this 360-degree replay is used in almost every sport.”

Advances in medical scanners and robotic surgery

Computer vision is also a core enabling technology for robotic surgery, a burgeoning field whose expansion owes much to the techniques invented by Kanade. “Any robot-assisted operation conducted nowadays is based on his contributions,” says Torres.

It was in fact Kanade himself who, together with his team, developed the first robotized system for hip replacement surgery. Called HipNav, it achieved much greater precision in the placement of the prosthesis with a far less invasive procedure than traditional surgery, reducing the risk of side effects like dislocation. The possibility of real-time tracking of the exact position of the patient’s pelvis was key to this success.

Moreover, thanks in no small measure to Kanade’s contributions, it is now possible to design robots capable of performing simple medical tests, like certain ultrasound scans, and detecting possible pathological regions. “Many villages have no hospitals,” Oussama Khatib points out. “So we are trying to set up small clinics with a robot that can perform very simple scanning, and can detect via an algorithm if there is anything suspicious that requires further tests.” This same robot, he adds, can be connected to a hospital, however distant, and controlled remotely by a radiologist, who can run more detailed tests without the patient having to travel.

Technologies conceived to “improve quality of life”

Kanade is confident that, in a few years time, his work will facilitate the spread of “quality-of-life technologies,” particularly through robots and other devices that “can help older people or people with disabilities to live independently.” He also predicts that his research in “virtualized reality” will offer spectators an increasingly realistic immersive experience of sports, concerts or other cultural events from the comfort of their own homes. He insists, however, that “this technology is not just for leisure and entertainment. It may also be useful, for example, in coordinating the response to humanitarian emergencies through virtual reconstructions of disaster-hit zones.”

He has concerns, he admits, about the possible misuse of some of the technologies his work has helped develop: “I hate to see how artificial intelligence and computer vision are being applied to create the likes of deep fake videos.” In fact, in 2010, Kanade and his team made a video in which President Obama speaks Japanese, with the images generated from a recording of the researcher himself. “It was a fun experiment, but the intention was serious, and we had meaningful applications in mind,” he recalls today. “For example, we wanted to better understand human facial expressions and the effects of certain gestures, like head or eye movements, to help people who have difficulty communicating smoothly, and we were also working on the creation of avatars to participate virtually in video conferences.”

In any case, Kanade is convinced that technology will be able to detect artificially generated videos to prevent their malicious use: “It should be easy to certify whether an image is genuine or non-genuine, and add a watermark to identify frauds. That said, it saddens me that this technology has the potential for harm, due to misuse by certain people.”

Nominators

A total of 25 nominations were received in this edition. The awardee researcher was nominated by Prof. Fernando Torres Medina, full professor in the Systems Engineering and Automation Area and Director of the Automation, Robotics and Computer Vision Research Group at the University of Alicante (Spain).

Information and Communication Technologies committee and evaluation support panel

The committee in this category was chaired by Joos Vandewalle, Honorary President of the Royal Flemish Academy of Belgium for Science and the Arts and Emeritus Professor in the Department of Electrical Engineering (ESAT) at KU Leuven (Belgium), with Ron Ho, Corporate Vice President, Hardware, at Lattice Semiconductor (United States) acting as secretary. Remaining members were Georg Gottlob, Professor of Informatics at the University of Calabria (Italy) and Emeritus Professor of Informatics at the University of Oxford (United Kingdom); Oussama Khatib, Professor of Computer Science and Director of the Robotics Laboratory at Stanford University (United States); Rudolf Kruse, Emeritus Professor in the Faculty of Computer Science at Otto von Guericke University Magdeburg (Germany); Mario Piattini, Professor of Computer Languages and Systems at the University of Castilla-La Mancha (Spain); and Bernhard Schölkopf, Director of the Max Planck Institute for Intelligent Systems where he heads the Empirical Inference Department, and 2019 Frontiers of Knowledge Laureate in Information and Communication Technologies.

The evaluation support panel was coordinated by Marisol Martín González, Coordinator of the MATERIA Global Area and Research Professor at the Institute of Micro and Nanotechnology (INM-CNM, CSIC), and formed by Alberto Ibáñez Rodríguez, Tenured Scientist at the Leonardo Torres Quevedo Institute of Physical and Information Technologies (ITEFI, CSIC); Luis Fonseca Chácharo, Research Professor and Director at the Institute of Microelectronics of Barcelona (IMB-CNM, CSIC); Felip Manya Serres, Scientific Researcher and Vice-Director at the Artificial Intelligence Research Institute (IIIA, CSIC); and José Javier Ramasco Sukia, Deputy Coordinator of the MATERIA Global Area and Research Professor at the Institute for Interdisciplinary Physics and Complex Systems (IFISC, CSIC-UIB).

About the BBVA Foundation Frontiers of Knowledge Awards

The BBVA Foundation centers its activity on the promotion of world-class scientific research and cultural creation, and the recognition of talent.

The BBVA Foundation Frontiers of Knowledge Awards, funded with 400,000 euros in each of their eight categories, recognize and reward contributions of singular impact in physics and chemistry, mathematics, biology and biomedicine, technology, environmental sciences (climate change, ecology and conservation biology), economics, social sciences, the humanities and music, privileging those that significantly enlarge the stock of knowledge in a discipline, open up new fields, or build bridges between disciplinary areas. The goal of the awards, established in 2008, is to celebrate and promote the value of knowledge as a public good without frontiers, the best instrument to take on the great global challenges of our time and expand the worldviews of each individual. Their eight categories address the knowledge map of the 21st century, from basic knowledge to fields devoted to understanding and interrelating the natural environment by way of closely connected domains such as biology and medicine or economics, information technologies, social sciences and the humanities, and the universal art of music.

The BBVA Foundation has been aided in the evaluation of nominees for the Frontiers Award in Climate Change by the Spanish National Research Council (CSIC), the country’s premier public research organization. CSIC appoints evaluation support panels made up of leading experts in the corresponding knowledge area, who are charged with undertaking an initial assessment of the candidates proposed by numerous institutions across the world, and drawing up a reasoned shortlist for the consideration of the award committees. CSIC is also responsible for designating each committee’s chair across the eight prize categories and participates in the selection of remaining members, helping to ensure objectivity in the recognition of innovation and scientific excellence.