At Last, a Self-Driving Car That Can Explain Itself - IEEE Spectrum

2022-10-02 19:42:18 By : Ms. Linda Yin

The October 2022 issue of IEEE Spectrum is here!

IEEE websites place cookies on your device to give you the best user experience. By using our websites, you agree to the placement of these cookies. To learn more, read our Privacy Policy.

Mitsubishi’s Electric AI not only improves performance, it also fosters trust

Rather than tell the driver that there’s an intersection 30 meters ahead, the navigation system refers to landmarks, such as the crosswalk in this illustration, or a tree farther down the street.

For all the recent improvements in artificial intelligence, the technology still cannot take the place of human beings in situations where it must frame its perceptions of the world in words that people can understand.

You might have thought that the many apparent advances in speech recognition would have solved the problem already. After all, Apple’s Siri, Microsoft’s Cortana, Amazon’s Alexa and Google Home are all very impressive, but these systems function solely on voice input: They can’t understand or react to the environment around them.

To bridge this communications gap, our team at Mitsubishi Electric Research Laboratories has developed and built an AI system that does just that. We call the system scene-aware interaction, and we plan to include it in cars.

As we drive down a street in downtown Los Angeles, our system’s synthesized voice provides navigation instructions. But it doesn’t give the sometimes hard-to-follow directions you’d get from an ordinary navigation system. Our system understands its surroundings and provides intuitive driving instructions, the way a passenger sitting in the seat beside you might do. It might say, “Follow the black car to turn right” or “Turn left at the building with a billboard.” The system will also issue warnings, for example: “Watch out for the oncoming bus in the opposite lane.”

To support improved automotive safety and autonomous driving, vehicles are being equipped with more sensors than ever before. Cameras, millimeter-wave radar, and ultrasonic sensors are used for automatic cruise control, emergency braking, lane keeping, and parking assistance. Cameras inside the vehicle are being used to monitor the health of drivers, too. But beyond the beeps that alert the driver to the presence of a car in their blind spot or the vibrations of the steering wheel warning that the car is drifting out of its lane, none of these sensors does much to alter the driver’s interaction with the vehicle.

Voice alerts offer a much more flexible way for the AI to help the driver. Some recent studies have shown that spoken messages are the best way to convey what the alert is about and are the preferable option in low-urgency driving situations. And indeed, the auto industry is beginning to embrace technology that works in the manner of a virtual assistant. Indeed, some carmakers have announced plans to introduce conversational agents that both assist drivers with operating their vehicles and help them to organize their daily lives.

Scene-Aware Interaction Technologywww.youtube.com

The idea for building an intuitive navigation system based on an array of automotive sensors came up in 2012 during discussions with our colleagues at Mitsubishi Electric’s automotive business division in Sanda, Japan. We noted that when you’re sitting next to the driver, you don’t say, “Turn right in 20 meters.” Instead, you’ll say, “Turn at that Starbucks on the corner.” You might also warn the driver of a lane that’s clogged up ahead or of a bicycle that’s about to cross the car’s path. And if the driver misunderstands what you say, you’ll go on to clarify what you meant. While this approach to giving directions or guidance comes naturally to people, it is well beyond the capabilities of today’s car-navigation systems.

Although we were keen to construct such an advanced vehicle-navigation aid, many of the component technologies, including the vision and language aspects, were not sufficiently mature. So we put the idea on hold, expecting to revisit it when the time was ripe. We had been researching many of the technologies that would be needed, including object detection and tracking, depth estimation, semantic scene labeling, vision-based localization, and speech processing. And these technologies were advancing rapidly, thanks to the deep-learning revolution.

Soon, we developed a system that was capable of viewing a video and answering questions about it. To start, we wrote code that could analyze both the audio and video features of something posted on YouTube and produce automatic captioning for it. One of the key insights from this work was the appreciation that in some parts of a video, the audio may be giving more information than the visual features, and vice versa in other parts. Building on this research, members of our lab organized the first public challenge on scene-aware dialogue in 2018, with the goal of building and evaluating systems that can accurately answer questions about a video scene.

We were particularly interested in being able to determine whether a vehicle up ahead was following the desired route, so that our system could say to the driver, “Follow that car.”

We then decided it was finally time to revisit the sensor-based navigation concept. At first we thought the component technologies were up to it, but we soon realized that the capability of AI for fine-grained reasoning about a scene was still not good enough to create a meaningful dialogue.

Strong AI that can reason generally is still very far off, but a moderate level of reasoning is now possible, so long as it is confined within the context of a specific application. We wanted to make a car-navigation system that would help the driver by providing its own take on what is going on in and around the car.

One challenge that quickly became apparent was how to get the vehicle to determine its position precisely. GPS sometimes wasn’t good enough, particularly in urban canyons. It couldn’t tell us, for example, exactly how close the car was to an intersection and was even less likely to provide accurate lane-level information.

We therefore turned to the same mapping technology that supports experimental autonomous driving, where camera and lidar (laser radar) data help to locate the vehicle on a three-dimensional map. Fortunately, Mitsubishi Electric has a mobile mapping system that provides the necessary centimeter-level precision, and the lab was testing and marketing this platform in the Los Angeles area. That program allowed us to collect all the data we needed.

The navigation system judges the movement of vehicles, using an array of vectors [arrows] whose orientation and length represent the direction and velocity. Then the system conveys that information to the driver in plain language.Mitsubishi Electric Research Laboratories

A key goal was to provide guidance based on landmarks. We knew how to train deep-learning models to detect tens or hundreds of object classes in a scene, but getting the models to choose which of those objects to mention—”object saliency”—needed more thought. We settled on a regression neural-network model that considered object type, size, depth, and distance from the intersection, the object’s distinctness relative to other candidate objects, and the particular route being considered at the moment. For instance, if the driver needs to turn left, it would likely be useful to refer to an object on the left that is easy for the driver to recognize. “Follow the red truck that’s turning left,” the system might say. If it doesn’t find any salient objects, it can always offer up distance-based navigation instructions: “Turn left in 40 meters.”

We wanted to avoid such robotic talk as much as possible, though. Our solution was to develop a machine-learning network that graphs the relative depth and spatial locations of all the objects in the scene, then bases the language processing on this scene graph. This technique not only enables us to perform reasoning about the objects at a particular moment but also to capture how they’re changing over time.

Such dynamic analysis helps the system understand the movement of pedestrians and other vehicles. We were particularly interested in being able to determine whether a vehicle up ahead was following the desired route, so that our system could say to the driver, “Follow that car.” To a person in a vehicle in motion, most parts of the scene will themselves appear to be moving, which is why we needed a way to remove the static objects in the background. This is trickier than it sounds: Simply distinguishing one vehicle from another by color is itself challenging, given the changes in illumination and the weather. That is why we expect to add other attributes besides color, such as the make or model of a vehicle or perhaps a recognizable logo, say, that of a U.S. Postal Service truck.

Natural-language generation was the final piece in the puzzle. Eventually, our system could generate the appropriate instruction or warning in the form of a sentence using a rules-based strategy.

The car’s navigation system works on top of a 3D representation of the road—here, multiple lanes bracketed by trees and apartment buildings. The representation is constructed by the fusion of data from radar, lidar, and other sensors.Mitsubishi Electric Research Laboratories

Rules-based sentence generation can already be seen in simplified form in computer games in which algorithms deliver situational messages based on what the game player does. For driving, a large range of scenarios can be anticipated, and rules-based sentence generation can therefore be programmed in accordance with them. Of course, it is impossible to know every situation a driver may experience. To bridge the gap, we will have to improve the system’s ability to react to situations for which it has not been specifically programmed, using data collected in real time. Today this task is very challenging. As the technology matures, the balance between the two types of navigation will lean further toward data-driven observations.

For instance, it would be comforting for the passenger to know that the reason why the car is suddenly changing lanes is because it wants to avoid an obstacle on the road or avoid a traffic jam up ahead by getting off at the next exit. Additionally, we expect natural-language interfaces to be useful when the vehicle detects a situation it has not seen before, a problem that may require a high level of cognition. If, for instance, the car approaches a road blocked by construction, with no clear path around it, the car could ask the passenger for advice. The passenger might then say something like, “It seems possible to make a left turn after the second traffic cone.”

Because the vehicle’s awareness of its surroundings is transparent to passengers, they are able to interpret and understand the actions being taken by the autonomous vehicle. Such understanding has been shown to establish a greater level of trust and perceived safety.

We envision this new pattern of interaction between people and their machines as enabling a more natural—and more human—way of managing automation. Indeed, it has been argued that context-dependent dialogues are a cornerstone of human-computer interaction.

Mitsubishi’s scene-aware interactive system labels objects of interest and locates them on a GPS map.Mitsubishi Electric Research Laboratories

Cars will soon come equipped with language-based warning systems that alert drivers to pedestrians and cyclists as well as inanimate obstacles on the road. Three to five years from now, this capability will advance to route guidance based on landmarks and, ultimately, to scene-aware virtual assistants that engage drivers and passengers in conversations about surrounding places and events. Such dialogues might reference Yelp reviews of nearby restaurants or engage in travelogue-style storytelling, say, when driving through interesting or historic regions.

Truck drivers, too, can get help navigating an unfamiliar distribution center or get some hitching assistance. Applied in other domains, mobile robots could help weary travelers with their luggage and guide them to their rooms, or clean up a spill in aisle 9, and human operators could provide high-level guidance to delivery drones as they approach a drop-off location.

This technology also reaches beyond the problem of mobility. Medical virtual assistants might detect the possible onset of a stroke or an elevated heart rate, communicate with a user to confirm whether there is indeed a problem, relay a message to doctors to seek guidance, and if the emergency is real, alert first responders. Home appliances might anticipate a user’s intent, say, by turning down an air conditioner when the user leaves the house. Such capabilities would constitute a convenience for the typical person, but they would be a game-changer for people with disabilities.

Natural-voice processing for machine-to-human communications has come a long way. Achieving the type of fluid interactions between robots and humans as portrayed on TV or in movies may still be some distance off. But now, it’s at least visible on the horizon.

Chiori Hori is a senior principal research scientist at Mitsubishi Electric Research Laboratories in Cambridge, Mass., specializing in multimodal scene-aware interaction for human–robot communications. She earned a Ph.D. in computer science from the Tokyo Institute of Technologies.

Anthony Vetro is a vice president and director at Mitsubishi Electric Research Labs, in charge of AI research in computer vision, speech and audio processing, and in data analytics. He earned a Ph.D. in electrical engineering from Polytechnic University, in New York (now the NYU Tandon School of Engineering).

This isn't AI. It's machine learning. Of course, AI sounds sexier.

The final goal of automotive travel has to be human brain equivalence guiding the vehicle. Each decade technology gets closer to it. Today, we're a thousand miles from it, but approaching steadily and surely because human brains are what's developing it!

Assistive technologies are often designed without involving the people these technologies are supposed to help. That needs to change.

Harry Goldstein is Acting Editor in Chief of IEEE Spectrum. 

Before we redesigned our website a couple of years ago, we took pains to have some users show us how they navigate our content or complete specific tasks like leaving a comment or listening to a podcast. We queried them about what they liked or didn’t like about how our content is presented. And we took onboard their experiences and designed a site and a magazine based on that feedback.

So when I read this month’s cover story by Britt Young about using a variety of high- and low-tech prosthetic hands, I was surprised to learn that much bionic-hand development is conducted without taking the lived experience of people who use artificial hands into account.

I shouldn’t have been. While user-centered design is a long-standing practice in Web development, it doesn’t seem to have expanded deep into other product-development practices. A quick search on the IEEE Xplore Digital Library tallied less than 2,000 papers (out of 5.7 million) on “user-centered design.” Five papers bubbled up when searching “user-centered design” and “prosthesis.”

Young, who is working on a book about the prosthetics industry, was in the first cohort of toddlers fitted with a myoelectric prosthetic hand, which users control by tensing and relaxing their muscles against sensors inside the device’s socket. Designed by people Young characterizes as “well-intentioned engineers,” these technologically dazzling hands try to recreate in all its complex glory what Aristotle called “the instrument of instruments.”

“It’s more important that we get to live the lives we want, with access to the tools we need, than it is to make us look like everyone else.”

While high-tech solutions appeal to engineers, Young makes the case that low-tech solutions like the split hook are often more effective for users. “Bionic hands seek to make disabled people ‘whole,’ to have us participate in a world that is culturally two-handed. But it’s more important that we get to live the lives we want, with access to the tools we need, than it is to make us look like everyone else.”

As Senior Editor Stephen Cass pointed out to me, one of the rallying cries of the disabled community is “nothing about us, without us.” It is a response to a long and often cruel history of able-bodied people making decisions for people with disabilities. Even the best intentions don’t make up for doing things for disabled people instead of with them, as we see in Young’s article.

Assistive and other technologies can indeed have huge positive impacts on the lives of people with disabilities. IEEE Spectrum has covered many of these developments over the decades, but generally speaking it has involved able-bodied journalists writing about assistive technology, often with the perspective of disabled people relegated to a quote or two, if it was included at all.

We are fortunate now to have the chance to break that pattern, thanks to a grant from the IEEE Foundation and the Jon C. Taenzer Memorial Fund. With the grant, Spectrum is launching a multiyear fellowship program for disabled writers. The goal is to develop writers with disabilities as technology journalists and provide practical support for their reporting. These writers will investigate not just assistive technologies, but also look at other technologies with ambitions for mass adoption through a disability lens. Will these technologies be built with inclusion in mind, or will disabled people be a literal afterthought? Our first step will be to involve people with disabilities in the design of the program, and we hope to begin publishing articles by fellows early next year.

This article appears in the October 2022 print issue.

He received the 2003 IEEE Medal of Honor

Amanda Davis is a freelance writer and creative services manager at Measurabl, an ESG software and professional services provider based in San Diego.

Professor Nick Holonyak, Jr., inventor of the light-emitting diode, holds a part of a stoplight that utilizes brighter, current version LED's designed by students of his.

Nick Holonyak, Jr. holds a part of a stoplight that utilizes a newer LED designed by his students. Ralf-Finn Hestoft/Getty Images

Nick Holonyak Jr., a prolific inventor and longtime professor of electrical engineering and computing, died on 17 September at the age of 93. In 1962, while working as a consulting scientist at General Electric’s Advanced Semiconductor Laboratory, he invented the first practical visible-spectrum LED. It is now used in light bulbs and lasers.

Holonyak left GE in 1963 to become a professor of electrical and computer engineering and researcher at his alma mater, the University of Illinois Urbana-Champaign. He retired from the university in 2013.

He received the 2003 IEEE Medal of Honor for “a career of pioneering contributions to semiconductors, including the growth of semiconductor alloys and heterojunctions, and to visible light-emitting diodes and injection lasers.”

After Holonyak earned bachelor’s, master’s, and doctoral degrees in electrical engineering from the University of Illinois, he was hired in 1954 as a researcher at Bell Labs, in Murray Hill, N.J. There he investigated silicon-based electronic devices.

He left in 1955 to serve in the U.S. Army Signal Corps, and was stationed at Fort Monmouth, N.J., and Yokohama, Japan. After being discharged in 1957, he joined GE’s Advanced Semiconductor Laboratory, in Syracuse, N.Y.

While at the lab, he invented a shorted emitter thyristor device. The four-layered semiconductor is now found in light dimmers and power tools. In 1962 he invented the red-light semiconductor laser, known as a laser diode, which now is found in cellphones as well as CD and DVD players.

Later that year, he demonstrated the first visible LED—a semiconductor source that emits light when current flows through it. LEDs previously had been made of gallium arsenide. He created crystals of gallium arsenide phosphide to make LEDs that would emit visible, red light. His work led to the development of the high-brightness, high-efficiency white LEDs that are found in a wide range of applications today, including smartphones, televisions, headlights, traffic signals, and aviation.

Holonyak left GE in 1963 and joined the University of Illinois as a professor of electrical and computer engineering.

In 1977 he and his doctoral students demonstrated the first quantum well laser, which later found applications in fiber optics, CD and DVD players, and medical diagnostic tools.

The university named him an endowed-chair professor of electrical and computer engineering and physics in 1993. The position was named for John Bardeen, an honorary IEEE member who had received two Nobel Prizes in Physics as well as the 1971 IEEE Medal of Honor. Bardeen was Holonyak’s professor in graduate school. The two men collaborated on research projects until Bardeen’s death in 1991.

Together with IEEE Life Fellow Milton Feng, Holonyak led the university’s transistor laser research center, which was funded by the U.S. Defense Advanced Research Projects Agency. There they developed transistor lasers that had both light and electric outputs. The innovation enabled high-speed communications technologies.

More recently, Holonyak developed a technique to bend light within gallium arsenide chips, allowing them to transmit information by light rather than electricity.

He supervised more than 60 graduate students, many of whom went on to become leaders in the electronics field.

Holonyak received last year’s Queen Elizabeth Prize for Engineering; the National Academy of Engineering’s 2015 Draper Prize; the 2005 Japan Prize; and the 1989 IEEE Edison Medal. In 2008 he was inducted to the National Inventors Hall of Fame, in Akron, Ohio.

He was a fellow of the American Academy of Arts and Sciences, the American Physical Society, and Optica. He was also a foreign member of the Russian Academy of Sciences. In addition Holonyak was a member of the U.S. Academies of Engineering and Sciences.

Read the full story about Holonyak’s LED breakthrough in IEEE Spectrum.

Download this free poster to learn how developments in Advanced Driver-Assistance Systems (ADAS) are creating a new approach to In-Vehicle Network design

Developments in Advanced Driver-Assistance Systems (ADAS) are creating a new approach to In-Vehicle Network (IVN) architecture design. With today's vehicles containing at least a hundred ECUs, the current distributed network architecture has reached the limit of its capabilities. The automotive industry is now focusing on a domain or zonal controller architecture to simplify network design, reduce weight & cost and maximize performance.

Download this free poster now!

A domain controller can replace the functions of many ECUs to enable high-speed communications, sensor fusion and decision-making, as well as supporting high speed interfaces for cameras, radar and LiDAR sensors. This poster graphically represents the development of IVNs from the past to the present and future then provides guidance on how to test them.