Tribe AI hosts weeklyish forums for our 125+ community of machine learning researchers, data scientists, and engineers to share insights and talk shop.
Recently, we hosted a discussion on autonomous driving technology. Below is an excerpt of a follow-up Q&A with 5 Tribe experts who have helped develop technology for companies like Waymo, Uber, and Cruise. Some engineers currently working in self-driving have chosen to remain anonymous so they don’t get sued.
The Tribe
• PG. Co-founder and CEO of Aquarium Learning, formerly an early employee at Cruise Automation
• KK. ML engineer and researcher in computer vision
• N. Senior software engineer in perception at a top 5 self-driving car company.
• E. Robotics and AI product manager and engineer. Currently working on onboard software at a self-driving car company.
• YJ. Data labeling expert and researcher. Formerly at a unicorn AI company
During the forum, we talked about predicting the behavior of road actors as one of the most unsolved problems of self-driving technology. What do you think the most unsolved problem is?
PG: I think that’s a little vague. Because really it’s about what you’re trying to build. And that turns into what do we need to solve? For example, if you’re looking at building for urban self driving – the Waymos and Cruises – probably predicting the behavior of these dynamic actors is one of the more challenging problems. But if you’re going to go outside that and look at trucks, the most unsolved problem turns into identifying very small objects at long distances and reacting accordingly with this giant vehicle loaded with thousands of pounds of cargo. So, to me, that question is situationally dependent.
KK: My experience is really just with the perception piece, so for that we have a lot of different models we’re trying to develop simultaneously. Traffic light detection. Pedestrian detection. For every model, you have to be able to trace issues back to their root cause. Why didn’t the system detect a pedestrian in this area when it should have? Then you have to go back and see what data it was trained on and how it was trained. It becomes a logistical nightmare to do this at scale with hundreds of different capabilities you’re trying to build incrementally. So I’d say the most unsolved problem is really the complexity of the system itself.
"I’d say the most unsolved problem is really the complexity of the system itself."
N: From a technical perspective, I think behavior prediction and evaluation is a big one. It’s a hard technical problem because there is no right answer. If I show ten people a scene, they can all point you to the ambulance and tell you how many cars are in the picture. But behavior prediction is less objective. In many cases, those same ten people wouldn’t agree if the question was what action you should take based on the scene. Determining metrics to understand and evaluate a system of behavior is a really hard problem and one that, in my opinion, none of the companies have solved yet.
E: In terms of the product itself, I think one thing that’s really hard to scale is understanding a good framing for how to represent the world. If you ask two drivers to drive the same road and then the same thing happens and you ask them to describe it – you’re going to get wildly different answers. Right now, we have that problem with five different self-driving car companies. What you end up with is 5 different mental models and it’s hard to say which is right or wrong. Because of that, I think there’s potentially an operational risk that not everyone is speaking the same language as they start to commercialize. There are some very interesting practical problems there that may need to be addressed through collaboration and legislation.
YJ: So my team has labeled tens of thousands of LIDAR annotations. And one of the biggest problems I saw is there’s no standard for data labelling. Do we include side mirrors or not? Do we include the open door for a parked car? Do you include the limbs on a pedestrian? I’ve seen data from ten different self-driving companies and I can tell you there’s no unified standard of how to properly label this data. And every company’s data is different because it depends on how dense the LIDAR frequency is.
At the forum, we also touched on the philosophy of responsibility for self-driving technology. Who decides how safe is safe enough? Who should be responsible for making these critical safety decisions – regulators, engineers, or someone else?
PG: My experience in the field is that regulators have a very broad, vague set of requirements. It just has to be as safe or safer than a human driver. They don’t require you to demonstrate that in a certain way, so it becomes up to the individual company to make the case for why they think their technology is safe enough to deploy without a driver. There are parallels to this in the medical devices industry, which is where our head of safety comes from, where you have to go and talk to the FDA. It becomes more of a conversation than checking boxes. So, at least how it works in practice now, the onus is really on these companies to make their own case.
KK: The interface between humans and machines is changing. There are bound to be misunderstandings. Before self-driving, the car had understandable behaviors and interfaces, so it was easier to know where the line of responsibility would lie. But with new capabilities coming out – like autopilot – humans need to be adaptive and understand now what’s different about their vehicle. With updates, there’s the potential for things to change under you without you even knowing. It becomes a tricky problem. But definitely there should be someone whose job is to find where the failure points may be and be ultimately responsible for the safety of the system as a whole. I can tell you as an engineer, you have specific goals and this larger question of the whole-system safety is not top of mind, which could be a recipe for potential problems.
"I think this question really isn’t about self driving cars. It’s a proxy for the larger question who ultimately bears responsibility for reducing the harm a technology might cause within society. We could just as easily be talking about Facebook and elections."
N: I think this question really isn’t about self driving cars. It’s a proxy for the larger question who ultimately bears responsibility for reducing the harm a technology might cause within society. We could just as easily be talking about Facebook and elections. Or BP and oil and carbon. And we’ve designed these systems – policy, legal systems – to express our will as a society. So ultimately that’s where the responsibility should lie in a perfect world.
But, because it’s not a perfect world, it’s important for the people working on these systems to do their best in the meantime. For example, when I as an engineer am deciding how to tune my system to recognize pedestrians there’s always going to be a tradeoff between the number of false positives versus always seeing every pedestrian. You are ultimately making a safety tradeoff. And I would like to see people thinking about it more thoughtfully. I think it’s okay that we in self driving admit that we don’t have all the answers.
And it’s not just a technology question. I’ve seen inside some of these small companies that are always on the verge of extinction and about to run out of funding. And those incentives are not good. When you’re faced with the choice of deploying an unsafe system or stopping to exist as a company what happens then? I feel strongly that not taking engineering ethics seriously is a problem, and I don’t think it applies just to self driving.
E: What is confident enough? What is good enough? My personal view is that we should approach this as one global team. I don’t think regulators versus companies is the right framing. Sure, there might be incentive structures that bias people in a certain way, but we should be able to set up those incentives in such a way that we get the best safety outcome. That means the most qualified people with the most technical or relevant background doing the best work on the ground. In the end, I think it’s going to be an iterative feedback loop between developers, companies, and regulators to define what’s reasonable.
YJ: So my team has labeled tens of thousands of LIDAR annotations. And one of the biggest problems I saw is there’s no standard for data labelling. Do we include side mirrors or not? Do we include the open door for a parked car? Do you include the limbs on a pedestrian? I’ve seen data from ten different self-driving companies and I can tell you there’s no unified standard of how to properly label this data. And every company’s data is different because it depends on how dense the LIDAR frequency is.
At the forum, we discussed trucking as a hard space to make progress in because of the lack of interesting problems to train the model. Can you think of other examples of limitations?
PG: So the flip side with urban driving is there are too many interesting scenarios. You have all these different actors – cars, pedestrians. Every city has its own weather conditions or norms of driving behaviors. There’s this thing known as the Pittsburgh left that’s different in Pittsburgh than anywhere else in the nation. Machine learning is good at handling common cases, but when you get to this longer tail of very interesting but uncommon cases it tends to struggle. That’s this overarching problem in both cases: how do you build this system that’s robust to a lot of different scenarios?
KK: With trucking, it really becomes a question of: what are the difficulties in acquiring useful training data? I think the biggest challenge is accurate interpretation of the entire physical and temporal context. And not just for ML models, but human annotators too. You need a human to come up with accurate labels and they need so much context – temporal context, spatial context – it’s very hard to replicate the entire driving experience to someone whose job it is to go and label data they might have no experience with.
N: The problem with trucking is a problem with ML in general, which is the long tail of issues. ML systems are known for being good at interpolating, so if they’ve seen something similar they’re really good at generalizing. But they’re not good at extrapolating. The consequence is that essentially you need to build up a library of all the things a car might see in order to teach the car how to operate. So these pockets where a car hasn’t seen something similar, there’s undefined behavior in there. Finding and eliminating those pockets is the second 90% of self driving development. That’s why it took Waymo from 2015-2020 to deploy a service in the real world. You know those pockets exist but you don’t know where they are until you have the data.
"The biggest limitations are going to be social and cultural. People are going to protest robots on the street. "
E: I would actually argue that solving the Trucking problem is theoretically not as hard as dense urban driving. There are ways to do that with simulation or synthetic data if you have a good understanding of how to represent the space. Say you need to make sure your model can recognize armadillos. You can either drive a thousand miles and hope for some armadillos. Or you augment your data with 500 examples of armadillos. The problem is that with people there are an infinite number of combinations of behaviors. The real question becomes: how do you represent that so that you can actually augment long tail interactions, not just the long tail presence of something on the road?
YJ: I think the biggest limitations aren’t the technology. It’s very hard to make something happen in the US. The technology is going to be there in the next five years. In ten years it’s going to be mature enough to drive on a crazy street. The biggest limitations are going to be social and cultural. People are going to protest robots on the street.
Are there niches where you see a big opportunity to apply this technology in the future? Outside of consumers looking to get from A to B without driving themselves.
PG: Self-driving to me is kind of like the space program. There was so much investment in terms of capital and the smartest people of that generation. The result of that was a lot of technology that was developed initially for spacecraft that made it into all aspects of human society. Computer development, mathematics, hardware – even ballpoint pens.
With self-driving, the interesting thing for me is seeing how it’s affecting so many other spheres of the economy. You have the hardware. LIDARs have essentially been bootstrapped as an industry by self driving. And now you can basically get something you can fit onto a vacuum robot. Then you have the robotics software stack. For example, we have one client who does analytics on trash. So these technologies of deep learning and perception that were refined for self driving are now being used to sort recycling. And a lot of these other applications are much easier and more constrained problems that are just as economically impactful.
That’s what really excites me. And that’s why I left self driving – to help develop that other layer in the stack – the infrastructure to support these robotics and intelligent use cases that interact with the world. I think the second order effects of the self-driving diaspora will be way bigger than the impact of self-driving by itself.
KK: I think the technology is immediately applicable in other areas that don’t have the same critical safety requirements, but can still really improve people’s lives. Drones are one example. I’m also seeing the computer vision and 3-D reconstruction techniques created for self-driving being applied in other areas like construction or even digital dentistry. Another area is wildfire detection. You can use the same kinds of computer vision techniques combined with weather sensors to detect smoke and put out fires before they spread. It’s a really exciting area to work in vision and perception.
"I think the second order effects of the self-driving diaspora will be way bigger than the impact of self-driving by itself."
N: With a lot of new technology it’s easier to imagine a slight tweak on what we have rather than an entirely different future. One thing that comes up a lot is that self-driving is incompatible with cities and our climate goals. But that assumes that self-driving will be the same as driving now, but you push a button and the car drives itself. That doesn’t have to be true.
Take busses, for example. The main cost is drivers, which is why busses are so big and the routes are sparse and specifically chosen. Imagine a future where a bus is ⅕ of the size and the routes are totally different and far more numerous and responsive to user demand. You can imagine all these different ways to apply the technology that will require a shift from how we view driving now.
E: This is part of the reason I work in this space – it’s the combination of all of the hard problems in the AI and robotics space combined into one being deployed at scale. I think there are tons of applications. Computer vision in tax document recognition. Quality control. If you’re going to focus more on the robotics side – logistics and manufacturing. I think there will be a lot more in the space of human assistance soon. Think, cleaning robots or more advanced in-home robots (Roomba++).
YJ: AI is a buzzword. There are a lot of things it can do. Agriculture and farming – finding the ripe strawberry. Filtering content and content moderation. Medical imaging. Tracking and predicting carbon and geographic changes. There are so many areas that are going to change.
What do you think is the most interesting challenge in this space right now?
PG: For me it’s variability, which is why we started Aquarium. You have to make sure your ML model can handle a lot of scenarios. In self driving it’s about how you handle someone dressed in normal clothes versus a dinosaur costume. But when you look at different fields, there’s all sorts of different types of variables that systems have to handle.
So really the challenge we’re trying to solve at Aquarium is about adapting to all these edge cases to make that iteration process easier. Being able to understand what the model is doing, where it’s failing, and give that feedback back to the model with the right data. So it adapts to these cases and does better in the future.
KK: For me, the most interesting challenge is the development of sensor capabilities. Right now, different sensors have different limitations so you can never get 100% reliability with any one sensor alone. You have to combine them to address this. But there are new sensors coming out that I think are pretty exciting and address a lot of these limitations. Single-photon avalanche diodes (SPAD) can detect through fog. All you need is one photon to get to the sensor and you’ll be able to detect something. There’s also a -4D radar that will open the door to higher levels of safety. And that’s just what exists right now.
"I don’t think the ML industry in general has figured out how to build these large systems from an organizational perspective."
N: For me, it’s the organizational challenges. I think self-driving companies in general have not figured out how to get people working together on one coherent system. When you have a 2,000 person company working on one thing, it becomes really hard to work together effectively and productively. I don’t think the ML industry in general has figured out how to build these large systems from an organizational perspective. Multiple companies in the space have seen a lot of growing pains there.
If I had to pick a technical challenge that I think is really interesting, it would be the evaluation side of things. How do we actually know the system is safe? And how do we know that without driving 5 million miles? I think that’s really hard and one that a lot of companies have not cracked.
E: I think large scale tech adoption of something this substantial is going to be a challenge. Everyone has their theories about how it will go, but how humans actually integrate self driving into their lives is going to be pretty interesting to watch.
Okay, what’s your bet: how many years are we from level 5 automation?
PG: Level 5 is a very vague term. Essentially what this means is a car that can drive as well as a human. But what does that mean? I can tell you right now that I can drive very well in the Bay Area, but if I were to go to Vietnam or Peru – I would be terrible. It’s a completely different driving environment.
Self driving works. Right now you could go to Phoenix and hail a car that would be safer than having a human driver. A lot of systems are set up so they might call for help from remote human operators occasionally, but that doesn’t really change anything either from an end user or economic perspective. But it doesn’t work everywhere. So if you’re talking about these existing driving conditions where you’re in Phoenix and you want to completely remove a human from the loop – I think that can be done in the next decade. But to me that’s not really that interesting because it doesn’t really affect its economic viability. So to me that’s more a scientific question rather than a practical question.
KK: Personally, I feel like it will be 5 years at a minimum to level 5. And that’s for early adopters willing to take some risk. For the masses it will be at least 10 or 15 years out before we see a shift happening.
"If the definition of level 5 is that a car can drive anywhere a human can, we’re probably pretty far. But I think that humans drive a lot when they shouldn’t anyways."
N: If the definition of level 5 is that a car can drive anywhere a human can, we’re probably pretty far. But I think that humans drive a lot when they shouldn’t anyways. Safely driving in a snowstorm where you shouldn’t be on the road? I think that’s a long way off and not a domain we should worry about in self driving.
E: I would say the distinction between level 4 and 5 isn’t that interesting. Five means you can drive actually anywhere, and four is under specific conditions. I’m not sure that the last X% is even worth going for. Do I care if I can take my self- driving car up a dirt road to the base of Everest? Is that really worth three years and hundreds of millions of dollars of development? We’re at level four now. We’re going to be at level four for a very long time if not forever. But to get to the sentiment of the questions: when can we drive in complicated city environments? If I had to guess maybe five years but that is a very low confidence estimate as there are just so many considerations.
YJ: I think the technology will be mature enough for it to happen in ten years. But I don’t think society is ready. Maybe it could happen in China. Because they can create a self driving car lane or just say, “no more gas cars” and everyone would have to do it. In the US there are so many regulations and I think there will be more resistance from people, so I don’t know if I could see it happen here in the next twenty years.