Dr. Know-it-all Knows it all sits down for a vlog and shares his thoughts on Tesla’s Full Self-Driving (FSD) Beta and why it matters; why it is such a big deal. We have been dabbling into the science of full self-driving and we think we understand, but at the root of it all and behind closed doors we are just scraping the surface. The doctor goes on to explain trajectory prediction and 4D data continuity, and how it correlates with FSD and how we evolve into FSD. He elaborates on why Andrej Karpathy and Elon Musk carryon so heavily about Tesla’s new 4D training and processing and their importance.
The doctor studied his master’s thesis, Image-Based Content Retrieval via Class-Based Histogram Comparisons, at the University of Georgia, Athens. And provides insight by saying, “The two biggest items are data continuity and trajectory projection.” He then goes on to explain, “Together, these two bring about a massive leap forward and open a path to true level four autonomy.”
“Semantically, for a computer to recognize a cat doesn’t mean that it understands that it’s a cat. It just means that that arrangement of edges and pixels and colors and so forth is associated with a label that it knows is called ‘cat’. And so that’s how it’s doing it.” He explains that quasi-semantics is a way of identifying objects via a neural network and then finds other object images within the database.
Because a computer is not understanding things the way we as humans do, that is why it is called quasi-semantics. As humans, we learn through language and attach a ‘thing’ to that specific word. We understand that a specific shape or outline is our parent, or that a specific texture is our pet. We imprint our memory with language and objects and then register those items as what we see them as. Because we have evolution helping us understand this process, he goes on to say, “what we think of as basic knowledge like this is really difficult for humans to understand. It’s also super difficult for computers to understand that. Therefore, this is actually still kind of cutting-edge research.”
How Driving Comes Into Play
The doctor uses two examples: a black Labrador Retriever and a black trash bag. Semantically speaking, the computer does not understand what a dog physically is, or a dog’s behavioral nature. A computer just showcases the image, through ‘blobs’ of data. The tricky part is a computer might also think a black trash bag is the same object, through more ‘blobs’ of data.
For safe and effective driving at levels four/five autonomy, a computer must know the clear difference between the two objects. The doctor goes on to explain that if the computer identified a dog in one picture, that it did not actually help the computer figure out what it was in the next picture. Adding, “that’s continuity.” “If I know what something is in one picture, I should be able to know that it’s that in the next picture and the next picture – in other words, four-dimensionally over time.” And this brings us to why 4D is of importance pertaining to FSD.
Google Search vs. Video Sequence Processing – A Dog
To clearly display the differences between a simple Google search and video sequencing, the doctor uses a dog image search from a label, and video stills of a dog, frame by frame. A search engine is able to use the label to collect data from pixels and shapes to find images that match the search label. Compared to video sequencing, each frame has to be processed by the computer.
The doctor uses three video stills to break down the process in which a computer goes through for each sequence. Explaining that each time the frame is broken down, the computer restarts its thinking process and has to ‘relabel’ the image in the frame. The first frame is the dog standing with a branch in his mouth, the second is similar but the dog has slightly moved, and the third the dog is in a new stance, and he points out that the computer may not know at this point that it is still a dog. “This is very inefficient because it has to reprocess everything at every single frame and it’s prone to error at any given frame, because it might just decide that that’s not a dog on one given frame,” the doctor explains. This creates error for the driving capabilities as the sequences can be altered on any frame: dog, dog, dog, not a dog, dog, etc.
The doctor goes on to describe this as a “new student driver” effect. “It’s kind of like having an early student driver in the car. They’re super cautious. They make dumb mistakes. It hugs the center of the lane. If it gets confused. It slows down in the middle of the road for no reason whatsoever – just phantom breaking, etc.”
In order for the car to really drive itself, these problems areas must be resolved. “Continuity of data over a long sequence of frames or videos plus a temporarily semantic understanding of what the car is seeing,’ he explains. 4D training comes into the forefront in order to begin resolving these issues before we get the FSD we are all expecting. Tesla’s FSD explored eight individual cameras alongside a radar and sonar around everything thirtieth of a second. “It processed all of this information separately. It identified the objects and then it acted on this”, he adds.
The New FSD Beta Combined with 4D Result
The doctor explains, “Obviously, this data continuity over time is exactly what the 4D that Musk and Karpathy are talking about. What this 4D training opens the door to is trajectory projection.” As Elon has previously spoken about, this requires a complete software rewrite: new models, new neural networks models, and new algorithms, in combination with the Tesla interference engine – the chip Tesla has in its vehicles. This affords the computers the ability to understand what these objects are – more closely to a human driver.
In the frames of the dog, the computer acknowledges in the first frame that this object is a dog, which can move under its own will. In frame two, we know the dog moved slightly. And in the third, we know the dog moved enough to change position towards the car. The computer is now able to recognize these movements and intersect what the doctor calls the “ego car” – the name of the car possessing the computer. “Thus, the computer knows now that it needs to take immediate steps to avoid this object which can move on its own or, at the very least, it needs to brake quickly to avoid collision with the object.”
As for a trash bag, the computer knows the differences between these two objects – a trash bag or a dog. The computer knows that a trash bag does not necessarily move on its own and can adjust its calculations to avoid collision with the object. Though, I think we would all rather collide with a trash bag than a dog, any day.
“This type of trajectory projection is what humans do extremely well – at least, if we’re paying attention when we’re driving. Distracted driving is a whole other thing, but computers have been just really, really unable to do this before, which is why they’re not very good at self-driving. That’s why you have companies like Waymo and GM Cruise and so forth that are mapping out exact environments around them with LiDAR and they can follow those things,” the doctor explains. Then he goes on to add that a roller coaster can detect objects in the way on the track and know when to brake; unlike Tesla’s new FSD software which has more understanding of the semantics of objects.
Any number of objects can come into the pathway of a car. But the computer must detect whether these objects are moving, still, their level of danger, and then be able to react to whichever the situation may be. Alongside the determination factor, the computer must be able to figure out how far away a specific object or situation is – tying into the reaction time the computer ultimately has. If you venture into the crosswalk, and a driver has their foot on the gas, they may hit you. But with FSD, the computer would instead recognize the stop light and would have probably stopped before you entered the crosswalk – allowing you to now cross safely and the car to enter a full rest.
Full video from Doctor Know-it-all Knows-it-all down below.
What do you think?
It is nice to know your opinion. Leave a comment.