How an Autonomous Drone Flies With Deep Learning: Page 2 of 3

A team of engineers at Nvidia share how they created a drone capable of fully autonomous flight in the forest.

(ResNets) were first introduced by Microsoft in 2015 and are specifically targeted at image recognition applications. The key advantage ResNet has over other DNNs is that it is easier to train and optimize. Using openly available image datasets captured by other researchers in the Swiss Alps and the pacfiic northwest (the Nvidia team tested their drone in Seattle), Kamenev said they were able to quickly train the drone.


A first-person demo of the drone's flight shows its ability to recognize a path (green), as well as obstacles (red). 


There were however challenges that required the team to make adjustments to the ResNet architecture, Kamenev said. “We found the network can be overconfident [when navigating],” he said. “When the drone is pointing left, right, or center it should be confident. But what happens if it's facing in between?” The solution to this was to implement a loss function into the programming that helped the drone find its position at those other angles.

In addition, the team found that the neural network was easily confused by bright spots in the forest and changes in lighting that came from tree cover. They also needed to account for disturbances like wind that might push the drone off course and require it to adjust. “Some suggested just hitting the drone with a stick,” Kamenev laughed. “But we just used manual override to turn the drone and see how well it gets back onto the path.”

“In addition to low level navigation we also wished to have some kind of system to help us avoid obstacles,” Jeffrey Smith, a senior computer vision software engineer at Nvidia, said, explaining the use of SLAM in the the drone. “Unexpected hazards have to also be identified.”

Smith said that while the SLAM system can very well estimate the camera's position as well as its position in real world space it also has issues with accuracy and missing scale information. “We can track objects, but can't tell how far away they are in real world units,” Smith said.


The Nvidia team experimented with several deep neural networks and got the best results from S-ResNet-18. (Image source: Design News)


Smith said that by using an algorithm based on Procrustes analysis , a type of statistical shape analysis, the team was able to determine translation, rotation, and scaling factors. “The real world intrudes in the form of measurement error,” he said. “ There was a 10-20% error in the distance estimates.”

Ultimately Smith said the team realized they didn't need to calculate from SLAM space to real world space at all, “Because we aren't doing mapping we don't care about scale, we are about not hitting things.”

The type of camera also presented challenges. “We used a simple webcam,” Smith said. “The problem with inexpensive digital cameras is their rolling shutter. SLAM assumes an imagine is being taken all at once. That's not true for rolling shutter.” Since rolling shutter captures a scene by rapdily scanning it vertically or horizontally, it introduces anomalies that aren't visible to the naked eye, but which


Jerald Cogswell's picture
There are non visual methods of navigation used by animals. Bats can hitting wires stretched across their path using their sonar. Owls can sense the sound of a mouse on the ground in darkness and seize it just by listening. But how can it know to avoid restricted or classified areas? How about avoiding solar concentrator paths? There is much to teach a young drone.

Add new comment

By submitting this form, you accept the Mollom privacy policy.