To understand why deep learning attracts so much attention, look to industrial robotics.

Deep learning—the technology that simulates the human brain in how it makes decisions based new knowledge—now empowers robots to better solve problems and recognize complex real-world patterns. It heralds a new age for robots, a researcher contends.

Nowhere is this more evident than in Amazon’s annual competition called the “Amazon Picking Challenge,” which features robots performing industrial tasks.

In 2016, the competition, which awarded prizes up to a whopping $80,000, required robots to use deep learning to pick items of various shapes and forms stored on shelves. The winner, Team Delft, produced a robot that demonstrated successful problem-solving while recognizing and selecting specific consumer items and placing them in a box. The challenge in all of these competitions is to complete the tasks accurately and quickly.

Team Delft wins the 2016 Amazon Picking Challenge

This technology is highlighted in a recent analysis about how deep learning will revolutionize robotics. In fact, deep learning (DL) is expected to have such an enormous impact on our overall lives, DL is ranked No. 1 on the IEEE Computer Society’s top 10 tech trends for 2018.

“One reason deep learning has attracted the attention of so many researchers and engineers, even outside of the AI community, is because it can capture abstract features and recognize patterns in ways many once thought impossible for computers,” writes Ryo Miyajima of Preferred Networks and author of “Deep Learning Triggers a New Era in Industrial Robotics,” which appears in the November/December 2017 issue of IEEE MultiMedia.

Another example of a deep-learning robot

Miyajima’s own company, Preferred Networks, achieved notable success by teaming up with FANUC to create a deep learning system that could select steel cylinders out of a bin with as much accuracy as conventional systems that require a great deal of human input.

bin picking system uses a depth camera image

Overview of our novel bin picking system. It uses a depth camera image around a given suction hand position as input, together with the output representing whether that suction was successful. Through trial and error, we collect this input and output pair with the actual robot, initially starting with a random policy. We then train the deep neural network with thousands of these inputs and outputs.

“By training the deep neural network with thousands of these inputs and outputs, we have achieved 90 percent accuracy, which is comparable to a conventional system whose parameters are tuned by experienced operators. Furthermore, our deep learning approach doesn’t require us to predefine the object’s appearance or shape,” he writes.

The winning team behind Amazon’s deep-learning robotics contest

The success of Team Delft depended on a deep learning system called “The Faster Region-based Convolutional Neural Network,” which was one of the more popular deep neural-network-based object detection systems when the team started working on their system.

“It had shown state-of-the-art accuracy when it was released on datasets such as Pascal’s visual object challenge (VOC) 2007 and 2012 and the Microsoft Common Objects in Context (MS COCO) dataset,” writes Miyajima.

object detection in Team Delft robotics system

Team Delft used deep learning in their object detection component to classify the objects in the camera image and output the bounding box for each object: detection for the (a) picking task and (b) stowing task.

The team developed a highly-sophisticated camera to detect objects by shape, location, and position.

“The Team Delft system estimated the 6D pose of the detected object and matched a premade CAD model of the object against the real-time point cloud retrieved from the camera,” says Miyajima.

Next, the team created a robotic arm that, once the camera recognized the object, poised itself to grab the object before going in for the kill.

“The team used shape primitives to describe the object geometry and predefine the grasps. Because an estimate of the object pose was available, the predefined grasp poses could be transformed with the object pose, providing the robot with poses to move into so it could pick up the item,” Miyajima says.

Lastly, the team used open-source software for manipulation called MoveIt! for offline movement, such as moving around on the floor without colliding with other objects.

“The team distinguished between two types of trajectories: offline and online motions. Offline motions were used for motions outside of the shelf, which could be pre-generated using RRT-Connect of MoveIt,” writes Miyajima.

A montage of robotics systems using the MoveIt! manipulation software 

The future of industrial robotics relies on two things, according to Miyajima:

Deep-learning robots vs conventionally programmed robots

With the current technology, end-to-end trained systems will not be as precise and accurate as conventional systems that are tweaked and tuned for a very specific task, such as controlling the position of a robotic arm. However, being able to teach abstract tasks (opening the cap of a bottle, for example) to robots end-to-end was not something experts thought was practical until a few years ago. With new technology, we might see robotic system designs radically reshaped in the near future.

Let’s say you want to pick up a cube with two fingers. You immediately choose to place your fingers on the two facing sides and not on any other combination of sides. Your intuition about your grasp quality is in line with what grasp analytics predict. Therefore, instead of spending time and using numerous robots to predict the outcome of every possible grasp, it makes sense to use our knowledge about grasp quality and inject it into the training dataset to train a deep neural network.

For the time being, humans can do some tasks much better than robots, especially those demanding manual dexterity. However, Miyajima believes that will change in the coming years.

“First, there is a growing demand for automation in mass customization solutions (as opposed to mass production). Second, even in some mass production factories that will benefit from automation, we still see human workers performing repetitive tasks that are technologically challenging (requiring dexterity, for example) or not worth the investment to automate. In either case, deep learning is a promising technology for cultivating undeveloped areas and providing us with more robust, adaptive, and reliable systems,” Miyajima says.


Related research on industry and robotics in the Computer Society Digital Library: