IBM’s Breakthrough Distributed Computation for Deep Learning Workloads (Update)

IBM’s Breakthrough Distributed Computation for Deep Learning Workloads (Update). NEWS ANALYSIS: Why deep learning is a literal ‘killer app’ for computers, and how IBM has figured out how to distribute computing for much faster processing of big-data artificial intelligence workloads. Connecting Servers for AI Jobs Sounds Easy, but Isn’t By merely being able to connect a group of servers together to work in concert on a single problem, IBM Research has uncovered a milestone in making Deep Learning much more practical at scale: to train AI models using millions of photos, drawings or even medical images and by increasing the speed and making significant gains in image recognition accuracy possible as evidenced in IBM’s initial results. The software will help shorten the time it takes to train AI models from days and weeks to hours. “We’ve been using GPU (graphics processing units) accelerators to accelerate deep learning ‘training.’ What we do is give these computer models millions of images, but then we have to train them on computers with powerful GPUs (to record and understand what the images entail). This is hard to do!” IBM Found the ‘Ideal Scaling’ Gupta said IBM Research posted close to ideal scaling with its new distributed deep learning software that achieved record low communication overhead and 95 percent scaling efficiency on the open source Caffe deep learning framework over 256 GPUs in 64 IBM Power systems. Previous best scaling was demonstrated by Facebook AI Research of 89 percent for a training run on Caffe2, at higher communication overhead. IBM demonstrated the scaling of the Distributed Deep Learning software by training a ResNet-101 deep learning model on 7.5 milion images from the ImageNet-22K data set, with an image batch size of 5,120. But progress in accuracy and the practicality of deploying deep learning at scale is gated by technical challenges running massive deep learning based AI models, with training times measured in days and weeks, Gupta said. IBM’s cluster with its new DDL library finished the run 7 hours, and achieved 33.8 percent accuracy.

NEWS ANALYSIS: Why deep learning is a literal ‘killer app’ for computers, and how IBM has figured out how to distribute computing for much faster processing of big-data artificial intelligence workloads.

IBM.deeplearning

Off the top, it sounds simple enough: You have one big, fast server processing an artificial-intelligence-related, big data workload. Then the requirements change; much more data needs to be added to the process to get the project done in a reasonable span of time. Logic says that all you need to do is add more horsepower to do the job.

As Dana Carvey used to say in his comedy act when satirizing President George H.W. Bush: “Not gonna do it.”

That’s right: Until today, adding more servers would not have solved the problem. Deep-learning analytics systems up to now have only been able to run on a single server; use cases simply haven’t been scalable by adding more servers, and there are major backend reasons for that.

All that is now history. IBM on Aug. 8 announced that its researchers have changed this by coming up with new distributed deep learning software that has taken quite a while to develop. This is very probably the biggest step forward in artificial intelligence computing in at least the last decade.

Connecting Servers for AI Jobs Sounds Easy, but Isn’t

By merely being able to connect a group of servers together to work in concert on a single problem, IBM Research has uncovered a milestone in making Deep Learning much more practical at scale: to train AI models using millions of photos, drawings or even medical images and by increasing the speed and making significant gains in image recognition accuracy possible as evidenced in IBM’s initial results.

Also on Aug. 8, IBM released a beta version of its Power AI software for cognitive and AI developers to build more accurate AI models to develop better predictions. The software will help shorten the time it takes to train AI models from days and weeks to hours.

What exactly makes deep learning so time-consuming to process? First of all, it involves many gigabytes or terabytes of data. Secondly, the software that can comb through all of this information is only now being optimized for workloads of this kind.

One thing a lot of people haven’t yet gotten straight is what sets deep learning apart from machine learning, artificial intelligence and cognitive intelligence.

Deep Learning a Subset of Machine Learning

“Deep learning is considered to be a subset, or a particular method, within this bigger term, which is machine learning,” Sumit Gupta, IBM Cognitive Systems Vice-President of High Performance Computing and Data Analytics, told eWEEK.

“The best example I always give about deep learning is this: When we’re teaching a kid how to recognize dogs and cats, we show them lots of images of dogs, and eventually one day the baby says ‘dog.’ The baby doesn’t look at the fact that the dog has four legs and a tail, or other details about it; the baby is actually perceiving a dog. That’s the big difference between the traditional computer models, where they were sort of ‘if and else’-type models versus perception. Deep learning tries to mimic…

Written By
More from Industry News

Sydney Royal Easter Show falls victim to credit card ticketing scam

Author: Kieran Gair / Source: The Sydney Morning Herald Police are asking people heading...
Read More

Leave a Reply

Your email address will not be published. Required fields are marked *