Deep learning: why should you care?
Deep learning is currently taking the IT world by storm. Facebook, Bloomberg and Google are actively exploring deep learning techniques. Thousands of deep learning startups operating in various industries raise seed funding every day. If you like traveling, have the budget and want to find out more about deep learning technology, there are so many events dedicated to it that you can attend them on a weekly basis.
In fact, deep learning is simply a family of general machine learning methods that use neural networks with more than one hidden layer. So some people are probably wondering why the bar of expectations is set so high.
Like many other machine-learning algorithms, it is inspired by what we know about the biological brain. Like humans, it tries to automatically extract higher layer features from the input (observations). However due to its iterative multilayer architecture, it can learn to represent high-level features more accurately than the previous generation of neural networks. The key factor is that instead of hand-engineered features, deep learning algorithms use trainable features that extract from the observations automatically. This requires more calculations and more training data, but allows us to discover dependencies that were previously unavailable.
The idea of multilayer neural networks is not new: the deep learning paradigm is a reincarnation of traditional neural networks from the early 1990s, which were proposed to improve the quality of machine-driven, quantitative prediction processes. Back then, neural networks were used for prediction analysis and classification.
I remember when I was writing my Master thesis in 1998, neural networks was a buzzword in the IT community. Of course, I took the opportunity to train my first neural network for prediction of FOREX market currency exchange rates with the aim of using the results for my thesis. The training process took 2 weeks, accuracy was completely disappointing and the system could not operate in real time, which depreciated its real value dramatically. It was a big fiasco and I had to swiftly change the topic of my thesis.
Other people were a bit luckier than I was: there were some successful and many mediocre attempts to use neural networks in financial analysis, image recognition and non-linear classification. However, until the mid 2000s the use of full-scale neural networks was extremely challenging throughout the world, due to the lack of sufficient computational resources that made training the networks extremely time-consuming. Another problem that limited the widespread practical use of neural networks was the absence of a fully-fledged methodology to integrate them into statistical frameworks.
These limitations were overcome in the mid-2000s for two reasons: (1) progress in computer technology made it possible to significantly reduce the time needed to train neural networks; and (2) the emergence of new methods for system training made it possible to optimize the training process, in turn dramatically boosting final quality. Modern deep learning systems are widely used to understand raw data, in object classification (text recognition) and computer vision (for example, age estimation from photos) and even in text understanding.
With regard to machine translation, a deep learning system can accurately learn a hierarchy of features representing higher levels of abstraction (semantics) derived from lower levels (lexical features) via intermediate steps (morphology, syntax, etc.). In fact, it has enough power to help us achieve the holy grail of computational linguistics: a cheap way of incorporating semantics into machine translation processes.
In the next post, I will cover some methods of how deep learning can be used to improve the quality of machine translation in industrial settings.
1. Yoshua Bengio, Ian Goodfellow, Aaron Courville, Deep Learning, MIT Press
2. Deng, L.; Yu, D. (2014). Deep Learning: Methods and Applications. Foundations and Trends in Signal Processing 7.