Why is Big Data Becoming So Popular?
Big Data is thought to be a powerful new technology that gives answers to many consumer-focused and point-of-sale questions companies are asking nowadays. Even more, it allows to provide insights into new questions and tasks data owners did not even think about in the past. How do you know exactly what are the main components of Big Data? Why did it become so popular only few years ago? You will find out about these issues in the article written by Natalia Konstantinova, an expert in information extraction and data analysis.
Big data is becoming a really popular trend of research and business developments. So it would be interesting to discuss the reasons for such popularity.
What is big data?
The classical and the most well known definition of ‘big data’ describes it as being characterised by three ‘Vs’:
Big Data is a massive amount of versatile data that is constantly changing and updating. For example, monitoring the activity of users in a social network or their use of information retrieval engine can be considered big data. This data can also be drawn from the enormous number of clinical trials of new medications or information about natural disasters all over the world.
There are a lot of different ways to define big data, however it is not the main topic of the article (a huge set of various definitions can be found here).
So why now?
A lot of research dealing with big data is appearing now. Plenty of startups started offering to extract value from your data and provide valuable insights. Surprisingly, methods allowing to process data automatically such as statistical ones – regression, classification, clustering have been known for many decades. So what are the reasons behind such a spike of popularity of this field? Why is it happening now?
Several explanations of this trend can be identified.
1. Technological progress
Technological progress has resulted in the fast development of computers and the computing power has increased considerably. Computing power that was previously only accessible to large companies is becoming available to a wider market.
This technological progress has also resulted in the price drop of the hardware. Whereas the computing power can be even rented at an acceptable rate. Previously one would have needed to use mainframes that would cost a lot of money to process a large amount of data. Today one can rent Amazon Elastic Compute Cloud (Amazon EC2) to speed up the computing drastically. It can be rented at affordable price and for necessary time. The problem of storage is also becoming less and less critical with the appearance of cloud storage and reduction of prices for storage devices.
2. Development of infrastructure
The infrastructure for analysis of big data is developing rapidly. Software programs and various libraries within different programming languages are becoming accessible for all those who want to explore this field.
This chart shows a set of instruments already available for analysis of data. Some of them are free and can be used by all interested parties. The advent of NoSQL databases, that became an important addition to classical relational databases, was also a powerful spur to the popularity of big data.
Open source project Hadoop has also helped to make implementation of distributed computing much easier.
3. Accessibility of data
Rapid progress is being made and presently one can collect data from various sources. It is relatively easy to install sensors in all kinds of devices, indeed, data can be collected from fridges and toasters, to say nothing of cars and phones.
Business have started to realise that data can be used to accurately predict the needs of customers which subsequently can increase profits significantly. Data is invaluable and the ability to use it correctly can be a crucial factor of success or failure in business
The amount of data is growing rapidly and we need to think how data should be processed and what kind of information to extract. When processing data one needs to ask questions, define what to look for. However, enormous amounts of data brings a new challenge of understanding what the right questions are and what information is feasible to obtain.
At present, we are not limited to one type of data, it can come in different forms: video, text, images, links between people. The amount of data is growing rapidly, so its automatic processing is becoming essential and it gives rise to more interest in such techniques as Machine Learning, Natural Language Processing and distributed computing.
About the author:
Dr Natalia Konstantinova got her PhD in Information Extraction for Interactive Question Answering from University of Wolverhampton (UK). Her interests lie in the field of NLP (Natural Language Processing), machine translation, information extraction, dialogue systems, speech, data analytics, machine learning as well as project management.