Machine Learning basics for a newbie

Introduction There has been a renewed interest in machine learning in last few years. This revival seems to be driven by strong fundamentals – loads of data being emitted by sensors across the globe, with cheap storage and lowest ever computational costs! However, not every one around understands what machine learning is. Here are a

Continue Reading

Big Data = Hype; But Why That Doesn’t Matter

From time to time, you still come across someone with the opinion that Big Data is nothing more than a fad, which will be forgotten about soon enough. You might not expect to hear this from me, but they’re actually right. Well – half right, at least! As I’ve written before, I’m not actually a

Continue Reading

10 data science predictions for 2015

These predictions were published by the International Institute for Analytics (IIA). They produced a nice infographics, featured below, and re-tweeted many times by various bloggers, using the hash tag #2015Analytics. Other interesting predictions include those by Tableau, those by Pivotal, as well as my own predictions. Here are IIA’s predictions for 2015, in plain text:

Continue Reading

Coursera online course: Introduction to Natural Language Processing

This course provides an introduction to the field of Natural Language Processing. It includes relevant background material in Linguistics, Mathematics, Statistics, and Computer Science. Some of the topics covered in the class are Text Similarity, Part of Speech Tagging, Parsing, Semantics, Question Answering, Sentiment Analysis, and Text Summarization. The course includes quizzes, programming assignments in

Continue Reading

Is the pivot language approach ever a good option?

A pivot language is a third or intermediate language that can bridge the gap between language pairs. For example, if there are translations between English to French and the same English to Spanish available, through the pivot language English, translations between French and Spanish can be generated. The major drawback and concern of generated translations

Continue Reading

What are support vector machines?

Support vector machines (SVMs) use an approach based on constructing decision planes (lines). Such a plain (line) is built for separating objects belonging to different classes. Suppose we have two sets of objects: red and green ones. Sometimes it is possible to draw a line that separates red objects from green ones: See more at:

Continue Reading

The famed MongoDB document database and its benefits

Over the past few years, there have been talks about data management and administration of databases. It is indeed important for data to be managed properly for the best results. Businesses get to enjoy lots of benefits if they manage their data properly because data is an asset, and a very important one for that

Continue Reading

Top-5 trends in Big Data Analysis

In the near future, the following major trends are expected in Big Data analysis: Big Data technology will be based on a mixture of cloud and on-premises computing. Many corporations start migrating from in-house database infrastructures to cloud-based data warehouses. Distributed frameworks for data analytics like MapReduce are turning into distributed managers of resources and

Continue Reading