1. AN OVERVIEW OF MACHINE LEARNING


In recent days, when people hear about machine learning, they tend to think of a terminator trying to destroy human species! Trust me we are nowhere near building that kind of robot as of yet. Had it been possible, we would have solved artificial intelligence. Well, its a long way down the lane, if we ever can solve AI.


Machine learning has been around for decades. Previously, it was used in very specialised programs like the optical character reader (OCR). Machine learning techniques became quite popular and mainstream during the 1990 when it was used for spam e-mail detections. It was pretty good at it!


The spam filter isn’t exactly a self aware skynet, but it does technically qualify as Machine Learning. It has actually learned so well, we seldom have to mark any email as spam. This followed the development of 100s of other ML application that now quietly power things like voice recognition and better search recommendation.


An engineering oriented definition of Machine Learning by Tom Mitchell -
A computer program is said to learn from experience 'E' with respect to some task 'T' and some performance measure 'P' if its performance on 'T', as measured by 'P' increases with 'E'.

Well how does this happen? If you download a .pdf of your textbook, did your computer learn everything? Can your computer now do your home work? NO! Definitely your data increased but your computer cant read or understand the book.

Machine Learning is a field of AI (Artificial Intelligence) that uses statistical techniques to give the computer systems the ability to learn from data without being explicitly programmed. In general terms, Machine learning is the science of programming computers so that they can learn from data.

WHAT IS MACHINE LEARNING?

The problem with traditional approach of programming is, if the problem is not trivial, the program becomes a long list of complex rules that are pretty hard to maintain. Machine Learning approach makes the programs much shorter and easier to maintain and most likely more accurate.

Machine Learning approach is great for:

  • Problems for which existing solutions require a lot of hand-tuning or long list of rules.
  • Problems that forever keep changing (no trivial solution, Example: Spam filters).
  • Getting insights about large dataset.

There are various types of ML systems. It can be broadly classified based on:

  • Weather they are trained with human supervision or not (Supervised, Unsupervised, Semi-Supervised and Reinforcement learning).
  • Online versus Offline learning.
  • Instance - based versus Model - based learning.
The above classification is not exclusive and can be combined in any way.

In supervised learning, the training data includes desired solutions called labels. A typical supervised learning task is classification. Another typical task is to predict numeric values using set of features called predictors. Some of the important supervised learning algorithm are listed below:
  • k - Nearest neighbours
  • Linear Regression
  • Logistic Regression
  • Support Vector Machines (SVM)
  • Decision Tree and Random Forest
  • Neural Networks (Some NN can be unsupervised or semi-supervised)
In Unsupervised learning algorithm, the training dataset is unlabelled that is the dataset doesn't include any information about the desired output. Some of the important unsupervised learning algorithms are:
  • Clustering
    • K Means
    • Hierarchical Cluster Analysis (HCA)
    • Expectation Maximisation
  • Visualisation and Dimensionality reduction
    • Principal Component Analysis
    • Kernel PCA
    • Locally - linear Embedding (LLE)
    • t - distributed stochastic neighbour embedding (t - SNE)
  • Association rule learning
    • Apriori
    • Eclat
Clustering algorithms detect similar groups. Visualisation algorithms outputs a 2D or 3D representation of the unlabelled data. Dimensionality reduction algorithm merges two different correlated  features into one. This is called feature extraction. It is often a good idea to run dimensionality reduction algorithm on data before training. Association rule learning is often used to discover interesting facts about the data.

Some algorithms can deal with partially labelled data (usually a lot of unlabelled and very few labels). This is called semi - supervised learning.

The learning system in reinforcement learning is called an agent. The agent can observe and perform actions and get rewards or penalties. It learns the best strategy over time called policy to get the most reward over time.

Another popular criteria used to categorise Machine Learning is weather or not the system can learn from incoming stream of data. If the system is capable of learning incrementally on the go, it is called online learning otherwise it is called offline learning. If the resources are limited, online learning systems are preferred over offline. The online learning system chops the data into small fragments and then learns from it. This process of chopping data and processing is known as out-of-core learning. One should be careful while using the online learning systems as a very high learning rate will lead to poor performance.

One more way to categorise Machine Learning is how they generalise. There are two main approaches to generalise: instance based learning or model based learning. In instance based learning, the program uses a measure of similarity to make prediction. Whereas, in model based approach, a model is created using the dataset and the model is used for prediction.

Since now you have a general idea about Machine Learning try to think which kind of learning technique would you use in a vehicle sent to mars? Please answer on the comment box.


This is a small overview of Machine Learning. My next blog will be about the main challenges of machine learning approach.

Comments

Post a Comment

Popular posts from this blog

3. INTRODUCTION TO ML WITH PYTHON USING kNN

5. kNN REGRESSION