Exploratory Data Analysis is a key step in any machine learning project. In this blog we will analyse the credit card data to draw key insights from it.
The data has been collected from the UCI repository. We can find the dataset at the UCI repository. This is the link to the dataset https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients.
XGBoost is one of the most celebrated library in the kaggle competitions. It has dominated the space of tabular or structured datasets.The model got so much popularity that top kagglers mainly relied on this library to win competitions. In this blog we shall see the basics of this algorithm and apply it to a kaggle case study.
Understanding Gradient Boosting:
Boosting is a strategy to convert a bunch of weak models to strong models. Here as models we shall consider the decision trees. Unlike the bagging models boosting learns from the mistakes by the weak learners. …
Tracking objects is an important application in the field of computer vision. This has got use cases in survillance system, defence, self driving cars etc. In this blog we shall discuss one of the basic tracking algorithm known as mean-shift algorithm and will see its application by tracking a car in a video.
Before moving to the mean-shift tracking let us understand histogram and its application to create pre-processed input for the mean-shift tracking.
An image is composed of pixels of different values. The distribution of pixel is an important characteristics of every image. Histogram is very useful to characterize image’s content by counting the number of values for each pixel. To generate a histogram for an image we can use the opencv library. …
Image quality enhancement is a very common application. Today we can see many apps providing image enhancement modules. In this blog we will see how we can convert a low quality image into a high quality image(to some extent) using image processing.
Consider the below image.
Deep learning techniques have been quite useful for counting and detecting number of objects in an image. However we can make use of the traditional image processing methods to do the same which would have lesser complexity. This blog explains the step by step process to calculate the number of pizza slices in a plate.
Let us consider the following image that has got four slices.
Transfer learning is a useful strategy for applications of image like classification and detection when we don’t have enough data to train the model. This also enables us to train the model with little modification in the pre trained architecture, like adding a layer on top of the model and only training the added layer. Unlike training the model from scratch this can be really handy as the number of parameters required to train the model would be less which would ensure less computational cost. This blog explains application of transfer learning to detect if the image has cactus. …
Color Quantization is the process of reducing the number of colors in an image while keeping the visual appearance of the image intact. This is an useful image compression technique which is quite useful for the devices that can show limited number of colors due to memory restriction.
Every image can be represented by three features which is the B,G,R value for each pixel. Considering that our image has pixel values from 0 to 255 we can say that for each image we have 256 * 256 * 256 colors. …
Semantic segmentation is a computer vision problem where we try to assign a class to each pixel . Unlike the classic image classification task where only one class value is predicted(assuming single label classification), in this problem we look for class value for each pixel. The application of image segmentation is predominantly seen in the medical field. However now this is being applied in other domains also e.g self driving car.
In case of image classification we are particularly interested to know what is there in the image. Semantic segmentation comes with two wh questions which is what and where.
U-Net is the most popular model for semantic segmentation task. Though we have other models to accomplish this task U-Net is widely accepted as the de-facto standard for this task. …
Quora insincere question classification was a challenge organized by kaggle in the field of natural language processing. The main aim the challenge was to figure out the toxic and divisive content. It is binary classification problem where class 0 represented insincere question and class 1 otherwise. This blog would specifically deal with the data modelling part.
In the first step we shall read the data using pandas. This code snippet would read the file into a pandas data frame.
We can know the shape of the data using the shape method.
Initially we would try to divide the training dataset into 2 parts which are train and validation. To do so we can take help of sklearn. The following code snippet would help us achieve it. …
Due to the high computation requirement we run deep learning models on powerful servers. However depending on the application sometime it is required to run these models on on customer devices like smartphones,cars etc. These devices have low computational power as well as low power requirement. Since they are at the end of data life cycle we refer it as edge computing or machine learning on the edge.
For traditional machine learning algorithms we mostly need a data center. For example google reverse image search. Whenever we use the search by image application similar images are returned as result. All these computations are performed in google’s data center. On the otherhand we also see human faces can be identified by the phone’s camera. This is a classic example on-device machine learning. However the model training is still carried out in the data centers. …