By definition, Machine Learning is the field of AI which allows us to program machines to learn and improve from existing data and improve themselves at performing certain tasks.
It’s a lot like human learning – we learn and get better.
By observing my niece, it struck me how humans are also a big complex intelligent system and at core, we use the same learning to improve at each task. You teach a toddler a simple task such as holding a spoon and after trying again and again the toddler is able to hold a spoon or a fork.
We perform the task again and again and observe the outcomes and train ourselves to perform better.
In the same way, in machine learning, we use data to train our ‘models’. Models are nothing but mathematical equations which solve a certain problem.
The Machine Learning algorithms can be broadly categorized into following parts:
Supervised learning algorithms try to model relationships and dependencies between the target prediction output and the input features so that we can predict the output values for new data based on the relationships which it learned from the previous datasets. In short, we train the models against a set of training data. The better the training data, the better the model.
They are used for pattern recognition and descriptive modeling. There are no output categories or labels here based on which the algorithm can try to model relationships. These algorithms try to use techniques on the input data to mine for rules, detect patterns and summarise and group the data points which help in deriving meaningful insights and describe the data better to the users.
Reinforcement learning algorithm (called the agent) continuously learns from the environment in an iterative fashion. In the process, the agent learns from its experiences of the environment until it explores the full range of possible states.
One of the simplest examples of machine learning is- Linear regression. It falls under supervised learning and is used in the predictive analysis. It tries to predict the value of a dependent variable from an independent one when the relationship between them is linear.
Statistically, it can be written as:
y = mx + c
It is the equation of a line in a plane.
To understand the concept of linear regression we will try to predict the value of bitcoin prices based on the bitcoin prices in 2017.
We would use sklearn and pandas to perform linear regression and would us matplotlib to plot the results.
Let’s load the bitcoin dataset using pandas-
input_dataframe = pd.read_csv('bitcoin_dataset.csv')Y = input_dataframe["btc_market_price"].apply(lambda x: int(x)).reshape(-1, 1) X = input_dataframe["Date"].apply(lambda x: parse(x).toordinal()).reshape(-1, 1)
We reshape our array from a series of data to a ndarray because scikit requires that a one-dimensional array of output variables (y) be shaped as a two-dimensional array with one column.
We would then split our data into test and training data. This would mean that we take some part of the data for training and some part of the data for testing. Let’s split the data into 40:60 ratio of test to train data.
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.4, shuffle=True)
Next we will use the Linear Regression model from the
linear_regression = linear_model.LinearRegression() linear_regression.fit(x_train, y_train) btc_predict = linear_regression.predict(x_test)
Each supervised learning models has two methods – fit and predict. ‘fit’ means to train the model against sample data and ‘predict’ means to predict the output against a set of input. Right now we are predicting the values of the test data.
We would now plot these using matplotlib-
plt.scatter(x_test, y_test, color='black') plt.plot(x_test, btc_predict, color='blue', linewidth=3) plt.suptitle('Bitcoin prediction using Linear Regression', fontsize=14, fontweight='bold') plt.xlabel("Date") plt.ylabel("Price") x_ticks, x_labels = plt.xticks() x_labels = map(lambda x: datetime.date.fromordinal(int(x)).strftime('%Y-%m-%d'), x_ticks) plt.xticks(x_ticks, x_labels, rotation=40) plt.subplots_adjust(bottom=0.22) # so that we can see the label properly plt.show()
This would create a graph of the actual values and results and our predicted values using linear regression model.
Here’s the graph-
I hope this blog was helpful. Thank you for reading it. If you have any queries, please feel free to drop a comment.