Polynomial regression

Introduction

If the Y depends on X with the non-linear way – modeling correlation between them using Simple Linear Regression isn’t the best idea. It will not represents our data in the best way it is possible. To see what I have on my mind take a look on an image under.

The line is not defining the correlation between x – y values…

In this case, we can use Polynomial Regression which is the next type of Linear Regression.

To recall – The equation of Simple Linear Regression was:

\(y = b_0 + b_1*x_1\)

Polynomial Regression is the sum of the next powers of variable (x1).

\(y = b_0 + b_1*x_1+b_2*x_1^2+b_3*x_1^2+…+b_n*x_1^n\)

So as you can see – equation above is not linear, but when I’m talking about Linear Regression I mean b0, b1, b2, bn are linear and we have the same variable in a different power. That is why polynomial regression is the next type of Linear Regression.

Dataset and business problem

The data can be found here. It’s a really simple set.

We have a position in the company here, level (which could be treated as working years) and salary.

The business problem is to find the salary for a level between (for example to something between 6 and 7), and we saw that this correlation is not linear.

Code

The whole code is very simple so here is listing and under it, I will try to explain it.

# Polynomial Regression

import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('polynomial_regression.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values


# Fitting Polynomial Regression to the dataset
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)

from sklearn.linear_model import LinearRegression
poly_reg.fit(X_poly, y)
model = LinearRegression()
model.fit(X_poly, y)


# Visualising the Polynomial Regression results
plt.scatter(X, y, color = 'red')
plt.plot(X, model.predict(X_poly), color = 'blue')
plt.title('Polynomial Regression Example')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

What is new here is adding new columns to X (actually X_poly):

from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)

After this operation we have a new array with a power of X values:

The first column is b0 from our equation next column is x1 and the next columns are, x1^2, x1^3 and x1^4.

Next thing which is done here is using Simple Linear Regressor to predict values.

After training model using polynomial features with the 4th degree, we can notice that the line representing our model fits much better to the data than a straight line.

Summary

Polynomial regression is a trick done to use Linear Regression to fit non-linear data.

To download code and the dataset you can check my GitHub page:

https://github.com/kamilpavlick/Learning-Machine-Learning/tree/master/Part%202%20-%20Regression

Join the Conversation

No comments

  1. Лучшая компания по обслуживанию компьютеров в Москве и Московской области:

    Компания окажет для Вас услуги по настройке и ремонту компьютера с выездом на дом. мастера сервисного центра установят операционную систему и программы, помогут подключить wi-fi и настроить оборудование.
    Удаление вирусов с Вашего компьютера, установка современного антивирусного программного обеспечения. Чистка компьютера от мусора, настройка работоспособности, проклейка термопасты.
    Компьютерный мастер приедет в течение часа и сделает всё качественно. Предоставляется гарантия на все виды работ и услуг.
    Также, производится ремонт другого оборудования.

    Информация на сайте –
    резюме ремонт компьютеров,
    нужен ремонт компьютера,
    компьютерные мошенники по ремонту компьютеров,
    тривери ремонт компьютеров,
    ремонт компьютеров цены.

Leave a comment

Your email address will not be published. Required fields are marked *