# Polynomial regression

### Introduction

If the Y depends on X with the non-linear way – modeling correlation between them using Simple Linear Regression isn’t the best idea. It will not represents our data in the best way it is possible. To see what I have on my mind take a look on an image under.

The line is not defining the correlation between x – y values…

In this case, we can use Polynomial Regression which is the next type of Linear Regression.

To recall – The equation of Simple Linear Regression was:

$$y = b_0 + b_1*x_1$$

Polynomial Regression is the sum of the next powers of variable (x1).

$$y = b_0 + b_1*x_1+b_2*x_1^2+b_3*x_1^2+…+b_n*x_1^n$$

So as you can see – equation above is not linear, but when I’m talking about Linear Regression I mean b0, b1, b2, bn are linear and we have the same variable in a different power. That is why polynomial regression is the next type of Linear Regression.

The data can be found here. It’s a really simple set.

We have a position in the company here, level (which could be treated as working years) and salary.

The business problem is to find the salary for a level between (for example to something between 6 and 7), and we saw that this correlation is not linear.

### Code

The whole code is very simple so here is listing and under it, I will try to explain it.

# Polynomial Regression

import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values

# Fitting Polynomial Regression to the dataset
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)

from sklearn.linear_model import LinearRegression
poly_reg.fit(X_poly, y)
model = LinearRegression()
model.fit(X_poly, y)

# Visualising the Polynomial Regression results
plt.scatter(X, y, color = 'red')
plt.plot(X, model.predict(X_poly), color = 'blue')
plt.title('Polynomial Regression Example')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()


What is new here is adding new columns to X (actually X_poly):

from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)

After this operation we have a new array with a power of X values:

The first column is b0 from our equation next column is x1 and the next columns are, x1^2, x1^3 and x1^4.

Next thing which is done here is using Simple Linear Regressor to predict values.

After training model using polynomial features with the 4th degree, we can notice that the line representing our model fits much better to the data than a straight line.

### Summary

Polynomial regression is a trick done to use Linear Regression to fit non-linear data.

To download code and the dataset you can check my GitHub page:

https://github.com/kamilpavlick/Learning-Machine-Learning/tree/master/Part%202%20-%20Regression