Y depends on
X with the non-linear way – modeling correlation between them using Simple Linear Regression isn’t the best idea. It will not represents our data in the best way it is possible. To see what I have on my mind take a look on an image under.
The line is not defining the correlation between x – y values…
In this case, we can use Polynomial Regression which is the next type of Linear Regression.
To recall – The equation of Simple Linear Regression was:\(y = b_0 + b_1*x_1\)
Polynomial Regression is the sum of the next powers of variable (
So as you can see – equation above is not linear, but when I’m talking about Linear Regression I mean
b0, b1, b2, bn are linear and we have the same variable in a different power. That is why polynomial regression is the next type of Linear Regression.
Dataset and business problem
The data can be found here. It’s a really simple set.
We have a position in the company here, level (which could be treated as working years) and salary.
The business problem is to find the salary for a level between (for example to something between 6 and 7), and we saw that this correlation is not linear.
The whole code is very simple so here is listing and under it, I will try to explain it.
# Polynomial Regression import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv('polynomial_regression.csv') X = dataset.iloc[:, 1:2].values y = dataset.iloc[:, 2].values # Fitting Polynomial Regression to the dataset from sklearn.preprocessing import PolynomialFeatures poly_reg = PolynomialFeatures(degree = 4) X_poly = poly_reg.fit_transform(X) from sklearn.linear_model import LinearRegression poly_reg.fit(X_poly, y) model = LinearRegression() model.fit(X_poly, y) # Visualising the Polynomial Regression results plt.scatter(X, y, color = 'red') plt.plot(X, model.predict(X_poly), color = 'blue') plt.title('Polynomial Regression Example') plt.xlabel('Position level') plt.ylabel('Salary') plt.show()
What is new here is adding new columns to
from sklearn.preprocessing import PolynomialFeatures poly_reg = PolynomialFeatures(degree = 4) X_poly = poly_reg.fit_transform(X)
After this operation we have a new array with a power of
The first column is b0 from our equation next column is x1 and the next columns are,
x1^2, x1^3 and
Next thing which is done here is using Simple Linear Regressor to predict values.
After training model using polynomial features with the 4th degree, we can notice that the line representing our model fits much better to the data than a straight line.
Polynomial regression is a trick done to use Linear Regression to fit non-linear data.
To download code and the dataset you can check my GitHub page: