Regression
regression.linear
LinearRegression
Defines a Linear Regression model class in Python with methods for setting parameters, fitting the model to data, predicting values, evaluating the model, and retrieving the coefficients.
Linear regression is a supervised learning algorithm that is used to predict the value of a continuous variable (y) based on one or more predictor variables (x). The goal of linear regression is to find the best-fitting straight line through the points. The best-fitting line is called a regression line. The regression line is defined by the equation y = mx + b, where m is the slope of the line and b is the y-intercept.
The linear regression model can be represented as y = X * w + b, where X is the input data, y is the target variable, w is the model coefficients, and b is the bias term.
evaluate(X, y)
The evaluate function calculates the mean squared error between the predicted values and the actual values.
Parameters
X : np.ndarray
The parameter `X` represents the input data or features. It is a matrix or array-like object
with shape (n_samples, n_features), where n_samples is the number of samples or observations
and n_features is the number of features or variables.
y : np.ndarray
The parameter `y` represents the true values of the target variable. It is an array-like
object containing the actual values that we are trying to predict for the corresponding
samples in the input data `X`.
Returns
mse : float
The mean squared error (mse) between the predicted and true values.
fit(X, y, X_val=None, y_val=None)
The fit
function trains a regression model using mini-batch gradient descent with early
stopping based on validation loss.
Parameters
X : np.ndarray
The parameter `X` is a numpy array that represents the input features for training the
model. It has shape `(num_samples, num_features)`, where `num_samples` is the number of
training samples and `num_features` is the number of features for each sample
y : np.ndarray
The parameter `y` represents the target variable or the dependent variable in the
supervised learning problem. It is a numpy array that contains the true values of the
target variable for the corresponding samples in the input data `X`. The shape of `y` is
`(num_samples,)`, where `num_samples` is the number of training samples
X_val : np.ndarray
X_val is the validation set features. It is an optional parameter that allows you to
evaluate the model's performance on a separate validation set during training. It should
be a numpy array of shape (num_samples, num_features), where num_samples is the number of
samples in the validation set and num_features
y_val : np.ndarray
`y_val` is the validation set labels. It is an array containing the true values of the
target variable for the validation set
get_coeff()
The function get_coeff
returns the coefficients of a trained model as a flattened numpy array.
Returns
coefficients : np.ndarray
The coefficients of the model as a flattened numpy array.
get_params()
The function get_params
returns a dictionary containing the values of various parameters.
Returns
params : dict
A dictionary containing the values of the learning rate, number of epochs, batch size,
regularization strength, and tolerance.
predict(X)
The predict
function takes an input array X
, adds a bias term to it, performs matrix
multiplication with the model coefficients, and returns the predictions as a flattened numpy array.
Parameters
X : np.ndarray
The parameter X is an input array of shape (num_samples, num_features), where num_samples
is the number of samples and num_features is the number of features for each sample
Returns
predictions : np.ndarray
A numpy array of predictions.
score(X, y)
The function calculates the R-squared score between the predicted and true values of a regression model.
Parameters
X : np.ndarray
The parameter X is an ndarray (numpy array) that represents the input data for which we
want to calculate the score. It could be a matrix or a vector depending on the specific
problem. For example, if we are trying to predict the price of a house based on its size
and number of bedrooms, then X will be a matrix with shape (num_samples, num_features),
where num_samples is the number of samples or observations and num_features is the number
of features or variables
y : np.ndarray
The parameter `y` represents the true labels or target values of the dataset. It is an
array-like object containing the actual values that we are trying to predict for the
corresponding samples in the input data `X`. For example, if we are trying to predict the
price of a house based on its size and number of bedrooms, then y will be a vector of
shape (num_samples,), where num_samples is the number of samples or observations in the
dataset.
Returns
r2 : float
The R-squared score, which is a measure of how well the predicted values (y_pred) match
the true values (y_true) in the given dataset.
set_params(params)
The function set_params
sets the parameters for a linear regression model.
Parameters
params : dict
A dictionary containing the values of the learning rate, number of epochs, batch size,
regularization strength, tolerance, and patience.
regression.lasso
LassoRegression
Bases: LinearRegression
Defines a Lasso Regression model using Mini-batch Gradient Descent with early stopping based on validation loss.
Lasso Regression is a linear regression model with L1 regularization. It is used to prevent overfitting and perform feature selection. The L1 regularization term is the sum of the absolute values of the coefficients. It is used to shrink the coefficients of the model to zero, thereby reducing the number of features used in the model. This is useful when the dataset has a large number of features and only a few of them are important for the model.
regression.ridge
RidgeRegression
Bases: LinearRegression
Defines a Ridge Regression model using Mini-batch Gradient Descent with early stopping based on validation loss.
Ridge Regression is a linear regression model with L2 regularization. It is used to prevent overfitting and perform feature selection. The L2 regularization term is the sum of the squares of the coefficients. It is used to shrink the coefficients of the model to zero, thereby reducing the number of features used in the model. This is useful when the dataset has a large number of features and only a few of them are important for the model.
regression.polynomial
PolynomialRegression
Bases: LinearRegression
Defines a PolynomialRegression class that extends the LinearRegression class and adds functionality for polynomial regression.
Polynomial regression is a form of linear regression in which the relationship between the independent variable x and the dependent variable y is modeled as an nth degree polynomial. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y | x), and has been used to describe nonlinear phenomena such as the growth rate of tissues, the distribution of carbon isotopes in lake sediments, and the progression of disease epidemics.