Regression

regression.linear

LinearRegression

Defines a Linear Regression model class in Python with methods for setting parameters, fitting the model to data, predicting values, evaluating the model, and retrieving the coefficients.

Linear regression is a supervised learning algorithm that is used to predict the value of a continuous variable (y) based on one or more predictor variables (x). The goal of linear regression is to find the best-fitting straight line through the points. The best-fitting line is called a regression line. The regression line is defined by the equation y = mx + b, where m is the slope of the line and b is the y-intercept.

The linear regression model can be represented as y = X * w + b, where X is the input data, y is the target variable, w is the model coefficients, and b is the bias term.

evaluate(X, y)

The evaluate function calculates the mean squared error between the predicted values and the actual values.

Parameters
X : np.ndarray
    The parameter `X` represents the input data or features. It is a matrix or array-like object
    with shape (n_samples, n_features), where n_samples is the number of samples or observations
    and n_features is the number of features or variables.
y : np.ndarray
    The parameter `y` represents the true values of the target variable. It is an array-like
    object containing the actual values that we are trying to predict for the corresponding
    samples in the input data `X`.
Returns
mse : float
    The mean squared error (mse) between the predicted and true values.

fit(X, y, X_val=None, y_val=None)

The fit function trains a regression model using mini-batch gradient descent with early stopping based on validation loss.

Parameters
X : np.ndarray
    The parameter `X` is a numpy array that represents the input features for training the
    model. It has shape `(num_samples, num_features)`, where `num_samples` is the number of
    training samples and `num_features` is the number of features for each sample

y : np.ndarray
    The parameter `y` represents the target variable or the dependent variable in the
    supervised learning problem. It is a numpy array that contains the true values of the
    target variable for the corresponding samples in the input data `X`. The shape of `y` is
    `(num_samples,)`, where `num_samples` is the number of training samples

X_val : np.ndarray
    X_val is the validation set features. It is an optional parameter that allows you to
    evaluate the model's performance on a separate validation set during training. It should
    be a numpy array of shape (num_samples, num_features), where num_samples is the number of
    samples in the validation set and num_features

y_val : np.ndarray
    `y_val` is the validation set labels. It is an array containing the true values of the
    target variable for the validation set

get_coeff()

The function get_coeff returns the coefficients of a trained model as a flattened numpy array.

Returns
coefficients : np.ndarray
    The coefficients of the model as a flattened numpy array.

get_params()

The function get_params returns a dictionary containing the values of various parameters.

Returns
params : dict
    A dictionary containing the values of the learning rate, number of epochs, batch size,
    regularization strength, and tolerance.

predict(X)

The predict function takes an input array X, adds a bias term to it, performs matrix multiplication with the model coefficients, and returns the predictions as a flattened numpy array.

Parameters
X : np.ndarray
    The parameter X is an input array of shape (num_samples, num_features), where num_samples 
    is the number of samples and num_features is the number of features for each sample
Returns
predictions : np.ndarray
    A numpy array of predictions.

score(X, y)

The function calculates the R-squared score between the predicted and true values of a regression model.

Parameters
X : np.ndarray
    The parameter X is an ndarray (numpy array) that represents the input data for which we
    want to calculate the score. It could be a matrix or a vector depending on the specific
    problem. For example, if we are trying to predict the price of a house based on its size
    and number of bedrooms, then X will be a matrix with shape (num_samples, num_features),
    where num_samples is the number of samples or observations and num_features is the number
    of features or variables
y : np.ndarray
    The parameter `y` represents the true labels or target values of the dataset. It is an
    array-like object containing the actual values that we are trying to predict for the
    corresponding samples in the input data `X`. For example, if we are trying to predict the
    price of a house based on its size and number of bedrooms, then y will be a vector of
    shape (num_samples,), where num_samples is the number of samples or observations in the
    dataset.
Returns
r2 : float
    The R-squared score, which is a measure of how well the predicted values (y_pred) match
    the true values (y_true) in the given dataset.

set_params(params)

The function set_params sets the parameters for a linear regression model.

Parameters
params : dict
    A dictionary containing the values of the learning rate, number of epochs, batch size,
    regularization strength, tolerance, and patience.

regression.lasso

LassoRegression

Bases: LinearRegression

Defines a Lasso Regression model using Mini-batch Gradient Descent with early stopping based on validation loss.

Lasso Regression is a linear regression model with L1 regularization. It is used to prevent overfitting and perform feature selection. The L1 regularization term is the sum of the absolute values of the coefficients. It is used to shrink the coefficients of the model to zero, thereby reducing the number of features used in the model. This is useful when the dataset has a large number of features and only a few of them are important for the model.


regression.ridge

RidgeRegression

Bases: LinearRegression

Defines a Ridge Regression model using Mini-batch Gradient Descent with early stopping based on validation loss.

Ridge Regression is a linear regression model with L2 regularization. It is used to prevent overfitting and perform feature selection. The L2 regularization term is the sum of the squares of the coefficients. It is used to shrink the coefficients of the model to zero, thereby reducing the number of features used in the model. This is useful when the dataset has a large number of features and only a few of them are important for the model.


regression.polynomial

PolynomialRegression

Bases: LinearRegression

Defines a PolynomialRegression class that extends the LinearRegression class and adds functionality for polynomial regression.

Polynomial regression is a form of linear regression in which the relationship between the independent variable x and the dependent variable y is modeled as an nth degree polynomial. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y | x), and has been used to describe nonlinear phenomena such as the growth rate of tissues, the distribution of carbon isotopes in lake sediments, and the progression of disease epidemics.