[Kaggle] sklearn Linear Regression function

아나엘 2023. 9. 30. 16:08

LinearRegression(*, fit_intercept=True, copy_X=True, n_jobs=None, positive=False)

model.score() -> 회귀모델의 성능지표인 R^2(R-squared) score, 결정계수입니다. 상관계수를 제곱한 값으로 보면 되며 변수간 영향을 주는 정도 또는 인과관계를 정량화해서 나타낸 것입니다.

사이킷런의 회귀 모델 클래스들은 RegressorMixin 클래스를 상속합니다. 결정 계수 Coefficient of determination는 -1~1 사이의 값을 가지며 공식은 다음과 같습니다.

여기에서 y바는 타깃 데이터의 평균입니다. 이 공식을 직접 쓰면 다음과 같습니다

y_pred = lr.predict(X_test)
r2 = 1 - ((y_test - y_pred)**2).sum() / ((y_test - y_test.mean())**2).sum()
print(r2)

LinearRegression 클래스가 구한 모델 파라미터는 가중치와 절편이 coef_와 intercept_ 인스턴스 변수에 따로 저장되어 있습니다. 하나의 특성만 사용했기 때문에 구해진 가중치 배열의 원소가 하나입니다.

print(lr.coef_, lr.intercept_)

[0.41788087] 0.44967564199686194

Ordinary least squares Linear Regression.

LinearRegression fits a linear model with coefficients w = (w1, ..., wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

Parameters

fit_intercept: bool, default=True Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered). -> 모형에 상수항이 있는가. 상수항 없으면 0으로.
copy_X: bool, default=True If True, X will be copied; else, it may be overwritten.
n_jobs: int, default=None The number of jobs to use for the computation. This will only provide speedup in case of sufficiently large problems, that is if firstly n_targets > 1 and secondly X is sparse or if positive is set to True. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary <n_jobs> for more details. -> 빠르게 만들 때
positive: bool, default=False When set to True, forces the coefficients to be positive. This option is only supported for dense arrays.
Added in 0.24

Attributes

coef_: array of shape (n_features, ) or (n_targets, n_features) Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.
rank_: int Rank of matrix X. Only available when X is dense.
singular_: array of shape (min(X, y),) Singular values of X. Only available when X is dense.
intercept_: float or array of shape (n_targets,) Independent term in the linear model. Set to 0.0 if fit_intercept = False.
n_features_in_: int Number of features seen during fit.
Added in 0.24
feature_names_in_: ndarray of shape (n_features_in_,) Names of features seen during fit. Defined only when X has feature names that are all strings.
Added in 1.0

Notes

From the implementation point of view, this is just plain Ordinary Least Squares (scipy.linalg.lstsq) or Non Negative Least Squares (scipy.optimize.nnls) wrapped as a predictor object.

Examples

import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
# y = 1 * x_0 + 2 * x_1 + 3
y = np.dot(X, np.array([1, 2])) + 3
reg = LinearRegression().fit(X, y)
reg.score(X, y)
1.0
reg.coef_
array([1., 2.])
reg.intercept_
3.0...
reg.predict(np.array([[3, 5]]))
array([16.])

추후 목록링크정리

from sklearn.linear_model import LinearRegression

저작자표시

티스토리