Scikit-Learn GridSearch custom scoring function -
i need perform kernel pca on dataset of dimension (5000, 26421) lower dimension representation. choose number of components (say k) parameter, performing reduction of data , reconstruction original space , getting mean square error of reconstructed , original data different values of k.
i came across sklearn's gridsearch functionality , want use above parameter estimation. since there no score function kernel pca, have implemented custom scoring function , passing gridsearch.
from sklearn.decomposition.kernel_pca import kernelpca sklearn.model_selection import gridsearchcv import numpy np import math def scorer(clf, x): y1 = clf.inverse_transform(x) error = math.sqrt(np.mean((x - y1)**2)) return error param_grid = [ {'degree': [1, 10], 'kernel': ['poly'], 'n_components': [100, 400, 100]}, {'gamma': [0.001, 0.0001], 'kernel': ['rbf'], 'n_components': [100, 400, 100]}, ] kpca = kernelpca(fit_inverse_transform=true, n_jobs=30) clf = gridsearchcv(estimator=kpca, param_grid=param_grid, scoring=scorer) clf.fit(x)
however, results in below error:
/usr/lib64/python2.7/site-packages/sklearn/metrics/pairwise.py in check_pairwise_arrays(x=array([[ 2., 2., 1., ..., 0., 0., 0.], ...., 0., 1., ..., 0., 0., 0.]], dtype=float32), y=array([[-0.05904257, -0.02796719, 0.00919842, .... 0.00148251, -0.00311711]], dtype=float32), precomp uted=false, dtype=<type 'numpy.float32'>) 117 "for %d indexed." % 118 (x.shape[0], x.shape[1], y.shape[0])) 119 elif x.shape[1] != y.shape[1]: 120 raise valueerror("incompatible dimension x , y matrices: " 121 "x.shape[1] == %d while y.shape[1] == %d" % ( --> 122 x.shape[1], y.shape[1])) x.shape = (1667, 26421) y.shape = (112, 100) 123 124 return x, y 125 126 valueerror: incompatible dimension x , y matrices: x.shape[1] == 26421 while y.shape[1] == 100
can point out doing wrong?
the syntax of scoring function incorrect. need pass predicted
, truth
values classifiers. how declare custom scoring function :
def my_scorer(y_true, y_predicted): error = math.sqrt(np.mean((y_true - y_predicted)**2)) return error
then can use make_scorer function in sklearn pass gridsearch.be sure set greater_is_better
attribute accordingly:
whether score_func score function (default), meaning high good, or loss function, meaning low good. in latter case, scorer object sign-flip outcome of score_func.
i assuming calculating error, attribute should set false
, since lesser error, better:
from sklearn.metrics import make_scorer my_func = make_score(my_scorer,greater_is_better=false)
then pass gridsearch :
gridsearchcv(estimator=my_clf, param_grid=param_grid, scoring=my_func)
where my_clf
classifier.
one more thing, don't think gridsearchcv
looking for. accepts data in form of train , test splits. here want transform input data. need use pipeline in sklearn. @ example mentioned here of combining pca , gridsearchcv.
Comments
Post a Comment