Scikit-Learn GridSearch custom scoring function -


i need perform kernel pca on dataset of dimension (5000, 26421) lower dimension representation. choose number of components (say k) parameter, performing reduction of data , reconstruction original space , getting mean square error of reconstructed , original data different values of k.

i came across sklearn's gridsearch functionality , want use above parameter estimation. since there no score function kernel pca, have implemented custom scoring function , passing gridsearch.

from sklearn.decomposition.kernel_pca import kernelpca sklearn.model_selection import gridsearchcv import numpy np import math  def scorer(clf, x):     y1 = clf.inverse_transform(x)     error = math.sqrt(np.mean((x - y1)**2))     return error  param_grid = [     {'degree': [1, 10], 'kernel': ['poly'], 'n_components': [100, 400, 100]},     {'gamma': [0.001, 0.0001], 'kernel': ['rbf'], 'n_components': [100, 400, 100]}, ]  kpca = kernelpca(fit_inverse_transform=true, n_jobs=30) clf = gridsearchcv(estimator=kpca, param_grid=param_grid, scoring=scorer) clf.fit(x) 

however, results in below error:

/usr/lib64/python2.7/site-packages/sklearn/metrics/pairwise.py in check_pairwise_arrays(x=array([[ 2.,  2.,  1., ...,  0.,  0.,  0.],     ....,  0.,  1., ...,  0.,  0.,  0.]], dtype=float32), y=array([[-0.05904257, -0.02796719,  0.00919842, ....        0.00148251, -0.00311711]], dtype=float32), precomp uted=false, dtype=<type 'numpy.float32'>)     117                              "for %d indexed." %     118                              (x.shape[0], x.shape[1], y.shape[0]))     119     elif x.shape[1] != y.shape[1]:     120         raise valueerror("incompatible dimension x , y matrices: "     121                          "x.shape[1] == %d while y.shape[1] == %d" % ( --> 122                              x.shape[1], y.shape[1]))         x.shape = (1667, 26421)         y.shape = (112, 100)     123      124     return x, y     125      126   valueerror: incompatible dimension x , y matrices: x.shape[1] == 26421 while y.shape[1] == 100 

can point out doing wrong?

the syntax of scoring function incorrect. need pass predicted , truth values classifiers. how declare custom scoring function :

def my_scorer(y_true, y_predicted):     error = math.sqrt(np.mean((y_true - y_predicted)**2))     return error 

then can use make_scorer function in sklearn pass gridsearch.be sure set greater_is_better attribute accordingly:

whether score_func score function (default), meaning high good, or loss function, meaning low good. in latter case, scorer object sign-flip outcome of score_func.

i assuming calculating error, attribute should set false, since lesser error, better:

from sklearn.metrics import make_scorer my_func = make_score(my_scorer,greater_is_better=false) 

then pass gridsearch :

gridsearchcv(estimator=my_clf, param_grid=param_grid, scoring=my_func) 

where my_clf classifier.

one more thing, don't think gridsearchcv looking for. accepts data in form of train , test splits. here want transform input data. need use pipeline in sklearn. @ example mentioned here of combining pca , gridsearchcv.


Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -