python - Eigen + MKL or OpenBLAS slower than Numpy/Scipy + OpenBLAS -
i'm starting c++ atm , want work matrices , speed things in general. worked python+numpy+openblas before. thought c++ + eigen + mkl might faster or @ least not slower.
my c++ code:
#define eigen_use_mkl_all #include <iostream> #include <eigen/dense> #include <eigen/lu> #include <chrono> using namespace std; using namespace eigen; int main() { int n = eigen::nbthreads( ); cout << "#threads: " << n << endl; uint16_t size = 4000; matrixxd = matrixxd::random(size,size); clock_t start = clock (); partialpivlu<matrixxd> lu = partialpivlu<matrixxd>(a); float timeelapsed = double( clock() - start ) / clocks_per_sec; cout << "elasped time " << timeelapsed << " seconds." << endl ; }
my python code:
import numpy np time import time scipy import linalg la size = 4000 = np.random.random((size, size)) t = time() lu, piv = la.lu_factor(a) print(time()-t)
my timings:
c++ 2.4s python 1.2s
why c++ slower python?
i compiling c++ using:
g++ main.cpp -o main -lopenblas -o3 -fopenmp -dmkl_lp64 -i/usr/local/include/mkl/include
mkl definiely working: if disable running time around 13s.
i tried c++ + openblas gives me around 2.4s well.
any ideas why c++ , eigen slower numpy/scipy?
the timing wrong. that's typical symptom of wall clock time vs. cpu time. when use system_clock
<chrono>
header “magically” becomes faster.
#define eigen_use_mkl_all #include <iostream> #include <eigen/dense> #include <eigen/lu> #include <chrono> int main() { int const n = eigen::nbthreads( ); std::cout << "#threads: " << n << std::endl; int const size = 4000; eigen::matrixxd = eigen::matrixxd::random(size,size); auto start = std::chrono::system_clock::now(); eigen::partialpivlu<eigen::matrixxd> lu(a); auto stop = std::chrono::system_clock::now(); std::cout << "elasped time " << std::chrono::duration<double>{stop - start}.count() << " seconds." << std::endl; }
i compile with
icc -o3 -mkl -std=c++11 -dndebug -i/usr/include/eigen3/ test.cpp
and output
#threads: 1 elasped time 0.295782 seconds.
your python version reports 0.399146080017
on machine.
alternatively, obtain comparable timing use time.clock()
(cpu time) in python instead of time.time()
(wall clock time).
Comments
Post a Comment