python - Eigen + MKL or OpenBLAS slower than Numpy/Scipy + OpenBLAS -


i'm starting c++ atm , want work matrices , speed things in general. worked python+numpy+openblas before. thought c++ + eigen + mkl might faster or @ least not slower.

my c++ code:

#define eigen_use_mkl_all #include <iostream> #include <eigen/dense> #include <eigen/lu> #include <chrono>  using namespace std; using namespace eigen;  int main() {     int n = eigen::nbthreads( );     cout << "#threads: " << n << endl;      uint16_t size = 4000;     matrixxd = matrixxd::random(size,size);      clock_t start = clock ();     partialpivlu<matrixxd> lu = partialpivlu<matrixxd>(a);      float timeelapsed = double( clock() - start ) / clocks_per_sec;      cout << "elasped time " << timeelapsed << " seconds." << endl ; } 

my python code:

import numpy np time import time scipy import linalg la  size = 4000  = np.random.random((size, size))  t = time() lu, piv = la.lu_factor(a) print(time()-t) 

my timings:

c++     2.4s python  1.2s 

why c++ slower python?

i compiling c++ using:

g++ main.cpp -o main -lopenblas -o3 -fopenmp  -dmkl_lp64 -i/usr/local/include/mkl/include 

mkl definiely working: if disable running time around 13s.

i tried c++ + openblas gives me around 2.4s well.

any ideas why c++ , eigen slower numpy/scipy?

the timing wrong. that's typical symptom of wall clock time vs. cpu time. when use system_clock <chrono> header “magically” becomes faster.

#define eigen_use_mkl_all #include <iostream> #include <eigen/dense> #include <eigen/lu> #include <chrono>  int main() {     int const n = eigen::nbthreads( );     std::cout << "#threads: " << n << std::endl;      int const size = 4000;     eigen::matrixxd = eigen::matrixxd::random(size,size);      auto start = std::chrono::system_clock::now();      eigen::partialpivlu<eigen::matrixxd> lu(a);      auto stop = std::chrono::system_clock::now();      std::cout << "elasped time "               << std::chrono::duration<double>{stop - start}.count()               << " seconds." << std::endl; } 

i compile with

icc -o3 -mkl -std=c++11 -dndebug -i/usr/include/eigen3/ test.cpp 

and output

#threads: 1 elasped time 0.295782 seconds. 

your python version reports 0.399146080017 on machine.


alternatively, obtain comparable timing use time.clock() (cpu time) in python instead of time.time() (wall clock time).


Comments