python - Eigen + MKL or OpenBLAS slower than Numpy/Scipy + OpenBLAS -


i'm starting c++ atm , want work matrices , speed things in general. worked python+numpy+openblas before. thought c++ + eigen + mkl might faster or @ least not slower.

my c++ code:

#define eigen_use_mkl_all #include <iostream> #include <eigen/dense> #include <eigen/lu> #include <chrono>  using namespace std; using namespace eigen;  int main() {     int n = eigen::nbthreads( );     cout << "#threads: " << n << endl;      uint16_t size = 4000;     matrixxd = matrixxd::random(size,size);      clock_t start = clock ();     partialpivlu<matrixxd> lu = partialpivlu<matrixxd>(a);      float timeelapsed = double( clock() - start ) / clocks_per_sec;      cout << "elasped time " << timeelapsed << " seconds." << endl ; } 

my python code:

import numpy np time import time scipy import linalg la  size = 4000  = np.random.random((size, size))  t = time() lu, piv = la.lu_factor(a) print(time()-t) 

my timings:

c++     2.4s python  1.2s 

why c++ slower python?

i compiling c++ using:

g++ main.cpp -o main -lopenblas -o3 -fopenmp  -dmkl_lp64 -i/usr/local/include/mkl/include 

mkl definiely working: if disable running time around 13s.

i tried c++ + openblas gives me around 2.4s well.

any ideas why c++ , eigen slower numpy/scipy?

the timing wrong. that's typical symptom of wall clock time vs. cpu time. when use system_clock <chrono> header “magically” becomes faster.

#define eigen_use_mkl_all #include <iostream> #include <eigen/dense> #include <eigen/lu> #include <chrono>  int main() {     int const n = eigen::nbthreads( );     std::cout << "#threads: " << n << std::endl;      int const size = 4000;     eigen::matrixxd = eigen::matrixxd::random(size,size);      auto start = std::chrono::system_clock::now();      eigen::partialpivlu<eigen::matrixxd> lu(a);      auto stop = std::chrono::system_clock::now();      std::cout << "elasped time "               << std::chrono::duration<double>{stop - start}.count()               << " seconds." << std::endl; } 

i compile with

icc -o3 -mkl -std=c++11 -dndebug -i/usr/include/eigen3/ test.cpp 

and output

#threads: 1 elasped time 0.295782 seconds. 

your python version reports 0.399146080017 on machine.


alternatively, obtain comparable timing use time.clock() (cpu time) in python instead of time.time() (wall clock time).


Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -