Introduction
In this article I would like to present performance difference between object-oriented iterators (runtime polymorphism) and generic iterators (compile-time polymorphism).
All standard library iterators (stl) are implemented according to generic programming principles. Because in generic programming everything must be known during compilation, this technique has some limitations. It increases necessity of code recompilation and prevents from providing independent library.
For example you cannot have library (in terms of shared library like "so" or "dll") which has exported standard algorithm like accumulate, you could do it only for known in advance iterator types (value types and containers).
Fortunately, we can probably "do better" with object-oriented and type erasure approach. Thanks to runtime polymorphism it's possible to implement such iterator, which depends only on it's value type but not on underplaying container type.
Thomas Becker wrote great article about such object-oriented iterator with type erasure (http://www.artima.com/cppsource/type_erasure.html). This article is based on his implementation called "any_iterator".
Object-oriented iterator is good step forward, but still depends on it's value type, so it makes a difference if function takes any_iterator<int> or any_iterator<double>. Because of that, if we want to expose algorithm like accumulate in shared library, we would need to provide declarations for all possible value types.Performance issues
Lack of flexibility is not the only disadvantage of object-oriented iterators. Another problem is performance cost of runtime polymorphism. How significant it might be ? Two times slower, maybe five times slower?
Well, if we talk about performance, intuition will be probably wrong, so let's measure it!
I've prepared a c++11 program which is testing performance of accumulate algorithm on vector<int> and list<int> containers, using standard generic iterators and runtime polymorphic any_iterator.
Surprisingly performance cost of object oriented approach is extraordinary, for list it is 40 and for vector it is 70 times slower!
Such extraordinary performance degradation shows what is the real cost of runtime polymorphism. It's not only one virtual method call, which after all shouldn't be so costly. The main cost is in preventing compiler from performing any optimisation (inlining, vectorisation, etc...).Test results
Compilation flags
clang++ -c -pipe -O2 -std=c++11 -Wall -W -fPIE
Results
Filling list, elapsed: 7439ms
Filling vector, elapsed: 1936ms
Accumulate list:144118291114003667, elapsed: 1150ms
Accumulate vector:144118291114003667, elapsed: 176ms
Accumulate list any_iterator:144118291114003667, elapsed: 44447ms
Accumulate vector any_iterator:144118291114003667, elapsed: 12226ms
Accumulate vector backwards:144118291114003667, elapsed: 217ms
Accumulate vector backwards any_iterator:144118291114003667, e..: 12773ms
Source code
#include <iostream> #include <list> #include <vector> #include <algorithm> #include <chrono> #include <sstream> #include <any_iterator/any_iterator.hpp> using namespace std; typedef IteratorTypeErasure::any_iterator< int const, // Value boost::bidirectional_traversal_tag, int const // Reference > any_number_iterator; const size_t MAX_NUMBERS = 1<<27; int main() { list<int> l; vector<int> v; vector<pair<string, chrono::nanoseconds>> results; // fill list { auto start = chrono::high_resolution_clock::now(); srand(0);generate_n(back_inserter(l), MAX_NUMBERS, rand); auto end = chrono::high_resolution_clock::now(); results.push_back(make_pair("Filling list", end-start)); } // fill vector { auto start = chrono::high_resolution_clock::now(); srand(0);generate_n(back_inserter(v), MAX_NUMBERS, rand); auto end = chrono::high_resolution_clock::now(); results.push_back(make_pair("Filling vector", end-start)); } // accumulate list { auto start = chrono::high_resolution_clock::now(); auto result = accumulate (l.begin(), l.end(), 0L); auto end = chrono::high_resolution_clock::now(); stringstream ss;ss<<"Accumulate list:"<<result; results.push_back(make_pair(ss.str(), end-start)); } // accumulate vector { auto start = chrono::high_resolution_clock::now(); auto result = accumulate (v.begin(), v.end(), 0L); auto end = chrono::high_resolution_clock::now(); stringstream ss;ss<<"Accumulate vector:"<<result; results.push_back(make_pair(ss.str(), end-start)); } // accumulate list any_iterator { auto start = chrono::high_resolution_clock::now(); auto result = accumulate (any_number_iterator(l.begin()), any_number_iterator(l.end()), 0L); auto end = chrono::high_resolution_clock::now(); stringstream ss;ss<<"Accumulate list any_iterator:"<<result; results.push_back(make_pair(ss.str(), end-start)); } // accumulate vector any_iterator { auto start = chrono::high_resolution_clock::now(); auto result = accumulate (any_number_iterator(v.begin()), any_number_iterator(v.end()), 0L); auto end = chrono::high_resolution_clock::now(); stringstream ss;ss<<"Accumulate vector any_iterator:"<<result; results.push_back(make_pair(ss.str(), end-start)); } // accumulate vector backwards { auto start = chrono::high_resolution_clock::now(); auto result = accumulate (v.rbegin(), v.rend(), 0L); auto end = chrono::high_resolution_clock::now(); stringstream ss;ss<<"Accumulate vector backwards:"<<result; results.push_back(make_pair(ss.str(), end-start)); } // accumulate vector backwards any_iterator { auto start = chrono::high_resolution_clock::now(); auto result = accumulate (any_number_iterator(v.rbegin()), any_number_iterator(v.rend()), 0L); auto end = chrono::high_resolution_clock::now(); stringstream ss;ss<<"Accumulate vector backwards any_iterator:"<<result; results.push_back(make_pair(ss.str(), end-start)); } for (const auto &i:results) cout << i.first<<", elapsed: "<<chrono::duration_cast<std::chrono::milliseconds>(i.second).count()<<"ms"<<endl; return 0; }
http://www.artima.com/cppsource/type_erasure.html
Brak komentarzy:
Prześlij komentarz