środa, 18 lutego 2015

What is performance cost of object oriented iterator in c++


  1. Introduction

    In this article I would like to present performance difference between object-oriented iterators (runtime polymorphism) and generic iterators (compile-time polymorphism).
    All standard library iterators (stl) are implemented according to generic programming principles. Because in generic programming everything must be known during compilation, this technique has some limitations. It increases necessity of code recompilation and prevents from providing independent library.
    For example you cannot have library (in terms of shared library like "so" or "dll") which has exported standard algorithm like accumulate, you could do it only for known in advance iterator types (value types and containers).

    Fortunately, we can probably "do better" with object-oriented and type erasure approach. Thanks to runtime polymorphism it's possible to implement such iterator, which depends only on it's value type but not on underplaying container type.
    Thomas Becker wrote great article about such object-oriented iterator with type erasure (http://www.artima.com/cppsource/type_erasure.html). This article is based on his implementation called "any_iterator".
    Object-oriented iterator is good step forward, but still depends on it's value type, so it makes a difference if function takes any_iterator<int> or any_iterator<double>. Because of that, if we want to expose algorithm like accumulate in shared library, we would need to provide declarations for all possible value types.
  2. Performance issues

    Lack of flexibility is not the only disadvantage of object-oriented iterators. Another problem is performance cost of runtime polymorphism. How significant it might be ? Two times slower, maybe five times slower?

    Well, if we talk about performance, intuition will be probably wrong,  so let's measure it!
    I've prepared a c++11 program which is testing performance of accumulate algorithm on vector<int> and list<int> containers, using standard generic iterators and runtime polymorphic any_iterator.
    Surprisingly performance cost of object oriented approach is extraordinary, for list it is 40 and for vector it is 70 times slower!

    Such extraordinary performance degradation shows what is the real cost of runtime polymorphism. It's not only one virtual method call, which after all shouldn't be so costly. The main cost is in preventing compiler from performing any optimisation (inlining, vectorisation, etc...).
  3. Test results

  • Compilation flags

    clang++ -c -pipe -O2 -std=c++11 -Wall -W -fPIE
  • Results

    Filling list, elapsed: 7439ms
    Filling vector, elapsed: 1936ms
    Accumulate list:144118291114003667, elapsed: 1150ms
    Accumulate vector:144118291114003667, elapsed: 176ms
    Accumulate list any_iterator:144118291114003667, elapsed: 44447ms
    Accumulate vector any_iterator:144118291114003667, elapsed: 12226ms
    Accumulate vector backwards:144118291114003667, elapsed: 217ms
    Accumulate vector backwards any_iterator:144118291114003667, e..: 12773ms
  • Source code

    #include <iostream>
    #include <list>
    #include <vector>
    #include <algorithm>
    #include <chrono>
    #include <sstream>
    
    #include <any_iterator/any_iterator.hpp>
    
    using namespace std;
    
    typedef IteratorTypeErasure::any_iterator<
      int const, // Value
      boost::bidirectional_traversal_tag,
      int const // Reference
    >
    any_number_iterator;
    
    const size_t MAX_NUMBERS = 1<<27;
    
    int main()
    {
        list<int> l;
        vector<int> v;
        vector<pair<string, chrono::nanoseconds>> results;
    
        // fill list
        {
            auto start = chrono::high_resolution_clock::now();
            srand(0);generate_n(back_inserter(l), MAX_NUMBERS, rand);
            auto end = chrono::high_resolution_clock::now();
            results.push_back(make_pair("Filling list", end-start));
        }
    
        // fill vector
        {
            auto start = chrono::high_resolution_clock::now();
            srand(0);generate_n(back_inserter(v), MAX_NUMBERS, rand);
            auto end = chrono::high_resolution_clock::now();
            results.push_back(make_pair("Filling vector", end-start));
        }
    
        // accumulate list
        {
            auto start = chrono::high_resolution_clock::now();
            auto result = accumulate (l.begin(), l.end(), 0L);
            auto end = chrono::high_resolution_clock::now();
            stringstream ss;ss<<"Accumulate list:"<<result;
            results.push_back(make_pair(ss.str(), end-start));
        }
    
        // accumulate vector
        {
            auto start = chrono::high_resolution_clock::now();
            auto result = accumulate (v.begin(), v.end(), 0L);
            auto end = chrono::high_resolution_clock::now();
            stringstream ss;ss<<"Accumulate vector:"<<result;
            results.push_back(make_pair(ss.str(), end-start));
        }
    
        // accumulate list any_iterator
        {
            auto start = chrono::high_resolution_clock::now();
            auto result = accumulate (any_number_iterator(l.begin()), any_number_iterator(l.end()), 0L);
            auto end = chrono::high_resolution_clock::now();
            stringstream ss;ss<<"Accumulate list any_iterator:"<<result;
            results.push_back(make_pair(ss.str(), end-start));
        }
    
        // accumulate vector any_iterator
        {
            auto start = chrono::high_resolution_clock::now();
            auto result = accumulate (any_number_iterator(v.begin()), any_number_iterator(v.end()), 0L);
            auto end = chrono::high_resolution_clock::now();
            stringstream ss;ss<<"Accumulate vector any_iterator:"<<result;
            results.push_back(make_pair(ss.str(), end-start));
        }
    
        // accumulate vector backwards
        {
            auto start = chrono::high_resolution_clock::now();
            auto result = accumulate (v.rbegin(), v.rend(), 0L);
            auto end = chrono::high_resolution_clock::now();
            stringstream ss;ss<<"Accumulate vector backwards:"<<result;
            results.push_back(make_pair(ss.str(), end-start));
        }
    
        // accumulate vector backwards any_iterator
        {
            auto start = chrono::high_resolution_clock::now();
            auto result = accumulate (any_number_iterator(v.rbegin()), any_number_iterator(v.rend()), 0L);
            auto end = chrono::high_resolution_clock::now();
            stringstream ss;ss<<"Accumulate vector backwards any_iterator:"<<result;
            results.push_back(make_pair(ss.str(), end-start));
        }
    
        for (const auto &i:results)
            cout << i.first<<", elapsed: "<<chrono::duration_cast<std::chrono::milliseconds>(i.second).count()<<"ms"<<endl;
    
        return 0;
    }
    
References:

http://www.artima.com/cppsource/type_erasure.html