środa, 18 lutego 2015

What is performance cost of object oriented iterator in c++


  1. Introduction

    In this article I would like to present performance difference between object-oriented iterators (runtime polymorphism) and generic iterators (compile-time polymorphism).
    All standard library iterators (stl) are implemented according to generic programming principles. Because in generic programming everything must be known during compilation, this technique has some limitations. It increases necessity of code recompilation and prevents from providing independent library.
    For example you cannot have library (in terms of shared library like "so" or "dll") which has exported standard algorithm like accumulate, you could do it only for known in advance iterator types (value types and containers).

    Fortunately, we can probably "do better" with object-oriented and type erasure approach. Thanks to runtime polymorphism it's possible to implement such iterator, which depends only on it's value type but not on underplaying container type.
    Thomas Becker wrote great article about such object-oriented iterator with type erasure (http://www.artima.com/cppsource/type_erasure.html). This article is based on his implementation called "any_iterator".
    Object-oriented iterator is good step forward, but still depends on it's value type, so it makes a difference if function takes any_iterator<int> or any_iterator<double>. Because of that, if we want to expose algorithm like accumulate in shared library, we would need to provide declarations for all possible value types.
  2. Performance issues

    Lack of flexibility is not the only disadvantage of object-oriented iterators. Another problem is performance cost of runtime polymorphism. How significant it might be ? Two times slower, maybe five times slower?

    Well, if we talk about performance, intuition will be probably wrong,  so let's measure it!
    I've prepared a c++11 program which is testing performance of accumulate algorithm on vector<int> and list<int> containers, using standard generic iterators and runtime polymorphic any_iterator.
    Surprisingly performance cost of object oriented approach is extraordinary, for list it is 40 and for vector it is 70 times slower!

    Such extraordinary performance degradation shows what is the real cost of runtime polymorphism. It's not only one virtual method call, which after all shouldn't be so costly. The main cost is in preventing compiler from performing any optimisation (inlining, vectorisation, etc...).
  3. Test results

  • Compilation flags

    clang++ -c -pipe -O2 -std=c++11 -Wall -W -fPIE
  • Results

    Filling list, elapsed: 7439ms
    Filling vector, elapsed: 1936ms
    Accumulate list:144118291114003667, elapsed: 1150ms
    Accumulate vector:144118291114003667, elapsed: 176ms
    Accumulate list any_iterator:144118291114003667, elapsed: 44447ms
    Accumulate vector any_iterator:144118291114003667, elapsed: 12226ms
    Accumulate vector backwards:144118291114003667, elapsed: 217ms
    Accumulate vector backwards any_iterator:144118291114003667, e..: 12773ms
  • Source code

    #include <iostream>
    #include <list>
    #include <vector>
    #include <algorithm>
    #include <chrono>
    #include <sstream>
    
    #include <any_iterator/any_iterator.hpp>
    
    using namespace std;
    
    typedef IteratorTypeErasure::any_iterator<
      int const, // Value
      boost::bidirectional_traversal_tag,
      int const // Reference
    >
    any_number_iterator;
    
    const size_t MAX_NUMBERS = 1<<27;
    
    int main()
    {
        list<int> l;
        vector<int> v;
        vector<pair<string, chrono::nanoseconds>> results;
    
        // fill list
        {
            auto start = chrono::high_resolution_clock::now();
            srand(0);generate_n(back_inserter(l), MAX_NUMBERS, rand);
            auto end = chrono::high_resolution_clock::now();
            results.push_back(make_pair("Filling list", end-start));
        }
    
        // fill vector
        {
            auto start = chrono::high_resolution_clock::now();
            srand(0);generate_n(back_inserter(v), MAX_NUMBERS, rand);
            auto end = chrono::high_resolution_clock::now();
            results.push_back(make_pair("Filling vector", end-start));
        }
    
        // accumulate list
        {
            auto start = chrono::high_resolution_clock::now();
            auto result = accumulate (l.begin(), l.end(), 0L);
            auto end = chrono::high_resolution_clock::now();
            stringstream ss;ss<<"Accumulate list:"<<result;
            results.push_back(make_pair(ss.str(), end-start));
        }
    
        // accumulate vector
        {
            auto start = chrono::high_resolution_clock::now();
            auto result = accumulate (v.begin(), v.end(), 0L);
            auto end = chrono::high_resolution_clock::now();
            stringstream ss;ss<<"Accumulate vector:"<<result;
            results.push_back(make_pair(ss.str(), end-start));
        }
    
        // accumulate list any_iterator
        {
            auto start = chrono::high_resolution_clock::now();
            auto result = accumulate (any_number_iterator(l.begin()), any_number_iterator(l.end()), 0L);
            auto end = chrono::high_resolution_clock::now();
            stringstream ss;ss<<"Accumulate list any_iterator:"<<result;
            results.push_back(make_pair(ss.str(), end-start));
        }
    
        // accumulate vector any_iterator
        {
            auto start = chrono::high_resolution_clock::now();
            auto result = accumulate (any_number_iterator(v.begin()), any_number_iterator(v.end()), 0L);
            auto end = chrono::high_resolution_clock::now();
            stringstream ss;ss<<"Accumulate vector any_iterator:"<<result;
            results.push_back(make_pair(ss.str(), end-start));
        }
    
        // accumulate vector backwards
        {
            auto start = chrono::high_resolution_clock::now();
            auto result = accumulate (v.rbegin(), v.rend(), 0L);
            auto end = chrono::high_resolution_clock::now();
            stringstream ss;ss<<"Accumulate vector backwards:"<<result;
            results.push_back(make_pair(ss.str(), end-start));
        }
    
        // accumulate vector backwards any_iterator
        {
            auto start = chrono::high_resolution_clock::now();
            auto result = accumulate (any_number_iterator(v.rbegin()), any_number_iterator(v.rend()), 0L);
            auto end = chrono::high_resolution_clock::now();
            stringstream ss;ss<<"Accumulate vector backwards any_iterator:"<<result;
            results.push_back(make_pair(ss.str(), end-start));
        }
    
        for (const auto &i:results)
            cout << i.first<<", elapsed: "<<chrono::duration_cast<std::chrono::milliseconds>(i.second).count()<<"ms"<<endl;
    
        return 0;
    }
    
References:

http://www.artima.com/cppsource/type_erasure.html

czwartek, 1 stycznia 2015

Technical papers for programmers

  1. Out of the Tar Pit (by Ben Moseley and Peter Marks)
    • Accidental vs. Essential complexity
    •  "We have argued that complexity causes more problems in large software
      systems than anything else. We have also argued that it can be tamed
      — but only through a concerted effort to avoid it where possible, and to
      separate it where not. Specifically we have argued that a system can usefully
      be separated into three main parts: the essential state, the essential logic,
      and the accidental state and control." 
  2. Dynamo: Amazon’s Highly Available Key-value Store(by Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels)
  3. Time, Clocks, and the Ordering of Events in a Distributed System (by Leslie Lamport (1978))
  4. Anchoring and Adjustment in Software Estimation. (By Jorg Aranda,  Steve Easterbrook).
    • Although the patterns on each condition are visible on the chart,
      the following numbers help to clarify it. The “2 months”
      participants had a mean estimate of 6.8 months. 
    • The control condition has a slightly higher mean estimate, at 8.3 months
    • and the “20 months” condition’s mean estimate is 17.4 months.
       
  5.  State the Problem Before Describing the Solution (By Leslie Lamport)
    .
    • a brief informal statement of the problem
    • the precise correctness conditions required of solution (this is the most important point that allows verify that we solve the right problem)
    • the solution
    • a proof that the solution satisfies the requisite condition