<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>RasterGrid Blog &#187; OpenMP</title>
	<atom:link href="http://rastergrid.com/blog/tag/openmp/feed/" rel="self" type="application/rss+xml" />
	<link>http://rastergrid.com/blog</link>
	<description>A technical blog from Daniel Rákos (aka aqnuep)</description>
	<lastBuildDate>Fri, 04 Nov 2011 18:10:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Synchronizable objects for C++</title>
		<link>http://rastergrid.com/blog/2010/02/synchronizable-objects-for-c/</link>
		<comments>http://rastergrid.com/blog/2010/02/synchronizable-objects-for-c/#comments</comments>
		<pubDate>Tue, 02 Feb 2010 19:01:56 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Multiprocessing]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Samples]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[lock]]></category>
		<category><![CDATA[macro]]></category>
		<category><![CDATA[multithreading]]></category>
		<category><![CDATA[mutex]]></category>
		<category><![CDATA[OOP]]></category>
		<category><![CDATA[OpenMP]]></category>
		<category><![CDATA[synchronization]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=120</guid>
		<description><![CDATA[Previously I talked about how one can easily take advantage of multiprocessing using OpenMP. Even if the C pragmas introduced by the parallel programming API standard is very straightforward for simple programs, it simply doesn&#8217;t fit nicely in a complex C++ application that is built from the ground with the OOP in mind. To smoothly]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Frastergrid.com%252Fblog%252F2010%252F02%252Fsynchronizable-objects-for-c%252F%22%2C%20%22shorturl%22%3A%20%22http%3A%2F%2Fbit.ly%2FbbpIPT%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22Synchronizable%20objects%20for%20C%2B%2B%22%20%7D);"></div>
<p>Previously I talked about how one can easily take advantage of multiprocessing using OpenMP. Even if the C pragmas introduced by the parallel programming API standard is very straightforward for simple programs, it simply doesn&#8217;t fit nicely in a complex C++ application that is built from the ground with the OOP in mind. To smoothly introduce OpenMP into such projects one need higher level constructs that hide the actual implementation details. This is the first article of a series that will try to provide reference implementations of such an abstraction. First, we will start with synchronizable primitives that try to reflect the functionality provided by the &#8220;synchronized&#8221; statement of Java.</p>
<p><span id="more-120"></span>This article is highly inspired by an article written by <a title="A &quot;synchronized&quot; statement for C++ like in Java" href="http://www.codeproject.com/KB/threads/cppsyncstm.aspx" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.codeproject.com/KB/threads/cppsyncstm.aspx?referer=');">Achilleas Margaritis</a><span style="line-height: normal; -webkit-border-horizontal-spacing: 5px; -webkit-border-vertical-spacing: 5px; font-size: small;"> and is mostly equivalent with his thoughts. My article tries to provide a portable reference implementation of a slightly modified version of the trick presented by Margaritis that uses OpenMP as the multiprocessing API back-end.</span></p>
<h2>Motivation</h2>
<p><span style="line-height: normal; -webkit-border-horizontal-spacing: 5px; -webkit-border-vertical-spacing: 5px; font-size: small;">According to the OO paradigm, classes and consequently objects provide an abstract interface to the underlying internal data or services of the modeled entity or entity class. When it comes to parallel programing we should provide facilities to enable concurrent access to shared resources that are in this case objects. Using plain OpenMP can be satisfactory, however when used extensively the OpenMP pragmas and API function calls introduced can greatly affect the readability and the maintainability of the code. Nevertheless, there can be platforms that use other APIs for handling race conditions. It is obvious that we need to encapsulate these facilities and provide an abstract tool-set instead.</span></p>
<h2>Implementation</h2>
<p><span style="line-height: normal; -webkit-border-horizontal-spacing: 5px; -webkit-border-vertical-spacing: 5px; font-size: small;">The very first building block of such a framework can be a mutex class that provides mutually exclusive access to certain resources. In the world of OpenMP this should look like something similar to the following:</span></p>
<pre class="brush: cpp">class Mutex {
public:
    Mutex() { omp_init_lock(&amp;_mutex); }
    ~Mutex() { omp_destroy_lock(&amp;_mutex); }
    void lock() { omp_set_lock(&amp;_mutex); }
    void unlock() { omp_unset_lock(&amp;_mutex); }
private:
    omp_lock_t _mutex;
};</pre>
<p>This seems already enough for us to make our Java-like &#8220;synchronized&#8221; statement, however we would like to create a framework that makes usage as easy and safe as possible. In order to get closer to this goal we apply the RAII (Resource Acquisition Is Initialization) design pattern to create our lock class:</p>
<pre class="brush: cpp">class Lock {
public:
    Lock(Mutex&amp; mutex) : _mutex(mutex), _release(false) { _mutex.lock(); }
    ~Lock() { _mutex.unlock(); }
    bool operator() const { return !_release; }
    void release() { _release = true; }
private:
    Mutex&amp; _mutex;
    bool _release;
};</pre>
<p>Our goal is to provide an inheritable interface for such objects that needs synchronization. However, this step has to involve severe considerations regarding to the provided interface as we explicitly need to conform to the following requirements:</p>
<ul>
<li>The interface shall not expose the interface of the underlying synchronization primitive, in our case the mutex class methods.</li>
<li>The interface shall be available only to the synchronizable objects but not for the external world as we would like to not just hide the implementation details of our abstract entity but also prevent the users to synchronize our objects as it should be the responsibility of the object itself.</li>
<li>The interface shall expose methods which are less prone to name collision, for convenience.</li>
</ul>
<p>If we take care of the presented conventions we end up with an interface similar to the following:</p>
<pre class="brush: cpp">class Synchronizable: protected Mutex {
protected:
	void enterSyncBlock() { this-&gt;lock(); }
	void exitSyncBlock() { this-&gt;unlock(); }
};</pre>
<p>Now we are almost at the finish line. We just need to inherit this class in order to have the needed facilities for an object that needs synchronization. However, using this interface directly is not the most comfortable and safe. If we would like to have a Java-like &#8220;synchronized&#8221; statement we have to call for additional help. Fortunately, we have our not so well respected C macro language coming to rescue us as we can use it to make some pseudo-language extensions. The simplest way to define our new statement is using the following line:</p>
<pre class="brush: cpp">#define synchronized(obj)  for(Lock obj##_lock = *obj; obj##_lock; obj##_lock.release())</pre>
<p>From now, we can really use object synchronization in C++ as easy as in Java, we just need the following syntax in the method of our shared objects:</p>
<pre class="brush: cpp">synchronized(this) {
    // some code that needs synchronization
}</pre>
<p>Now it is clearly visible how handy the RAII pattern became in our case. Beside that it is now very straightforward to use this statement it provides additional benefits:</p>
<ul>
<li>It makes the code more readable and as a result it is easier to maintain.</li>
<li>No need to call inconveniently named methods and use lock variables.</li>
<li>The synchronized code has it&#8217;s own scope inside the code.</li>
<li>It is exception-safe as the mutex is unlocked upon destruction.</li>
</ul>
<p>Additionally, we can also take advantage of the inherent problem in C++ regarding to multiple inheritance. If we inherit our object from other two synchronized objects then using a simple type casting we can explicitly specify which ancestor we would like to synchronize in a particular block. Also, to ease this we can define our synchronization statement instead of the Java-like one using the following line:</p>
<pre class="brush: cpp">#define synchronized(cls)  for(Lock obj##_lock = *static_cast&lt;cls*&gt;(this); obj##_lock; obj##_lock.release())</pre>
<p>In this case we pass the class name instead of the object pointer <em>this</em>. Using this later construct we can easily specify the correct ancestor that we would like to synchronize in case when we deal with multiple inheritance situations. Personally I prefer the later syntax as it is much more customized for C++ use cases.</p>
<p>As from now we don&#8217;t need a direct interface for entering and exiting our synchronization block we can simplify our synchronizable interface to the following chunk:</p>
<pre class="brush: cpp">class Synchronizable: protected Mutex {
};</pre>
<p>This is enough from now to provide the facilities needed for a synchronization block but still complies to the requirement that we would like to hide the synchronization primitive related details.</p>
<p>Beside this, Jörg came up with the idea today to replace the for loop in our macro with a single if statement. This seems reasonable as we don&#8217;t have to sacrifice any scoping and safety related benefits of our framework. This simplifies our lock class to the following:</p>
<pre class="brush: cpp">class Lock {
public:
    Lock(Mutex&amp; mutex) : _mutex(mutex) { _mutex.lock(); }
    ~Lock() { _mutex.unlock(); }
    bool operator() const { return true; }
private:
    Mutex&amp; _mutex;
};</pre>
<p>This definition of the lock class is satisfactory if we redefine our synchronized macro to use an if statement instead:</p>
<pre class="brush: cpp">/* Java-like synchronized statement */
#define synchronized(obj)  if (Lock obj##_lock = *obj)
/* alternative synchronized statement to support multiple inheritance */
#define synchronized(cls)  if (Lock obj##_lock = *static_cast&lt;cls*&gt;(this))</pre>
<p>Thanks to the useful comments we even managed to further optimize and minimize the support code needed for our new pseudo-language extension.</p>
<h2>Conclusion</h2>
<p>We have seen an example how one can implement an easy to use synchronizable interface for C++. Also, we&#8217;ve provided a concrete implementation that is based on OpenMP. This library is still far from an API that provides all the necessary constructs that one needs for using parallel programming in their C++ projects, however we made our first step and I will recap on the subject in subsequent articles to further extend this framework.</p>
<p>Credits go to Achilleas Margaritis whose article inspired me to write mine and to Jörg for the useful improvement ideas.</p>
<h3>Full source code</h3>
<p><strong>Language:</strong> C++<br />
<strong> Platform:</strong> cross-platform<br />
<strong> Dependency:</strong> OpenMP<br />
<strong> Download link:</strong> <a title="omp_sync.h" href="/blog/wp-content/uploads/2010/02/files/omp_sync.h" target="_blank">omp_sync.h</a><br />
<strong> Comments:</strong> In order to use it as it is, you will need a C++ compiler supporting OpenMP like GCC 4.2 or Visual C++ 2008.</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/02/synchronizable-objects-for-c/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Exploit parallelism with the least effort</title>
		<link>http://rastergrid.com/blog/2010/01/exploit-parallelism-with-the-least-effort/</link>
		<comments>http://rastergrid.com/blog/2010/01/exploit-parallelism-with-the-least-effort/#comments</comments>
		<pubDate>Tue, 19 Jan 2010 21:12:15 +0000</pubDate>
		<dc:creator>Daniel Rákos</dc:creator>
				<category><![CDATA[Multiprocessing]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[Fortran]]></category>
		<category><![CDATA[OpenMP]]></category>

		<guid isPermaLink="false">http://rastergrid.com/blog/?p=84</guid>
		<description><![CDATA[Multiprocessing has been there for decades as a premium feature for enterprise applications but adopting this technology still brings huge burden to software companies that still maintain and develop legacy code. Nowadays, as most commodity hardware already have highly parallelized architectures, a modern application is almost unimaginable without proper multi-threading capabilities even if we talk]]></description>
			<content:encoded><![CDATA[
<div class="topsy_widget_data topsy_theme_light-green" style="float: right;margin-left: 0.75em; background: url(data:,%7B%20%22url%22%3A%20%22http%253A%252F%252Frastergrid.com%252Fblog%252F2010%252F01%252Fexploit-parallelism-with-the-least-effort%252F%22%2C%20%22style%22%3A%20%22big%22%2C%20%22title%22%3A%20%22Exploit%20parallelism%20with%20the%20least%20effort%22%20%7D);"></div>
<p>Multiprocessing has been there for decades as a premium feature for enterprise applications but adopting this technology still brings huge burden to software companies that still maintain and develop legacy code. Nowadays, as most commodity hardware already have highly parallelized architectures, a modern application is almost unimaginable without proper multi-threading capabilities even if we talk about text editor or a multimedia application. The transition from traditional software development to multiprocessing is not an easy and painless task. Fortunately we have such tools in our hand like OpenMP.</p>
<p><span id="more-84"></span>Currently the biggest hit is OpenCL as it seems to be the ultimate solution to harness the power of highly parallel architectures like multi-core CPUs, DSPs and probably most important is that it can leverage the huge raw computational capabilities of GPUs. However it is one of the most important standard that came out lately, it is not the answer for all questions. For those who would like to converge their legacy code with multiprocessing technology maybe it&#8217;s a better advice to look around for other solutions.</p>
<p>My intension was not related to this when I started to search around for a multiprocessing framework. I just wanted to find something that provides an easy to use interface to introduce multi-threading and the needed shared memory semantics into my hobby projects. This is how I found <a title="OpenMP Homepage" href="http:/www.openmp.org/" target="_blank">OpenMP</a>.</p>
<h2>What is OpenMP?</h2>
<p>Basically, OpenMP is an API specification for parallel programming that is intended to extend the most preferred programming languages used for computationally heavy and scientific calculations with a tool set that enables cross-platform multi-threading support tightly integrated into the language itself. Namely, OpenMP adds shared memory parallel programming capabilities to the C, C++ and Fortran languages.</p>
<p>While OpenMP is limited to these particular programming languages, it is truly an open and multi-platform API that is very well supported by different compilers (at least as far as I can tell). The standard itself is developed and maintained in a similar fashion like OpenGL as it has it&#8217;s own <a title="OpenMP Architecture Review Board" href="http://www.openmp.org/wp/about-openmp/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.openmp.org/wp/about-openmp/?referer=');">Architecture Review Board</a> with representatives from all major hardware and software vendors like AMD, HP, IBM, Intel, Sun Microsystems, Microsoft and others.</p>
<p>The specification itself is maintained in two different versions: one for C/C++ and another for Fortran. As I was never involved in development with Fortran, I dug deeper only in the C/C++ specific details, however the facilities provided by the API are basically the same for Fortran as well.</p>
<p>The language extensions are introduced using OpenMP specific pragmas and a run-time library. At first sight this does not seem to be the most elegant solution but this fits very well into all versions of the programming language specifications so there are no further interworking issues and the OpenMP standard can be maintained totally separated from the underlying language itself. Looking at the evolution process of the C and the C++ programming languages this makes sense by the way.</p>
<h2>Say Hello World to parallel programming</h2>
<p>I think the best way to show the power and simplicity of OpenMP is to show a basic example on how easy is to add parallel computing capabilities even to the most straightforward algorithms:</p>
<pre class="brush:c">void quicksort(int *a, int lo, int hi) {
    int i=lo, j=hi, h;
    int x=a[(lo+hi)/2];

    do {
        while (a[i] &lt; x) i++;
        while (a[j] &gt; x) j--;
        if (i &lt;= j) {
            h=a[i]; a[i]=a[j]; a[j]=h;
            i++; j--;
        }
    } while (i &lt;= j);

    #pragma omp parallel sections
    {
        #pragma omp section
        if (lo &lt; j) quicksort(a, lo, j);
        #pragma omp section
        if (i &lt; hi) quicksort(a, i, hi);
    }
}</pre>
<p>This is the quick sort algorithm in OpenMP fashion. As you may already observed this function is not really different from the original sequential version of the famous sorting technique. The only added content is the presence of the three OpenMP specific pragmas and an additional block.</p>
<p>I will now explain how we exploited parallel programming with just these few added lines but I don&#8217;t want to go into details as it is always better to read the specification itself before starting to heavily use OpenMP. First, we&#8217;ve created &#8220;parallel sections&#8221; which means that we expressed our intension that we would like to separate the tasks in the next code block between multiple threads. Next we&#8217;ve specified the actual &#8220;sections&#8221; that one thread should execute.</p>
<p>This way each time we&#8217;ve split up the array in two pieces we sort the separate regions using separate threads. Of course, for a very huge this would not mean that the number of threads will exponentially grow as it will be saturated at some point. However, this is just one parameter that is fully controlled by the programmer.</p>
<h2>Parallelize loops with minimal effort</h2>
<p>Many times happens that the performance bottleneck is inside a for loop that moves or does calculations on huge data arrays. One example is an algorithm that interpolates two float arrays to another one. This can be of course parallelized using the &#8220;sections&#8221; semantics presented earlier, however it would need modification to the original algorithm and after this it would not clearly reflect the purpose of that anymore. OpenMP supports also such cases very elegantly:</p>
<pre class="brush:c">#pragma omp parallel for
for(int i = 1; i &lt; size; ++i)
    C[i] = A[i] * alpha + B[i] * (1 - alpha);</pre>
<p>Notice that there are no loop-carried dependencies. This means that one iteration of the loop does not depend upon the result of another iteration of the loop. This makes it appropriate for parallelization. Only by adding a single pragma the time needed to execute this loop may scale down perfectly on multi-core systems.</p>
<p>For more control over how many threads will likely to carry out the results of this for loop one can specify the exact number of threads that should be used for the operation by adding another option to the pragma:</p>
<pre class="brush:c">#pragma omp parallel for num_threads(4)</pre>
<p>Of course there are plenty of other configuration possibilities that control how the parallelized code will actually execute but, again, this article is not meant to be a through guide on the usage of OpenMP instead it&#8217;s just a foretaste to raise interest for getting more details about this prominent tool.</p>
<h2>More than just threads</h2>
<p>We&#8217;ve seen so far that OpenMP enables the introduction of basic work sharing support for an already existing project with minimal effort. However, OpenMP is more than just another way to execute separate threads, it also provide very easy to use facilities for synchronization and shared data handling that can be the building blocks of any multiprocessing application including, but not limited to the following features:</p>
<ul>
<li>Explicitly scoped variables to indicate shared and thread private storage</li>
<li>Atomic operations and critical sections</li>
<li>Execution barriers for fine grained synchronization</li>
</ul>
<p>The best thing in these is that you just specify the appropriate pragmas for the affected statements or variables and the rest is carried out by OpenMP. For more information on the usage of these please refer to the <a title="OpenMP specification" href="http://www.openmp.org/wp/openmp-specifications/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.openmp.org/wp/openmp-specifications/?referer=');">OpenMP specification</a>.</p>
<h2>Compiler support</h2>
<p>One of the best things in OpenMP is that it is well supported by most of the major C/C++ compiler vendors:</p>
<ul>
<li><strong>GCC</strong> version 4.3.2 and later (enabled with the -fopenmp compiler switch)</li>
<li><strong>Visual C++</strong> 2008 and later (enabled with the /openmp compiler switch)</li>
<li><strong>Intel C/C++</strong> compiler version 10.1 and later (using -Qopenmp on Windows or -openmp on Linux or MacOSX)</li>
</ul>
<p>For a <a title="OpenMP compilers" href="http://www.openmp.org/wp/openmp-compilers/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.openmp.org/wp/openmp-compilers/?referer=');">complete list</a> of supported compiler please refer to the official site of OpenMP.</p>
<p>Another advantage that raises from the fact how the actual language integration of OpenMP has been designed is that it usually gracefully degrades on compilers without support for OpenMP as the pragmas can be silently ignored. I intentionally used the word &#8220;usually&#8221; as in case that the business logic of the application is consciously using the multi-threaded semantics then it wouldn&#8217;t execute in the exact same way with or without OpenMP. However, the responsibility to monitor such situations is up to the developer.</p>
<h2>Conclusion</h2>
<p>My personal opinion about OpenMP that it best suites those situations when a gradual transition is needed for legacy code towards a parallelized system or when one searches for the easiest possible way to take advantage multiprocessing capable environments. Still, OpenMP is suitable to fulfill almost all the tasks that are needed to implement completely new applications with parallel programming in mind and so I recommend it to everybody even for general use.</p>

]]></content:encoded>
			<wfw:commentRss>http://rastergrid.com/blog/2010/01/exploit-parallelism-with-the-least-effort/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

