The OpenMP “vs” OpenACC debate: by an insider

(Note: I served on both OpenACC and OpenMP committees. )

Let me take you back to 2008 ish. OpenMp v3.0 had just been released in May. DOE supercomputers were dominated by multicore cpus, bluegene and cell architecture.

They are all very different but at their core (ha) is the concept that once you reach a compute unit, you thread ~4/8/16 way. So looked a lot like scaling discrete cpu nodes to the developer. You had to tweak it based on FPUs or cache but that high level was pretty much the same.

With low thread counts OpenMP 3.0 is perfect. You can tweak the number threads to say, lower some memory access pressure or ALU use. Jaguar for example had share FPUs so to get performance on some apps you had to cut in half number of threads.

Then Oak Ridge plans an upgrade to Jaguar using hot new things called GPGPU. 2010 compiler like PGI, Cray, PathScale, CAPS all start playing around with “what does it mean to thread across hundreds or thousands of cores. They prototype pragmas as part of this exploration. Two years away from Titans launch and the field is still grappling with pragmas as an alternative to CUDA. PGI develops CUDA Fortran and OLCF’s early science program is underway. OpenMP ARB is having intense arguments about where to take the standard next.

At the heart of the argument are strong egos who shall remain nameless. OpenMP wanted to be conservative, and justifiably didn’t want to react too quickly to what could possibly be a one-off arch in DOE (Titan). User feedback back then was basically “WTF DOE Why do you hate us?!” OpenMP wasn’t budging and ORNL was getting nervous because egos are blocking a needed exploration. So ORNL asks Cray (vendor), NVIDIA, AMD, PGI, CAPS to coalesce their individual efforts into a single standard w/ the hopes that something usable would exist while OMP ARB warmed up.

So a year before Titan’s launch, Nov 2011 OpenACC is born with the subset of OpenMP members who believed the future lay in 1000s of threads on accelerators. Boy were they right… (shout out to Aurora project for finally seeing the light.)

The memeber went to work combining their individua accelerator directives protypes into one open standard. It was modeled after OpenMP by design: OpenMP is easy to learn, plays nice with legacy apps, had clear syntax, and there was the intention that one day the two would merge.

OpenMP leadership at the time never accepted the split, even though these members continued to serve in OpenMP. It was infantile to put it mildly. ORNL now has an alternative to CUDA. S3D fork becomes first application to be ported to a premature OpenACC 1.0.

I joined ORNL in March 2012 and began to serve on OpenACC committee. OpenMP finally launched an accelerator committee and I began to serve there too. By this time the philosophy of these two standards couldn’t have been more divergent.

OpenMP 3.x assumes the application will basically find its way to a compute unit (think MPI to a node, or cell processor) and when a loop region appears it fans out a bunch of threads to accomplish the task. Data movement? non loop regions that aren’t tasks?

Titans GPUs needed data movement first. And the concept of kernels, aka chunks of work that would get shipped to the GPU. Need atomic? Tough luck, once the threads launched they were unreachable until complete. Take the longest loop and put it up top, move tiny loops inside.

But something else happened. The first ORNL Hackathon was sponsored by OpenACC. They wanted to see their users implement it. We noticed OpenACC wasn’t something you sprinkled on your code and turned it on/off with a flag. Everyone had to change the code.

We realized that the basic promise of a directive placed in comments that accelerate an application wasn’t what was happening because legacy apps were developed for low core high cache architectures. But when they changed it to work with OpenACC, apps ran better.

Enter a short lived architecture that rhymes with Tights Manding. An accelerator somewhere between bluegene and gpu, with wide vectors. Finally OpenMP is ready to indulge the accelerator fangirls, except the it basically took OpenMP 3.x, added data movement and SIMD wide vectors

Cray refocused on that architecture (for understandable reasons) It wasn’t until 2018, a full six years after Titan was launched that the first OpenMP 4.5 compiler was working thanks to Cori… and finally an OMP 4.5 based application ran on Titan.

Conclusion: So when you see the OMP/ACC “debate” what you’re really seeing is leftover heat of the Directives universe “Big Bang” aka people picking sides. But as user liaison for many years here’s my take: pick what works for you. Try them both. Try them all.

What I like about OpenMP is that its member base brings a lot of different architecture considerations. Short/wide vectors, discrete, embedded, big cache, are all there. At the same I don’t like is the legacy induced franken directives it’s become where nothing ever deprecates.

What I like about OpenACC is something I along with

@sunitachandra29 and others put in place: it’s user centric model. Users present at the first day of every F2F meeting. The committee prioritizes features based on requests, it doesn’t turn into a CS exercise in completeness.

We have usergroups on Slack, user group meetups at every major HPC conference. Hackathons. F2F meetings. User user user user… see a pattern? I tried this approach at OpenMP but frankly it didn’t take.

So at the heart of the philosophy it’s about a standard that is slower to adapt and able to have longer term perspective over architecture (OpenMP) vs one that jumps into new architecture and prioritizes based on current user needs. (OpenACC) will OpenACC turn into Frankenstein?

Who knows will they merge? Very very unlikely. Is OpenACC growing? Yes Did OpenMP get influenced ( and basically pushed into action) by OpenACC? Yes. Which one should you use? Try them both, try them all. Do what works for you. /fin #apologiesfortypos

I want to add one more thing. The current OpenMP board is filled with great pragmatic people.

This post was adapted from a twitter thread before I left the site.

Fernanda Foertter

The OpenMP “vs” OpenACC debate: by an insider

Leave a comment Cancel reply

The OpenMP “vs” OpenACC debate: by an insider

Share this:

Leave a comment Cancel reply