# Thinking Parallel

## Mutual Exclusion with Locks – an Introduction

Do you know the difference between a mutex and a spinlock? You know exactly what kind of locks are employed in e.g. OpenMP? You find hierarchical locks boring and recursive spinlocks only make you yawn? Then you will most likely NOT enjoy reading this article, and I encourage you to stop reading now, as it is way too basic for you – don’t worry, though, as I have a way more advanced article in the pipeline :-). Seriously, stop!

Now wouldn’t that be an interesting statistic as to how many of my readers just stopped reading at this point, because they already know everything I have to offer in this article :-). Unfortunately, I will have to cope without this knowledge for now. Anyways, since you are still reading I will assume that I can tell you something interesting today (or so I hope) – and if not, you would CFLink, right? (Now, that’s an innovation! A call to action right at the beginning of my post! Will see how this turns out ๐ )

I will start out this article with a quick introduction as to why mutual exclusion is necessary and how it is done with locks and will then turn to some interesting variations of it. I will not cover semaphores, monitors or all the other techniques to guarantee mutual exclusion, but will concentrate on locks and their variations (which includes mutex variables and spinlocks).

### Why Mutual Exclusion is Needed for Shared Memory

Let’s assume we do not use Mutual Exclusion for a second and two of our threads want to increment a shared variable, conveniently called i with a nice and simple command such as i++. Now what’s really happening is, that (depending on your architecture), not one command is carried out by your processor, but most likely three:

1. load value of i into a register (lets call it r1)
2. increment r1 by 1
3. store value in r1 back into i

And we did not even need assembler for that (have I mentioned that I do not particularly like assembler?)! Anyways, let’s see what could happen in the worst case with our example and two threads (assuming i had a value of 1 beforehand):

1. Thread 1: load value of i into a register on processor 1 (lets call it r1) [i=1, r1=1]
2. Thread 2: load value of i into a register on processor 2 (lets call it r2) [i=1, r1=1, r2=1]
3. Thread 1: increment r1 by 1 [i=1, r1=2, r2=1]
4. Thread 2: increment r2 by 1 [i=1, r1=2, r2=2]
5. Thread 1: store value in r1 back into i [i=2, r1=2, r2=2]
6. Thread 2: store value in r1 back into i [i=2, r1=2, r2=2]

The values in brackets are there to remind you what exactly is stored inside the variable and registers at this point in time. Remember, steps 1 and 2, 3 and 4, 5 and 6 are happening in parallel, on two different CPUs! i should have been incremented two times, so the final result should be i=3. Yet, it is not, as can be seen at the end of step 6. And that’s why we need mutual exclusion, as two threads working on the same data at the same time is a recipe for disaster! By the way, you do not even need two processors for this to happen: When you have two threads running on a single processor and the scheduler decides to stop thread 1 after it has just loaded the value of i into its register, you have the same dilemma.

### How Mutual Exclusion is Done

Now that we have seen the problem, lets take a look at the solution. Simply put, we need to stop the two threads from working on the same data at the same time (that’s why it’s called mutual exclusion after all). The most common way to do this today is by using locks. A lock can be either locked or unlocked (sometimes also referred to as set or unset). And it really behaves like a lock on an ordinary door in the real world: You enter a room, you lock the door because you do not want to be disturbed, do whatever you need to do in there, unlock the door and only then can somebody else enter. If anybody tries to enter the room while you are still in there, he has to wait. As long as you do not forget to lock or unlock the door, this algorithms guarantees mutual exclusion and protects the so called critical region. And this is what the whole thing would look like using a parallel programming system (OpenMP in this case):

1. omp_set_lock (&my_lock);
2. i++;
3. omp_unset_lock (&my_lock);

Actually, the lock needs to be initialized beforehand and destroyed sometime afterwards as well, but that’s not too difficult either. Furthermore, in OpenMP there is a way to do this even easier, like this:

1. #pragma omp critical
2. {
3.     i++;
4. }

Or even simpler, like this:

1. #pragma omp atomic
2. i++;

This basically does the same thing, except you do not need to worry about initialization and destruction of the lock and you cannot forget to unlock the mutex accidentally, but this is not what this article is about, so I will leave it at that and return to our regularly scheduled locks. Actually, that is all I wanted to say about how locks work in general. If you stick to the algorithm (lock, change, unlock) at any time the shared resource is worked on, you are fine. If you don’t, you code may still break ;-). In my experience, the most difficult thing to get right is not to understand the reason why mutual exclusion is necessary or even how it is done, but the most difficult thing is to spot all the places in the code where it is needed. But then again, even that is not that difficult and my students usually do get it right the second or third time at the latest. OK, now that you know the basics, on to the more interesting variations I promised earlier.

### Mutex

The mutex is the most well-known way to enforce mutual exclusion (it better be, with a name like that ๐ ). It basically behaves like the lock sketched above: When a thread tries to lock a mutex, it is either acquired (if no other thread presently owns the mutex) or the requesting thread is put to sleep until the lock is available again (in case another thread presently owns the lock). When there are multiple threads waiting on a single lock, the order in which they are awoken is usually not determined. So much for the mutex, let’s now take a look at spinlocks.

### Spinlocks

The functions associated with spinlocks are syntactically equivalent to the ones mentioned for mutexes, the subtle difference lays in the way the wait for the lock is handled: Contrary to mutexes, threads are not put to sleep on spinlocks, but instead continue to spin (that means trying to acquire the lock in this context). Therefore, they usually have a quicker response time (as no thread needs to be woken up as soon as the lock is unlocked), but also do waste processor cycles (a process commonly known as busy waiting). Spinlocks are very common in High Performance Computing, because in this field most of the time each thread is scheduled on its own processor anyways and therefore there is not much to gain by putting threads to sleep (which is a quite time-consuming process after all).

It depends on your threading system, which lock-variant is provided to you. Pthreads for example used to have only mutexes, but now provides both. OpenMP is silent about the issue in the specification, therefore the compiler vendors are free to use whatever variety they wish (as far as I remember, many use a mixture of both, where a thread is spinning for a predefined amount of time and if the lock could not be acquired then, it is put to sleep). Check the documentation of your threading system for details about your locks, with the information provided above it should be pretty easy to find out.

### Recursive Locks

There is another, orthogonal dimension to locking, though. What happens, when a thread that already owns a lock tries to lock it again? The answer is usually a deadlock – the thread will hang. Why would you want to do a thing like that? The common answer to that is: inside recursive functions or functions calling each other from inside the critical region (while both setting the lock). But honestly, I have not seen this used in practice a whole lot and it is usually a sign that something is wrong with your synchronisation patterns anyways – remember, you want your critical regions to be as small and fast as possible, so it is usually not the hottest idea to be calling any functions in there!

But just in case you still need it or encounter code that already employs recursive locks I will say a few words about them: In addition to the normal locking functionality described above, recursive locks employ an internal counter to monitor, how many times they have been locked by the same thread (oh, did I forget to mention that? – It is possible to set the same lock multiple times from the same thread with recursive locks and that is the main difference between normal locks and them). The counter is also decreased every time the thread unsets the lock and only if the counter reaches zero, the lock is made available to the other threads again. Recursive locks can be combined with the mutex or the spinlock-variety, so e.g. recursive spinlocks are possible as well.

### Timed Locks

There are times, when a thread wants to enter a critical region, but has other things to do if the region is presently locked. A very common way to deal with this situation is to test the lock beforehand. A very simple example (you guessed it, in OpenMP ๐ ) will make this clearer:

1. while (!omp_test_lock (&my_lock)) {
2.     /* we did not get the lock!
3.      * -> therefore, do some other work
4.      * then try again... */
5. }
6. /* at this point in the code, omp_test_lock returned
7.  * successfully and has therefore acquired the lock
8.  * -> do something with the locked resource ...
9.  * ... and finally unlock it again */
10. omp_unset_lock (&my_lock);

But actually, this is not what timed locking is about. An alternative to the way of dealing with the problem described above would be, to give the normal set_lock-function an additional timing-value (such as e.g. 200ms). The function will block normally, but if the lock is not acquired by that time, the function will return anyways and indicate somehow, that it has failed to acquire the lock. This can be useful e.g. in embedded systems, where it is required that certain timing constraints are obeyed (such as no thread shall wait more than 200ms on a lock, or something has gone wrong – although you would probably phrase that constraint a little different in your specification ๐ ). And in that case, a watchdog thread (or process) could be notified that makes sure that the system is working properly.

Timed locks are not very common (I think POSIX has them now), but still your threading system may have them and they are an important building block in certain niches (e.g. embedded systems). Theoretically, they can be combined with all the other variations suggested above and therefore a timed, recursive mutex is possible.

### Hierarchical Locks

I first heard about hierarchical locks at Europar last week. This does not necessarily mean they are the newest invention in town (the first paper I could find on them dates back to 2002 – the locks are called RH-locks in there and it is a very good read if you want to know more implementation details of locks). It is way more likely that there is much more to locking than I know about or can describe in one article, especially since this is not my field of research :-). Anyways, hierarchical locks lock exactly like normal locks from the outside, their difference is in the implementation. I apologize for mixing in an implementation detail here, after all I have not told you about other such details, such as queue locks vs. backoff locks – because this is an article for programmers, not for compiler writers.

Nevertheless, I found the concept interesting enough to mention it here, especially after my last article about locality. Hierarchical locks operate under the premise that it is a good idea to let threads with a high memory locality (e.g. threads scheduled on the same node, working on the same data or data that are close together in memory) acquire the locks consecutively, as this will reduce cache misses. This sounds logical, although I do not have any experience whatsoever about it. The idea was invented for CC-NUMA machines and that’s where its strengths are. And the best thing about it is: the programmer does not need to care, as the locality improvements are managed automatically by the runtime system. I do not see why hierarchical looks cannot be combined with any of the locks described above, so maybe you are in the market for a hierarchical, timed, nested spinlock ๐ ?

If you want to know more about locking and thread-programming in general, I wholeheartedly recommend what I like to call the POSIX Threads Bible by David Butenhof – Programming with POSIX Threads. Or if you are more into Java, I hear Java Concurrency in Practice is a very good read. I have not really found a great book on Thread-Programming with C# – the best one appears to be Programming C#, although it contains only a single chapter on threaded programming. If you find a better one, please let me know.

### Locking Summary

This article has grown so big that I need a summary for it ;-). I am inclined to promise anyone who has managed to read this far a beer, but since my blog audience has grown considerably during the last month (with about 150 regular subscribers and about 600 unique visitors a day) and this article will be on the net for a long time, this may not be such a great idea. Especially, since I don’t even drink beer. Therefore, I will promise to answer any on topic comments you will leave here, instead :P.

Anyways, the topic of this article was mutual exclusion with locks. After a not so short introduction to locking and why it is needed, I have showed some variations among them you may encounter when dealing with locking. First of all, I looked at spinlocks and mutex variables. The major difference between them is, that mutex variables put waiting threads to sleep, while with spinlocks waiting threads actively continue to try to acquire the lock.

The second variety I pointed out were recursive locks, which make it possible to set the same lock multiple times by the same thread. Next were timed locks, that do exactly that: they time out after a certain period of waiting time, returning control back to the calling thread. And finally, I looked at hierarchical locks, an implementation detail that promises better performance by exploiting locality, especially on bigger machines.

What I have not covered in this article are more implementation details (not that I know many more, as I said this field is not my research area), as well as more exotic concepts for mutual exclusion (lockless data-structures, anyone?). If there is sufficient interest, I might do that later.

Have I forgotten anything? Or did I get a detail wrong? Feel free to leave a comment!

#### 25 Responses to Mutual Exclusion with Locks – an Introduction »»

1. Comment by Cobo | 2006/09/10 at 04:14:02

Wow! Your students must feel lucky… You should see how I got all these things explained the first time. Great article which serves as a quick summary of the basics.

I’ve liked the simplicity of all explanations given and those little interesting notes about other things not strictly related to the topic (specially locality, which seems quite interesting).

So, please continue with the effort on writing. It’s really appreciated.

Cheers!

2. Comment by Michael Suess | 2006/09/10 at 10:43:59

Cobo,

I promised to answer every response to this aricle, so here I go: Thanks for the kind words, its always nice to see that people are actually reading and enjoying what I write…

3. Comment by Erik | 2006/09/25 at 22:15:39

I needed a review on these topics as well, thanks for taking the time to write the article!

4. Comment by Ashok | 2006/10/17 at 23:07:53

I think the article has a pretty good description about locks and any starter can use it. I find one thing still missing and that is about read-write locks.

5. Comment by Michael Suess | 2006/10/18 at 09:55:29

@Ashok: you are right, I totally forgot about rw-locks in this article. And you know what? I have no idea why…

6. Comment by Paul | 2007/01/17 at 22:33:22

A very enjoyable article. You write very clearly, and with good humour – quite an achievement for such a technical (and fascinating) subject.

Now, I’m no compiler writer, but I’m still wondering how a lock is obtained. In my mind it almost seems like a thread needs a lock to get a lock ๐ How else can it be done safely? I gather the magical Compare and Swap (CAS) instruction is often used, to atomically set a variable, but what about architectures which don’t have the CAS instruction?

Ciao!

7. Comment by shane | 2007/01/18 at 22:22:50

s

8. Comment by Michael Suess | 2007/01/18 at 22:44:16

@Paul: I am not a hardware-architect or low-level compiler writer, so I am not really the best person to ask this question to. I will try my best to answer it anyways :-). To my understanding, you are right and you really need a lock to implement a lock. The only way around this dilemma is to use special, low-level machine instructions. Compare-and-Swap is often used for this purpose, as is Test-and-Set or Test and Test-and-Set. You can find informations on these e.g. in the Wikipedia. To my knowledge, there is no way around these instructions and every parallel architecture provides at least one of these or a similar one.

But maybe there is a hardware architect or a compiler-guy around that can answer this question with more authority?

9. Comment by John S | 2007/02/20 at 10:12:32

I find all the smilies gratuitous, and the writing style a tad informal. I must sound like a grouch.. but I take the time to criticize because I enjoy this article otherwise.

10. Comment by Refik | 2007/05/17 at 02:21:40

Great article Michael, I used to learn this in my operating systems class and as I remember we used first the book by Tanenbaum however we switched to an older one by Douglas Comer; and it was much better explained than in Tanenbaum’s book, you had some nice code examples, if you don’t understand something you would just have to try them out yourself. Great article, keep up the good work!

Schรยถne Grรยผรลธe aus Sarajewo ๐
Refik

11. Comment by etenv | 2007/09/13 at 20:52:07

Thank you Michael for providing that kind of article in such a pleasant way of reviewing the concepts.
I’m currently learning multithreading programming and I’ll surely take a look at the rest of your blog.

A with/tip i would suggest -> provide more code examples.

else{
Keep up the good work.
}

12. Comment by vgdti | 2007/09/26 at 08:42:08

Thanks for the nice article.

I am just wondering if Pthread really supports the spinlocks as what your said “It depends on your threading system, which lock-variant is provided to you. Pthreads for example used to have only mutexes, but now provides both.”

13. Comment by Michael Suess | 2007/09/26 at 10:33:15

@vgdti: pthreads supports spinlocks now, look here for the description. Beware that this is an optional part and does not need to be provided by all implementations.

14. Comment by Joe Aebischer | 2007/11/17 at 17:05:32

Mutual Exclusion with Locks – an Introduction

Enjoyed your article very much I’m a real dummy and need explanations that are easy to read, understand, and possible use for my final project MS in CIS. Keep up the good work. You might make a few more online references. I’m getting poor pursueing book purchasing.

15. Comment by Deepa | 2007/12/04 at 10:31:36

Good article for a beginner. I was trying to look up the meaning of Mutual Exclusion and locking and was glad I got to this page.

The article proved cent percent useful.

Keep writing,
Deepa

16. Comment by varun | 2008/03/30 at 11:32:15

Can you please tel me when the mutex and spinlock is useful,

I mean, in case of uniprocessor and multiprocessor environment.

As per as my knowledge is concerned, in uniprocessor , we should use mutex and in multiprocessor, we should use spin lock.??

Please explain is it right or wrong?

17. Comment by pallavi | 2008/08/05 at 09:21:45

Hi ,
I am Pallavi. I have gone through the article , its very good for brgineer . Can you provide me some idea on how the threading and mutual exclusion is handled in Linux o/s ? What are the differences as compared to windows ? Can you send some good links or book names ?

Thanks And Regards

18. Comment by Yuri | 2008/11/17 at 20:04:54

Great article!
Way to go

19. Comment by chintam | 2009/04/27 at 19:16:04

Hello fren…
I dint completed the whole one except till first of spin lock…
But it was on of the most simplest and understandable words i ever heared and understood related to parallel computing.I would be pleasured to learn all the concepts of parallel computing form you..Hope i get all the articles am in need off…Thanks alot…
Keep writing….

20. Comment by Somehting | 2014/10/02 at 09:35:28

I have worked with C++ for more than 3 years now. But i didn’t get to use much of threads , since we are working as a team. So recently , i had the need to understand everything about using threads safely. While searching for something good to read , i found this. And this article , explains the need of locks very simply. And also it points out ‘Where to put locks’ as the most difficult part , even though we know how to use it theoretically. I have experienced this a lot , and i hope i may be able to solve that after this. Thank you.

21. Comment by Manish kumar | 2014/11/23 at 13:58:08

Good article ๐

22. Comment by Hiba Latifee | 2015/04/15 at 09:38:54

Hi there.
I thoroughly enjoyed reading this. It was as though someone was teaching me all this in person. Such a fun teacher you are. Anyway, there were some grammatical errors (like ‘have showed’) and some spelling errors too (like ‘sychronisation’). Apart from that, I would suggest you write more on semaphore and monitors too. I intend on reading those too.
Thank you.

HTML allowed: `<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> `