Process Synchronization – II (atomic operations)

Let us continue our discussion from the previous post, where we discussed a software solution to the Critical section problem. Here, we discuss about a simple hardware solution to the critical section problem, followed by a discussion on  atomic operations like test_and_set() and compare_and_swap().

SYNCHRONIZATION HARDWARE

  • Peterson’s solution → software based → not guaranteed to work on modern computer architectures
  • In the following sections, we explore more solutions based on the premise of locking protecting critical regions through the use of locks

Simple hardware solution for a single processor environment

  • Basis for solution: Critical section problem could be solved in a single processor environment by simply preventing interrupts from happening while a shared variable is being modified.
  • Why does it work? Since no other instructions would run, we can be sure that no unexpected modifications will be made to the shared variable.
  • This approach is often taken by non-preemptive kernels.
  • Feasibility
    • Impractical solution [not scalable]→ not feasible in a multi-processor environment
    • Reason 
      • Disabling interrupts in multi-processor environment → need to pass the message to all processors → time consuming
      • This message passing delays entry into each critical section → decrease in system’s efficiency

Hardware solution to Critical Section Problem using Atomic instructions

  • Modern machines provide special atomic hardware instructions
  • atomic = uninterruptible
    • An operation is atomic if it completes in a single step.
    • i.e., When an atomic operation is performed on a variable in a shared memory location, no other thread can observe the operation half-complete.
    • Without atomic CPU instructions, lock-free programming would be impossible, since there would be no guarantee of multiple threads not operating on a shared variable at the same time.
  • These instructions help in solving the critical section problem in a relatively simple manner.
  • They allow us to either
    • test and modify the content of a word
    • swap the contents of a word

Why are atomic instructions needed?

[Source]

Consider a simple example:

  • This simple program takes time to execute 10^12 instructions. 
  • How to speed up?
    • Have multiple iterations running in parallel, using a multi-core processor, say on 2 separate cores
    • However, instead of speeding things up, it will actually slow it down. Why?
      • overhead involved in forking and joining multiple threads
      • As each processor updates loop, its value bounces between their L1 caches → multiple bouncing slows performance
      • Also, we will get a race condition, which we discussed in the previous post, where the final value of variable loop will depend on sequence of the execution by the two processors.
    • Due to the race condition, the final value obtained can be as low as 2, instead of the desired 10^12
  • Fixing this requires load-add-store to be executed in a single step → atomic operations needed 

We understand atomic instructions using two examples : the test_and_set() function and compare_and_swap() function.

test_and_set()

  • executed atomically
    • this means that if two test_and_set() functions are executed simultaneously, they will be executed sequentially in some arbitrary order.
  • We can implement mutual exclusion by declaring a boolean variable lock, initialized to false.

Compare_and_swap()

  • executed atomically
  • operates on three operands
  • Mutual exclusion is implemented as follows:
    • declare a global variable lock and initialize it to 0
    • first process to invoke compare_and_swap() → sets lock to 1
    • This process will enter its critical section, since value of lock was equal to the expected value of 0.
    • Future calls to compare_and_swap() will not succeed, since lock is now 1 → no other process can enter their critical section.
    • When the first process exits from critical section, lock is set to false → allows other processes to enter critical section.
  • This solution, though satisfies mutual exclusion, does not guarantee bounded waiting [the same process can execute critical section again and again]

In the next post, we will discuss another algorithm using test_and_set() which satisfies all conditions required by the critical section problem. We will also start an interesting discussion on mutexes. Till then , keep reading and sharing 🙂