Skip to main content

Transactional Memory

Transactional Memory

info

Definition: Transactional Memory Transactional memory allows a group of statements(memory operations) to be atomic and serializable without using locks. Transactional memory is:

  • Atomic: either all these memory operations are performed and become visible, or none of them. A transaction is committed if its memory operations are performed.
  • Serializable: global view of memory is consistent with ordering all transaction operations as a group in an interleaved serial execution of the program.

Compared to LL/SC instructions mentioned in Lecture - Consistency and Synchronization Problems which is more limited in scope but lightweight and suitable for fine-grained synchronization tasks, transactional memory offers broader support for complex atomic regions and simplifies programming.

Transactional memory operation is a code region enveloped as a transaction appears to execute fully or not at all (atomicity) without affected by interference from other threads (isolation):

LockTransactional Memory
Pasted image 20241111152342Pasted image 20241111152748

From programmers' perspective, using locks requires explicit declaration, lock and release on lock variables. Transactional memory enables atomic regions in parallel software by allowing blocks of code to execute as transactions without explicit locks.

Under the programming layer, transactional memory provides:

  • Speculative Execution: Transactions execute speculatively, tracking memory reads and writes in special hardware structures (e.g., cache or transactional buffers).
  • Conflict Detection: The hardware monitors for conflicts (e.g., another thread modifies memory accessed by the transaction).
  • Commit or Abort:
    • If no conflicts occur, the transaction commits, making its updates globally visible atomically.
    • If conflicts occur, the transaction aborts, and all changes are discarded, requiring retry or fallback logic.

HTM simplifies programming by avoiding manual lock management and reducing overhead for uncontested regions, but hardware resources and conflict detection limits can cause transactions to abort. A comparison between lock and HTM:

AspectTransactional Memory (HTM)LL/SC
ScopeSupports multiple memory locations within a transaction.Operates on a single memory location.
GranularityDetects conflicts for large memory regions using cache-line tracking.Limited to a single address with explicit Load/Store operations.
Conflict HandlingDetects and handles conflicts automatically.Relies on retries if the condition (SC succeeds) is violated.
Programming ModelHigh-level abstraction for atomic regions (e.g., transactional blocks).Requires manual loops and retries around LL/SC operations.
Ease of UseEasier for programmers; reduces explicit locking.More low-level, requiring careful management.
Hardware RequirementsRequires dedicated transactional buffers or cache tracking.Relatively simpler hardware support.

The implementation of transactional memory can be either:

  • Software Transactional Memory (STM)
  • Hardware Transactional Memory (HTM)

Granularity of Consistency

Transactional memory provides a different perspective on atomicity and has different view on granularity.

Granularity with Locks

Coarse-grained lockFine-grained lock 1Fine-grained lock 2
Pasted image 20241111152342Pasted image 20241111152414Pasted image 20241111152542
Pasted image 20241111152355Pasted image 20241111152407Pasted image 20241111152425
per histogram lockper column lock
does not expose parallelism at all, all thread will stuck at waiting othersexposes a little parallelism between operating on different diagramsexposes a lot of parallelism by allowing writing to different column of each diagram simultenously

Granularity with Transactional Memory

Coarse-grained transactionFine-grained transaction
Pasted image 20241111152748Pasted image 20241111152741
This is fetch-and-increment, but transaction is more general.
Pasted image 20241111152704Pasted image 20241111152717

Hardware Implementation

Instead of acquiring a lock, the hardware tracks memory reads and writes in the cache (or transactional buffers). Usually, hardware transactional memory marks the address as being read or written usually by adding read bit and write bit to the address.

note

Example: one possible implementation on hardware Pasted image 20241111155113

Conflict Detection and Aborting

Conflict detection is naturally achieved via cache coherence protocol. In example above:

  • Invalidation for a line in the read set = conflict
  • Intervention/invalidation for a line in the write set = conflict

On detecting the conflict, two strategies are:

  • eager: detect the conflict on the remote request happens
  • lazy: only notified when commit happens

The policy of responding to a conflicts is usually:

  • requester wins: Receiver of conflicting request provides data and aborts.
  • requester stalls: Receiver either defers (buffers request) or NACKs (requester retries). Requester only aborts if there's a possibility of deadlock.
  • committer wins: Someone commit first will win. This is compatible with lazy conflict detection.

On committing and aborting transactions:

  • On aborting transaction, it can:
    • simply mark all corresponding cache with M state to be I state, witch will abort all write operations to any address.
    • Clear R/W bits.
    • Signal ABORT to processor, which must rollback execution
  • On retry, new transaction will load non-speculative data as needed.
note

Consider two threads performing operations on a shared variable x using a lock.

Pseudo-Code:

Thread 1:                  Thread 2:
lock(L); lock(L);
x = x + 1; x = x + 2;
unlock(L); unlock(L);
  1. Start Transaction: Both threads detect the lock operation and treat it as a transactional start.
  2. Speculative Execution:
    • Each thread executes the critical section speculatively.
    • Hardware tracks memory operations (x and L) in the cache.
  3. Conflict Detection:
    • If Thread 1 writes x while Thread 2 is also accessing x, the cache coherence protocol detects the conflict.
  4. Conflict Management:
    • Abort: One thread (e.g., Thread 2) aborts its transaction.
    • Retry: The aborted thread retries, either re-entering the speculative region or falling back to the lock.

The above example is a typically Speculative Lock Elision (SLE).

Speculative Lock Elision (SLE)

The lock is treated as a "version number." Instead of acquiring the lock, SLE records a snapshot of memory locations accessed during the critical section. This avoids modifying the lock's state in memory, which reduces contention.

The hardware monitors memory accesses (via cache coherence mechanisms). If another thread modifies a memory location accessed speculatively, a conflict is detected.

On a conflict, SLE aborts the speculative execution, rolls back changes, and retries execution by falling back to the traditional locking mechanism.

Transactional Lock Removal (TLR)

TLR uses transactional memory to maintain a private "working copy" of the data modified during the critical section. The actual lock is ignored during speculative execution.

Similar to SLE, TLR uses hardware mechanisms to detect conflicts through cache coherence protocols. Any write-set or read-set conflict triggers detection.

On a conflict, TLR aborts the transaction and retries execution, potentially reverting to locking. It ensures that only one thread makes forward progress in case of contention.

How SLE and TLR Address Bottlenecks

  • Avoid Overhead: By avoiding actual lock acquisition, both techniques reduce memory contention and cache invalidation associated with locks.
  • Conflict Resolution: They rely on hardware's ability to detect conflicts at fine granularity, leading to more efficient resolution than traditional lock contention.
  • Scalability: Both improve scalability in multi-core systems by reducing the serialization imposed by locks.
FeatureSpeculative Lock Elision (SLE)Transactional Lock Removal (TLR)Locks
Lock InterpretationIgnores lock but speculates mutual exclusionIgnores lock entirely; full transactional semanticsStrictly acquires and releases lock to ensure mutual exclusion
Fallback MechanismAcquires the lock on abortRetries transaction or falls back to lock-
AssumptionsAssumes lock is not critical for logicAssumes critical section can complete without conflictAssumes critical section needs strict serialized access
Conflict DetectionHardware-monitored via cache coherenceHardware-monitored via cache coherenceNo explicit conflict detection, contention leads to blocking or spinning
Conflict ManagementAborts and retries with lockingAborts and retries transaction or locksThreads wait
Version ControlImplicit (speculative snapshots of memory)Maintained via transactional memoryExplicit
Performance OverheadLow unless conflicts occur frequentlyLow unless conflicts occur frequentlyHigh under contention (due to lock contention and thread blocking)
ScalabilityHigh for low-conflict scenariosHigh for low-conflict scenariosPoor scalability under high contention