下面我會分析一下自旋鎖，并代碼實現(xiàn)自旋鎖和互斥鎖的性能對比，以及利用C++11實現(xiàn)自旋鎖。

一：自旋鎖（spin lock）

自旋鎖是一種用于保護多線程共享資源的鎖，與一般互斥鎖（mutex）不同之處在于當自旋鎖嘗試獲取鎖時以忙等待（busy wai ting）的形式不斷地循環(huán)檢查鎖是否可用。

在多CPU的環(huán)境中，對持有鎖較短的程序來說，使用自旋鎖代替一般的互斥鎖往往能夠提高程序的性能。

最后加粗的句子很重要，本文將針對該結(jié)論進行驗證。

下面是man手冊中對自旋鎖pthread_spin_lock()函數(shù)的描述：

DESCRIPTION The pthread_spin_lock() function shall lock the spin lock referenced by lock. The calling thread shall acquire the lock if it is not held by another thread. Otherwise, the thread shall spin (that is, shall not return from the pthread_spin_lock() call) until the lock becomes available. The results are undefined if the calling thread holds the lock at the time the call is made. The pthread_spin_trylock() function shall lock the spin lock referenced by lock if it is not held by any thread. Otherwise, the function shall fail. The results are undefined if any of these functions is called with an uninitialized spin lock.

可以看出，自選鎖的主要特征：當自旋鎖被一個線程獲得時，它不能被其它線程獲得。如果其他線程嘗試去phtread_spin_lock()獲得該鎖，那么它將不會從該函數(shù)返回，而是一直自旋（spin），直到自旋鎖可用為止。

使用自旋鎖時要注意：

由于自旋時不釋放CPU，因而持有自旋鎖的線程應該盡快釋放自旋鎖，否則等待該自旋鎖的線程會一直在哪里自旋，這就會浪費CPU時間。
持有自旋鎖的線程在sleep之前應該釋放自旋鎖以便其他咸亨可以獲得該自旋鎖。內(nèi)核編程中，如果持有自旋鎖的代碼sleep了就可能導致整個系統(tǒng)掛起。(下面會解釋）

使用任何鎖都需要消耗系統(tǒng)資源（內(nèi)存資源和CPU時間），這種資源消耗可以分為兩類：

1.建立鎖所需要的資源

2.當線程被阻塞時所需要的資源

POSIX提供的與自旋鎖相關(guān)的函數(shù)有以下幾個，都在中。

int pthread_spin_init(pthread_spinlock_t *lock, int pshared);

初始化spin lock，當線程使用該函數(shù)初始化一個未初始化或者被destroy過的spin lock有效。該函數(shù)會為spin lock申請資源并且初始化spin lock為unlocked狀態(tài)。

有關(guān)第二個選項是這么說的：

If the Thread Process-Shared Synchronization option is supported and the value of pshared is PTHREAD_PROCESS_SHARED, the implementation shall permit the spin lock to be operated upon by any thread that has access to the memory where the spin lock is allocated, even if it is allocated in memory that is shared by multiple processes. If the Thread Process-Shared Synchronization option is supported and the value of pshared is PTHREAD_PROCESS_PRIVATE, or if the option is not supported, the spin lock shall only be operated upon by threads created within the same process as the thread that initialized the spin lock. If threads of differing processes attempt to operate on such a spin lock, the behav‐ ior is undefined.

所以，如果初始化spin lock的線程設置第二個參數(shù)為PTHREAD_PROCESS_SHARED，那么該spin lock不僅被初始化線程所在的進程中所有線程看到，而且可以被其他進程中的線程看到，PTHREAD_PROESS_PRIVATE則只被同一進程中線程看到。如果不設置該參數(shù)，默認為后者。

int pthread_spin_destroy(pthread_spinlock_t *lock);

銷毀spin lock，作用和mutex的相關(guān)函數(shù)類似，就不翻譯了：

The pthread_spin_destroy() function shall destroy the spin lock referenced by lock and release any resources used by the lock. The effect of subsequent use of the lock is undefined until the lock is reinitialized by another call to pthread_spin_init(). The results are undefined if pthread_spin_destroy() is called when a thread holds the lock, or if this function is called with an uninitialized thread spin lock.

不過和mutex的destroy函數(shù)一樣有這樣的性質(zhì)（當初害慘了我）：

The result of referring to copies of that object in calls to pthread_spin_destroy(), pthread_spin_lock(), pthread_spin_try‐ lock(), or pthread_spin_unlock() is undefined.

int pthread_spin_lock(pthread_spinlock_t *lock);

加鎖函數(shù)，功能上文都說過了，不過這么一點值得注意：

EBUSY A thread currently holds the lock. These functions shall not return an error code of [EINTR].

int pthread_spin_trylock(pthread_spinlock_t *lock);

還有這個函數(shù)，這個一般很少用到。

int pthread_spin_unlock(pthread_spinlock_t *lock);

解鎖函數(shù)。不是持有鎖的線程調(diào)用或者解鎖一個沒有l(wèi)ock的spin lock這樣的行為都是undefined的。

二：自旋鎖和互斥鎖的區(qū)別

從實現(xiàn)原理上來講，Mutex屬于sleep-waiting類型的鎖。例如在一個雙核的機器上有兩個線程(線程A和線程B)，它們分別運行在Core0和Core1上。假設線程A想要通過 pthread_mutex_lock操作去得到一個臨界區(qū)的鎖，而此時這個鎖正被線程B所持有，那么線程A就會被阻塞(blocking)，Core0 會在此時進行上下文切換(Context Switch)將線程A置于等待隊列中，此時Core0就可以運行其他的任務(例如另一個線程C)而不必進行忙等待。而Spin lock則不然，它屬于busy-waiting類型的鎖，如果線程A是使用pthread_spin_lock操作去請求鎖，那么線程A就會一直在 Core0上進行忙等待并不停的進行鎖請求，直到得到這個鎖為止。

如果大家去查閱Linux glibc中對pthreads API的實現(xiàn)NPTL(Native POSIX Thread Library) 的源碼的話(使用”getconf GNU_LIBPTHREAD_VERSION”命令可以得到我們系統(tǒng)中NPTL的版本號)，就會發(fā)現(xiàn)pthread_mutex_lock()操作如果沒有鎖成功的話就會調(diào)用system_wait()的系統(tǒng)調(diào)用并將當前線程加入該mutex的等待隊列里。而spin lock則可以理解為在一個while(1)循環(huán)中用內(nèi)嵌的匯編代碼實現(xiàn)的鎖操作(印象中看過一篇論文介紹說在linux內(nèi)核中spin lock操作只需要兩條CPU指令，解鎖操作只用一條指令就可以完成)。有興趣的朋友可以參考另一個名為sanos的微內(nèi)核中pthreds API的實現(xiàn)：mutex.c spinlock.c，盡管與NPTL中的代碼實現(xiàn)不盡相同，但是因為它的實現(xiàn)非常簡單易懂，對我們理解spin lock和mutex的特性還是很有幫助的。

對于自旋鎖來說，它只需要消耗很少的資源來建立鎖；隨后當線程被阻塞時，它就會一直重復檢查看鎖是否可用了，也就是說當自旋鎖處于等待狀態(tài)時它會一直消耗CPU時間。

對于互斥鎖來說，與自旋鎖相比它需要消耗大量的系統(tǒng)資源來建立鎖；隨后當線程被阻塞時，線程的調(diào)度狀態(tài)被修改，并且線程被加入等待線程隊列；最后當鎖可用時，在獲取鎖之前，線程會被從等待隊列取出并更改其調(diào)度狀態(tài)；但是在線程被阻塞期間，它不消耗CPU資源。

因此自旋鎖和互斥鎖適用于不同的場景。自旋鎖適用于那些僅需要阻塞很短時間的場景，而互斥鎖適用于那些可能會阻塞很長時間的場景。

三：自旋鎖與linux內(nèi)核進程調(diào)度關(guān)系

現(xiàn)在我們就來說一說之前的問題，如果臨界區(qū)可能包含引起睡眠的代碼則不能使用自旋鎖，否則可能引起死鎖：

那么為什么信號量保護的代碼可以睡眠而自旋鎖會死鎖呢？

先看下自旋鎖的實現(xiàn)方法吧，自旋鎖的基本形式如下：

spin_lock(&mr_lock):

    //critical region

    spin_unlock(&mr_lock);

跟蹤一下spin_lock(&mr_lock)的實現(xiàn)

#define spin_lock(lock) _spin_lock(lock)

#define _spin_lock(lock) __LOCK(lock)

#define __LOCK(lock)

do { preempt_disable(); __acquire(lock); (void)(lock); } while (0)

注意到“preempt_disable()”，這個調(diào)用的功能是“關(guān)搶占”（在spin_unlock中會重新開啟搶占功能）。從中可以看出，使用自旋鎖保護的區(qū)域是工作在非搶占的狀態(tài)；即使獲取不到鎖，在“自旋”狀態(tài)也是禁止搶占的。了解到這，我想咱們應該能夠理解為何自旋鎖保護的代碼不能睡眠了。試想一下，如果在自旋鎖保護的代碼中間睡眠，此時發(fā)生進程調(diào)度，則可能另外一個進程會再次調(diào)用spinlock保護的這段代碼。而我們現(xiàn)在知道了即使在獲取不到鎖的“自旋”狀態(tài)，也是禁止搶占的，而“自旋”又是動態(tài)的，不會再睡眠了，也就是說在這個處理器上不會再有進程調(diào)度發(fā)生了，那么死鎖自然就發(fā)生了。

總結(jié)下自旋鎖的特點：

單CPU非搶占內(nèi)核下：自旋鎖會在編譯時被忽略（因為單CPU且非搶占模式情況下，不可能發(fā)生進程切換，時鐘只有一個進程處于臨界區(qū)（自旋鎖實際沒什么用了）
單CPU搶占內(nèi)核下：自選鎖僅僅當作一個設置搶占的開關(guān)（因為單CPU不可能有并發(fā)訪問臨界區(qū)的情況，禁止搶占就可以保證臨街區(qū)唯一被擁有）
多CPU下：此時才能完全發(fā)揮自旋鎖的作用，自旋鎖在內(nèi)核中主要用來防止多處理器中并發(fā)訪問臨界區(qū)，防止內(nèi)核搶占造成的競爭。

四：linux發(fā)生搶占的時間

linux搶占發(fā)生的時間，搶占分為用戶搶占和內(nèi)核搶占。

用戶搶占在以下情況下產(chǎn)生：

從系統(tǒng)調(diào)用返回用戶空間
從中斷處理程序返回用戶空間

內(nèi)核搶占會發(fā)生在：

當從中斷處理程序返回內(nèi)核空間的時候，且當時內(nèi)核具有可搶占性
當內(nèi)核代碼再一次具有可搶占性的時候（如：spin_unlock時）
如果內(nèi)核中的任務顯示的調(diào)用schedule() （這個我暫時不太懂）

基本的進程調(diào)度就是發(fā)生在時鐘中斷后，并且發(fā)現(xiàn)進程的時間片已經(jīng)使用完了，則發(fā)生進程搶占。通常我們會利用中斷處理程序返回內(nèi)核空間的時候可進行內(nèi)核搶占這個特性來提高一些I/O操作的實時性，如：當I/O事件發(fā)生的時候，對應的中斷處理程序被激活，當它發(fā)現(xiàn)有進程在等待這個I/O事件的時候，它會激活等待進程，并且設置當前正在執(zhí)行進程的need_resched標志，這樣在中斷處理程序返回的時候，調(diào)度程序被激活，原來在等待I/O事件的進程（很可能）獲得執(zhí)行權(quán)，從而保證了對I/O事件的相對快速響應（毫秒級）?？梢钥闯觯贗/O事件發(fā)生的時候，I/O事件的處理進程會搶占當前進程，系統(tǒng) 的響應速度與調(diào)度時間片的長度無關(guān)。

五：spin_lock和mutex實際效率對比

1.++i是否需要加鎖?

我分別使用POSIX的spin_lock和mutex寫了兩個累加的程序，啟動了兩個線程，并利用時間戳計算它們執(zhí)行完累加所用的時間。

下面這個是使用spin_lock的代碼，我啟動兩個線程同時對num進行++，使用spin_lock保護臨界區(qū)，實際上可能會有疑問++i（++i和++num本文中是一個意思）為什么還要加鎖？

i++需要加鎖是很明顯的事情，對i++的操作的印象是，它一般是三步曲，從內(nèi)存中取出i放入寄存器中，在寄存器中對i執(zhí)行inc操作，然后把i放回內(nèi)存中。這三步明顯是可打斷的，所以需要加鎖。

但是++i可能就有點猶豫了。實際上印象流是不行的，來看一下i++和++i的匯編代碼，其實他們是一樣的，都是三步，我只上一個圖就行了，如下：

所以++i也不是原子操作，在多核的機器上，多個線程在讀取內(nèi)存中的i時，可能讀取到同一個值，這就導致多個線程同時執(zhí)行+1，但實際上它們得到的結(jié)果是一樣的，即i只加了一次。還有一點：這幾句匯編正說明了++i和i++i對于效率是一樣的，不過這只是針對內(nèi)建POD類型而言，如果是class的話，我們都寫過類的++運算符的重載，如果一個類在單個語句中不寫++i，而是寫i++的話，那無疑效率會有很大的損耗。（有點跑題）

2.spin_lock代碼

首先是spin_lock實現(xiàn)兩個線程同時加一個數(shù)，每個線程均++num，然后計算花費的時間。

#include < iostream >
#include < thread >

#include < pthread.h >
#include < sys/time.h >
#include < unistd.h >

int num = 0;
pthread_spinlock_t spin_lock;

int64_t get_current_timestamp()
{
    struct timeval now = {0, 0};
    gettimeofday(&now, NULL);
    return now.tv_sec * 1000 * 1000 + now.tv_usec;
}

void thread_proc()
{
    for(int i=0; i< 100000000; ++i){
        pthread_spin_lock(&spin_lock);
        ++num;
        pthread_spin_unlock(&spin_lock);
    }   
}

int main()
{
    pthread_spin_init(&spin_lock, PTHREAD_PROCESS_PRIVATE);//maybe PHREAD_PROCESS_PRIVATE or PTHREAD_PROCESS_SHARED

    int64_t start = get_current_timestamp();

    std::thread t1(thread_proc), t2(thread_proc);
    t1.join();
    t2.join();

    std::cout< "num:"<

3.mutex代碼

#include < iostream >
#include < thread >

#include < pthread.h >
#include < sys/time.h >
#include < unistd.h >

int num = 0;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

int64_t get_current_timestamp()
{   
    struct timeval now = {0, 0}; 
    gettimeofday(&now, NULL);
    return now.tv_sec * 1000 * 1000 + now.tv_usec;
}

void thread_proc()
{
    for(int i=0; i< 1000000; ++i){
        pthread_mutex_lock(&mutex);
        ++num;
        pthread_mutex_unlock(&mutex);
    }   
}

int main()
{
    int64_t start = get_current_timestamp();
   std::thread t1(thread_proc), t2(thread_proc);
    t1.join();
    t2.join();
    std::cout< "num:"<

4.結(jié)果分析

得出的結(jié)果如圖，num是最終結(jié)果，cost是花費時間，單位為us，main2是使用spin lock，

顯然，在臨界區(qū)只有++num這一條語句的情況下，spin lock相對花費的時間短一些，實際上它們有可能接近的情況，取決于CPU的調(diào)度情況，但始終會是spin lock執(zhí)行的效率在本情況中花費時間更少。

我修改了兩個程序中臨界區(qū)的代碼，改為：

for(int i=0; i< 1000000; ++i){
        pthread_spin_lock(&spin_lock);
        ++num;
        for(int i=0; i< 100; ++i){
            //do nothing
        }   
        pthread_spin_unlock(&spin_lock);
    }

另一個使用mutex的程序也加了這么一段，然后結(jié)果就與之前的情況大相徑庭了：

實驗結(jié)果是如此的明顯，僅僅是在臨界區(qū)內(nèi)加了一個10圈的循環(huán)，spin lock就需要花費比mutex更長的時間了。

所以， spin lock雖然lock/unlock的性能更好（花費很少的CPU指令），但是它只適應于臨界區(qū)運行時間很短的場景。實際開發(fā)中，程序員如果對自己程序的鎖行為不是很了解，否則使用spin lock不是一個好主意。更保險的方法是使用mutex，如果對性能有進一步的要求，那么再考慮spin lock。

六：使用C++實現(xiàn)自主實現(xiàn)自旋鎖

由于前面原理已經(jīng)很清楚了，現(xiàn)在直接給代碼如下：

#pragma once

#include < atomic >

class spin_lock {
private:
    std::atomic< bool > flag = ATOMIC_VAR_INIT(false);
public:
    spin_lock() = default;
    spin_lock(const spin_lock&) = delete;
    spin_lock& operator=(const spin_lock) = delete;
    void lock(){   //acquire spin lock
        bool expected = false;
        while(!flag.compare_exchange_strong(expected, true));
            expected = false;    
    }   
    void unlock(){   //release spin lock
        flag.store(false);
    }   
};

測試文件，僅給出關(guān)鍵部分：

int num = 0;
spin_lock sm; 

void thread_proc()
{
    for(int i=0; i< 10000000; ++i){
        sm.lock();
        ++num;
        sm.unlock();
    }   
}

好的，對自旋鎖的總結(jié)就先到這里了。

聲明：本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點僅代表作者本人，不代表電子發(fā)燒友網(wǎng)立場。文章及其配圖僅供工程師學習之用，如有內(nèi)容侵權(quán)或者其他違規(guī)問題，請聯(lián)系本站處理。舉報投訴