注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Tsecer的回音岛

Tsecer的博客

 
 
 

日志

 
 

i386 Spinlock和PPC中原子操作  

2011-09-18 02:17:52|  分类: Linux内核 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

一、引出问题

本来这两个问题相关性并不是很大,放在一起只是因为这两个都是一些细节的东西,没有必要单独放置在两篇独立的文章中,所以就这么先挤在一起吧。

①、386的spinlock

看386内核代码的spinlock,里面有一个

rep;nop;

指令,这个指令在熟悉386汇编的人看起来是比较无厘头的,而spinlock又是内核中一个重要的同步方法,所以还是要刨根问底一下。

②、powerpc的原子操作

在新的内核的futex机制中,其实依赖了用户态的一个原子性的cmpx指令,也就是比较一个内存mem内的值是否为Val,如果是则替换为NewVal,然后返回原始值。这个指令比较复杂,由于386提供了这么一个指令,所以实现就比较简单,剩下的我们应该看一下powerpc这种RISC机型是如何实现的。

二、386的spinlock

linux-2.6.21\include\asm-i386\spinlock.h

static inline void __raw_spin_lock(raw_spinlock_t *lock)
{
 asm volatile("\n1:\t"
       LOCK_PREFIX " ; decb %0\n\t"
       "jns 3f\n"
       "2:\t"
       "rep;nop\n\t"者就是那个著名的rep nop指令
       "cmpb $0,%0\n\t"
       "jle 2b\n\t"
       "jmp 1b\n"
       "3:\n\t"
       : "+m" (lock->slock) : : "memory");
}

关于这个指令,我在万能的www.stackoverflow.com上找到了一个专业的解释,说它专业是因为给出了文档和指令说明,一点都不含糊。

下面有两篇文章对这个问题进行了说明,这一篇中文的文档,http://www.groad.net/bbs/read.php?tid-3373.html

另一个是stackoverflow上的说明http://stackoverflow.com/questions/7086220/what-does-rep-nop-mean-in-x86-assembly

下面是intel开发手册中关于这条指令的描述。大致来说,就是说:pause这条指令在P4的时候才引入的,为了给处理器一个spin loop Hint,就是告诉处理器,现在在循环等待一个事件,此时通过流水线,指令重排操作等都是徒劳的,所以您就别费劲了。这样折腾除了提高CPu的温度之外没有任何好处。但是里面说到,这个指令在不识别pause这个组合的早期版本中还是兼容的,完全等价于一条nop,所以这点看就不厚道了,在nop前加个rep什么作用都没有,大家虚惊一场。

现在的问题是为什么 rep nop可以生成这个pause的机器码呢?

我们在GNU的Binutils的源代码中搜索了一下rep,看到了这个指令的定义

binutils-2.21.1\opcodes\i386-tbl.h

  { "rep", 0, 0xf3, None, 1,
这也就是说rep对应的指令码就是0xf3,然后看一下nop的机器码

Opcode Instruction 64-Bit ModeCompat/Leg Mode       Description
90       NOP        Valid      Valid               One byte no-operation instruction.

所以汇编器生成的就是这个pause的机器码。

PAUSE—Spin Loop Hint
Opcode Instruction  64-BitMode Compat/Leg Mode  Description
F3 90    PAUSE        Valid        Valid          Gives hint to processor that improves

Description
Improves the performance of spin-wait loops. When executing a “spin-wait loop,” a Pentium 4or Intel Xeon processor suffers a severe performance penalty when exiting the loop because itdetects a possible memory order violation. The PAUSE instruction provides a hint to theprocessor that the code sequence is a spin-wait loop. The processor uses this hint to avoid thememory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops.
An additional function of the PAUSE instruction is to reduce the power consumed by aPentium 4 processor while executing a spin loop. The Pentium 4 processor can execute a spinwaitloop extremely quickly, causing the processor to consume a lot of power while it waitsfor the resource it is spinning on to become available. Inserting a pause instruction in a spinwait loop greatly reduces the processor’s power consumption.
This instruction was introduced in the Pentium 4 processors, but is backward compatible withall IA-32 processors. In earlier IA-32 processors, the PAUSE instruction operates like a NOPinstruction. The Pentium 4 and Intel Xeon processors implement the PAUSE instruction as apre-defined delay. The delay is finite and can be zero for some processors. This instruction doesnot change the architectural state of the processor (that is, it performs essentially a delayingno-op operation).
This instruction’s operation is the same in non-64-bit modes and 64-bit mode.

三、PowerPC中的原子操作

这个虽然PowerPC中没有牛X的cmpx指令,但是有一个lwarx指令和对应的stwcx指令,这个指令时需要硬件支持的。它大致的原理是这样的,通过lwarx告诉处理器,我要保留一个内存的地址,你要帮我站岗,监视总线(Bus snoop),如果有人修改了这个内存的地址,之后要告诉我。告诉的方式就是通过stwcx,如果失败,设置特殊标志位,这样我就再来一次,直到没人再修改为止

放个指令的地址的说明,来自IBMhttp://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.aixassem%2Fdoc%2Falangref%2Flwarx.htm

warx (Load Word and Reserve Indexed) Instruction

Purpose

Used in conjunction with a subsequent stwcx. instruction to emulate a read-modify-write operation on a specified memory location.

Note: The lwarx instruction is supported only in the PowerPC? architecture.

Syntax

Bits Value
0-5 31
6-10 RT
11-15 RA
16-20 RB
21-30 20
31 /
PowerPC? 
lwarx RT, RA, RB

Description

The lwarx and stwcx. instructions are primitive, or simple, instructions used to perform a read-modify-write operation to storage. If the store is performed, the use of the lwarx and stwcx. instructions ensures that no other processor or mechanism has modified the target memory location between the time the lwarx instruction is executed and the time the stwcx. instruction completes.

If general-purpose register (GPR) RA = 0, the effective address (EA) is the content of GPR RB. Otherwise, the EA is the sum of the content of GPR RA plus the content of GPR RB.

The lwarx instruction loads the word from the location in storage specified by the EA into the target GPR RT. In addition, a reservation on the memory location is created for use by a subsequent stwcx. instruction.

The lwarx instruction has one syntax form and does not affect the Fixed-Point Exception Register. If the EA is not a multiple of 4, the results are boundedly undefined.

Parameters

RT Specifies target general-purpose register where result of operation is stored.
RA Specifies source general-purpose register for EA calculation.
RB Specifies source general-purpose register for EA calculation.

Examples

  1. The following code performs a "Fetch and Store" by atomically loading and replacing a word in storage:

    # Assume that GPR 4 contains the new value to be stored.  # Assume that GPR 3 contains the address of the word  # to be loaded and replaced.  loop:   lwarx   r5,0,r3          # Load and reserve          stwcx.  r4,0,r3          # Store new value if still                                   # reserved          bne-    loop             # Loop if lost reservation  # The new value is now in storage.  # The old value is returned to GPR 4. 
  2. The following code performs a "Compare and Swap" by atomically comparing a value in a register with a word in storage:

    # Assume that GPR 5 contains the new value to be stored after  # a successful match.  # Assume that GPR 3 contains the address of the word  # to be tested.  # Assume that GPR 4 contains the value to be compared against  # the value in memory.  loop:   lwarx   r6,0,r3          # Load and reserve          cmpw    r4,r6            # Are the first two operands                                   # equal?          bne-    exit             # Skip if not equal          stwcx.  r5,0,r3          # Store new value if still                                   # reserved          bne-    loop             # Loop if lost reservation  exit:   mr      r4,r6            # Return value from storage  # The old value is returned to GPR 4.  # If a match was made, storage contains the new value.

    If the value in the register equals the word in storage, the value from a second register is stored in the word in storage. If they are unequal, the word from storage is loaded into the first register and the EQ bit of the Condition Register field 0 is set to indicate the result of the comparison.

stwcx. (Store Word Conditional Indexed) Instruction

lwarx instruction to emulate a read-modify-write operation on a specified memory location.Value0-5316-10RS11-15RA16-20RB21-30150311

RS, RA, RBlwarx instructions are primitive, or simple, instructions used to perform a read-modify-write operation to storage. If the store is performed, the use of the stwcx. and lwarx instructions ensures that no other processor or mechanism has modified the target memory location between the time the lwarx instruction is executed and the time the stwcx. instruction completes.lwarx (Load Word and Reserve Indexed) instruction.

  评论这张
 
阅读(1302)| 评论(0)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017