注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Tsecer的回音岛

Tsecer的博客

 
 
 

日志

 
 

低版本gdb为什么对于C++作用域识别错误  

2016-11-11 21:49:14|  分类: gdb源代码分析 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |
一、问题
在gdb调试的时候,打断点和查看变量值是一个常见的步骤,但是在使用一些比较早的gdb时,例如我现在使用的6.6版本,在C++的namespace或者C++类的使用时经常出现不识别的问题。下面是精简之后的问题描述:
tsecer@harry: cat gdbnamespace.cpp 
#include <stdio.h>

namespace tsecer
{

template<class C> 
struct T
{
static C& inst()
{
    static C st;
    return st;
}
};

struct S 
{
int i;
void show()
{
    printf("in singleton\n");
}
};

};

using namespace tsecer;

int main()
{
    T<S>::inst().show();
   return 0; 
}
tsecer@harry: g++ gdbnamespace.cpp -g
tsecer@harry: gdb ./a.out 
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-suse-linux"...
Using host libthread_db library "/lib64/libthread_db.so.1".
(gdb) b main
Breakpoint 1 at 0x4005ac: file gdbnamespace.cpp, line 31.
(gdb) r
Starting program: /data/harry/work/gdbnamespace/a.out 
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff22aab000
[Thread debugging using libthread_db enabled]
[New Thread 140188735997664 (LWP 7644)]
[Switching to Thread 140188735997664 (LWP 7644)]

Breakpoint 1, main () at gdbnamespace.cpp:31
31          T<S>::inst().show();
(gdb) p tsecer::T<tsecer::S>::inst()
$1 = (tsecer::S &) @0x500a84: {i = 0}
(gdb) p *(tsecer::S *)0x500a84
A syntax error in expression, near `)0x500a84'.
(gdb) p *('tsecer::S' *)0x500a84
$2 = {i = 0}
(gdb) 

这里看到的一些问题:
1、比较复杂的模版可以识别,这里就是p tsecer::T<tsecer::S>::inst()。
2、相对比较简单的类型识别不出来 p *(tsecer::S *)0x500a84

我其实最开始遇到的是第二个问题,然后看gdb代码的时候遇到也挺有意思的第一个问题。
二、gdb对于表达式的解析
gdb对于C/C++语言的解析主要位于gdb-6.6\gdb\c-exp.y,该文件生成的原件就是位于同一文件夹下的c-exp.c,在c-exp.y文件中,作者对于这个问题作了大段的注释(我把它拷过来刷下字数):
%type <voidval> exp exp1 type_exp start variable qualified_name lcurly
%type <lval> rcurly
%type <tval> type typebase qualified_type
……
/* FIXME: carlton/2003-09-25: This next bit leads to lots of
   reduce-reduce conflicts, because the parser doesn't know whether or
   not to use qualified_name or qualified_type: the rules are
   identical.  If the parser is parsing 'A::B::x', then, when it sees
   the second '::', it knows that the expression to the left of it has
   to be a type, so it uses qualified_type.  But if it is parsing just
   'A::B', then it doesn't have any way of knowing which rule to use,
   so there's a reduce-reduce conflict; it picks qualified_name, since
   that occurs earlier in this file than qualified_type.

   There's no good way to fix this with the grammar as it stands; as
   far as I can tell, some of the problems arise from ambiguities that
   GDB introduces ('start' can be either an expression or a type), but
   some of it is inherent to the nature of C++ (you want to treat the
   input "(FOO)" fairly differently depending on whether FOO is an
   expression or a type, and if FOO is a complex expression, this can
   be hard to determine at the right time).  Fortunately, it works
   pretty well in most cases.  For example, if you do 'ptype A::B',
   where A::B is a nested type, then the parser will mistakenly
   misidentify it as an expression; but evaluate_subexp will get
   called with 'noside' set to EVAL_AVOID_SIDE_EFFECTS, and everything
   will work out anyways.  But there are situations where the parser
   will get confused: the most common one that I've run into is when
   you want to do

     print *((A::B *) x)"

   where the parser doesn't realize that A::B has to be a type until
   it hits the first right paren, at which point it's too late.  (The
   workaround is to type "print *(('A::B' *) x)" instead.)  (And
   another solution is to fix our symbol-handling code so that the
   user never wants to type something like that in the first place,
   because we get all the types right without the user's help!)

   Perhaps we could fix this by making the lexer smarter.  Some of
   this functionality used to be in the lexer, but in a way that
   worked even less well than the current solution: that attempt
   involved having the parser sometimes handle '::' and having the
   lexer sometimes handle it, and without a clear division of
   responsibility, it quickly degenerated into a big mess.  Probably
   the eventual correct solution will give more of a role to the lexer
   (ideally via code that is shared between the lexer and
   decode_line_1), but I'm not holding my breath waiting for somebody
   to get around to cleaning this up...  */

qualified_type: typebase COLONCOLON name

这里作者描述了语法解析的困境,这个好像和gcc模版面临的问题一样(这也就是gcc引入typename关键字的原因):在执行表达式解析的时候,它单独面对这个A::B是解析为变量还是类型?举个栗子,从我们的角度来看,如果遇到A::B::x这样的表达式,那么此时一定是一个类型(变量的话就是语法错误了),但是单单在看到A::B的时候,语法分析器没有办法知道这是一个类型还是一个变量,作者说按照当前的配置,认为是一个"field",其实也就是一个变量,因为在语法文件中qualified_name的定义要比qualified_type靠前(所以语法分析器在解决规约冲突的时候优先选择靠前的表达式,也就是前面%type出现的顺序决定的优先级)。
三、然则何时使用qualified_type呢
按照这个注释,那么qualified_type就没有存在的必要了,因为它和qualified_name完全相同但是优先级更低,那么它就始终没有出场的机会了。其实不然,从语法描述上来看,一些场景下必须使用qualified_type,典型的场景就是sizeof操作符,该语法定义为:
exp : SIZEOF '(' type ')' %prec UNARY
{ write_exp_elt_opcode (OP_LONG);
 write_exp_elt_type (builtin_type (current_gdbarch)->builtin_int);
 CHECK_TYPEDEF ($3);
 write_exp_elt_longcst ((LONGEST) TYPE_LENGTH ($3));
 write_exp_elt_opcode (OP_LONG); }
;
这里明确要求SIZEOF后面的语法为一个类型,所以将会更优先适用类型,同样是上面的例子,我们可以通过sizeof看到结构的大小:
(gdb) p sizeof(tsecer::S)
$3 = 4
(gdb) 
四、为什么模版可以被识别出来
gdb-6.6\gdb\c-exp.y
static int
yylex ()
{
……
  /* It's a name.  See how long it is.  */
  namelen = 0;
  for (c = tokstart[namelen];
       (c == '_' || c == '$' || (c >= '0' && c <= '9')
|| (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '<');)
    {
      /* Template parameter lists are part of the name.
FIXME: This mishandles `print $a<4&&$a>3'.  */

      if (c == '<')
               /* Scan ahead to get rest of the template specification.  Note
                  that we look ahead only when the '<' adjoins non-whitespace
                  characters; for comparison expressions, e.g. "a < b > c",
                  there must be spaces before the '<', etc. */
               
               char * p = find_template_name_end (tokstart + namelen);
               if (p)
                 namelen = p - tokstart;
               break;
}
      c = tokstart[++namelen];
    }
……
    if (sym && SYMBOL_CLASS (sym) == LOC_TYPEDEF)
        {
 /* NOTE: carlton/2003-09-25: There used to be code here to
    handle nested types.  It didn't work very well.  See the
    comment before qualified_type for more info.  */
 yylval.tsym.type = SYMBOL_TYPE (sym);
 return TYPENAME;
        }
    yylval.tsym.type
      = language_lookup_primitive_type_by_name (current_language,
current_gdbarch, tmp);
    if (yylval.tsym.type != NULL)
      return TYPENAME;
……
在上面的代码中,如果是变量后紧跟这'<',则一直解析到'>',并把这个作为一个整体进行变量查找,对于tsecer::T<tsecer::S>::inst()这个例子,其中解析到的name就是'T<tsecer::S>',而符号表中把这个作为一个整体可以查找到该类型,返回的语法类型为TYPENAME,驱动上层规约为一个新的类型,从而继续执行不会报错。添加单引号其实和这里的模版的意义一样,就是要把这个整体作为一个符号由lookup_symbol可以查找到。
五、高版本的gdb(例如7.2)如何处理这个问题
高版本对于A::B::x的情况,将会一直作为type来处理,直到遇到一个不是类型的token结束,同样是以 tsecer::S  为例,这个函数一直解析到S,并把这个作为一个整体返回,从而在词法分析而不是语法分析的时候来产生冲突。正如注释所说,这样做并不理想,但是比前面的实现依然是一个比较大的提升。代码同样位于c-exp.y
/* The outer level of a two-level lexer.  This calls the inner lexer
   to return tokens.  It then either returns these tokens, or
   aggregates them into a larger token.  This lets us work around a
   problem in our parsing approach, where the parser could not
   distinguish between qualified names and qualified types at the
   right point.
   
   This approach is still not ideal, because it mishandles template
   types.  See the comment in lex_one_token for an example.  However,
   this is still an improvement over the earlier approach, and will
   suffice until we move to better parsing technology.  */
static int
yylex (void)
{
  token_and_value current;
  int first_was_coloncolon, last_was_coloncolon, first_iter;

  if (popping && !VEC_empty (token_and_value, token_fifo))
    {
      token_and_value tv = *VEC_index (token_and_value, token_fifo, 0);
      VEC_ordered_remove (token_and_value, token_fifo, 0);
      yylval = tv.value;
      return tv.token;
    }
  popping = 0;

  current.token = lex_one_token ();
  if (current.token == NAME)
    current.token = classify_name (expression_context_block);
  if (parse_language->la_language != language_cplus
      || (current.token != TYPENAME && current.token != COLONCOLON))
    return current.token;

  first_was_coloncolon = current.token == COLONCOLON;
  last_was_coloncolon = first_was_coloncolon;
  obstack_free (&name_obstack, obstack_base (&name_obstack));
  if (!last_was_coloncolon)
    obstack_grow (&name_obstack, yylval.sval.ptr, yylval.sval.length);
  current.value = yylval;
  first_iter = 1;
  while (1)
    {
      token_and_value next;

      next.token = lex_one_token ();
      next.value = yylval;

      if (next.token == NAME && last_was_coloncolon)
{
 int classification;

 classification = classify_inner_name (first_was_coloncolon
? NULL
: expression_context_block,
first_iter);
 /* We keep going until we either run out of names, or until
    we have a qualified name which is not a type.  */
 if (classification != TYPENAME)
   {
     /* Push the final component and leave the loop.  */
     VEC_safe_push (token_and_value, token_fifo, &next);
     break;
   }

 /* Update the partial name we are constructing.  */
 if (!first_iter)
   {
     /* We don't want to put a leading "::" into the name.  */
     obstack_grow_str (&name_obstack, "::");
   }
 obstack_grow (&name_obstack, next.value.sval.ptr,
next.value.sval.length);

 yylval.sval.ptr = obstack_base (&name_obstack);
 yylval.sval.length = obstack_object_size (&name_obstack);
 current.value = yylval;
 current.token = classification;

 last_was_coloncolon = 0;
}
      else if (next.token == COLONCOLON && !last_was_coloncolon)
last_was_coloncolon = 1;
      else
{
 /* We've reached the end of the name.  */
 VEC_safe_push (token_and_value, token_fifo, &next);
 break;
}

      first_iter = 0;
    }

  评论这张
 
阅读(60)| 评论(0)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017