注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Tsecer的回音岛

Tsecer的博客

 
 
 

日志

 
 

文件重命名、awk匹配管道符及其它  

2013-11-17 18:48:30|  分类: linux知识 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |
一、文件重命名
这一点在linux下实现比较简单,虽然linux下的rename命令功能也比较简单,但是该工具的源代码中就已经说明这个工作的实现可以借助强大的shell和其它外部工具sed来方便的实现,这个rename只是一个简单脚本的C语言实现而已。Linux下对于单个文件的重命名通过mv命令来实现,这一点看起来通常有些让人困惑,但是习惯就好,通常的文件重命名都是通过这个mv命令来实现。
对于windows下的文件重命名,想当然的也要使用for循环来进行替换,奈何windows下的for命令非常难用(事实上,整个cmd都是如此),还存在变量延迟展开等问题。这些是语法问题,还有一些常规的理念问题,例如cmd工具并不会将命令行中的通配符自动展开为文件名传递给可执行程序,而是原封不动的拷贝。对于cmd下的rename来说,这个工具renmae参数中通配符在不同位置的意义还不相同。
在网络上找到一篇关于windos下rename命令各个参数的意义,总结的非常详细和专业,由于是国外的文章,在给出链接同时,把原文章备份一份.总起来特点有下面一些:
1、cmd不会对命令行中的通配符 * ?做自动展开后传给给应用程序,这一点和shell不同。
2、重命名源和目的中通配符代表的意义不同。
3、文件名和扩展名之间的'.'在整个匹配过程中有非常特殊的作用。

sourceMask

The sourceMask works as a filter to determine which files are renamed. The wildcards work here the same as with any other command that filters file names.

  • ? - Matches any 0 or 1 character except . This wildcard is greedy - it always consumes the next character if it is not a . However it will match nothing without failure if at name end or if the next character is a .

  • * - Matches any 0 or more characters including . (with one exception below). This wildcard is not greedy. It will match as little or as much as is needed to enable subsequent characters to match.

All non-wildcard characters must match themselves, with a few special case exceptions.

  • . - Matches itself or it can match the end of name (nothing) if no more characters remain. (Note - a valid Windows name cannot end with .)

  • {space} - Matches itself or it can match the end of name (nothing) if no more characters remain. (Note - a valid Windows name cannot end with {space})

  • *. at the end - Matches any 0 or more characters except . The terminating . can actually be any combination of . and {space} as long as the very last character in the mask is . This is the one and only exception where * does not simply match any set of characters.

The above rules are not that complex. But there is one more very important rule that makes the situation confusing: The sourceMask is compared against both the long name and the short 8.3 name (if it exists). This last rule can make interpretation of the results very tricky, because it is not always obvious when the mask is matching via the short name.

It is possible to use RegEdit to disable the generation of short 8.3 names on NTFS volumes, at which point interpretation of file mask results is much more straight forward. Any short names that were generated before disabling short names will remain.

targetMask

Note - I haven't done any rigorous testing, but it appears these same rules also work for the target name of the COPY commmand

The targetMask specifies the new name. It is always applied to the full long name; The targetMask is never applied to the short 8.3 name, even if the sourceMask matched the short 8.3 name.

The presence or absence of wildcards in the sourceMask has no impact on how wildcards are processed in the targetMask.

In the following discussion - c represents any character that is not *, ?, or .

The targetMask is processed against the source name strictly from left to right with no back-tracking.

  • c - Advances the position within the source name as long as the next character is not . and appends c to the target name. (Replaces the character that was in source with c, but never replaces .)

  • ? - Matches the next character from the source long name and appends it to the target name as long as the next character is not . If the next character is . or if at the end of the source name then no character is added to the result and the current position within the source name is unchanged.

  • * at end of targetMask - Appends all remaining characters from source to the target. If already at the end of source, then does nothing.

  • *c - Matches all source characters from current position through the last occurance of c (case sensitive greedy match) and appends the matched set of characters to the target name. If c is not found, then all remaining characters from source are appended, followed by c This is the only situation I am aware of where Windows file pattern matching is case sensitive.

  • *. - Matches all source characters from current position through the last occurance of . (greedy match) and appends the matched set of characters to the target name. If . is not found, then all remaining characters from source are appended, followed by .

  • *? - Appends all remaining characters from source to the target. If already at end of source then does nothing.

  • . without * in front - Advances the position in source through the first occurance of . without copying any characters, and appends . to the target name. If . is not found in the source, then advances to the end of source and appends . to the target name.

After the targetMask has been exhausted, any trailing . and {space} are trimmed off the end of the resulting target name because Windows file names cannot end with . or {space}

二、awk对于正则表达式的处理
tsecer@harry:/home/tsecer/split>echo "tripple|||pipe|||seperator|||" | awk -F'|||' '{print NF}'
1
awk没有识别出三个连续符号的分隔符,此时猜测awk将符号当作了正则表达式的‘或’选择符,使用第二个版本
tsecer@harry:/home/tsecer/split>echo "tripple|||pipe|||seperator|||" | awk -F'\|\|\|' '{print NF}'
awk: warning: escape sequence `\|' treated as plain `|'
1
从awk的代码看,awk对字符串提前做了一层字节的转义处理(相对于shell及正则表达式库所做的转义),把符号前转义直接删除。此时需要再加强用药:
tsecer@harry:/home/tsecer/split>echo "tripple|||pipe|||seperator|||" | awk -F'\\|\\|\\|' '{print NF}'
4
还有一种方法,就是使用集合,虽然集合中只有一个元素(从这单看出集合优先级高于选择优先级):
tsecer@harry:/home/tsecer/split>echo "tripple|||pipe|||seperator|||" | awk -F'[|][|][|]' '{print NF}'
4
三、终端输入输出缓冲区
在最早接触计算机的时候,printf hello world始终是一个入门级知识点。后来用到sscanf来所描终端输入。由于printf的输出和sscanf的内容在同一个终端上交叉显示,有时候就担心会不会scanf读到printf的输出呢?这个问题看起来比较荒诞,事实上不会发生,但是还是想知道下为什么。之后看了内核的实现,发现基本的问题在于终端显示用户的输入只是终端回显打开时附带出发的一个机制,而向终端写入的内容根本就没有在本终端中留下痕迹,而是直接写到了对方的读缓冲区。
以我们最为常见的伪终端为例,它们底层的tty_driver使用相同的驱动,都是pty.c文件中的pty_ops,在写入时操作时
static int pty_write(struct tty_struct * tty, const unsigned char *buf, int count)
{
    struct tty_struct *to = tty->link;
    int    c;

    if (!to || tty->stopped)
        return 0;

    c = to->receive_room;
    if (c > count)
        c = count;
    to->ldisc.receive_buf(to, buf, NULL, c);
   
    return c;
}
一方写入时,直接调用的是对端的接收接口,该数据不会在本tty_struct结构中留下痕迹。
  评论这张
 
阅读(660)| 评论(0)
推荐 转载

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2017