弹剑而歌

星期四, 五月 17, 2007

bash while read line and variable scope?

比较下面两个 bash 脚本：

script (1)
echo -e "aaa\nbbb\nccc\nddd" >tmpfile
value=0
while read line; do
    value=`expr $value + 1`
    echo $value; 
done "<"tmpfile
echo "at last: $value"

result:
1
2
3
4
at last: 4

script (2):
value=0
echo -e "aaa\nbbb\nccc\nddd" | while read line; do 
    value=`expr $value + 1`
    echo $value
done
echo "at last: $value"

result:
1
2
3
4
at last: 0

在第二个脚本中，全局变量的变更没有起作用。这是因为 bash 中管道是在子 shell 中执行的。所以这里只能使用重定向。

参考：comp.unix.shell FAQ

This is because in the Bourne shell redirected control structures
run in a subshell
In shells other than ksh (not pdksh) and
zsh elements of a pipeline are run in subshells.

可以用如下的脚本验证

#!/bin/sh

echo -e "aaa" >tmpfile
while read line; do
        ps aux | grep "$0" | grep -v grep
done "<"tmpfile
rm -f tmpfile

echo "---"
echo -e "aaa" | while read line; do
        ps aux | grep "$0" | grep -v grep
done

a MySQL charset problem

有一个 dump1.sql 文件，其中中文字符集使用的是 UTF-8，但 CREATE 语句的 DEFAULT CHARSET=GBK，所以导入时是会出问题的：

sh$ mysql -u root test "<"dump1.sql
ERROR 1062 (23000) at line 5653: Duplicate entry '????' for key 2

进入数据库可以看到中文字段都是?，所以中文字数相同就会重复。要解决这个问题，首先需要对文件转码，然后使用正确的参数导入。

当然数据库首先必须支持 GBK 字符集，可以用 SHOW CHARACTER SET 查看所有支持的字符集，如果不支持，可以在编译的 ./configure 增加 --with-extra-charsets=gbk 参数。

然后如下操作：

转码：
sh$ iconv -f UTF-8 -t GBK -o dump2.sql dump1.sql
导入：
sh$ mysql -u root -p --default-character-set=gbk test "<"dump2.sql

星期二, 五月 15, 2007

sed -n "$i{p}"

如果要在循环中，从一个文件或字符串结果中过滤第 $i 行，用 sed -n "$ip" 肯定不行，那么可以增加使用 {} 号，将变量和命令隔开，即 sed -n "$i{p}"。例如：

sh$ snmpwalk -v2c -c demo 192.168.0.98 UCD-SNMP-MIB:prNames | sed -n "$i{p}"
UCD-SNMP-MIB::prNames.1 = STRING: vsftpd

bash getopt

先看一下几个简单的例子，就可以了解 getopt 大致怎样使用了：

sh$ args=`getopt abc:d -a -b -c opt`
sh$ echo $args
-a -b -c opt --
sh$ args=`getopt -l long1,long2: abc:d -a -b -c opt1 --long1 opt2 arg1 arg2`
sh$ echo $args
-a -b -c 'opt1' --long1 -- 'opt2' 'arg1' 'arg2'
sh$ args=`getopt -l long1,long2: abc:d -a -b -c opt1 --long2 opt2 arg1 arg2`
sh$ echo $args
-a -b -c 'opt1' --long2 'opt2' -- 'arg1' 'arg2'

那么在脚本中，则通常可以这样：

args=`getopt abc:d $*`
echo $args

一个比较完整的例子如下：

args=`getopt -l help,item:items-list: c:h:m:i:I: $*`
if [ $? -gt 0 ]; then
    strerr="Invalid options"
    echo "$strerr" >&2
    logger -it "$strerr"
    exit 1
fi

for i in $args; do
    case $i in
    -c) shift; community=$1; shift;;
    -h) shift; host=$1; shift;;
    -m) shift; mailto=$1; shift;;
    -i|--item)
        shift
        if [ -n "$items" ]; then
            items=`echo -e "$items\n$1"`
        else
            items="$1"
        fi
        shift
        ;;
    -I|--items-list)
        shift; ilist=$1; shift;;
    --help)
        shift
        echo "useage: $PROGRAM [-c|-h|-m] [--item|--help]
    -c community
    -h host
    -m mailto
    -i|--item item_map, 'community host' map
    --help, print this message"
        exit 0
        ;;
    esac
done

python datetime object from time.localtime()

datetime 作为一个 class 对象，其操作较 time 模块更直观和直接，唯一不方便的是 datetime 必须使用给定的参数来生成，而 time 模块可以直接使用 time.localtime() 方法生成当前时间的对象。那么可以结合这两个模块的优点：

>>> d = datetime.datetime(*time.localtime()[0:6])
>>> print d
2007-05-15 09:35:06
>>>

注意在参数中使用的 * 号，这是一个之前没有认真注意过的有用的特性。

星期五, 五月 11, 2007

python merge regular expression from file

#!/usr/bin/env python

import re

# record = open('test.txt', 'r').read()
record = '60.176.247.144 - - [09/May/2007:16:40:54 +0800] "GET /bbs/thread-19160-1-7.html HTTP/1.1" 200 22481 "http://www.google.com/search?hl=en&q=%E6%94%AF%E4%BB%98%E5%AE%9D+HAS_NO_PRIVILEGE&btnG=Google+Search" "Mozilla/5.0 (Windows; U; Windows NT 5.2; zh-CN; rv:1.8.1.3) Gecko/20070309 (FoxPlus) Firefox/2.0.0.3"'

regexp1 = r'(?P[\d\.]+) - - (?P\[.*\]) "GET (?P.*) HTTP/[\d\.]+" \d+ \d+ "(?Phttp://.*)" ".*"'
recmo = re.match(regexp1, record)
recmo.group('rlink')

r = open('regexps.conf', 'r').read()
# content:
#    google.com, r'search.*[?&]q=(?P.*)&.*'
r = r.strip()
print r
rmo = re.match('\s*([\w\.]+)\s*,\s*r\'(.*)\'', r)
print rmo.groups()
site, regexp2 = rmo.groups()
print site, regexp2
regexp3 = r'http://[\w\.]*(?P%s)/(?P%s)' % (site, regexp2)
# Merge
print recmo.group('rlink')
print regexp3
rc = re.compile(regexp3)
mo = rc.match(recmo.group('rlink'))
print mo.groups()
print mo.group('word')

sh$ python test.py
google.com, r'search.*[?&]q=(?P.*)&.*'
('google.com', 'search.*[?&]q=(?P.*)&.*')
google.com search.*[?&]q=(?P.*)&.*
http://www.google.com/search?hl=en&q=%E6%94%AF%E4%BB%98%E5%AE%9D+HAS_NO_PRIVILEGE&btnG=Google+Search
http://[\w\.]*(?Pgoogle.com)/(?Psearch.*[?&]q=(?P.*)&.*)
('google.com', 'search?hl=en&q=%E6%94%AF%E4%BB%98%E5%AE%9D+HAS_NO_PRIVILEGE&btnG=Google+Search', '%E6%94%AF%E4%BB%98%E5%AE%9D+HAS_NO_PRIVILEGE')
%E6%94%AF%E4%BB%98%E5%AE%9D+HAS_NO_PRIVILEGE

如果改变 regexps.conf 的内容为：

google.com, r'search.*[?&]q=.*&.*'

即去除其中的分组设置，则运行为：

sh$ python test.py
google.com, r'search.*[?&]q=.*&.*'
('google.com', 'search.*[?&]q=.*&.*')
google.com search.*[?&]q=.*&.*
http://www.google.com/search?hl=en&q=%E6%94%AF%E4%BB%98%E5%AE%9D+HAS_NO_PRIVILEGE&btnG=Google+Search
http://[\w\.]*(?Pgoogle.com)/(?Psearch.*[?&]q=.*&.*)
('google.com', 'search?hl=en&q=%E6%94%AF%E4%BB%98%E5%AE%9D+HAS_NO_PRIVILEGE&btnG=Google+Search')
Traceback (most recent call last):
  File "test.py", line 26, in ?
    print mo.group('word')
IndexError: no such group

再把条件弄复杂一点，以保证总由 word 这个 group：

rmo = re.match(r'\s*([\w\.]+)\s*,\s*r\'(.*=\(\?\P\\.\*\).*)\'', r)

星期三, 五月 09, 2007

python list extend

>>> L = [1, 2, 3]
>>> L1 = ['a', 'b']
>>> i = 0
>>> for l in L:
        if i == 0:
            i += 1
     L.extend(L1)
     continue
        print l
2
3
a
b

比较有用的特性，比如在 find_topfiles 时，循环中可以对 /usr/local/* 这样的路径使用 glob()，将新的 list 插入到该条目后面。

python module __variable

sh$ cat module.py
#!/usr/bin/python

_var = 0
__var1 = 1
var2 = 2

def __func():
    print "__func"
func = __func

sh$ cat main.py
#!/usr/bin/python

import module

v = module.__var1
print v
v = module._var
print v
f = module.__func
f()

class test:
    def __init__(self):
        self.v = module._var
        # self.v = module.__var1    # (1)
        # self.f = module.__func    # (2)
        self.v = module.var2
        self.f = module.func        # (3)
    def run(self):
        print self.v
        self.f()

ti = test()
ti.run()

(1) 的输出：

sh$ python main.py
1
0
__func
Traceback (most recent call last):
  File "main.py", line 23, in ?
    ti = test()
  File "main.py", line 15, in __init__
    self.v = module.__var1
AttributeError: 'module' object has no attribute '_test__var1'

(2)的输出：

sh$ python main.py
1
0
__func
Traceback (most recent call last):
  File "main.py", line 23, in ?
    ti = test()
  File "main.py", line 16, in __init__
    self.f = module.__func
AttributeError: 'module' object has no attribute '_test__func'

在 module 层面都可以，但在 class 中 _var 可以但 __var 就不行，应该是和 class pseudo private 冲突了。当然 __var 是以 _ 开头，所以不能用 from module import __var 语法。

vim split window

可以在同一个 vim 中同时打开几个窗口来编辑，这样比较直观：

sh$ vim -O main.py module.py

这样会使用垂直分割，水平风格使用 -o。如果使用 -O3 则会同时打开3个窗口。在一个打开的窗口中，可以使用 CTRL-WV 来复制一个垂直窗口，然后在新的窗口中就可以编辑新的文件，否则两边会同时变化。

窗口之间可以使用 CTRL-TAB 来进行切换(Windows gvim)，Linux 下，使用 CTRL-W,> 也是可以的。

如果使用 :sp newfile 来获得水平分割，可以使用 CTRL-W,j 下移，CTRL-W,k 上移(和编辑移动命令一样)。CTRL-W,_(need SHIFT) 来最小化其它窗口，默认情况下最小化高度是 1(一行)，可以使用 :set wmh=0 设置高度为 0，这样就只显示文件名。这样就可以在 ~/.vimrc 中定义几个 map：

sh$ vi ~/.vimrc
set wmh=0
map "<"C-J">" "<"C-W">"j"<"C-W">"_
map "<"C-K">" "<"C-W">"k"<"C-W">"_

，这样直接按 CTRL-J 就会切换到下一个窗口，并最小化其他窗口！

垂直分割则可以采用这样的方法：

sh$ vi ~/.vimrc
set wmw=0
nmap "<"C-H">" "<"C-W">"h"<"C-W">""<"bar">"
nmap "<"C-L">" "<"C-W">"l"<"C-W">""<"bar">"

然后使用 :vsp newfile (:vs 也可)来使用垂直分割窗口打开一个新文件。但如果要切换回去，比如 CTRL-L 之后，只能用 CTRL-W,H 来切换，所以设置 wmw=16 比较好一点，这样至少可以看到文件名。

Switch between splits very fast (for multi-file editing)
Vim documentation: vim_faq
Vim documentation: windows

这样 ~/.vimrc 为：

set number
set tabstop=4
au BufNewFile,BufRead *.t2t set ft=txt2tags
" colorscheme pablo
" colorscheme desert
set wmh=0
" winminheight
" set wh=999
" winheight
map "<"C-J">" "<"C-W">"j"<"C-W">"_
map "<"C-K">" "<"C-W">"k"<"C-W">"_
set wmw=16
" winwidth
nmap "<"C-H">" "<"C-W">"h"<"C-W">""<"bar">"
nmap "<"C-L">" "<"C-W">"l"<"C-W">""<"bar">"

星期二, 五月 08, 2007

mplayer with srt subtitle

mplayer 使用字幕的方法：

sh$ mplayer -sub 'The Terminal.2004.HDRip.CD1.CHS.srt' \
/mnt/fedora/opt/.aMule/Incoming/*Terminal.2004.HDRip.XviD-CNXP.CD1.avi \
-font ~roc/simsun.ttf \
-subcp cp936 \
-subfont-text-scale 4

使用 v 可以开启/关闭字幕。可以同时选择两个字幕，然后使用 b/j 来轮转：

sh$ mplayer -sub 'The Terminal.2004.HDRip.CD1.CHS.srt','The Terminal.2004.HDRip.CD1.ENG.srt' \ 
/mnt/fedora/opt/.aMule/Incoming/*Terminal.2004.HDRip.XviD-CNXP.CD1.avi \
-font ~roc/simsun.ttf \
-subcp cp936 \
-subfont-text-scale 4

-subfont-text-scale 默认为 5，-subcp cp936 为中文支持，目前只知道使用 ttf(True-Type Font)字体的办法，点阵字体如文泉驿等还不清楚，所以可以使用 simsun.ttf。但考虑到版权问题可以使用 Fedora 的 uming.ttf，实际上显示效果更好。

星期日, 五月 06, 2007

python import(module)

如果要 import 的模块是字符串(比如利用参数进行传递)，可以事业 builtin 的 __import() 函数，例如：

def __init__(self, module='__main__', defaultTest=None,
             argv=None, testRunner=None, testLoader=defaultTestLoader):
    if type(module) == type(''):
        self.module = __import__(module)
        for part in module.split('.')[1:]:
            self.module = getattr(self.module, part)
        else:
            self.module = module

又如：

mod = __import__('random')

当然也可以使用

exec "import %s as mod" % module

来解决。

可以看看 unittest 对它的使用，得到一个更清楚的认识：

class TestLoader:
    ...
    def loadTestsFromModule(self, module):
        """Return a suite of all tests cases contained in the given module"""
        tests = []
        for name in dir(module):
            obj = getattr(module, name)
            if (isinstance(obj, (type, types.ClassType)) and
                issubclass(obj, TestCase)):
                tests.append(self.loadTestsFromTestCase(obj))
        return self.suiteClass(tests)

......

class TestProgram:
    def __init__(self, module='__main__', defaultTest=None,
                 argv=None, testRunner=None, testLoader=defaultTestLoader):
        if type(module) == type(''):
            self.module = __import__(module)
            for part in module.split('.')[1:]:
                self.module = getattr(self.module, part)
        else:
            self.module = module
    ...
    def parseArgs(self, argv):
        import getopt
        try:
            options, args = getopt.getopt(argv[1:], 'hHvq',
                                          ['help','verbose','quiet'])
            for opt, value in options:
                if opt in ('-h','-H','--help'):
                    self.usageExit()
                if opt in ('-q','--quiet'):
                    self.verbosity = 0
                if opt in ('-v','--verbose'):
                    self.verbosity = 2
            if len(args) == 0 and self.defaultTest is None:
                self.test = self.testLoader.loadTestsFromModule(self.module)
                return
            if len(args) > 0:
                self.testNames = args
            else:
                self.testNames = (self.defaultTest,)
            self.createTests()
        except getopt.error, msg:
            self.usageExit(msg)

订阅：博文 (Atom)