弹剑而歌: 六月 2007

星期五, 六月 22, 2007

python reload()

当在交互界面调试 module 的时候，如果 import 了这个 module，然后又作出了更改，可以使用 reload() 重新加载这个模块来同步变化，否则只能退出再进入，那会导致其他的设定丢失造成不便：

>>> import tree
>>> ...
>>> tree = reload(tree)

python Tree root.[inexistent].branch ?

如果
root = Tree(value)
root.trunk.branch = value1
而 trunk 不存在，能否自动创建这个 Tree Node Container？

是应该利用 root 的 __getattr__ 还是应该利用 __setattr__ ？因为这时候 trunk 根本就不存在，这时也就根本无从利用其 __setattr__，而对于 root，可以肯定的是在进行 root.trunk.branch = value1 的操作时，肯定是 __getattr__ 被调用！可以看下面的例子：

>>> class test:
...     def __init__(self):
...         self.x = 1
...     def __getattr__(self, attr_name):
...         try:
...             return self.__dict__[attr_name]
...         except KeyError:
...             self.__dict__[attr_name] = 'inexistent'
...             return self.__dict__[attr_name]
...
>>> t = test()
>>> t.x
1
>>> t.y
'inexistent'
>>> t.x.y = 2
Traceback (most recent call last):
 File "", line 1, in ?
AttributeError: 'int' object has no attribute 'y'
>>> t.z.x = 2
Traceback (most recent call last):
 File "", line 1, in ?
AttributeError: 'str' object has no attribute 'x'

>>> print t
Traceback (most recent call last):
 File "", line 1, in ?
TypeError: 'str' object is not callable

这表明 __repr__ 已经受到了影响，那么原因何在呢？

先来看下面这个例子：

>>> class test:
...     def __init__(self):
...         self.x = 1
...     def __getattr__(self, attr_name):
...         print attr_name
...         if attr_name == 'y':
...             return 2
...         else:
...             raise AttributeError, attr_name
...
>>> t = test()
>>> t.x
1
>>> t.y
y
2
>>> print t.x
1
>>> print t
__str__
__repr__
<__main__.test>

首先可以看到，在前面的例子中 return self.__dict__[attr_name] 其实不是必须的，因为 python 自己会为我们做这些，并且做的更好，因为它会检查继承树。实际上，只有当一个 attribute 在其继承树中都找不到的时候，__getattr__ 才会被调用。

从 print t 的输出可以看出，self.__str__ 和 self.__repr__ 这两个方法实际上也是通过 __getattr__ 来寻找的，在前面的例子中，没有重载 __str__ 和 __repr__，而是对它们进行了赋值操作，将字符串 'inexistent' 赋值给了它们，当然会导致它们"not callable"。

那么，为了实现上面的 Tree 操作，并且不影响 print 操作，编码如下：

def __repr__(self):
   return "" % hex(id(self))
   # for k, v in self.__traverse__(): print '%s = %s;' % (k, v)
def __str__(self):
   return self.__repr__()
def __getattr__(self, attr_name):
   setattr(self, attr_name, Tree(None))
   return self.__dict__[attr_name]
def __setattr__(self, attr_name, value):
   if attr_name in self.__used_names:
raise TreeExc(_("Attribute name '%s' is reserved" % attr_name))
   try:
       # If self.attribute exists
       existed = self.__dict__[attr_name]
       if isinstance(value, Tree):
           subtree = value
           self.__dict__[attr_name] = subtree
           # Replace the node directly
       else:
           # self.__dict__[attr_name]._Tree__node_value = value
           # This will lead to raise TreeExc at #1, because the setattr operation of
           #    "self.__dict__[attr_name].attribute = value" has been affected by self.__setattr__()
           subtree = existed
           subtree.__dict__['_Tree__node_value'] = value
           # Only replace the node value
   except KeyError:
   # if self.attribute does not exists, assign it a EMPTY node
       if isinstance(value, Tree):
           subtree = value
           self.__dict__[attr_name] = subtree
       else:
           self.__dict__[attr_name] = Tree(value)

为了和内置的 print 显示同样的效果，使用了 return "" % hex(id(self))，这里 id(self) 就是得到内存地址。

但是这里还有一个疑问，在前面那个例子中，因为 __str__ 和 __repr__ 的 attr_name 已经被打印出来，并且它又不是 "y"，为什么没有抛出 AttributeError 的异常？

星期四, 六月 21, 2007

python module name contains '.'？

sh$ mv bin.py bin.test.py
sh$ python
Python 2.3.4 (#1, Feb  6 2006, 10:38:46)
[GCC 3.4.5 20051201 (Red Hat 3.4.5-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import bin.test
Traceback (most recent call last):
  File "", line 1, in ?
ImportError: No module named test
>>> import bin
>>> dir(bin)
['__builtins__', '__doc__', '__file__', '__name__', 'dbin', 'dbin8', 'rdbin']
>>>

".test.py" 都被忽略掉了！因为 bin.test.py 会导致查找文件 bin/test.py，当然是不存在的！

python Tree 若干相关问题

1. Tree traverse:
@ path string to node/value map
@ path sequence to node/value map
@ list of nodes/values (reference)
@ deep list of nodes?
\==> 2

2. Tree 和 deep dict 两者之间的一致性及其体现(基本原理及设计思想)
计入设计文档

3. Tree search:
@ include/exclude
@ return string or list sequence map
@ copy or assign?
@ key indexed items

4. Tree node value copy/deepcopy?
\==> python mutable/immutable 设计思想？

5. add operation:
head.branch + root.br1 ?
head.branch + {['root', 'br1', 'br11'] : value1, ['root', 'br1', 'br12'] : value2, ...}
\==> Tree update

6. Tree cyclic link 问题

星期三, 六月 20, 2007

python itree = func(head.branch=value)

>>> class tree:
...     def __init__(self, **kwargs):
...             self.ka = kwargs
...
>>> t = tree()
>>> print t.ka
{}
>>> t = tree(a=1)
>>> print t.ka
{'a': 1}
>>> t = tree(a.b=1)
SyntaxError: keyword can't be an expression

虽然已经定义了 class Tree，并且可以非常方便的操作，如：

head = Tree(value)
head = Tree(value, data=value1, extra=value2)
head.branch = value
head.branch = Tree(value)
head.branch[key] = value
value = head.branch()
value = head.branch[key]()
head.Node1(['branch', 'br1'], value)
head.Node1(['branch', {'br1' : key}, 'br2'], value)
tmap = head.branch('traverse')
other = Tree(value); head.update(other)

但如果要做 func(head.branch=value) 还是不可能的。不过实际上也没有这样的必要，如果需要的是 head.branch 或它的值，可以分别用 head.branch 和 head.branch() 作为其参数，如果是要改变 head.branch 的值，在函数中更改即可。上面的形式只会造成混乱。

python **kwargs

>>> class tree:
...     def __init__(self, value, **kwargs):
...             exec "self.%s = kwargs" % value
...
>>> t = tree('x', k='v')
>>> print t
<__main__.tree instance at 0xb7ec464c>
>>> print t.x
{'k': 'v'}
>>> t = tree('x', **{'k' : 'v'})
>>> print t.x
{'k': 'v'}
>>> print **{'a' : 1, 'b' : 2}
  File "", line 1
    print **{'a' : 1, 'b' : 2}
           ^
SyntaxError: invalid syntax

参考：
2007/05/python-datetime-object-from.html

python setattr

>>> class tree:
...     def __init__(self): pass
...
>>> t = tree
>>> t = tree()
>>> t.x = 1
>>> print t.__dict__
{'x': 1}
>>> t.'' = 1
  File "", line 1
    t.'' = 1
       ^
SyntaxError: invalid syntax
>>> t. = 1
  File "", line 1
    t. = 1
       ^
SyntaxError: invalid syntax
>>> setattr(t, '', 2)
>>> print t.__dict__
{'': 2, 'x': 1}
>>> setattr(t, '?', 2)
>>> print t.__dict__
{'': 2, 'x': 1, '?': 2}
>>> print t.?
  File "", line 1
    print t.?
            ^
SyntaxError: invalid syntax
>>> setattr(t, '.y', 3)
>>> print t.__dict__
{'': 2, 'x': 1, '?': 2, '.y': 3}
>>> print t..y
  File "", line 1
    print t..y
            ^
SyntaxError: invalid syntax

可见，使用 setattr 可以创建非常不规范的 attr_name，它实际上只是作为一个字符串存放在 instance.__dict__ 中了。在：
2007/06/python-dict-pseudo-private-attributes.html
中我也已经讨论过 __dict__ 的相关问题，实际上是一脉相承的。

python dict update

>>> zip(['a', 'b', 'c'], [1, 2, 3])
[('a', 1), ('b', 2), ('c', 3)]
>>> dict(zip(['a', 'b', 'c'], [1, 2, 3]))
{'a': 1, 'c': 3, 'b': 2}
>>> zip(['a', 'b', 'c'], [1, 2, 3], ['x', 'y', 'z'])
[('a', 1, 'x'), ('b', 2, 'y'), ('c', 3, 'z')]
>>> dict(zip(['a', 'b', 'c'], [1, 2, 3], ['x', 'y', 'z']))
Traceback (most recent call last):
  File "", line 1, in ?
ValueError: dictionary update sequence element #0 has length 3; 2 is required
>>> help(dict.update)

>>> d = {'a' : 1, 'b' : 2, 'c': 3}
>>> print d
{'a': 1, 'c': 3, 'b': 2}
>>> d.update(('x', 'y', 'z'))
Traceback (most recent call last):
  File "", line 1, in ?
ValueError: dictionary update sequence element #0 has length 1; 2 is required
>>> d.update(('x', 'y', 'z'), (4, 5, 6))
Traceback (most recent call last):
  File "", line 1, in ?
TypeError: update expected at most 1 arguments, got 2
>>> d.update([('x', 4), ('y', 5), ('z', 6)])
>>> print d
{'a': 1, 'c': 3, 'b': 2, 'y': 5, 'x': 4, 'z': 6}

可见这里 d.update([('x', 4), ('y', 5), ('z', 6)]) 和 d.update({'x' : 4, 'y' : 5, 'z' : 6}) 拥有相同的效果。

python doc and gettext

#!/usr/bin/env python
# -*- encoding: utf-8 -*-

import re
from gettext import gettext as _

#1
__doc__ = _("""
Author: Roc Zhou
Date: 2007-06-20
Email: chowroc.z@gmail.com

A new data structrue: Tree
...""")

class Tree:
    #2
    __doc__ = _("""This class define a builtins like tree type of data structure,
thus now we can manipulate a tree very conveniently, like this:
head = Tree(value)
head = Tree(value, data=value1, extra=value2)
head.branch = value
...""")
......
    def __init__(self, value, **kwargs):
        _(""""head = Tree(value)" create a Tree instance as the root node with node 'value'.
...""")
......
    def Node1(self, pathseq, value):
        #3
        _("""Construct a Tree node from a path sequence, this path sequence
only represents a single node, for example:
...""")

这里对 #1, #2 都使用 "__doc__ = " 的操作，而不是默认的直接定义。对于#1，可能是因为前面使用了 from gettext import gettext as _，因为需要国际化支持，所以必须先这么做，但导致 __doc__ 默认定义不在 module 顶部，所以可能需要使用 __doc__ 来指定。

而#2可能是因为 gettext 返回的值使用默认定义方法无法正确传递给 __doc__，因为打印的结果为 None。

但是对于#3，无论是否使用 "__doc__ = " 都无效，只能使用默认定义，不知道是什么原因？

星期一, 六月 18, 2007

python 'try/except' in unittest?

起初，我在一个 test function 中定义的测试流程如下：


def setUp(self):
    try:
        self.head = Tree('root', data='head.data', extra='head.extra')
        self.failUnlessEqual(self.head._Tree__node_value, 'root')
        self.failUnlessEqual(self.head.data, 'head.data')  #(1)
        self.failUnlessEqual(self.head.extra, 'head.extra')
    except:
        seilf.fail(_("Unexpected exception catched"))

然后我运行这个单元测试：

sh$ python tree_ut.py
EEEE
======================================================================
ERROR: test__call__ (__main__.TestTree)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tree_ut.py", line 29, in setUp
    self.fail(_("Unexpected exception catched"))
  File "/usr/lib/python2.3/unittest.py", line 270, in fail
    raise self.failureException, msg
AssertionError: Unexpected exception catched

======================================================================
ERROR: test__setattr__ (__main__.TestTree)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tree_ut.py", line 29, in setUp
    self.fail(_("Unexpected exception catched"))
  File "/usr/lib/python2.3/unittest.py", line 270, in fail
    raise self.failureException, msg
AssertionError: Unexpected exception catched

======================================================================
ERROR: test__sgetitem__ (__main__.TestTree)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tree_ut.py", line 29, in setUp
    self.fail(_("Unexpected exception catched"))
  File "/usr/lib/python2.3/unittest.py", line 270, in fail
    raise self.failureException, msg
AssertionError: Unexpected exception catched

======================================================================
ERROR: test_tree (__main__.TestTree)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tree_ut.py", line 29, in setUp
    self.fail(_("Unexpected exception catched"))
  File "/usr/lib/python2.3/unittest.py", line 270, in fail
    raise self.failureException, msg
AssertionError: Unexpected exception catched

----------------------------------------------------------------------
Ran 4 tests in 0.006s

FAILED (errors=4)

这样的错误信息让人摸不着头脑，怎么会都是 Unexpected exception 呢？用：

def setUp(self):
    try:
        self.head = Tree('root', data='head.data', extra='head.extra')
        self.failUnlessEqual(self.head._Tree__node_value, 'root')
        self.failUnlessEqual(self.head.data, 'head.data')  #(1)
        self.failUnlessEqual(self.head.extra, 'head.extra')
    except Exception, ex:
        print 'DEBUG:', ex, ex.__class__
        seilf.fail(_("Unexpected exception catched"))

分析一下，可以发现是 #(1) 部分抛出的异常被 except 捕捉了，当然这里这个测试用例是不对的，但这也说明在 unittest 使用这种 try/except 语法是行不通的，因为这样就掩盖了真实的错误，让人搞不清方向。

python dict && pseudo private attributes

前面为了定义 Tree 结构时谈到，在重新定义了 __setattr__ 之后只能通过 __dict__ 来赋值。在实际使用的过程中，还发现一些 __dict__ 的问题需要搞清楚。

首先 dir() 的输出和 __dict__ 是不同的，例如定义了一个空类 tree 之后，分别查看 dir() 和 __dict__：

>>> class tree
... def __init__(self): pass
...
>>> t = tree()
>>> dir(tree)
['__doc__', '__init__', '__module__']
>>> print tree.__dict__
{'__module__': '__main__', '__doc__': None, '__init__': }
>>> print t.__dict__
{}
>>> t.f = 'testing'
>>> print t.__dict__
{'f': 'testing'}
>>> print tree.__dict__
{'__module__': '__main__', '__doc__': None, '__init__': }

对一个 class object 甚至可以这样赋值：

>>> class tree:
...     def __init__(self): pass
...
>>> d = {'a' : 1, 'b' : 2, 'c' : 3}
>>> tree.__dict__.update(d)
>>> print tree.__dict__
{'a': 1, '__module__': '__main__', 'b': 2, 'c': 3, '__doc__': None, '__init__': }
>>> t = tree()
>>> print t.__dict__
{}
>>> t.b
2

下面的问题涉及伪私有变量的使用。

在 class Tree 中，对于节点值和键索引，使用了 __node_value 和 __node_items 这样的变量。于是定义是：

self.__dict__['__node_value'] = value
self.__dict__['__node_items'] = {}

在使用的时候却发现根本没有值，且报错。可以用下面一些例子来说明这种情况：

>>> class test:
...     def __init__(self):
...             self.__dict__['x'] = 1
...             self.__dict__['__node_value'] = 2
...
>>> t = test()
>>> print t.x
1
>>> print t._test__node_value
Traceback (most recent call last):
  File "", line 1, in ?
AttributeError: test instance has no attribute '_test__node_value'

>>> class test:
...     def __init__(self):
...             self.__dict__['x'] = 1
...             self.__dict__['__value'] = 2
...             self.__dict__['__node_value'] = 3
...
>>> t = test()
>>> print t.x
1
>>> print t.__value
2
>>> print t.__node_value
3
>>> print t.__dict__
{'__node_value': 3, 'x': 1, '__value': 2}

>>> class test:
...     def __init__(self):
...             self.__dict__['x'] = 1
...             self.__dict__['__node_value'] = 2
...             print self.x
...             print self.__node_value
...
>>> t = test()
1
Traceback (most recent call last):
  File "", line 1, in ?
  File "", line 7, in __init__
AttributeError: test instance has no attribute '_test__node_value'

>>> class test:
...     def __init__(self):
...             self.__dict__['x'] = 1
...             self.__node_value = 2
...             print self.x
...             print self.__node_value
...
>>> t = test()
1
2
>>> print t.__dict__
{'x': 1, '_test__node_value': 2}

可见，当使用 self.__node_value 来赋值时，实际上在 __dict__ 中的键值确实 '_test__node_value'，而不是 '__node_value'；如果键值是 '__node_value'，那么就可以使用 t.__node_value 来访问，而不受所谓 pseudo private attribute 规则的限制！

对于定义在 class object 中的变量也是一样的道理：

>>> class test:
...     __used_names = ['name1', 'name2']
...     def __init__(self): self.__dict__['x'] = 1
>>> t = test()
>>> print t.__dict__
{'x': 1}
>>> print t.__class__.__dict__
{'__module__': '__main__', '__init__': , '_test__used_names': ['name1', 'name2'], '__doc__': None}
>>> print t.__used_names
Traceback (most recent call last):
  File "", line 1, in 
    print t.__used_names
AttributeError: test instance has no attribute '__used_names'
>>> print t._test__used_names
['name1', 'name2']

python redefined setattr

现在在定义新的 python 数据结构 Tree，其目标是希望其在 python 中的操作非常简单，象 builtins 的变量那样，例如 root.branch1.branch11() 能返回这个节点的值。

现在，在对一个树的节点赋值时，如：

root.branch1.branch11 = value
或：
root.branch1.branch12 = Tree(value)

对于这个节点的名字要有一点限制(也就是前面讨论的要避免名字空间冲突的问题)，就是对于一个 Tree instance 已经保留使用的 name，如 method 名：__repr__, __getitem__, __setitem__，以及作为节点值的 __node_value 和作为键值索引的 __node_items。这时，调用的方法是 Tree.__setattr__，为了达到保留字的目的，最初的做法是这样的：

class Tree:
    _("""A Tree instance can only contains sub Tree instances as its attributes""")
    # (1)
    __used_names = None
    __used_names = Tree.__dict__.copy()
    __used_names.update(object.__dict__.copy())
    __used_names = used_names.keys()
    def __init__(self, value=None, **kwargs):
        self.__node_value = value
        # __node_value should always be Non-Tree
        self.__node_items = {}
        for k, v in kwargs.items(): self.__setattr__(k, v)
    ...
    def __setattr__(self, attr_name, value):
        _("""The operation 'root.br1.br2 = value' works on root.br1 rather than root.br1.br2""")
        if attr_name in self.__used_names:
            raise TreeExc(_("Attribute name '%s' is reserved" % attr_name))
        try:
            existed = getattr(self, attr_name)
            # self.attribute exists
            if isinstance(value, Tree):
                subtree = value
                if not subtree._Tree__node_value:
                    subtree._Tree__node_value = existed._Tree__node_value
                setattr(self, attr_name, subtree)
                # replace directly after the tree __node_value has been reserved
            else:
                setattr(self, '%s._Tree__node_value' % attr_name, value)
        except AttributeError:
        # if self.attribute does not exists, assign it a EMPTY node
            if isinstance(value, Tree):
                subtree = value
                setattr(self, attr_name, subtree)
            else:
                setattr(self, attr_name, Tree(value))

然后运行：

sh$ python -c "import tree; t = tree.Tree(1, data='head')"
Traceback (most recent call last):
  File "", line 1, in ?
  File "tree.py", line 16, in ?
    class Tree:
  File "tree.py", line 20, in Tree
    __used_names = Tree.__dict__.copy()
NameError: name 'Tree' is not defined

用 self 也是一样的报告没有 difined 的错误。这是很好理解的，在 Tree 完成所有的 attributes 的 assign 之前，它当然是没有定义的。所以这种办法不可行。

于是重新定义：

class Tree:
    _("""A Tree instance can only contains sub Tree instances as its attributes""")
    # (1)
    # __used_names = None
    # __used_names = Tree.__dict__.copy()
    # __used_names.update(object.__dict__.copy())
    # __used_names = used_names.keys()
    # ------------------------------------------------------------------------
    def __init__(self, value=None, **kwargs):
        # (2)
        self.__used_names = None
        self.__used_names = Tree.__dict__.copy()
        self.__used_names.update(object.__dict__)
        self.__used_names = self.__used_names.keys()
        self.__node_value = value
        # __node_value should always be Non-Tree
        self.__node_items = {}
        for k, v in kwargs.items(): self.__setattr__(k, v)
    def __getattr__(self, attr_name): pass
    def __setattr__(self, attr_name, value):
        _("""The operation 'root.br1.br2 = value' works on root.br1 rather than root.br1.br2""")
        if attr_name in self.__used_names:
            raise TreeExc(_("Attribute name '%s' is reserved" % attr_name))
        try:
            existed = getattr(self, attr_name)
            # self.attribute exists
            if isinstance(value, Tree):
                subtree = value
                if not subtree._Tree__node_value:
                    subtree._Tree__node_value = existed._Tree__node_value
                setattr(self, attr_name, subtree)
                # replace directly after the tree __node_value has been reserved
            else:
                setattr(self, '%s._Tree__node_value' % attr_name, value)
        except AttributeError:
        # if self.attribute does not exists, assign it a EMPTY node
            if isinstance(value, Tree):
                subtree = value
                setattr(self, attr_name, subtree)
            else:
                setattr(self, attr_name, Tree(value))

运行结果：

sh$ python -c "import tree; t = tree.Tree(1, data='head')"
Traceback (most recent call last):
  File "", line 1, in ?
  File "tree.py", line 30, in __init__
    self.__used_names = None
  File "tree.py", line 55, in __setattr__
    if attr_name in self.__used_names:
TypeError: iterable argument required

可以看到，__setattr__ 被调用了，说明当重新定义了 __setattr__ 之后，不仅仅是外部调用如 root.branch1.branch11 = value 这样的情况会收到影响，而且自身的使用如 self.attribute = value 也会受到影响，都会把要操作的对象当作一个 Tree Node 来处理。同时这也说明接下来的两个语句：

self.__node_value = value
# __node_value should always be Non-Tree
self.__node_items = {}

也是有问题的。实际上，这会导致赋值语句的死循环，最后在运行时可能会报这样的错误：

......
  File "tree.py", line 67, in __setattr__
    setattr(self, '%s._Tree__node_value' % attr_name, value)
  File "tree.py", line 67, in __setattr__
    setattr(self, '%s._Tree__node_value' % attr_name, value)
  File "tree.py", line 67, in __setattr__
    setattr(self, '%s._Tree__node_value' % attr_name, value)
  File "tree.py", line 67, in __setattr__
    setattr(self, '%s._Tree__node_value' % attr_name, value)
  File "tree.py", line 67, in __setattr__
    setattr(self, '%s._Tree__node_value' % attr_name, value)
  File "tree.py", line 67, in __setattr__
    setattr(self, '%s._Tree__node_value' % attr_name, value)
  File "tree.py", line 67, in __setattr__
    setattr(self, '%s._Tree__node_value' % attr_name, value)
  File "tree.py", line 67, in __setattr__
    setattr(self, '%s._Tree__node_value' % attr_name, value)
  File "tree.py", line 48, in __setattr__
    _("""The operation 'root.br1.br2 = value' works on root.br1 rather than root.br1.br2""")
  File "/usr/lib/python2.3/gettext.py", line 472, in gettext
    return dgettext(_current_domain, message)
  File "/usr/lib/python2.3/gettext.py", line 454, in dgettext
    t = translation(domain, _localedirs.get(domain, None))
  File "/usr/lib/python2.3/gettext.py", line 403, in translation
    mofiles = find(domain, localedir, languages, all=1)
  File "/usr/lib/python2.3/gettext.py", line 366, in find
    val = os.environ.get(envar)
  File "/usr/lib/python2.3/UserDict.py", line 51, in get
    if not self.has_key(key):
RuntimeError: maximum recursion depth exceeded

那么如果使用内置的 setattr() 函数呢？将上面的(2)改成(3)：

        # (3)
        setattr(self, '__used_names', None)
        setattr(self, '__used_names', Tree.__dict__.copy())
        self.__used_names.update(object.__dict__)
        setattr(self, '__used_names', self.__used_names.keys())

运行：

sh$ python -c "import tree; t = tree.Tree(1, data='head')"
Traceback (most recent call last):
  File "", line 1, in ?
  File "tree.py", line 35, in __init__
    setattr(self, '__used_names', None)
  File "tree.py", line 55, in __setattr__
    if attr_name in self.__used_names:
TypeError: iterable argument required

同样的错误，说明 setattr(self, attr_name, value) 与 self.attribute = value 一样收到 self.__setattr__() 的影响。

可以考虑的另一种办法是在每个 __init__ 中增加相应的初始化语句：

class Tree:
    _("""A Tree instance can only contains sub Tree instances as its attributes""")
    # (1)
    # __used_names = None
    # __used_names = Tree.__dict__.copy()
    # __used_names.update(object.__dict__.copy())
    # __used_names = used_names.keys()
    # ------------------------------------------------------------------------
    # __used_names = object.__dict__.copy().keys() + [
    #   '__used_names', '__node_value', '__node_items',
    #   '__getattr__', '__setitem__', '__getitem__', '__call__',
    #   'tree', '__traverse__'   ]
    def __init__(self, value=None, **kwargs):
        # (2)
        # self.__used_names = None
        # self.__used_names = Tree.__dict__.copy()
        # self.__used_names.update(object.__dict__)
        # self.__used_names = self.__used_names.keys()
        # (3)
        # setattr(self, '__used_names', None)
        # setattr(self, '__used_names', Tree.__dict__.copy())
        # self.__used_names.update(object.__dict__)
        # setattr(self, '__used_names', self.__used_names.keys())
        # --------------------------------------------------------------------
        # self.__node_value = value
        self.__dict__['__node_value'] = value
        # __node_value should always be Non-Tree
        # self.__node_items = {}
        self.__dict__['__node_items'] = {}
        for k, v in kwargs.items(): self.__setattr__(k, v)
    def __setattr__(self, attr_name, value):
        _("""The operation 'root.br1.br2 = value' works on root.br1 rather than root.br1.br2""")
        # (4)
        __used_names = Tree.__dict__.copy()
        print __used_names
        __used_names.update(object.__dict__.copy())
        __used_names = __used_names.keys() + ['__node_value', '__node_items']
        if attr_name in __used_names:
        # --------------------------------------------------------------------
        # if attr_name in self.__used_names:
            raise TreeExc(_("Attribute name '%s' is reserved" % attr_name))
        try:
            existed = getattr(self, attr_name)
            # self.attribute exists
            if isinstance(value, Tree):
                subtree = value
                if not subtree._Tree__node_value:
                    subtree._Tree__node_value = existed._Tree__node_value
                setattr(self, attr_name, subtree)
                # replace directly after the tree __node_value has been reserved
            else:
                setattr(self, '%s._Tree__node_value' % attr_name, value)
        except AttributeError:
        # if self.attribute does not exists, assign it a EMPTY node
            if isinstance(value, Tree):
                subtree = value
                # setattr(self, attr_name, subtree)
                self.__dict__[attr_name] = subtree
            else:
                # setattr(self, attr_name, Tree(value))
                self.__dict__[attr_name] = Tree(value)

可见，在重定义了 __setattr__ 之后，赋值只能使用 self.__dict__ 来完成了。

这样做的潜在的问题是开销可能会比较大，因为每次创建 Tree instance 的时候到要重复一遍。还是回到定义 Tree class static variable 的思路上来的话，最终是这样定义的：

class Tree:
    _("""A Tree instance can only contains sub Tree instances as its attributes""")
    # (1)
    # __used_names = None
    # __used_names = Tree.__dict__.copy()
    # __used_names.update(object.__dict__.copy())
    # __used_names = used_names.keys()
    # ------------------------------------------------------------------------
    __used_names = object.__dict__.copy().keys() + [
        '__used_names', '__node_value', '__node_items',
        '__getattr__', '__setitem__', '__getitem__', '__call__',
        'tree', '__traverse__'   ]
    def __init__(self, value=None, **kwargs):
        # (2)
        # self.__used_names = None
        # self.__used_names = Tree.__dict__.copy()
        # self.__used_names.update(object.__dict__)
        # self.__used_names = self.__used_names.keys()
        # (3)
        # setattr(self, '__used_names', None)
        # setattr(self, '__used_names', Tree.__dict__.copy())
        # self.__used_names.update(object.__dict__)
        # setattr(self, '__used_names', self.__used_names.keys())
        # --------------------------------------------------------------------
        # self.__node_value = value
        self.__dict__['__node_value'] = value
        # __node_value should always be Non-Tree
        # self.__node_items = {}
        self.__dict__['__node_items'] = {}
        for k, v in kwargs.items(): self.__setattr__(k, v)
    def __setattr__(self, attr_name, value):
        _("""The operation 'root.br1.br2 = value' works on root.br1 rather than root.br1.br2""")
        # (4)
        # __used_names = Tree.__dict__.copy()
        # __used_names.update(object.__dict__.copy())
        # __used_names = __used_names.keys() + ['__node_value', '__node_items']
        # if attr_name in __used_names:
        # --------------------------------------------------------------------
        if attr_name in self.__used_names:
            raise TreeExc(_("Attribute name '%s' is reserved" % attr_name))
        try:
            existed = getattr(self, attr_name)
            # self.attribute exists
            if isinstance(value, Tree):
                subtree = value
                if not subtree._Tree__node_value:
                    subtree._Tree__node_value = existed._Tree__node_value
                # setattr(self, attr_name, subtree)
                self.__dict__[attr_name] = subtree
                # replace directly after the tree __node_value has been reserved
            else:
                # setattr(self, '%s._Tree__node_value' % attr_name, value)
                self.__dict__[attr_name]._Tree__node_value = value
                # self.__dict__[attr_name] = Tree(value)
        except AttributeError:
        # if self.attribute does not exists, assign it a EMPTY node
            if isinstance(value, Tree):
                subtree = value
                # setattr(self, attr_name, subtree)
                self.__dict__[attr_name] = subtree
            else:
                # setattr(self, attr_name, Tree(value))
                self.__dict__[attr_name] = Tree(value)

星期四, 六月 14, 2007

python reassign repr or normal method?

已知在 python 中，变量名都是到对象的引用，因此可以指向任何对象而不受类型的限制。比如一个 function/method 的变量名就可以指向一个普通变量。那么对于象 __getattr__ 这样的 builtins 的方法会有什么影响呢？例如：

sh$ expand -t4 test001.py
#!/usr/bin/env python

class tree:
    def __init__(self):
        self.data = 1
    def func(self):
        return self.data
    def __custom__(self):
        return self.data
    def __private(self):
        return self.data

t = tree()
print "t", t
print "t.data", t.data
print "t.__custom__()", t.__custom__()
print "t._tree__private__()", t._tree__private()
# print "t.__repr__()", t.__repr__()              #(1)
# print "t._tree__repr__()", t._tree__repr__()    #(2)
# print "type(t.__repr__)", type(t.__repr__)
print "t.func()", t.func()
t.__repr__ = 2
t.__str__ = 2
t.func = 2
t.__custom__ = 2
t._tree__private = 2
print "*** After resign ***"
# print "t", t                                    #(3)
# t.__repr__() is used
print "t.data", t.data
# print "t.__repr__()", t.__repr__()              #(4)
# print "type(t.__repr__)", type(t.__repr__)
# print "t.func()", t.func()                      #(5)
# print "t.__custom__()", t.__custom__()          #(6)
print "t._tree__private__()", t._tree__private()  #(7)

分别看一下(1)~(6)的出错输出如下：

(1)
t <__main__.tree instance at 0xb7f0418c>
t.data 1
t.__custom__() 1
t._tree__private__() 1
t.__repr__()
Traceback (most recent call last):
  File "test001.py", line 18, in ?
    print "t.__repr__()", t.__repr__()              #(1)
AttributeError: tree instance has no attribute '__repr__'

(2)
t <__main__.tree instance at 0xb7faa18c>
t.data 1
t.__custom__() 1
t._tree__private__() 1
t._tree__repr__()
Traceback (most recent call last):
  File "test001.py", line 19, in ?
    print "t._tree__repr__()", t._tree__repr__()    #(2)
AttributeError: tree instance has no attribute '_tree__repr__'

(3)
t <__main__.tree instance at 0xb7f6d18c>
t.data 1
t.__custom__() 1
t._tree__private__() 1
t.func() 1
*** After resign ***
t Traceback (most recent call last):
  File "test001.py", line 28, in ?
    print "t", t                                    #(3)
TypeError: 'int' object is not callable

(4)
t <__main__.tree instance at 0xb7f0e18c>
t.data 1
t.__custom__() 1
t._tree__private__() 1
t.func() 1
*** After resign ***
t.data 1
t.__repr__()
Traceback (most recent call last):
  File "test001.py", line 31, in ?
    print "t.__repr__()", t.__repr__()              #(4)
TypeError: 'int' object is not callable

(5)
t <__main__.tree instance at 0xb7fb418c>
t.data 1
t.__custom__() 1
t._tree__private__() 1
t.func() 1
*** After resign ***
t.data 1
t.func()
Traceback (most recent call last):
  File "test001.py", line 33, in ?
    print "t.func()", t.func()                      #(5)
TypeError: 'int' object is not callable

(6)
t <__main__.tree instance at 0xb7f6818c>
t.data 1
t.__custom__() 1
t._tree__private__() 1
t.func() 1
*** After resign ***
t.data 1
t.__custom__()
Traceback (most recent call last):
  File "test001.py", line 34, in ?
    print "t.__custom__()", t.__custom__()          #(6)
TypeError: 'int' object is not callable

(7)
t <__main__.tree instance at 0xb7f8618c>
t.data 1
t.__custom__() 1
t._tree__private__() 1
t.func() 1
*** After resign ***
t.data 1
t._tree__private__()
Traceback (most recent call last):
  File "test001.py", line 35, in ?
    print "t._tree__private__()", t._tree__private()  #(7)
TypeError: 'int' object is not callable

这样看来，内置的操作符与一般方法一样，上面重新定义了 t.__repr__ 和 t.__str__ 后，print t 的操作就不行了(注意不是 print t.data！)。对于其他内置类型如 list/dict 等也是一样。

只是不太明白为什么上面会出现 AttributeError: tree instance has no attribute '__repr__' 这样的异常，因为 __repr__ 并不是私有变量呀？而且即使是私有变量，通过 _tree__repr__ 也应该可以访问的呀？

于是再显式地定义一个 __repr__：

sh$ expand -t4 test002.py
#!/usr/bin/env python

class tree:
    def __init__(self):
        self.data = 1
    def __repr__(self):
        return "test: %d" % self.data

t = tree()
print "t", t
print "t.data", t.data
print "t.__repr__()", t.__repr__()              #(1)
print "type(t.__repr__)", type(t.__repr__)
t.__repr__ = "string"
print "*** After reassign ***"
# print "t", t                                  #(3)
# t.__repr__() is used
print "t.data", t.data
# print "t.__repr__()", t.__repr__()            #(4)
print "type(t.__repr__)", type(t.__repr__)

(3)
t test: 1
t.data 1
t.__repr__() test: 1
type(t.__repr__) 
*** After reassign ***
t Traceback (most recent call last):
  File "test002.py", line 16, in ?
    print "t", t                                    #(3)
TypeError: 'str' object is not callable

(4)
t test: 1
t.data 1
t.__repr__() test: 1
type(t.__repr__) 
*** After reassign ***
t.data 1
t.__repr__()
Traceback (most recent call last):
  File "test002.py", line 19, in ?
    print "t.__repr__()", t.__repr__()              #(4)
TypeError: 'str' object is not callable

可以看到，重新定义之后，t.__repr__ 这个 attribute 就有了！

但为什么在前面就产生没有定义的异常，如果没有定义，那么 print t 的操作又是如何进行的呢？对于这一点，需要对 python 的基本原理有一个更清楚的认识才行。参考"python namespace"的说明。

另一个比较重要的例子：

>>> class tree:
    def __init__(self):  self.__repr__ = 1
>>> t = tree()
>>> print t
Traceback (most recent call last):
  File "", line 1, in ?
TypeError: 'int' object is not callable

之所以讨论这些问题，是因为如果我想要定义一个 class Tree，那么就可以存在名字空间冲突的隐患！

星期三, 六月 13, 2007

logcheck hack for logfiles updated by rsync

由于硬件资源的限制，对于日志分析中日志的集中过程无法利用 syslog-ng 这样的远程记录方式来完成，否则网络带宽就会成为瓶颈，而日志丢包的问题又不知道如何解决。所以使用更稳妥的办法就是利用 rsync 来同步日志到一个中央的存储主机上，然后日志分析的工作比如 logcheck for log filter 就在这里集中进行。

然而遇到一个问题，就是 logcheck 调用 logtail 时，会记录原日志文件的 inode 和 size 到一个 offset_file，其格式是：

$inode
$last_size

下次 logcheck 运行时，首先通过文件大小(wc -c)和 offset_file 中的 $last_size 进行比较，如果 $current_size "<" $last_size，那么 logcheck 就判定已经做个轮转，于是转到 logfile.1，否则仍然检查 logfile，然后会调用 logtail，logtail 会应用那个 offset_file 到选定的日志文件，通过 $last_size 记录从上次结束的地方开始读取...

但是如果使用 rsync 同步日志时，这个 inode 不会被保留，每一次 rsync 之后，日志的 inode 都会被更改，导致 logcheck 无法正确找到上一次结束的位置。如果使用 cp -f，则 indoe 会被保留，但 cp 的开销太大，特别是对于日志这样的大文件。于是我对 logcheck 又做了一个小 hack，增加了一个 --rsynced 参数：

-y    = adjust the inode number if the logfiles are updated by rsync

最终的 patch 文件是这样的：

sh$ expand -t4 logcheck-1.2.45-rsynced.patch
diff -Naur logcheck-1.2.45.old/src/logcheck logcheck-1.2.45.new/src/logcheck
--- logcheck-1.2.45.old/src/logcheck    2006-07-06 18:16:42.000000000 +0800
+++ logcheck-1.2.45.new/src/logcheck    2007-06-15 10:50:34.000000000 +0800
@@ -49,7 +49,7 @@
 ATTACK=0

 # Set the getopts string
-GETOPTS="c:dhH:l:L:m:opr:RsS:tTuvw"
+GETOPTS="c:dhH:l:L:m:opr:RsS:tTuvwy"

 # Get the details for the email message
 DATE="$(date +'%Y-%m-%d %H:%M')"
@@ -90,6 +90,10 @@
 LOCKDIR=/var/lock/logcheck
 LOCKFILE="$LOCKDIR/logcheck"

+RSYNCED=0
+# If the logfiles are centralized by rsync, the inode number will not be
+#  reserved, so add an option for this condition
+
 # Carry out the clean up tasks
 cleanup() {

@@ -208,7 +212,9 @@
        mkdir $cleaned \
            || error "Could not make dir $cleaned for cleaned rulefiles."
    fi
-   for rulefile in $(run-parts --list $dir); do
+   debug "find $dir, pwd: `pwd`"
+   for rulefile in $(find $dir -type f -perm +0100); do
+       debug "rulefile $rulefile"
        rulefile=$(basename $rulefile)
        if [ -f ${dir}/${rulefile} ]; then
        debug "cleanrules: ${dir}/${rulefile}"
@@ -406,6 +412,18 @@
     fi
 }

+rsynced() {
+   logfile=$1
+   offsetfile=$2
+   if [ $RSYNCED -eq 1 ]; then
+       debug "$logfile rsync specified"
+       new_inode=$(ls -i $logfile | awk '{print $1}')
+       old_inode=$(head -n1 $offsetfile)
+       debug "replace $old_inode with $new_inode"
+       sed -i "1s/^.*$/$new_inode/" $offsetfile
+   fi
+}
+
 # Get the yet unseen part of one logfile.
 logoutput() {
     file=$1
@@ -415,11 +433,13 @@
     if [ -f "$file" ]; then
    offsetfile="$STATEDIR/offset$(echo $file | tr / .)"
    if [ -s "$offsetfile" -a -r "$offsetfile" ]; then
+       rsynced $file $offsetfile
        if [[ $(wc -c < "$file") -lt $(tail -n 1  "$offsetfile") ]]; then
            # assume the log is rotated by savelog(8)
        # syslog-ng leaves old files here
        if [ -e "$file.0" -a "$file.0" -nt "$file.1.gz" ]; then
            debug "Running logtail on rotated: $file.0"
+           rsynced $file.0 $offsetfile
            $LOGTAIL -f "$file.0" -o "$offsetfile" $LOGTAIL_OPTS > \
            $TMPDIR/logoutput/$(basename "$file") 2>&1 \
            || error "Could not run logtail or save output"
@@ -429,6 +449,7 @@
        # should also probably check if file is still fresh
        elif [ -e "$file.1" ]; then
            debug "Running logtail on rotated: $file.1"
+           rsynced $file.1 $offsetfile
            $LOGTAIL -f "$file.1" -o "$offsetfile" $LOGTAIL_OPTS > \
            $TMPDIR/logoutput/$(basename "$file") 2>&1 \
            || error "Could not run logtail or save output"
@@ -452,7 +473,7 @@
     debug "usage: Printing usage and exiting"
     cat"<<"EOF
 usage: logcheck [-c CFG] [-d] [-h] [-H HOST] [-l LOG] [-L CFG] [-m MAIL] [-o]
-                [-r DIR] [-s|-p|-w] [-R] [-S DIR] [-t] [-T] [-u]
+                [-r DIR] [-s|-p|-w] [-R] [-S DIR] [-t] [-T] [-u] [-y]
  -c CFG       = override default configuration file
  -d           = debug mode
  -h           = print this usage information and exit
@@ -471,6 +492,7 @@
  -u           = enable syslog-summary
  -v           = print version
  -w           = use the "workstation" runlevel
+ -y           = adjust the offset inode number if the logfiles are updated by rsync remotely
 EOF
 }

@@ -600,6 +622,10 @@
        debug "Setting REPORTLEVEL to workstation"
        REPORTLEVEL="workstation"
        ;;
+   y)
+       debug "Setting RSYNCED"
+       RSYNCED=1
+       ;;
    \?)
        usage
        exit 1

这里没有使用长格式的参数 --rsynced，而是短格式 -y(-r/-s 都已经被使用了)，因为 logcheck 使用的是 bash 内置 getopts 而不是 GNU 的 getopt，而 getopts 不支持长格式的参数，如果改为使用 getopt，则要更改的内容太多，所以最后还是选择一个比较简单的方法吧。

注意这里轮转后的日志由于也是 rsync 同步的，所以也要应用 rsynced() 函数对 inode 做调整。

另外，从 root 调用 logcheck 的时候，可以使用 sudo 或 su，但记得要更改 $HOME 目录，对 su，可以使用 su - logcheck，对 sudo，可以使用 sudo -u logcheck -H。如果不更改 $HOME，会导致 find 命令出错(find $dir -type f -perm +0100，就是前面为避免使用 run-parts --list 而做的一个小 hack)：
find: cannot get current directory: Permission denied
所以调用脚本可以写成：

#!/bin/sh

PATH=$PATH:/usr/sbin
datadir=/data/hosts
echo "DEBUG: $datadir"

find $datadir -type f | xargs setfacl -m user:logcheck:4
find $datadir -type d | xargs setfacl -m user:logcheck:5
# sudo -u logcheck -H /usr/sbin/logcheck $@
su - logcheck -c "/usr/sbin/logcheck $@"

星期一, 六月 11, 2007

cifs uid/gid overwrite

默认情况下，使用 cifs 挂载 samba 后，挂载文件系统的文件属主是有问题的：

smbclient# ls /mnt/host/ -l
total 24
-rw-r--r--   1 root  root  372 May 25 09:55 adjust
drwxr-xr-x   5 10003 10003   0 Jun 11 11:37 fs_backup
drwxr-xr-x  12 10003 10014   0 Jun 11 11:37 logs

这意味这文件属于了不该属于的用户。但即使使用：

smbclient# mount //store/homes -t cifs /mnt/host -o uid=0,gid=0,username=host_p01,password='********'

挂载也没有用，文件属主还是不对。

从 man 手册的情况来看，是有一个 Unix Extensions 在产生影响。要关闭这个选项，需要在 Samba 客户机上执行：

smbclient# echo "0">/proc/fs/cifs/LinuxExtensionsEnabled

然后再重新挂载 cifs 文件系统，不需要使用 uid,gid 参数，也会映射到 root 用户：

smbclient# mount //store/homes -t cifs /mnt/host -o username=host_p01,password='********'
smbclient# ls /mnt/host/ -l
total 24
-rwxrwSrwt  1 root root 372 May 25 09:55 adjust
drwxrwxrwx  1 root root   0 Jun 11 11:37 fs_backup
drwxrwxrwx  1 root root   0 Jun 11 11:37 logs

但是这样一来，文件的权限又有问题了!!! 只能通过增加参数来解决这个问题：

smbclient# mount //store/homes -t cifs /mnt/host -o file_mode=0644,dir_mode=0755,username=host_p01,password='********'

但是符号链接仍然不能使用。鱼与熊掌呀，现在只能如此了，好在这边应用中还没有必须用到 symlink 的地方，希望后续版本能够解决这个问题


smbclient# cd /mnt/hosts/
smbclient# ln logs/ -s test
ln: creating symbolic link `test' to `logs/': Operation not supported

如果要在服务器端关闭 Unix Extensions，在 /etc/samba/smb.conf 中编辑：

unix extensions = no

星期五, 六月 08, 2007

A small hack for logcheck-1.2.45

对于 log filter，logcheck 是比较好的选择，但是从 1.1.1 到 1.2.45，还是有比较大的变化，例如 logtail 由 C 程序改成了 perl 脚本。但最主要的一点是 logcheck-1.2.45 的设计上正交性更好，而且提供了针对各种服务和应用的更多模式，因而可以给管理员更多的自由选择。因此可以省去很多使用 1.1.1 的情况下必须自己编写模式的麻烦，而且 1.2.45 的模式匹配也更为精确。

问题主要在于 logcheck-1.2.45 的安装比较麻烦，而且由于使用的是 shell 脚本，所以平台相关性比较严重一点，而且对于依赖性的检查不好。logcheck-1.2.45 依赖于 lockfile-progs，但除非你安装了 logcheck 并运行，你不会知道这一点，而 lockfile-progs 在安装时也会出现编译错误，因为缺少 lockfile.h 这个头文件，但它并不会告诉你这是因为还需要安装 liblockfile-1.06.2 这个包。

安装了所有这些之后，将 /var/log 拷贝成 /tmp/log，然后：

sh# chown logcheck.logcheck /tmp/log -R
sh# vi /etc/logcheck/logfiles
/tmp/log/messages
/tmp/log/maillog
/tmp/log/secure
sh# su - logcheck
sh$ /usr/sbin/logcheck

不会有输出，从接收到的邮件中分析，发现没有做任何过滤，但是 /etc/logcheck/ignore.d.server/* 中却确实有相应的模式！

因此我分析了一下 /usr/sbin/logcheck 这个 shell 程序，找到寻找模式文件的那部分

......
cleanrules "$RULEDIR/cracking.d" $TMPDIR/cracking
cleanrules "$RULEDIR/violations.d" $TMPDIR/violations
cleanrules "$RULEDIR/violations.ignore.d" $TMPDIR/violations-ignore

# Now clean the ignore rulefiles for the report levels
for level in $REPORTLEVELS; do
    cleanrules "$RULEDIR/ignore.d.$level" $TMPDIR/ignore
done

# The following cracking.ignore directory will only be used if
# $SUPPORT_CRACKING_IGNORE is set to 1 in the configuration file.
# This is *only* for local admin use.
if [ $SUPPORT_CRACKING_IGNORE -eq 1 ]; then
    cleanrules "$RULEDIR/cracking.ignore.d" $TMPDIR/cracking-ignore
fi
......
cleanrules() {
    dir=$1
    cleaned=$2

    if [ -d $dir ]; then
        if [ ! -d $cleaned ]; then
        mkdir $cleaned \
            || error "Could not make dir $cleaned for cleaned rulefiles."
    fi
    for rulefile in $(run-parts --list $dir); do
        rulefile=$(basename $rulefile)
        if [ -f ${dir}/${rulefile} ]; then
        debug "cleanrules: ${dir}/${rulefile}"
        if [ -r ${dir}/${rulefile} ]; then
            # pipe to cat on greps to get usable exit status
            egrep --text -v '^[[:space:]]*$|^#' $dir/$rulefile | cat \
                    >> $cleaned/$rulefile \
                || error "Couldn't append to $cleaned/$rulefile. Disk Full?"
        else
            error "Couldn't read $dir/$rulefile"
        fi
        fi
    done
    elif [ -f $dir ]; then
    error "cleanrules: '$dir' is a file, not a directory"
    elif [ -z $dir ]; then
    error "cleanrules: called without argument"
    fi
}

可以看到，寻找模式文件的操作由 cleanrules() 函数来完成，而实际上有哪些文件需要应用是由 run-parts --list $dir 这个命令来查找的。增加一个 DEBUG 输出来查看有那些 rulefiles 被应用了，结果发现出错信息。

单独运行：

sh# run-parts --list /etc/logcheck/ignore.d.server/
Not a directory: --list
sh# run-parts /etc/logcheck/ignore.d.server/ --list
# EMPTY!

所以这样实际上没有找到任何文件。

从 google search 的情况来看，run-parts 的平台相关性比较大，这个命令是用来寻找一个目录下那些有执行权限的文件的，如果不使用 --list，就会执行这些文件。当然前提是这个 run-parts 有这个参数，而 RHEL4 上面的这个 run-parts 就没有这个参数(实际上只有一个 PATH 参数)，而 logcheck 的开发者似乎对 debian 比较熟悉，所以这里不能直接使用。

可以做一个修改，将 run-parts 命令改为：

find $dir -type f -perm +0100

即可。对于使用 ulfs 安装，相应的 profile 为：

sh# cat /usr/src/logcheck/.config
pkgname = "logcheck";
version = "1.2.45";
user = "logcheck";
groups = "";
group = "logcheck";
archive = "logcheck_1.2.45.tar.gz";
command = "tar xfz logcheck_1.2.45.tar.gz";
command = "cd logcheck-1.2.45";
command = "sed -i 's/install -d/mkdir -p/g' Makefile";
command = "sed -i 's/run-parts --list $dir/find $dir -type f -perm +0100/g' src/logcheck";
command = "make";
command = "cd ..";
command = "rm -rf logcheck-1.2.45";
time = "20070608 10:49:35 Fri"

然后，在 /etc/logcheck/ignore.d.server 下，对那些需要用到的模式文件，使用 chmod u+x 增加可执行权限，这样这些文件就会在过滤的时候被用到！因此可以看到这种方法提供了更高的正交性和灵活性。

星期四, 六月 07, 2007

hostname in syslog

如果使用 syslog 或 syslog-ng 集中日志，或者使用其他方法集中日志()，然后在中心主机上用统一的过滤程序过滤日志并生成每日的报告，一个很重要的问题就是在日志记录中 hostname 字段必须正确唯一，否则两台主机使用同一个 hostname，那就混乱了。

如果 syslog 的 hostname 字段与系统的 hostname command 输出或 /etc/sysconfig/network 中 HOSTNAME= 不一致，只需要重启 syslogd 即可：/etc/init.d/syslog restart

星期三, 六月 06, 2007

From a Samba I/O problem

昨天调整网络的时候发现一个问题，因为之前都是用 Samba 做网络文件系统共享，使用 mount 挂载到本地来做一些备份操作等，昨天在没有 umount 的情况下更改 IP 地址网段，结果发现再运行 df/fuser/lsof 等命令都会导致其进程挂起，使用 ps 显示为状态"D"(Down or Deadlock?)，即 Uninterruptable Sleep (Unusally I/O)。这些进程无法用 kill，即使使用 kill -9 也不行。

在网上查了一下，仔细想了一想，觉得这样的机制还是有道理的，因为这种情况通常反映的是 I/O 错误，最常见的就是磁盘错误，如果磁盘损害，出现了无法修复的错误，应该曝出这个错误，而不能使进程可以被 kill 掉。

事实上，对于 NFS 也是一样的，并且这和先终止服务器端的 smbd 进程不同：如果先终止 smbd，不会出现这样的 I/O 问题。

这些进程只能通过重启机器来消除，或者先把 IP 地址改回来，待 umount 之后再重新更改 IP 地址。对于网络文件系统，可以考虑更改 network SysV init 脚本，将相应的检查和操作加入其中。

订阅：博文 (Atom)

弹剑而歌