星期五, 六月 22, 2007

python reload()

当在交互界面调试 module 的时候,如果 import 了这个 module,然后又作出了更改,可以使用 reload() 重新加载这个模块来同步变化,否则只能退出再进入,那会导致其他的设定丢失造成不便:
>>> import tree
>>> ...
>>> tree = reload(tree)

python Tree root.[inexistent].branch ?

如果
root = Tree(value)
root.trunk.branch = value1
而 trunk 不存在,能否自动创建这个 Tree Node Container

是应该利用 root 的 __getattr__ 还是应该利用 __setattr__ ?因为这时候 trunk 根本就不存在,这时也就根本无从利用其 __setattr__,而对于 root,可以肯定的是在进行 root.trunk.branch = value1 的操作时,肯定是 __getattr__ 被调用!可以看下面的例子:
>>> class test:
... def __init__(self):
... self.x = 1
... def __getattr__(self, attr_name):
... try:
... return self.__dict__[attr_name]
... except KeyError:
... self.__dict__[attr_name] = 'inexistent'
... return self.__dict__[attr_name]
...
>>> t = test()
>>> t.x
1
>>> t.y
'inexistent'
>>> t.x.y = 2
Traceback (most recent call last):
File "", line 1, in ?
AttributeError: 'int' object has no attribute 'y'
>>> t.z.x = 2
Traceback (most recent call last):
File "", line 1, in ?
AttributeError: 'str' object has no attribute 'x'

>>> print t
Traceback (most recent call last):
File "", line 1, in ?
TypeError: 'str' object is not callable
这表明 __repr__ 已经受到了影响,那么原因何在呢?

先来看下面这个例子:
>>> class test:
... def __init__(self):
... self.x = 1
... def __getattr__(self, attr_name):
... print attr_name
... if attr_name == 'y':
... return 2
... else:
... raise AttributeError, attr_name
...
>>> t = test()
>>> t.x
1
>>> t.y
y
2
>>> print t.x
1
>>> print t
__str__
__repr__
<__main__.test>
首先可以看到,在前面的例子中 return self.__dict__[attr_name] 其实不是必须的,因为 python 自己会为我们做这些,并且做的更好,因为它会检查继承树。实际上,只有当一个 attribute 在其继承树中都找不到的时候,__getattr__ 才会被调用。

从 print t 的输出可以看出,self.__str__ 和 self.__repr__ 这两个方法实际上也是通过 __getattr__ 来寻找的,在前面的例子中,没有重载 __str__ 和 __repr__,而是对它们进行了赋值操作,将字符串 'inexistent' 赋值给了它们,当然会导致它们"not callable"。

那么,为了实现上面的 Tree 操作,并且不影响 print 操作,编码如下:
def __repr__(self):
return "" % hex(id(self))
# for k, v in self.__traverse__(): print '%s = %s;' % (k, v)
def __str__(self):
return self.__repr__()
def __getattr__(self, attr_name):
setattr(self, attr_name, Tree(None))
return self.__dict__[attr_name]
def __setattr__(self, attr_name, value):
if attr_name in self.__used_names:
raise TreeExc(_("Attribute name '%s' is reserved" % attr_name))
try:
# If self.attribute exists
existed = self.__dict__[attr_name]
if isinstance(value, Tree):
subtree = value
self.__dict__[attr_name] = subtree
# Replace the node directly
else:
# self.__dict__[attr_name]._Tree__node_value = value
# This will lead to raise TreeExc at #1, because the setattr operation of
# "self.__dict__[attr_name].attribute = value" has been affected by self.__setattr__()
subtree = existed
subtree.__dict__['_Tree__node_value'] = value
# Only replace the node value
except KeyError:
# if self.attribute does not exists, assign it a EMPTY node
if isinstance(value, Tree):
subtree = value
self.__dict__[attr_name] = subtree
else:
self.__dict__[attr_name] = Tree(value)
为了和内置的 print 显示同样的效果,使用了 return "" % hex(id(self)),这里 id(self) 就是得到内存地址。

但是这里还有一个疑问,在前面那个例子中,因为 __str__ 和 __repr__ 的 attr_name 已经被打印出来,并且它又不是 "y",为什么没有抛出 AttributeError 的异常?

星期四, 六月 21, 2007

python module name contains '.'?

sh$ mv bin.py bin.test.py
sh$ python
Python 2.3.4 (#1, Feb 6 2006, 10:38:46)
[GCC 3.4.5 20051201 (Red Hat 3.4.5-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import bin.test
Traceback (most recent call last):
File "", line 1, in ?
ImportError: No module named test
>>> import bin
>>> dir(bin)
['__builtins__', '__doc__', '__file__', '__name__', 'dbin', 'dbin8', 'rdbin']
>>>
".test.py" 都被忽略掉了!因为 bin.test.py 会导致查找文件 bin/test.py,当然是不存在的!

python Tree 若干相关问题

1. Tree traverse:
@ path string to node/value map
@ path sequence to node/value map
@ list of nodes/values (reference)
@ deep list of nodes?
\==> 2

2. Tree 和 deep dict 两者之间的一致性及其体现(基本原理及设计思想)
计入设计文档

3. Tree search:
@ include/exclude
@ return string or list sequence map
@ copy or assign?
@ key indexed items

4. Tree node value copy/deepcopy?
\==> python mutable/immutable 设计思想?

5. add operation:
head.branch + root.br1 ?
head.branch + {['root', 'br1', 'br11'] : value1, ['root', 'br1', 'br12'] : value2, ...}
\==> Tree update

6. Tree cyclic link 问题

星期三, 六月 20, 2007

python itree = func(head.branch=value)

>>> class tree:
... def __init__(self, **kwargs):
... self.ka = kwargs
...
>>> t = tree()
>>> print t.ka
{}
>>> t = tree(a=1)
>>> print t.ka
{'a': 1}
>>> t = tree(a.b=1)
SyntaxError: keyword can't be an expression
虽然已经定义了 class Tree,并且可以非常方便的操作,如:
head = Tree(value)
head = Tree(value, data=value1, extra=value2)
head.branch = value
head.branch = Tree(value)
head.branch[key] = value
value = head.branch()
value = head.branch[key]()
head.Node1(['branch', 'br1'], value)
head.Node1(['branch', {'br1' : key}, 'br2'], value)
tmap = head.branch('traverse')
other = Tree(value); head.update(other)
但如果要做 func(head.branch=value) 还是不可能的。不过实际上也没有这样的必要,如果需要的是 head.branch 或它的值,可以分别用 head.branch 和 head.branch() 作为其参数,如果是要改变 head.branch 的值,在函数中更改即可。上面的形式只会造成混乱。

python **kwargs

>>> class tree:
... def __init__(self, value, **kwargs):
... exec "self.%s = kwargs" % value
...
>>> t = tree('x', k='v')
>>> print t
<__main__.tree instance at 0xb7ec464c>
>>> print t.x
{'k': 'v'}
>>> t = tree('x', **{'k' : 'v'})
>>> print t.x
{'k': 'v'}
>>> print **{'a' : 1, 'b' : 2}
File "", line 1
print **{'a' : 1, 'b' : 2}
^
SyntaxError: invalid syntax
参考:
2007/05/python-datetime-object-from.html

python setattr

>>> class tree:
... def __init__(self): pass
...
>>> t = tree
>>> t = tree()
>>> t.x = 1
>>> print t.__dict__
{'x': 1}
>>> t.'' = 1
File "", line 1
t.'' = 1
^
SyntaxError: invalid syntax
>>> t. = 1
File "", line 1
t. = 1
^
SyntaxError: invalid syntax
>>> setattr(t, '', 2)
>>> print t.__dict__
{'': 2, 'x': 1}
>>> setattr(t, '?', 2)
>>> print t.__dict__
{'': 2, 'x': 1, '?': 2}
>>> print t.?
File "", line 1
print t.?
^
SyntaxError: invalid syntax
>>> setattr(t, '.y', 3)
>>> print t.__dict__
{'': 2, 'x': 1, '?': 2, '.y': 3}
>>> print t..y
File "", line 1
print t..y
^
SyntaxError: invalid syntax
可见,使用 setattr 可以创建非常不规范的 attr_name,它实际上只是作为一个字符串存放在 instance.__dict__ 中了。在:
2007/06/python-dict-pseudo-private-attributes.html
中我也已经讨论过 __dict__ 的相关问题,实际上是一脉相承的。

python dict update

>>> zip(['a', 'b', 'c'], [1, 2, 3])
[('a', 1), ('b', 2), ('c', 3)]
>>> dict(zip(['a', 'b', 'c'], [1, 2, 3]))
{'a': 1, 'c': 3, 'b': 2}
>>> zip(['a', 'b', 'c'], [1, 2, 3], ['x', 'y', 'z'])
[('a', 1, 'x'), ('b', 2, 'y'), ('c', 3, 'z')]
>>> dict(zip(['a', 'b', 'c'], [1, 2, 3], ['x', 'y', 'z']))
Traceback (most recent call last):
File "", line 1, in ?
ValueError: dictionary update sequence element #0 has length 3; 2 is required
>>> help(dict.update)

>>> d = {'a' : 1, 'b' : 2, 'c': 3}
>>> print d
{'a': 1, 'c': 3, 'b': 2}
>>> d.update(('x', 'y', 'z'))
Traceback (most recent call last):
File "", line 1, in ?
ValueError: dictionary update sequence element #0 has length 1; 2 is required
>>> d.update(('x', 'y', 'z'), (4, 5, 6))
Traceback (most recent call last):
File "", line 1, in ?
TypeError: update expected at most 1 arguments, got 2
>>> d.update([('x', 4), ('y', 5), ('z', 6)])
>>> print d
{'a': 1, 'c': 3, 'b': 2, 'y': 5, 'x': 4, 'z': 6}
可见这里 d.update([('x', 4), ('y', 5), ('z', 6)]) 和 d.update({'x' : 4, 'y' : 5, 'z' : 6}) 拥有相同的效果。

python __doc__ and gettext

#!/usr/bin/env python
# -*- encoding: utf-8 -*-

import re
from gettext import gettext as _

#1
__doc__ = _("""
Author: Roc Zhou
Date: 2007-06-20
Email: chowroc.z@gmail.com

A new data structrue: Tree
...""")

class Tree:
#2
__doc__ = _("""This class define a builtins like tree type of data structure,
thus now we can manipulate a tree very conveniently, like this:
head = Tree(value)
head = Tree(value, data=value1, extra=value2)
head.branch = value
...""")
......
def __init__(self, value, **kwargs):
_(""""head = Tree(value)" create a Tree instance as the root node with node 'value'.
...""")
......
def Node1(self, pathseq, value):
#3
_("""Construct a Tree node from a path sequence, this path sequence
only represents a single node, for example:
...""")
这里对 #1, #2 都使用 "__doc__ = " 的操作,而不是默认的直接定义。对于#1,可能是因为前面使用了 from gettext import gettext as _,因为需要国际化支持,所以必须先这么做,但导致 __doc__ 默认定义不在 module 顶部,所以可能需要使用 __doc__ 来指定。

而#2可能是因为 gettext 返回的值使用默认定义方法无法正确传递给 __doc__,因为打印的结果为 None。

但是对于#3,无论是否使用 "__doc__ = " 都无效,只能使用默认定义,不知道是什么原因?

星期一, 六月 18, 2007

python 'try/except' in unittest?

起初,我在一个 test function 中定义的测试流程如下:

def setUp(self):
try:
self.head = Tree('root', data='head.data', extra='head.extra')
self.failUnlessEqual(self.head._Tree__node_value, 'root')
self.failUnlessEqual(self.head.data, 'head.data') #(1)
self.failUnlessEqual(self.head.extra, 'head.extra')
except:
seilf.fail(_("Unexpected exception catched"))
然后我运行这个单元测试:
sh$ python tree_ut.py
EEEE
======================================================================
ERROR: test__call__ (__main__.TestTree)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tree_ut.py", line 29, in setUp
self.fail(_("Unexpected exception catched"))
File "/usr/lib/python2.3/unittest.py", line 270, in fail
raise self.failureException, msg
AssertionError: Unexpected exception catched

======================================================================
ERROR: test__setattr__ (__main__.TestTree)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tree_ut.py", line 29, in setUp
self.fail(_("Unexpected exception catched"))
File "/usr/lib/python2.3/unittest.py", line 270, in fail
raise self.failureException, msg
AssertionError: Unexpected exception catched

======================================================================
ERROR: test__sgetitem__ (__main__.TestTree)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tree_ut.py", line 29, in setUp
self.fail(_("Unexpected exception catched"))
File "/usr/lib/python2.3/unittest.py", line 270, in fail
raise self.failureException, msg
AssertionError: Unexpected exception catched

======================================================================
ERROR: test_tree (__main__.TestTree)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tree_ut.py", line 29, in setUp
self.fail(_("Unexpected exception catched"))
File "/usr/lib/python2.3/unittest.py", line 270, in fail
raise self.failureException, msg
AssertionError: Unexpected exception catched

----------------------------------------------------------------------
Ran 4 tests in 0.006s

FAILED (errors=4)
这样的错误信息让人摸不着头脑,怎么会都是 Unexpected exception 呢?用:
def setUp(self):
try:
self.head = Tree('root', data='head.data', extra='head.extra')
self.failUnlessEqual(self.head._Tree__node_value, 'root')
self.failUnlessEqual(self.head.data, 'head.data') #(1)
self.failUnlessEqual(self.head.extra, 'head.extra')
except Exception, ex:
print 'DEBUG:', ex, ex.__class__
seilf.fail(_("Unexpected exception catched"))
分析一下,可以发现是 #(1) 部分抛出的异常被 except 捕捉了,当然这里这个测试用例是不对的,但这也说明在 unittest 使用这种 try/except 语法是行不通的,因为这样就掩盖了真实的错误,让人搞不清方向。

python __dict__ && pseudo private attributes

前面为了定义 Tree 结构时谈到,在重新定义了 __setattr__ 之后只能通过 __dict__ 来赋值。在实际使用的过程中,还发现一些 __dict__ 的问题需要搞清楚。

首先 dir() 的输出和 __dict__ 是不同的,例如定义了一个空类 tree 之后,分别查看 dir() 和 __dict__:
>>> class tree
... def __init__(self): pass
...
>>> t = tree()
>>> dir(tree)
['__doc__', '__init__', '__module__']
>>> print tree.__dict__
{'__module__': '__main__', '__doc__': None, '__init__': }
>>> print t.__dict__
{}
>>> t.f = 'testing'
>>> print t.__dict__
{'f': 'testing'}
>>> print tree.__dict__
{'__module__': '__main__', '__doc__': None, '__init__': }
对一个 class object 甚至可以这样赋值:
>>> class tree:
... def __init__(self): pass
...
>>> d = {'a' : 1, 'b' : 2, 'c' : 3}
>>> tree.__dict__.update(d)
>>> print tree.__dict__
{'a': 1, '__module__': '__main__', 'b': 2, 'c': 3, '__doc__': None, '__init__': }
>>> t = tree()
>>> print t.__dict__
{}
>>> t.b
2
下面的问题涉及伪私有变量的使用。

在 class Tree 中,对于节点值和键索引,使用了 __node_value 和 __node_items 这样的变量。于是定义是:
self.__dict__['__node_value'] = value
self.__dict__['__node_items'] = {}
在使用的时候却发现根本没有值,且报错。可以用下面一些例子来说明这种情况:
>>> class test:
... def __init__(self):
... self.__dict__['x'] = 1
... self.__dict__['__node_value'] = 2
...
>>> t = test()
>>> print t.x
1
>>> print t._test__node_value
Traceback (most recent call last):
File "", line 1, in ?
AttributeError: test instance has no attribute '_test__node_value'

>>> class test:
... def __init__(self):
... self.__dict__['x'] = 1
... self.__dict__['__value'] = 2
... self.__dict__['__node_value'] = 3
...
>>> t = test()
>>> print t.x
1
>>> print t.__value
2
>>> print t.__node_value
3
>>> print t.__dict__
{'__node_value': 3, 'x': 1, '__value': 2}

>>> class test:
... def __init__(self):
... self.__dict__['x'] = 1
... self.__dict__['__node_value'] = 2
... print self.x
... print self.__node_value
...
>>> t = test()
1
Traceback (most recent call last):
File "", line 1, in ?
File "", line 7, in __init__
AttributeError: test instance has no attribute '_test__node_value'

>>> class test:
... def __init__(self):
... self.__dict__['x'] = 1
... self.__node_value = 2
... print self.x
... print self.__node_value
...
>>> t = test()
1
2
>>> print t.__dict__
{'x': 1, '_test__node_value': 2}
可见,当使用 self.__node_value 来赋值时,实际上在 __dict__ 中的键值确实 '_test__node_value',而不是 '__node_value';如果键值是 '__node_value',那么就可以使用 t.__node_value 来访问,而不受所谓 pseudo private attribute 规则的限制!

对于定义在 class object 中的变量也是一样的道理:
>>> class test:
... __used_names = ['name1', 'name2']
... def __init__(self): self.__dict__['x'] = 1
>>> t = test()
>>> print t.__dict__
{'x': 1}
>>> print t.__class__.__dict__
{'__module__': '__main__', '__init__': , '_test__used_names': ['name1', 'name2'], '__doc__': None}
>>> print t.__used_names
Traceback (most recent call last):
File "", line 1, in
print t.__used_names
AttributeError: test instance has no attribute '__used_names'
>>> print t._test__used_names
['name1', 'name2']

python redefined __setattr__

现在在定义新的 python 数据结构 Tree,其目标是希望其在 python 中的操作非常简单,象 builtins 的变量那样,例如 root.branch1.branch11() 能返回这个节点的值。

现在,在对一个树的节点赋值时,如:
root.branch1.branch11 = value
或:
root.branch1.branch12 = Tree(value)
对于这个节点的名字要有一点限制(也就是前面讨论的要避免名字空间冲突的问题),就是对于一个 Tree instance 已经保留使用的 name,如 method 名:__repr__, __getitem__, __setitem__,以及作为节点值的 __node_value 和作为键值索引的 __node_items。这时,调用的方法是 Tree.__setattr__,为了达到保留字的目的,最初的做法是这样的:
class Tree:
_("""A Tree instance can only contains sub Tree instances as its attributes""")
# (1)
__used_names = None
__used_names = Tree.__dict__.copy()
__used_names.update(object.__dict__.copy())
__used_names = used_names.keys()
def __init__(self, value=None, **kwargs):
self.__node_value = value
# __node_value should always be Non-Tree
self.__node_items = {}
for k, v in kwargs.items(): self.__setattr__(k, v)
...
def __setattr__(self, attr_name, value):
_("""The operation 'root.br1.br2 = value' works on root.br1 rather than root.br1.br2""")
if attr_name in self.__used_names:
raise TreeExc(_("Attribute name '%s' is reserved" % attr_name))
try:
existed = getattr(self, attr_name)
# self.attribute exists
if isinstance(value, Tree):
subtree = value
if not subtree._Tree__node_value:
subtree._Tree__node_value = existed._Tree__node_value
setattr(self, attr_name, subtree)
# replace directly after the tree __node_value has been reserved
else:
setattr(self, '%s._Tree__node_value' % attr_name, value)
except AttributeError:
# if self.attribute does not exists, assign it a EMPTY node
if isinstance(value, Tree):
subtree = value
setattr(self, attr_name, subtree)
else:
setattr(self, attr_name, Tree(value))
然后运行:
sh$ python -c "import tree; t = tree.Tree(1, data='head')"
Traceback (most recent call last):
File "", line 1, in ?
File "tree.py", line 16, in ?
class Tree:
File "tree.py", line 20, in Tree
__used_names = Tree.__dict__.copy()
NameError: name 'Tree' is not defined
用 self 也是一样的报告没有 difined 的错误。这是很好理解的,在 Tree 完成所有的 attributes 的 assign 之前,它当然是没有定义的。所以这种办法不可行。

于是重新定义:
class Tree:
_("""A Tree instance can only contains sub Tree instances as its attributes""")
# (1)
# __used_names = None
# __used_names = Tree.__dict__.copy()
# __used_names.update(object.__dict__.copy())
# __used_names = used_names.keys()
# ------------------------------------------------------------------------
def __init__(self, value=None, **kwargs):
# (2)
self.__used_names = None
self.__used_names = Tree.__dict__.copy()
self.__used_names.update(object.__dict__)
self.__used_names = self.__used_names.keys()
self.__node_value = value
# __node_value should always be Non-Tree
self.__node_items = {}
for k, v in kwargs.items(): self.__setattr__(k, v)
def __getattr__(self, attr_name): pass
def __setattr__(self, attr_name, value):
_("""The operation 'root.br1.br2 = value' works on root.br1 rather than root.br1.br2""")
if attr_name in self.__used_names:
raise TreeExc(_("Attribute name '%s' is reserved" % attr_name))
try:
existed = getattr(self, attr_name)
# self.attribute exists
if isinstance(value, Tree):
subtree = value
if not subtree._Tree__node_value:
subtree._Tree__node_value = existed._Tree__node_value
setattr(self, attr_name, subtree)
# replace directly after the tree __node_value has been reserved
else:
setattr(self, '%s._Tree__node_value' % attr_name, value)
except AttributeError:
# if self.attribute does not exists, assign it a EMPTY node
if isinstance(value, Tree):
subtree = value
setattr(self, attr_name, subtree)
else:
setattr(self, attr_name, Tree(value))
运行结果:
sh$ python -c "import tree; t = tree.Tree(1, data='head')"
Traceback (most recent call last):
File "", line 1, in ?
File "tree.py", line 30, in __init__
self.__used_names = None
File "tree.py", line 55, in __setattr__
if attr_name in self.__used_names:
TypeError: iterable argument required
可以看到,__setattr__ 被调用了,说明当重新定义了 __setattr__ 之后,不仅仅是外部调用如 root.branch1.branch11 = value 这样的情况会收到影响,而且自身的使用如 self.attribute = value 也会受到影响,都会把要操作的对象当作一个 Tree Node 来处理。同时这也说明接下来的两个语句:
self.__node_value = value
# __node_value should always be Non-Tree
self.__node_items = {}
也是有问题的。实际上,这会导致赋值语句的死循环,最后在运行时可能会报这样的错误:
......
File "tree.py", line 67, in __setattr__
setattr(self, '%s._Tree__node_value' % attr_name, value)
File "tree.py", line 67, in __setattr__
setattr(self, '%s._Tree__node_value' % attr_name, value)
File "tree.py", line 67, in __setattr__
setattr(self, '%s._Tree__node_value' % attr_name, value)
File "tree.py", line 67, in __setattr__
setattr(self, '%s._Tree__node_value' % attr_name, value)
File "tree.py", line 67, in __setattr__
setattr(self, '%s._Tree__node_value' % attr_name, value)
File "tree.py", line 67, in __setattr__
setattr(self, '%s._Tree__node_value' % attr_name, value)
File "tree.py", line 67, in __setattr__
setattr(self, '%s._Tree__node_value' % attr_name, value)
File "tree.py", line 67, in __setattr__
setattr(self, '%s._Tree__node_value' % attr_name, value)
File "tree.py", line 48, in __setattr__
_("""The operation 'root.br1.br2 = value' works on root.br1 rather than root.br1.br2""")
File "/usr/lib/python2.3/gettext.py", line 472, in gettext
return dgettext(_current_domain, message)
File "/usr/lib/python2.3/gettext.py", line 454, in dgettext
t = translation(domain, _localedirs.get(domain, None))
File "/usr/lib/python2.3/gettext.py", line 403, in translation
mofiles = find(domain, localedir, languages, all=1)
File "/usr/lib/python2.3/gettext.py", line 366, in find
val = os.environ.get(envar)
File "/usr/lib/python2.3/UserDict.py", line 51, in get
if not self.has_key(key):
RuntimeError: maximum recursion depth exceeded


那么如果使用内置的 setattr() 函数呢?将上面的(2)改成(3):
        # (3)
setattr(self, '__used_names', None)
setattr(self, '__used_names', Tree.__dict__.copy())
self.__used_names.update(object.__dict__)
setattr(self, '__used_names', self.__used_names.keys())
运行:
sh$ python -c "import tree; t = tree.Tree(1, data='head')"
Traceback (most recent call last):
File "", line 1, in ?
File "tree.py", line 35, in __init__
setattr(self, '__used_names', None)
File "tree.py", line 55, in __setattr__
if attr_name in self.__used_names:
TypeError: iterable argument required
同样的错误,说明 setattr(self, attr_name, value) 与 self.attribute = value 一样收到 self.__setattr__() 的影响。

可以考虑的另一种办法是在每个 __init__ 中增加相应的初始化语句:
class Tree:
_("""A Tree instance can only contains sub Tree instances as its attributes""")
# (1)
# __used_names = None
# __used_names = Tree.__dict__.copy()
# __used_names.update(object.__dict__.copy())
# __used_names = used_names.keys()
# ------------------------------------------------------------------------
# __used_names = object.__dict__.copy().keys() + [
# '__used_names', '__node_value', '__node_items',
# '__getattr__', '__setitem__', '__getitem__', '__call__',
# 'tree', '__traverse__' ]
def __init__(self, value=None, **kwargs):
# (2)
# self.__used_names = None
# self.__used_names = Tree.__dict__.copy()
# self.__used_names.update(object.__dict__)
# self.__used_names = self.__used_names.keys()
# (3)
# setattr(self, '__used_names', None)
# setattr(self, '__used_names', Tree.__dict__.copy())
# self.__used_names.update(object.__dict__)
# setattr(self, '__used_names', self.__used_names.keys())
# --------------------------------------------------------------------
# self.__node_value = value
self.__dict__['__node_value'] = value
# __node_value should always be Non-Tree
# self.__node_items = {}
self.__dict__['__node_items'] = {}
for k, v in kwargs.items(): self.__setattr__(k, v)
def __setattr__(self, attr_name, value):
_("""The operation 'root.br1.br2 = value' works on root.br1 rather than root.br1.br2""")
# (4)
__used_names = Tree.__dict__.copy()
print __used_names
__used_names.update(object.__dict__.copy())
__used_names = __used_names.keys() + ['__node_value', '__node_items']

if attr_name in __used_names:
# --------------------------------------------------------------------
# if attr_name in self.__used_names:
raise TreeExc(_("Attribute name '%s' is reserved" % attr_name))
try:
existed = getattr(self, attr_name)
# self.attribute exists
if isinstance(value, Tree):
subtree = value
if not subtree._Tree__node_value:
subtree._Tree__node_value = existed._Tree__node_value
setattr(self, attr_name, subtree)
# replace directly after the tree __node_value has been reserved
else:
setattr(self, '%s._Tree__node_value' % attr_name, value)
except AttributeError:
# if self.attribute does not exists, assign it a EMPTY node
if isinstance(value, Tree):
subtree = value
# setattr(self, attr_name, subtree)
self.__dict__[attr_name] = subtree
else:
# setattr(self, attr_name, Tree(value))
self.__dict__[attr_name] = Tree(value)
可见,在重定义了 __setattr__ 之后,赋值只能使用 self.__dict__ 来完成了。

这样做的潜在的问题是开销可能会比较大,因为每次创建 Tree instance 的时候到要重复一遍。还是回到定义 Tree class static variable 的思路上来的话,最终是这样定义的:
class Tree:
_("""A Tree instance can only contains sub Tree instances as its attributes""")
# (1)
# __used_names = None
# __used_names = Tree.__dict__.copy()
# __used_names.update(object.__dict__.copy())
# __used_names = used_names.keys()
# ------------------------------------------------------------------------
__used_names = object.__dict__.copy().keys() + [
'__used_names', '__node_value', '__node_items',
'__getattr__', '__setitem__', '__getitem__', '__call__',
'tree', '__traverse__' ]
def __init__(self, value=None, **kwargs):
# (2)
# self.__used_names = None
# self.__used_names = Tree.__dict__.copy()
# self.__used_names.update(object.__dict__)
# self.__used_names = self.__used_names.keys()
# (3)
# setattr(self, '__used_names', None)
# setattr(self, '__used_names', Tree.__dict__.copy())
# self.__used_names.update(object.__dict__)
# setattr(self, '__used_names', self.__used_names.keys())
# --------------------------------------------------------------------
# self.__node_value = value
self.__dict__['__node_value'] = value
# __node_value should always be Non-Tree
# self.__node_items = {}
self.__dict__['__node_items'] = {}
for k, v in kwargs.items(): self.__setattr__(k, v)
def __setattr__(self, attr_name, value):
_("""The operation 'root.br1.br2 = value' works on root.br1 rather than root.br1.br2""")
# (4)
# __used_names = Tree.__dict__.copy()
# __used_names.update(object.__dict__.copy())
# __used_names = __used_names.keys() + ['__node_value', '__node_items']
# if attr_name in __used_names:
# --------------------------------------------------------------------
if attr_name in self.__used_names:
raise TreeExc(_("Attribute name '%s' is reserved" % attr_name))
try:
existed = getattr(self, attr_name)
# self.attribute exists
if isinstance(value, Tree):
subtree = value
if not subtree._Tree__node_value:
subtree._Tree__node_value = existed._Tree__node_value
# setattr(self, attr_name, subtree)
self.__dict__[attr_name] = subtree
# replace directly after the tree __node_value has been reserved
else:
# setattr(self, '%s._Tree__node_value' % attr_name, value)
self.__dict__[attr_name]._Tree__node_value = value
# self.__dict__[attr_name] = Tree(value)
except AttributeError:
# if self.attribute does not exists, assign it a EMPTY node
if isinstance(value, Tree):
subtree = value
# setattr(self, attr_name, subtree)
self.__dict__[attr_name] = subtree
else:
# setattr(self, attr_name, Tree(value))
self.__dict__[attr_name] = Tree(value)

星期四, 六月 14, 2007

python reassign __repr__ or normal method?

已知在 python 中,变量名都是到对象的引用,因此可以指向任何对象而不受类型的限制。比如一个 function/method 的变量名就可以指向一个普通变量。那么对于象 __getattr__ 这样的 builtins 的方法会有什么影响呢?例如:
sh$ expand -t4 test001.py
#!/usr/bin/env python

class tree:
def __init__(self):
self.data = 1
def func(self):
return self.data
def __custom__(self):
return self.data
def __private(self):
return self.data

t = tree()
print "t", t
print "t.data", t.data
print "t.__custom__()", t.__custom__()
print "t._tree__private__()", t._tree__private()
# print "t.__repr__()", t.__repr__() #(1)
# print "t._tree__repr__()", t._tree__repr__() #(2)
# print "type(t.__repr__)", type(t.__repr__)
print "t.func()", t.func()
t.__repr__ = 2
t.__str__ = 2
t.func = 2
t.__custom__ = 2
t._tree__private = 2
print "*** After resign ***"
# print "t", t #(3)
# t.__repr__() is used
print "t.data", t.data
# print "t.__repr__()", t.__repr__() #(4)
# print "type(t.__repr__)", type(t.__repr__)
# print "t.func()", t.func() #(5)
# print "t.__custom__()", t.__custom__() #(6)
print "t._tree__private__()", t._tree__private() #(7)
分别看一下(1)~(6)的出错输出如下:
(1)
t <__main__.tree instance at 0xb7f0418c>
t.data 1
t.__custom__() 1
t._tree__private__() 1
t.__repr__()
Traceback (most recent call last):
File "test001.py", line 18, in ?
print "t.__repr__()", t.__repr__() #(1)
AttributeError: tree instance has no attribute '__repr__'

(2)
t <__main__.tree instance at 0xb7faa18c>
t.data 1
t.__custom__() 1
t._tree__private__() 1
t._tree__repr__()
Traceback (most recent call last):
File "test001.py", line 19, in ?
print "t._tree__repr__()", t._tree__repr__() #(2)
AttributeError: tree instance has no attribute '_tree__repr__'

(3)
t <__main__.tree instance at 0xb7f6d18c>
t.data 1
t.__custom__() 1
t._tree__private__() 1
t.func() 1
*** After resign ***
t Traceback (most recent call last):
File "test001.py", line 28, in ?
print "t", t #(3)
TypeError: 'int' object is not callable

(4)
t <__main__.tree instance at 0xb7f0e18c>
t.data 1
t.__custom__() 1
t._tree__private__() 1
t.func() 1
*** After resign ***
t.data 1
t.__repr__()
Traceback (most recent call last):
File "test001.py", line 31, in ?
print "t.__repr__()", t.__repr__() #(4)
TypeError: 'int' object is not callable

(5)
t <__main__.tree instance at 0xb7fb418c>
t.data 1
t.__custom__() 1
t._tree__private__() 1
t.func() 1
*** After resign ***
t.data 1
t.func()
Traceback (most recent call last):
File "test001.py", line 33, in ?
print "t.func()", t.func() #(5)
TypeError: 'int' object is not callable

(6)
t <__main__.tree instance at 0xb7f6818c>
t.data 1
t.__custom__() 1
t._tree__private__() 1
t.func() 1
*** After resign ***
t.data 1
t.__custom__()
Traceback (most recent call last):
File "test001.py", line 34, in ?
print "t.__custom__()", t.__custom__() #(6)
TypeError: 'int' object is not callable

(7)
t <__main__.tree instance at 0xb7f8618c>
t.data 1
t.__custom__() 1
t._tree__private__() 1
t.func() 1
*** After resign ***
t.data 1
t._tree__private__()
Traceback (most recent call last):
File "test001.py", line 35, in ?
print "t._tree__private__()", t._tree__private() #(7)
TypeError: 'int' object is not callable
这样看来,内置的操作符与一般方法一样,上面重新定义了 t.__repr__ 和 t.__str__ 后,print t 的操作就不行了(注意不是 print t.data!)。对于其他内置类型如 list/dict 等也是一样。

只是不太明白为什么上面会出现 AttributeError: tree instance has no attribute '__repr__' 这样的异常,因为 __repr__ 并不是私有变量呀?而且即使是私有变量,通过 _tree__repr__ 也应该可以访问的呀?

于是再显式地定义一个 __repr__:
sh$ expand -t4 test002.py
#!/usr/bin/env python

class tree:
def __init__(self):
self.data = 1
def __repr__(self):
return "test: %d" % self.data

t = tree()
print "t", t
print "t.data", t.data
print "t.__repr__()", t.__repr__() #(1)
print "type(t.__repr__)", type(t.__repr__)
t.__repr__ = "string"
print "*** After reassign ***"
# print "t", t #(3)
# t.__repr__() is used
print "t.data", t.data
# print "t.__repr__()", t.__repr__() #(4)
print "type(t.__repr__)", type(t.__repr__)

(3)
t test: 1
t.data 1
t.__repr__() test: 1
type(t.__repr__)
*** After reassign ***
t Traceback (most recent call last):
File "test002.py", line 16, in ?
print "t", t #(3)
TypeError: 'str' object is not callable

(4)
t test: 1
t.data 1
t.__repr__() test: 1
type(t.__repr__)
*** After reassign ***
t.data 1
t.__repr__()
Traceback (most recent call last):
File "test002.py", line 19, in ?
print "t.__repr__()", t.__repr__() #(4)
TypeError: 'str' object is not callable
可以看到,重新定义之后,t.__repr__ 这个 attribute 就有了!

但为什么在前面就产生没有定义的异常,如果没有定义,那么 print t 的操作又是如何进行的呢?对于这一点,需要对 python 的基本原理有一个更清楚的认识才行。参考"python namespace"的说明。

另一个比较重要的例子:
>>> class tree:
def __init__(self): self.__repr__ = 1
>>> t = tree()
>>> print t
Traceback (most recent call last):
File "", line 1, in ?
TypeError: 'int' object is not callable
之所以讨论这些问题,是因为如果我想要定义一个 class Tree,那么就可以存在名字空间冲突的隐患!

星期三, 六月 13, 2007

logcheck hack for logfiles updated by rsync

由于硬件资源的限制,对于日志分析中日志的集中过程无法利用 syslog-ng 这样的远程记录方式来完成,否则网络带宽就会成为瓶颈,而日志丢包的问题又不知道如何解决。所以使用更稳妥的办法就是利用 rsync 来同步日志到一个中央的存储主机上,然后日志分析的工作比如 logcheck for log filter 就在这里集中进行。

然而遇到一个问题,就是 logcheck 调用 logtail 时,会记录原日志文件的 inode 和 size 到一个 offset_file,其格式是:
$inode
$last_size
下次 logcheck 运行时,首先通过文件大小(wc -c)和 offset_file 中的 $last_size 进行比较,如果 $current_size "<" $last_size,那么 logcheck 就判定已经做个轮转,于是转到 logfile.1,否则仍然检查 logfile,然后会调用 logtail,logtail 会应用那个 offset_file 到选定的日志文件,通过 $last_size 记录从上次结束的地方开始读取...

但是如果使用 rsync 同步日志时,这个 inode 不会被保留,每一次 rsync 之后,日志的 inode 都会被更改,导致 logcheck 无法正确找到上一次结束的位置。如果使用 cp -f,则 indoe 会被保留,但 cp 的开销太大,特别是对于日志这样的大文件。于是我对 logcheck 又做了一个小 hack,增加了一个 --rsynced 参数:
-y    = adjust the inode number if the logfiles are updated by rsync
最终的 patch 文件是这样的:
sh$ expand -t4 logcheck-1.2.45-rsynced.patch
diff -Naur logcheck-1.2.45.old/src/logcheck logcheck-1.2.45.new/src/logcheck
--- logcheck-1.2.45.old/src/logcheck 2006-07-06 18:16:42.000000000 +0800
+++ logcheck-1.2.45.new/src/logcheck 2007-06-15 10:50:34.000000000 +0800
@@ -49,7 +49,7 @@
ATTACK=0

# Set the getopts string
-GETOPTS="c:dhH:l:L:m:opr:RsS:tTuvw"
+GETOPTS="c:dhH:l:L:m:opr:RsS:tTuvwy"

# Get the details for the email message
DATE="$(date +'%Y-%m-%d %H:%M')"
@@ -90,6 +90,10 @@
LOCKDIR=/var/lock/logcheck
LOCKFILE="$LOCKDIR/logcheck"

+RSYNCED=0
+# If the logfiles are centralized by rsync, the inode number will not be
+# reserved, so add an option for this condition
+
# Carry out the clean up tasks
cleanup() {

@@ -208,7 +212,9 @@
mkdir $cleaned \
|| error "Could not make dir $cleaned for cleaned rulefiles."
fi
- for rulefile in $(run-parts --list $dir); do
+ debug "find $dir, pwd: `pwd`"
+ for rulefile in $(find $dir -type f -perm +0100); do
+ debug "rulefile $rulefile"
rulefile=$(basename $rulefile)
if [ -f ${dir}/${rulefile} ]; then
debug "cleanrules: ${dir}/${rulefile}"
@@ -406,6 +412,18 @@
fi
}

+rsynced() {
+ logfile=$1
+ offsetfile=$2
+ if [ $RSYNCED -eq 1 ]; then
+ debug "$logfile rsync specified"
+ new_inode=$(ls -i $logfile | awk '{print $1}')
+ old_inode=$(head -n1 $offsetfile)
+ debug "replace $old_inode with $new_inode"
+ sed -i "1s/^.*$/$new_inode/" $offsetfile
+ fi
+}
+
# Get the yet unseen part of one logfile.
logoutput() {
file=$1
@@ -415,11 +433,13 @@
if [ -f "$file" ]; then
offsetfile="$STATEDIR/offset$(echo $file | tr / .)"
if [ -s "$offsetfile" -a -r "$offsetfile" ]; then
+ rsynced $file $offsetfile
if [[ $(wc -c < "$file") -lt $(tail -n 1 "$offsetfile") ]]; then
# assume the log is rotated by savelog(8)
# syslog-ng leaves old files here
if [ -e "$file.0" -a "$file.0" -nt "$file.1.gz" ]; then
debug "Running logtail on rotated: $file.0"
+ rsynced $file.0 $offsetfile
$LOGTAIL -f "$file.0" -o "$offsetfile" $LOGTAIL_OPTS > \
$TMPDIR/logoutput/$(basename "$file") 2>&1 \
|| error "Could not run logtail or save output"
@@ -429,6 +449,7 @@
# should also probably check if file is still fresh
elif [ -e "$file.1" ]; then
debug "Running logtail on rotated: $file.1"
+ rsynced $file.1 $offsetfile
$LOGTAIL -f "$file.1" -o "$offsetfile" $LOGTAIL_OPTS > \
$TMPDIR/logoutput/$(basename "$file") 2>&1 \
|| error "Could not run logtail or save output"
@@ -452,7 +473,7 @@
debug "usage: Printing usage and exiting"
cat"<<"EOF
usage: logcheck [-c CFG] [-d] [-h] [-H HOST] [-l LOG] [-L CFG] [-m MAIL] [-o]
- [-r DIR] [-s|-p|-w] [-R] [-S DIR] [-t] [-T] [-u]
+ [-r DIR] [-s|-p|-w] [-R] [-S DIR] [-t] [-T] [-u] [-y]
-c CFG = override default configuration file
-d = debug mode
-h = print this usage information and exit
@@ -471,6 +492,7 @@
-u = enable syslog-summary
-v = print version
-w = use the "workstation" runlevel
+ -y = adjust the offset inode number if the logfiles are updated by rsync remotely
EOF
}

@@ -600,6 +622,10 @@
debug "Setting REPORTLEVEL to workstation"
REPORTLEVEL="workstation"
;;
+ y)
+ debug "Setting RSYNCED"
+ RSYNCED=1
+ ;;
\?)
usage
exit 1
这里没有使用长格式的参数 --rsynced,而是短格式 -y(-r/-s 都已经被使用了),因为 logcheck 使用的是 bash 内置 getopts 而不是 GNU 的 getopt,而 getopts 不支持长格式的参数,如果改为使用 getopt,则要更改的内容太多,所以最后还是选择一个比较简单的方法吧。

注意这里轮转后的日志由于也是 rsync 同步的,所以也要应用 rsynced() 函数对 inode 做调整。

另外,从 root 调用 logcheck 的时候,可以使用 sudo 或 su,但记得要更改 $HOME 目录,对 su,可以使用 su - logcheck,对 sudo,可以使用 sudo -u logcheck -H。如果不更改 $HOME,会导致 find 命令出错(find $dir -type f -perm +0100,就是前面为避免使用 run-parts --list 而做的一个小 hack):
find: cannot get current directory: Permission denied
所以调用脚本可以写成:
#!/bin/sh

PATH=$PATH:/usr/sbin
datadir=/data/hosts
echo "DEBUG: $datadir"

find $datadir -type f | xargs setfacl -m user:logcheck:4
find $datadir -type d | xargs setfacl -m user:logcheck:5
# sudo -u logcheck -H /usr/sbin/logcheck $@
su - logcheck -c "/usr/sbin/logcheck $@"

星期一, 六月 11, 2007

cifs uid/gid overwrite

默认情况下,使用 cifs 挂载 samba 后,挂载文件系统的文件属主是有问题的:
smbclient# ls /mnt/host/ -l
total 24
-rw-r--r-- 1 root root 372 May 25 09:55 adjust
drwxr-xr-x 5 10003 10003 0 Jun 11 11:37 fs_backup
drwxr-xr-x 12 10003 10014 0 Jun 11 11:37 logs
这意味这文件属于了不该属于的用户。但即使使用:
smbclient# mount //store/homes -t cifs /mnt/host -o uid=0,gid=0,username=host_p01,password='********'
挂载也没有用,文件属主还是不对。

从 man 手册的情况来看,是有一个 Unix Extensions 在产生影响。要关闭这个选项,需要在 Samba 客户机上执行:
smbclient# echo "0">/proc/fs/cifs/LinuxExtensionsEnabled
然后再重新挂载 cifs 文件系统,不需要使用 uid,gid 参数,也会映射到 root 用户:
smbclient# mount //store/homes -t cifs /mnt/host -o username=host_p01,password='********'
smbclient# ls /mnt/host/ -l
total 24
-rwxrwSrwt 1 root root 372 May 25 09:55 adjust
drwxrwxrwx 1 root root 0 Jun 11 11:37 fs_backup
drwxrwxrwx 1 root root 0 Jun 11 11:37 logs
但是这样一来,文件的权限又有问题了!!! 只能通过增加参数来解决这个问题:
smbclient# mount //store/homes -t cifs /mnt/host -o file_mode=0644,dir_mode=0755,username=host_p01,password='********'
但是符号链接仍然不能使用。鱼与熊掌呀,现在只能如此了,好在这边应用中还没有必须用到 symlink 的地方,希望后续版本能够解决这个问题

smbclient# cd /mnt/hosts/
smbclient# ln logs/ -s test
ln: creating symbolic link `test' to `logs/': Operation not supported
如果要在服务器端关闭 Unix Extensions,在 /etc/samba/smb.conf 中编辑:
unix extensions = no

星期五, 六月 08, 2007

A small hack for logcheck-1.2.45

对于 log filter,logcheck 是比较好的选择,但是从 1.1.1 到 1.2.45,还是有比较大的变化,例如 logtail 由 C 程序改成了 perl 脚本。但最主要的一点是 logcheck-1.2.45 的设计上正交性更好,而且提供了针对各种服务和应用的更多模式,因而可以给管理员更多的自由选择。因此可以省去很多使用 1.1.1 的情况下必须自己编写模式的麻烦,而且 1.2.45 的模式匹配也更为精确。

问题主要在于 logcheck-1.2.45 的安装比较麻烦,而且由于使用的是 shell 脚本,所以平台相关性比较严重一点,而且对于依赖性的检查不好。logcheck-1.2.45 依赖于 lockfile-progs,但除非你安装了 logcheck 并运行,你不会知道这一点,而 lockfile-progs 在安装时也会出现编译错误,因为缺少 lockfile.h 这个头文件,但它并不会告诉你这是因为还需要安装 liblockfile-1.06.2 这个包。

安装了所有这些之后,将 /var/log 拷贝成 /tmp/log,然后:
sh# chown logcheck.logcheck /tmp/log -R
sh# vi /etc/logcheck/logfiles
/tmp/log/messages
/tmp/log/maillog
/tmp/log/secure
sh# su - logcheck
sh$ /usr/sbin/logcheck

不会有输出,从接收到的邮件中分析,发现没有做任何过滤,但是 /etc/logcheck/ignore.d.server/* 中却确实有相应的模式!

因此我分析了一下 /usr/sbin/logcheck 这个 shell 程序,找到寻找模式文件的那部分
......
cleanrules "$RULEDIR/cracking.d" $TMPDIR/cracking
cleanrules "$RULEDIR/violations.d" $TMPDIR/violations
cleanrules "$RULEDIR/violations.ignore.d" $TMPDIR/violations-ignore

# Now clean the ignore rulefiles for the report levels
for level in $REPORTLEVELS; do
cleanrules "$RULEDIR/ignore.d.$level" $TMPDIR/ignore
done

# The following cracking.ignore directory will only be used if
# $SUPPORT_CRACKING_IGNORE is set to 1 in the configuration file.
# This is *only* for local admin use.
if [ $SUPPORT_CRACKING_IGNORE -eq 1 ]; then
cleanrules "$RULEDIR/cracking.ignore.d" $TMPDIR/cracking-ignore
fi
......
cleanrules() {
dir=$1
cleaned=$2

if [ -d $dir ]; then
if [ ! -d $cleaned ]; then
mkdir $cleaned \
|| error "Could not make dir $cleaned for cleaned rulefiles."
fi
for rulefile in $(run-parts --list $dir); do
rulefile=$(basename $rulefile)
if [ -f ${dir}/${rulefile} ]; then
debug "cleanrules: ${dir}/${rulefile}"
if [ -r ${dir}/${rulefile} ]; then
# pipe to cat on greps to get usable exit status
egrep --text -v '^[[:space:]]*$|^#' $dir/$rulefile | cat \
>> $cleaned/$rulefile \
|| error "Couldn't append to $cleaned/$rulefile. Disk Full?"
else
error "Couldn't read $dir/$rulefile"
fi
fi
done
elif [ -f $dir ]; then
error "cleanrules: '$dir' is a file, not a directory"
elif [ -z $dir ]; then
error "cleanrules: called without argument"
fi
}
可以看到,寻找模式文件的操作由 cleanrules() 函数来完成,而实际上有哪些文件需要应用是由 run-parts --list $dir 这个命令来查找的。增加一个 DEBUG 输出来查看有那些 rulefiles 被应用了,结果发现出错信息。

单独运行:
sh# run-parts --list /etc/logcheck/ignore.d.server/
Not a directory: --list
sh# run-parts /etc/logcheck/ignore.d.server/ --list
# EMPTY!
所以这样实际上没有找到任何文件。

从 google search 的情况来看,run-parts 的平台相关性比较大,这个命令是用来寻找一个目录下那些有执行权限的文件的,如果不使用 --list,就会执行这些文件。当然前提是这个 run-parts 有这个参数,而 RHEL4 上面的这个 run-parts 就没有这个参数(实际上只有一个 PATH 参数),而 logcheck 的开发者似乎对 debian 比较熟悉,所以这里不能直接使用。

可以做一个修改,将 run-parts 命令改为:
find $dir -type f -perm +0100
即可。对于使用 ulfs 安装,相应的 profile 为:
sh# cat /usr/src/logcheck/.config
pkgname = "logcheck";
version = "1.2.45";
user = "logcheck";
groups = "";
group = "logcheck";
archive = "logcheck_1.2.45.tar.gz";
command = "tar xfz logcheck_1.2.45.tar.gz";
command = "cd logcheck-1.2.45";
command = "sed -i 's/install -d/mkdir -p/g' Makefile";
command = "sed -i 's/run-parts --list $dir/find $dir -type f -perm +0100/g' src/logcheck";
command = "make";
command = "cd ..";
command = "rm -rf logcheck-1.2.45";
time = "20070608 10:49:35 Fri"
然后,在 /etc/logcheck/ignore.d.server 下,对那些需要用到的模式文件,使用 chmod u+x 增加可执行权限,这样这些文件就会在过滤的时候被用到!因此可以看到这种方法提供了更高的正交性和灵活性。

星期四, 六月 07, 2007

hostname in syslog

如果使用 syslog 或 syslog-ng 集中日志,或者使用其他方法集中日志(),然后在中心主机上用统一的过滤程序过滤日志并生成每日的报告,一个很重要的问题就是在日志记录中 hostname 字段必须正确唯一,否则两台主机使用同一个 hostname,那就混乱了。

如果 syslog 的 hostname 字段与系统的 hostname command 输出或 /etc/sysconfig/network 中 HOSTNAME= 不一致,只需要重启 syslogd 即可:/etc/init.d/syslog restart

星期三, 六月 06, 2007

From a Samba I/O problem

昨天调整网络的时候发现一个问题,因为之前都是用 Samba 做网络文件系统共享,使用 mount 挂载到本地来做一些备份操作等,昨天在没有 umount 的情况下更改 IP 地址网段,结果发现再运行 df/fuser/lsof 等命令都会导致其进程挂起,使用 ps 显示为状态"D"(Down or Deadlock?),即 Uninterruptable Sleep (Unusally I/O)。这些进程无法用 kill,即使使用 kill -9 也不行。

在网上查了一下,仔细想了一想,觉得这样的机制还是有道理的,因为这种情况通常反映的是 I/O 错误,最常见的就是磁盘错误,如果磁盘损害,出现了无法修复的错误,应该曝出这个错误,而不能使进程可以被 kill 掉。

事实上,对于 NFS 也是一样的,并且这和先终止服务器端的 smbd 进程不同:如果先终止 smbd,不会出现这样的 I/O 问题。

这些进程只能通过重启机器来消除,或者先把 IP 地址改回来,待 umount 之后再重新更改 IP 地址。对于网络文件系统,可以考虑更改 network SysV init 脚本,将相应的检查和操作加入其中。