弹剑而歌: Python Note: struct for binary protocol

roczhou

30 Dec 2008 ChangeLog:

30 Dec 2008, roczhou, 创建文档

最近在做一个项目的时候，使用了基于文本的 JSON 协议，总的来说，效率还可以。不过后来在做压力测试时，因为协议使用 UDP，因此会有大报文分片的情况，所以服务端只能基于 IP 分配任务，但因为最初没有找到虚拟大量客户端的有效方法(后来通过虚拟 IP 加 iptables 实现)，故当时出现的一个情况就是大量请求只压在了一个任务线程(Task Thread)上，服务端不能完全压满。

因此最初提出了一个将协议头改成二进制的方法，这样前端接收线程可以在收到报文后进行一个比较快速的解析，并将内容分配给正确的任务线程。但此时客户端使用的是 Python 实现，而服务器端使用的是 C/C++ 实现，为了实现这种二进制协议，需要使用 Python 的 struct 模块来进行转换。例子如下：

Python 端：

  #!/usr/bin/env python
# -*- encoding: utf-8 -*-

__author__ = "roczhou"
__date__ = "11 Dec 2008"
__version__ = "0.3"

import time
import struct

headobj = {
  "V" : 1,
  "C" : 0x01,
  "S" : 10000,
  "R" : 0,
  # "T" : int(time.time()),
  "T" : 1111111111,
 "c" : 1,
 "i" : 0,
  "L" : 32,
}

values = []
for k in ["V", "C", "S", "R", "T", "c", "i", "L"]:
 values.append(headobj[k])
print "SIZE", struct.calcsize("!2BHIQ2IH")
open("/tmp/head.bin", 'wb').write(struct.pack("!2BHIQ2IH", *values))

C 端:

  #include 
#include 
#include 

struct _head {
 uint8_t version;
 uint8_t command;
 uint16_t sequence;
 uint32_t reserved;
 uint64_t timestamp;
 uint32_t count;
 uint32_t index;
 uint16_t length;
} typedef head;

int main(int argc, char *argv[]) {
 char buffer[1024];
 FILE *fp = fopen("/tmp/head.bin", "rb");
 if(fp == NULL) {
  printf("File /tmp/head.bin does not exists\n");
  return 1;
 }
 // size_t len = fread(buffer, sizeof(head), 1024, fp);
 size_t len = fread(buffer, 1, 1024, fp);
 head *hd = (head *)buffer;
 printf("version: %d\n", hd->version);
 printf("command: %d\n", hd->command);
 printf("sequence: %d\n", ntohs(hd->sequence));
 printf("reserved: %d\n", ntohl(hd->reserved));
 printf("timestamp: %d\n", ntohl(hd->timestamp));
 printf("count: %d\n", ntohl(hd->count));
 printf("index: %d\n", ntohl(hd->index));
 printf("length: %d\n", ntohs(hd->length));
 printf("LEN: %d\n", len)
 return 0;
};

对于二进制协议，第一是要保证每个字段(元素)的长度固定不变，第二是要保证各个字段(元素)的顺序固定不必。

关于 Python struct 的用法，可以参考官方文档，最重要的是保证格式串所表示的各个字段(元素)的字节长一致，列表如下：

  B, [unsigned] char, 1 bytes
H, [unsigned] short, 2 bytes
I, [unsigned] int, 4 bytes
Q, [unsigned] long long, 8 bytes

l/L(long) 和 i/L(int) 的字节长度一样？

  >>> import struct
>>> struct.calcsize("!i")
4
>>> struct.calcsize("!l")
4

struct.calcsize(format) 会计算这个格式表示的字长，这个字节长必须和 C/C++ 的 sizeof(struct) 所计算出的长度一样。在上面的 C 代码中，最后也打印出了这个长度 LEN。

使用二进制协议另一个问题是字节序。不同的操作系统平台使用的字节序可能不一样，例如 Linux 和 Solaris。使用网络序一般不会有什么问题，所以在 Python 端的 format 使用了 ! 表示使用网络序，而 C 端使用 ntohs/ntohl 表示从网络序转换成整型和长整形，否则在 Linux 和 Solaris 下得到的结果会不同。

弹剑而歌

星期二, 十二月 30, 2008

Python Note: struct for binary protocol

没有评论:

博客归档

供稿人