全角半角相互转换（Python）

最近在学习自然语言处理，首要的问题是微博中文分词。我遇到了一个麻烦，就是全角（full-width）半角（half-width）的相互转换。

其实思路也是很简单的，学习C++的时候都写过根据ASCII码转换英文大小写的函数。

全角字符Unicode编码为65281 ~ 65374，对应的十六进制为0xFF01 ~ 0xFF5E。
半角字符Unicode编码为33~126，对应的十六进制为0x21 ~ 0x7E。
空格是特例，全角为12288（0x3000），半角为32（0x20）。

从上面的规则可以看出，除空格外，半角符号+65248（0xfee0）=对应的全角符号。

Talk is cheap, show you the code：

def full2half_width(ustr):
    # print type(ustr)  >>> "unicode"
    half = ''
    for u in ustr:
        num = ord(u)
        if num == 0x3000:    # 全角空格变半角
            num = 32
            print 'Found something.'
        elif 0xFF01 <= num <= 0xFF5E:
            num -= 0xfee0
        u = unichr(num)    # to unicode
        half += u
    return half

def half2full_width(ustr):
    # print type(ustr)  >>> "unicode"
    full = ''
    for u in ustr:
        num = ord(u)
        if num == 32:    # 半角空格变全角
            num = 0x3000
            print 'Found something.'
        elif 0x21 <= num <= 0x7E:
            num += 0xfee0
        u = unichr(num)    # to unicode
        full += u
    return full

注释

ord() 接受长度为1的字符串作为参数，返回对应的Unicode数值。
unichr() 反之