python pipe模块用法


pipe并不是Python内置的库,如果你安装了easy_install,直接可以安装它,否则你需要自己下载它:http://pypi.python.org/pypi/pipe

之所以要介绍这个库,是因为它向我们展示了一种很有新意的使用迭代器和生成器的方式:流。pipe将可迭代的数据看成是流,类似于linux,pipe使用'|'传递数据流,并且定义了一系列的“流处理”函数用于接受并处理数据流,并最终再次输出数据流或者是将数据流归纳得到一个结果。我们来看一些例子。

第一个,非常简单的,使用add求和:

[python]
  1. >>> from pipe import *
  2. >>> range(5) | add
  3. 10

    求偶数和需要使用到where,作用类似于内建函数filter,过滤出符合条件的元素:

    [python]
    1. >>> range(5) | where(lambda x: x % 2 == 0) | add
    2. 6

      还记得我们定义的斐波那契数列生成器吗?求出数列中所有小于10000的偶数和需要用到take_while,与itertools的同名函数有类似的功能,截取元素直到条件不成立:

      def fibonacci():
      a=b=1
      yield a
      yield b
      while True:
      a, b = b, a+b
      yield b

      [python]
      1. >>> fib = fibonacci
      2. >>> fib() | where(lambda x: x % 2 == 0)
      3. ... | take_while(lambda x: x < 10000)
      4. ... | add
      5. 3382


        需要对元素应用某个函数可以使用select,作用类似于内建函数map;需要得到一个列表,可以使用as_list:

        [python]
        1. >>> fib() | select(lambda x: x ** 2) | take_while(lambda x: x < 100) | as_list
        2. [1, 1, 4, 9, 25, 64]


          pipe中还包括了更多的流处理函数。你甚至可以自己定义流处理函数,只需要定义一个生成器函数并加上修饰器Pipe。如下定义了一个获取元素直到索引不符合条件的流处理函数:

          [python]
          1. >>> @Pipe
          2. ... def take_while_idx(iterable, predicate):
          3. ... for idx, x in enumerate(iterable):
          4. ... if predicate(idx): yield x
          5. ... else: return
          6. ...

            使用这个流处理函数获取fib的前10个数字:

            [python]
            1. >>> fib() | take_while_idx(lambda x: x < 10) | as_list
            2. [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]


              更多的函数就不在这里介绍了,你可以查看pipe的源文件,总共600行不到的文件其中有300行是文档,文档中包含了大量的示例。

              pipe实现起来非常简单,使用Pipe装饰器,将普通的生成器函数(或者返回迭代器的函数)代理在一个实现了__ror__方法的普通类实例上即可,但是这种思路真的很有趣。

               

               

              一道面试题:

              读取文件,统计文件中每个单词出现的次数,然后按照次数高低排序。

               

              本来蛮平淡无奇的一题,但一跟刚刚介绍的 Pipe 结合起来,就有意思了,这类数据流的处理,相当适合用 Pipe 来处理,花了点时间,写代码如下:


               

              #coding=utf-8
              from re import split
              from pipe import *
              
              with open(r'C:UsersAdministratorDesktop.py') as f:  
                  print(f.read()  
                      | Pipe(lambda x:split('W+', x))  
                      | Pipe(lambda x:(i for i in x if i.strip()))  
                      | groupby(lambda x:x)  
                      | select(lambda x:(x[0], (x[1] | count)))  
                      | sort(key=lambda x:x[1], reverse=True)  
                      )  

              输出结果:

               

              [('request', 91), ('POST', 81), ('and', 38), ('u', 36), ('if', 33), ('in', 32), ('team', 29), ('line', 23), ('objects', 20), ('gcmgroups', 16), ('get', 14), ('import', 14), ('save', 13), ('str', 12), ('0', 11), ('1', 11), ('i', 11), ('False', 10), ('GcwGroups', 9), ('from', 9), ('group_name', 9), ('path', 9), ('team_groups', 9), ('add', 8), ('else', 8), ('extra_context', 8), ('form2', 8), ('return', 8), ('Area', 7), ('baoming', 7), ('cname', 7), ('cname1', 7), ('cname2', 7), ('form1', 7), ('mysql_cur', 7), ('8', 6), ('gender', 6), ('is_del', 6), ('time', 6), ('user', 6), ('20', 5), ('7', 5), ('def', 5), ('depth', 5), ('for', 5), ('gcwteam', 5), ('radio1', 5), ('13', 4), ('16', 4), ('2', 4), ('2013', 4), ('5', 4), ('GB2312', 4), ('GcwMember', 4), ('GcwMemberForm', 4), ('GcwTeam', 4), ('GcwTeamForm', 4), ('HttpResponseRedirect', 4), ('age', 4), ('append', 4), ('area1', 4), ('cad_id', 4), ('csv', 4), ('django', 4), ('email', 4), ('encode', 4), ('fax', 4), ('gr_name', 4), ('lines', 4), ('name', 4), ('ob', 4), ('phone', 4), ('qq', 4), ('response', 4), ('status', 4), ('team_user', 4), ('template_name', 4), ('116', 3), ('12', 3), ('4', 3), ('RequestContext', 3), ('True', 3), ('a', 3), ('areas', 3), ('cname3', 3), ('community', 3), ('create', 3), ('csa', 3), ('diyi', 3), ('filter', 3), ('gcmmember', 3), ('gcw', 3), ('hd_cont', 3), ('id', 3), ('list', 3), ('mysql_db', 3), ('pp', 3), ('radio2', 3), ('radio3', 3), ('radio4', 3), ('radio9', 3), ('render_to_response', 3), ('result', 3), ('shiyun', 3), ('sys', 3), ('t_id', 3), ('textfield10', 3), ('textfield11', 3), ('textfield12', 3), ('textfield13', 3), ('textfield14', 3), ('textfield15', 3), ('textfield16', 3), ('textfield5', 3), ('textfield6', 3), ('textfield7', 3), ('textfield8', 3), ('textfield9', 3), ('title', 3), ('topic', 3), ('writers', 3), ('3', 2), ('50', 2), ('FROM', 2), ('Http404', 2), ('HttpResponse', 2), ('MySQLdb', 2), ('SELECT', 2), ('WHERE', 2), ('all', 2), ('area2', 2), ('area3', 2), ('baoming_user', 2), ('close', 2), ('commit', 2), ('context_instance', 2), ('cut_pages', 2), ('diqu', 2), ('except', 2), ('execute', 2), ('ftp', 2), ('ftp_status', 2), ('gcw_baoming_list', 2), ('gcw_team', 2), ('get_full_area', 2), ('group_community', 2), ('group_farmer', 2), ('group_org', 2), ('group_other', 2), ('group_pupils', 2), ('group_students', 2), ('group_tertiary', 2), ('group_troops', 2), ('is_valid', 2), ('len', 2), ('login_required', 2), ('models', 2), ('not', 2), ('page', 2), ('pk', 2), ('recommend_type', 2), ('resu', 2), ('root', 2), ('select_sql', 2), ('select_sql_mem', 2), ('set_gcw_ftpd', 2), ('st2', 2), ('todo', 2), ('try', 2), ('url', 2), ('username', 2), ('utf', 2), ('10', 1), ('11', 1), ('168', 1), ('17', 1), ('18', 1), ('192', 1), ('210', 1), ('9', 1), ('Content', 1), ('Disposition', 1), ('E', 1), ('QQ', 1), ('arraysize', 1), ('attachment', 1), ('auth', 1), ('baoshaowei', 1), ('break', 1), ('charset', 1), ('cleaned_data', 1), ('coding', 1), ('connect', 1), ('contrib', 1), ('cursor', 1), ('d', 1), ('datetime', 1), ('db', 1), ('decorators', 1), ('en', 1), ('excel', 1), ('extend', 1), ('fetchall', 1), ('fetchmany', 1), ('filename', 1), ('forms', 1), ('ftpd', 1), ('gcw130', 1), ('gcw_baoming', 1), ('gcw_baoming_csv', 1), ('gcw_shipin_status', 1), ('gcwteam_set', 1), ('get_object_or_404', 1), ('hbl', 1), ('hbl_cassi', 1), ('host', 1), ('html', 1), ('http', 1), ('insert', 1), ('int', 1), ('is_captain', 1), ('m_author', 1), ('m_name', 1), ('mail', 1), ('method', 1), ('mimetype', 1), ('order_by', 1), ('pages', 1), ('passwd', 1), ('print', 1), ('raise', 1), ('range', 1), ('re', 1), ('recommend_name', 1), ('reload', 1), ('setdefaultencoding', 1), ('shortcuts', 1), ('team_age', 1), ('team_area', 1), ('team_area_id', 1), ('team_man_num', 1), ('team_name', 1), ('team_num', 1), ('team_woman_num', 1), ('template', 1), ('text', 1), ('textfield21', 1), ('textfield22', 1), ('textfield23', 1), ('textfield24', 1), ('textfield25', 1), ('textfield26', 1), ('textfield61', 1), ('textfield71', 1), ('textfield81', 1), ('topic_gcwmember', 1), ('topic_gcwteam', 1), ('userdb', 1), ('users', 1), ('utf8', 1), ('util', 1), ('views', 1), ('while', 1), ('wohnort3', 1), ('works_long', 1), ('works_name', 1), ('works_type', 1), ('writer', 1), ('writerow', 1), ('writerows', 1)]

评论关闭