将生成器拆分为多个块,而无需预先遍历

(这个问题是关系到这一个和这一个,但这些都是预先行走发电机,而这正是我想避免的)


我想将生成器拆分为多个块。要求是:


不要填充数据块:如果剩余元素的数量小于数据块大小,则最后一个数据块必须较小。

不要事先遍历生成器:计算元素是昂贵的,并且只能由使用函数来完成,而不是由分块器来完成

这当然意味着:不要在内存中累积(无列表)

我尝试了以下代码:


def head(iterable, max=10):

    for cnt, el in enumerate(iterable):

        yield el

        if cnt >= max:

            break


def chunks(iterable, size=10):

    i = iter(iterable)

    while True:

        yield head(i, size)


# Sample generator: the real data is much more complex, and expensive to compute

els = xrange(7)


for n, chunk in enumerate(chunks(els, 3)):

    for el in chunk:

        print 'Chunk %3d, value %d' % (n, el)

这以某种方式起作用:


Chunk   0, value 0

Chunk   0, value 1

Chunk   0, value 2

Chunk   1, value 3

Chunk   1, value 4

Chunk   1, value 5

Chunk   2, value 6

^CTraceback (most recent call last):

  File "xxxx.py", line 15, in <module>

    for el in chunk:

  File "xxxx.py", line 2, in head

    for cnt, el in enumerate(iterable):

KeyboardInterrupt

Buuuut ...它永远不会停止(我必须按下^C)while True。每当生成器被耗尽时,我都想停止该循环,但是我不知道如何检测到这种情况。我试图提出一个异常:


class NoMoreData(Exception):

    pass


def head(iterable, max=10):

    for cnt, el in enumerate(iterable):

        yield el

        if cnt >= max:

            break

    if cnt == 0 : raise NoMoreData()


def chunks(iterable, size=10):

    i = iter(iterable)

    while True:

        try:

            yield head(i, size)

        except NoMoreData:

            break


# Sample generator: the real data is much more complex, and expensive to compute    

els = xrange(7)


for n, chunk in enumerate(chunks(els, 2)):

    for el in chunk:

        print 'Chunk %3d, value %d' % (n, el)

但是然后仅在使用方的上下文中引发异常,这不是我想要的(我想保持使用方代码干净)


Chunk   0, value 0

Chunk   0, value 1

Chunk   0, value 2

Chunk   1, value 3

Chunk   1, value 4

Chunk   1, value 5

Chunk   2, value 6

Traceback (most recent call last):

  File "xxxx.py", line 22, in <module>

    for el in chunk:

  File "xxxx.py", line 9, in head

    if cnt == 0 : raise NoMoreData

__main__.NoMoreData()

我如何在chunks不走动的情况下检测发电机是否在功能中耗尽?


绝地无双
浏览 750回答 3
3回答

呼啦一阵风

一种方法是先查看第一个元素(如果有),然后创建并返回实际的生成器。def head(iterable, max=10):&nbsp; &nbsp; first = next(iterable)&nbsp; &nbsp; &nbsp; # raise exception when depleted&nbsp; &nbsp; def head_inner():&nbsp; &nbsp; &nbsp; &nbsp; yield first&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# yield the extracted first element&nbsp; &nbsp; &nbsp; &nbsp; for cnt, el in enumerate(iterable):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield el&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if cnt + 1 >= max:&nbsp; # cnt + 1 to include first&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break&nbsp; &nbsp; return head_inner()只需在chunk生成器中使用它,并StopIteration像处理自定义异常一样捕获异常即可。更新:这是另一个版本,itertools.islice用于替换大部分head功能和一个for循环。这个简单for的事实,循环做同样的事情为笨重的while-try-next-except-break原代码构造,所以结果是很多的可读性。def chunks(iterable, size=10):&nbsp; &nbsp; iterator = iter(iterable)&nbsp; &nbsp; for first in iterator:&nbsp; &nbsp; # stops when iterator is depleted&nbsp; &nbsp; &nbsp; &nbsp; def chunk():&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # construct generator for next chunk&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield first&nbsp; &nbsp; &nbsp; &nbsp;# yield element from for loop&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for more in islice(iterator, size - 1):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; yield more&nbsp; &nbsp; # yield more elements from the iterator&nbsp; &nbsp; &nbsp; &nbsp; yield chunk()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# in outer generator, yield next chunk使用itertools.chain替换内部生成器,我们可以得到比这更短的代码:def chunks(iterable, size=10):&nbsp; &nbsp; iterator = iter(iterable)&nbsp; &nbsp; for first in iterator:&nbsp; &nbsp; &nbsp; &nbsp; yield chain([first], islice(iterator, size - 1))

30秒到达战场

由于(在CPython中)使用了纯C级内置函数,因此我可以提出最快的解决方案。这样,就不需要Python字节码来生成每个块(除非在Python中实现了底层生成器),这具有巨大的性能优势。它确实在返回每个块之前先遍历了每个块,但是它没有对要返回的块进行任何预遍历:# Py2 only to get generator based mapfrom future_builtins import mapfrom itertools import islice, repeat, starmap, takewhile# operator.truth is *significantly* faster than bool for the case of# exactly one positional argumentfrom operator import truthdef chunker(n, iterable):&nbsp; # n is size of each chunk; last chunk may be smaller&nbsp; &nbsp; return takewhile(truth, map(tuple, starmap(islice, repeat((iter(iterable), n)))))由于有点密集,请展开图进行说明:def chunker(n, iterable):&nbsp; &nbsp; iterable = iter(iterable)&nbsp; &nbsp; while True:&nbsp; &nbsp; &nbsp; &nbsp; x = tuple(islice(iterable, n))&nbsp; &nbsp; &nbsp; &nbsp; if not x:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return&nbsp; &nbsp; &nbsp; &nbsp; yield x包装对chunkerin 的调用enumerate将使您可以根据需要对块进行编号。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python