有没有办法在 tf.data.Dataset w/tf.py_func 中传递字典?

我在数据处理中使用 tf.data.Dataset,我想用 tf.py_func 应用一些 python 代码。


顺便说一句,我发现在 tf.py_func 中,我无法返回字典。有没有办法做到这一点或解决方法?


我有如下所示的代码


def map_func(images, labels):

    """mapping python function"""

    # do something

    # cannot be expressed as a tensor graph

    return {

        'images': images,

        'labels': labels,

        'new_key': new_value}

def tf_py_func(images, labels):

    return tf.py_func(map_func, [images, labels], [tf.uint8, tf.string], name='blah')


return dataset.map(tf_py_func)

===========================================================================


已经有一段时间了,我忘记我问过这个问题了。我以另一种方式解决了它,它是如此简单,以至于我觉得我几乎是个傻瓜。问题是:

  1. tf.py_func 不能返回字典。

  2. dataset.map 可以返回字典。

答案是:映射两次。


def map_func(images, labels):

    """mapping python function"""

    # do something

    # cannot be expressed as a tensor graph

    return processed_images, processed_labels


def tf_py_func(images, labels):

    return tf.py_func(map_func, [images, labels], [tf.uint8, tf.string], name='blah')


def _to_dict(images, labels):

    return { 'images': images, 'labels': labels }


return dataset.map(tf_py_func).map(_to_dict)


交互式爱情
浏览 172回答 2
2回答

MYYA

您可以将字典转换为返回的字符串,然后拆分为字典。这可能看起来像这样:return (images + " " + labels + " " + new value)然后在您的其他功能中:l = map_func(image, label).split(" ")d['images'] = l[0]d[...

小唯快跑啊

我也遇到过这个问题(我想使用非 TF 函数预处理文本数据,但将所有内容都保留在 Tensorflow 的 Dataset 对象的保护伞下)。事实上,不需要双重map()解决方法。在处理每个示例时,只需嵌入 Python 函数。这是完整的示例代码;也在 colab 上进行了测试(前两行用于安装依赖项)。!pip install tensorflow-gpu==2.0.0b1!pip install tensorflow-datasets==1.0.2from typing import Dictimport tensorflow as tfimport tensorflow_datasets as tfds# Get a textual dataset using the 'tensorflow_datasets' librarydataset_builder = tfds.text.IMDBReviews()dataset_builder.download_and_prepare()# Do not randomly shuffle examples for demonstration purposesds = dataset_builder.as_dataset(shuffle_files=False)training_ds = ds[tfds.Split.TRAIN]print(training_ds)# <_OptionsDataset shapes: {text: (), label: ()}, types: {text: tf.string,&nbsp;# label: tf.int64}># Print the first training examplefor example in training_ds.take(1):&nbsp; &nbsp; print(example['text'])&nbsp; &nbsp; # tf.Tensor(b"As a lifelong fan of Dickens, I have ... realised.",&nbsp; &nbsp; # shape=(), dtype=string)# some global configuration or object which we want to access in the# processing functionwe_want_upper_case = Truedef process_string(t: tf.Tensor) -> str:&nbsp; &nbsp; # This function must have been called as tf.py_function which means&nbsp; &nbsp; # it's always eagerly executed and we can access the .numpy() content&nbsp; &nbsp; string_content = t.numpy().decode('utf-8')&nbsp; &nbsp; # Now we can do what we want in Python, i.e. upper-case or lower-case&nbsp; &nbsp; # depending on the external parameter.&nbsp; &nbsp; # Note that 'we_want_upper_case' is a variable defined in the outer scope&nbsp; &nbsp; # of the function! We cannot pass non-Tensor objects as parameters here.&nbsp; &nbsp; if we_want_upper_case:&nbsp; &nbsp; &nbsp; &nbsp; return string_content.upper()&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; return string_content.lower()def process_example(example: Dict[str, tf.Tensor]) -> Dict[str, tf.Tensor]:&nbsp; &nbsp; # I'm using typing (Dict, etc.) just for clarity, it's not necessary&nbsp; &nbsp; result = {}&nbsp; &nbsp; # First, simply copy all the tensor values&nbsp; &nbsp; for key in example:&nbsp; &nbsp; &nbsp; &nbsp; result[key] = tf.identity(example[key])&nbsp; &nbsp; # Now let's process the 'text' Tensor.&nbsp; &nbsp; # Call the 'process_string' function as 'tf.py_function'. Make sure the&nbsp; &nbsp; # output type matches the 'Tout' parameter (string and tf.string).&nbsp; &nbsp; # The inputs must be in a list: here we pass the string-typed Tensor 'text'.&nbsp; &nbsp; result['text'] = tf.py_function(func=process_string,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; inp=[example['text']],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Tout=tf.string)&nbsp; &nbsp; return result# We can call the 'map' function which consumes and produces dictionariestraining_ds = training_ds.map(lambda x: process_example(x))for example in training_ds.take(1):&nbsp; &nbsp; print(example['text'])&nbsp; &nbsp; # tf.Tensor(b"AS A LIFELONG FAN OF DICKENS, I HAVE ...&nbsp; REALISED.",&nbsp; &nbsp; # shape=(), dtype=string)
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python