无法将“字符串”列表转换为 tf.Dataset.from_tensor_slicer()

我有以下数据:


partial_x_train_features = [

    [b'south pago pago victor mclaglen jon hall frances farmer olympe bradna gene lockhart douglass dumbrille francis ford ben welden abner biberman pedro cordoba rudy robles bobby stone nellie duran james flavin nina campana alfred e green treasure hunt adventure adventure'],

    [b'easy virtue jessica biel ben barnes kristin scott thomas colin firth kimberley nixon katherine parkinson kris marshall christian brassington charlotte riley jim mcmanus pip torrens jeremy hooton joanna bacon maggie hickey georgie glen stephan elliott young englishman marry glamorous american brings home meet parent arrive like blast future blow entrenched british stuffiness window comedy romance'],

    [b'fragments antonin gregori derangere anouk grinberg aurelien recoing niels arestrup yann collette laure duthilleul david assaraf pascal demolon jean baptiste iera richard sammel vincent crouzet fred epaud pascal elso nicolas giraud michael abiteboul gabriel le bomin psychiatrist probe mind traumatized soldier attempt unlock secret drove gentle deeply disturbed world war veteran edge insanity drama war'],

    [b'milka film taboos milka elokuva tabuista irma huntus leena suomu matti turunen eikka lehtonen esa niemela sirkka metsasaari tauno lehtihalmes ulla tapaninen toivo tuomainen hellin auvinen salmi rauni mollberg small finnish lapland community milka innocent year old girl live mother miss dead father prays god love haymaking employ drama'],

    [b'sleeping car david naughton judie aronson kevin mccarthy jeff conaway dani minnick ernestine mercer john carl buechler gary brockette steve lundquist billy stevenson michael scott bicknell david coburn nicole hansen tiffany million robert ruth douglas curtis jason david naughton move abandon train car resurrect vicious ghost landlady dead husband mister near fatal encounter comedy horror']]


我知道演员不是大小相同的数组,但搜索几个类似的问题(即question1question2)无法解决我的问题。

如果您想复制该问题,也请关注我的colab notebook ,如果我遗漏了任何重复的问题,请在评论中写下。


万千封印
浏览 120回答 2
2回答

长风秋雁

您需要将这些字符串转换为向量,并将它们填充为相等的长度。我将向您展示一个示例partial_x_train_actors_array:import tensorflow as tfpartial_x_train_actors_array = [b'victor mclaglen', b'jon hall', b'frances farmer',&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; b'olympe bradna', b'gene lockhart', b'douglass dumbrille',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; b'francis ford', b'ben welden', b'abner biberman',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; b'pedro de cordoba', b'rudy robles', b'bobby stone',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; b'nellie duran', b'james flavin', b'nina campana']tok = tf.keras.preprocessing.text.Tokenizer(char_level=True)tok.fit_on_texts(partial_x_train_actors_array)seq = tok.texts_to_sequences(partial_x_train_actors_array)这seq看起来像:[[20, 10, 11, 16, 7, 4, 5, 12, 11, 6, 1, 17, 6, 2, 3],&nbsp;[21, 7, 3, 5, 22, 1, 6, 6],&nbsp;[14, 4, 1, 3, 11, 2, 13, 5, 14, 1, 4, 12, 2, 4],&nbsp;[7, 6, 18, 12, 19, 2, 5, 8, 4, 1, 9, 3, 1],&nbsp;[17, 2, 3, 2, 5, 6, 7, 11, 28, 22, 1, 4, 16],&nbsp;[9, 7, 15, 17, 6, 1, 13, 13, 5, 9, 15, 12, 8, 4, 10, 6, 6, 2],&nbsp;[14, 4, 1, 3, 11, 10, 13, 5, 14, 7, 4, 9],&nbsp;[8, 2, 3, 5, 29, 2, 6, 9, 2, 3],&nbsp;[1, 8, 3, 2, 4, 5, 8, 10, 8, 2, 4, 12, 1, 3],&nbsp;[19, 2, 9, 4, 7, 5, 9, 2, 5, 11, 7, 4, 9, 7, 8, 1],&nbsp;[4, 15, 9, 18, 5, 4, 7, 8, 6, 2, 13],&nbsp;[8, 7, 8, 8, 18, 5, 13, 16, 7, 3, 2],&nbsp;[3, 2, 6, 6, 10, 2, 5, 9, 15, 4, 1, 3],&nbsp;[21, 1, 12, 2, 13, 5, 14, 6, 1, 20, 10, 3],&nbsp;[3, 10, 3, 1, 5, 11, 1, 12, 19, 1, 3, 1]]然后,将序列填充为等长:padded = tf.keras.preprocessing.sequence.pad_sequences(seq)array([[ 0,&nbsp; 0,&nbsp; 0, 20, 10, 11, 16,&nbsp; 7,&nbsp; 4,&nbsp; 5, 12, 11,&nbsp; 6,&nbsp; 1, 17,&nbsp; 6,&nbsp; 2,&nbsp; 3],&nbsp; &nbsp; &nbsp; &nbsp;[ 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0, 21,&nbsp; 7,&nbsp; 3,&nbsp; 5, 22,&nbsp; 1,&nbsp; 6,&nbsp; 6],&nbsp; &nbsp; &nbsp; &nbsp;[ 0,&nbsp; 0,&nbsp; 0,&nbsp; 0, 14,&nbsp; 4,&nbsp; 1,&nbsp; 3, 11,&nbsp; 2, 13,&nbsp; 5, 14,&nbsp; 1,&nbsp; 4, 12,&nbsp; 2,&nbsp; 4],&nbsp; &nbsp; &nbsp; &nbsp;[ 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 7,&nbsp; 6, 18, 12, 19,&nbsp; 2,&nbsp; 5,&nbsp; 8,&nbsp; 4,&nbsp; 1,&nbsp; 9,&nbsp; 3,&nbsp; 1],&nbsp; &nbsp; &nbsp; &nbsp;[ 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0, 17,&nbsp; 2,&nbsp; 3,&nbsp; 2,&nbsp; 5,&nbsp; 6,&nbsp; 7, 11, 28, 22,&nbsp; 1,&nbsp; 4, 16],&nbsp; &nbsp; &nbsp; &nbsp;[ 9,&nbsp; 7, 15, 17,&nbsp; 6,&nbsp; 1, 13, 13,&nbsp; 5,&nbsp; 9, 15, 12,&nbsp; 8,&nbsp; 4, 10,&nbsp; 6,&nbsp; 6,&nbsp; 2],&nbsp; &nbsp; &nbsp; &nbsp;[ 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0, 14,&nbsp; 4,&nbsp; 1,&nbsp; 3, 11, 10, 13,&nbsp; 5, 14,&nbsp; 7,&nbsp; 4,&nbsp; 9],&nbsp; &nbsp; &nbsp; &nbsp;[ 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 8,&nbsp; 2,&nbsp; 3,&nbsp; 5, 29,&nbsp; 2,&nbsp; 6,&nbsp; 9,&nbsp; 2,&nbsp; 3],&nbsp; &nbsp; &nbsp; &nbsp;[ 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 1,&nbsp; 8,&nbsp; 3,&nbsp; 2,&nbsp; 4,&nbsp; 5,&nbsp; 8, 10,&nbsp; 8,&nbsp; 2,&nbsp; 4, 12,&nbsp; 1,&nbsp; 3],&nbsp; &nbsp; &nbsp; &nbsp;[ 0,&nbsp; 0, 19,&nbsp; 2,&nbsp; 9,&nbsp; 4,&nbsp; 7,&nbsp; 5,&nbsp; 9,&nbsp; 2,&nbsp; 5, 11,&nbsp; 7,&nbsp; 4,&nbsp; 9,&nbsp; 7,&nbsp; 8,&nbsp; 1],&nbsp; &nbsp; &nbsp; &nbsp;[ 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 4, 15,&nbsp; 9, 18,&nbsp; 5,&nbsp; 4,&nbsp; 7,&nbsp; 8,&nbsp; 6,&nbsp; 2, 13],&nbsp; &nbsp; &nbsp; &nbsp;[ 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 8,&nbsp; 7,&nbsp; 8,&nbsp; 8, 18,&nbsp; 5, 13, 16,&nbsp; 7,&nbsp; 3,&nbsp; 2],&nbsp; &nbsp; &nbsp; &nbsp;[ 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 3,&nbsp; 2,&nbsp; 6,&nbsp; 6, 10,&nbsp; 2,&nbsp; 5,&nbsp; 9, 15,&nbsp; 4,&nbsp; 1,&nbsp; 3],&nbsp; &nbsp; &nbsp; &nbsp;[ 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0, 21,&nbsp; 1, 12,&nbsp; 2, 13,&nbsp; 5, 14,&nbsp; 6,&nbsp; 1, 20, 10,&nbsp; 3],&nbsp; &nbsp; &nbsp; &nbsp;[ 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 0,&nbsp; 3, 10,&nbsp; 3,&nbsp; 1,&nbsp; 5, 11,&nbsp; 1, 12, 19,&nbsp; 1,&nbsp; 3,&nbsp; 1]])最后:ds = tf.data.Dataset.from_tensor_slices(padded)next(iter(ds))<tf.Tensor: shape=(18,), dtype=int32, numpy=array([ 0,&nbsp; 0,&nbsp; 0, 20, 10, 11, 16,&nbsp; 7,&nbsp; 4,&nbsp; 5, 12, 11,&nbsp; 6,&nbsp; 1, 17,&nbsp; 6,&nbsp; 2,&nbsp; &nbsp; &nbsp; &nbsp; 3])>如果出于任何原因,您需要所有输入(不仅仅是partial_x_train_actors_array)具有相同的填充形状,您可以使用该maxlen参数。

精慕HU

其中一个数据数组(即partial_x_train_actors_array)的元素沿第二个维度具有不同的长度(这就是错误抱怨没有矩形的原因)。因此,您应该使它们具有相同的大小(例如通过填充或截断),或者使用结构RaggedTensor(doc、guide)来存储和处理它:partial_x_train_actors_array&nbsp;=&nbsp;tf.ragged.constant(...)在您希望按原样获取数据并使用tf.data.DatasetAPI(例如内部map方法)对其执行自定义或复杂处理的情况下,后一种方法特别有用和高效。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python