了解 ELMo 的演示次数

我正在尝试使用 ELMo，只需将其用作更大的 PyTorch 模型的一部分。此处给出了一个基本示例。

这是一个 torch.nn.Module 子类，它计算任意数量的 ELMo 表示并为每个表示引入可训练的标量权重。例如，此代码片段计算两层表示（如我们论文中的 SNLI 和 SQuAD 模型）：

from allennlp.modules.elmo import Elmo, batch_to_ids

options_file = "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_options.json"

weight_file = "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5"

# Compute two different representation for each token.

# Each representation is a linear weighted combination for the

# 3 layers in ELMo (i.e., charcnn, the outputs of the two BiLSTM))

elmo = Elmo(options_file, weight_file, 2, dropout=0)

# use batch_to_ids to convert sentences to character ids

sentences = [['First', 'sentence', '.'], ['Another', '.']]

character_ids = batch_to_ids(sentences)

embeddings = elmo(character_ids)

# embeddings['elmo_representations'] is length two list of tensors.

# Each element contains one layer of ELMo representations with shape

# (2, 3, 1024).

# 2 - the batch size

# 3 - the sequence length of the batch

# 1024 - the length of each ELMo vector

我的问题涉及“陈述”。你能将它们与普通的 word2vec 输出层进行比较吗？您可以选择将返回多少ELMo（增加第 n 维），但是这些生成的表示之间有什么区别以及它们的典型用途是什么？

给你一个想法，对于上面的代码，embeddings['elmo_representations']返回两个项目（两个表示层）的列表，但它们是相同的。

简而言之，如何定义 ELMo 中的“表示”？

达令说

浏览 442回答 1

1回答

肥皂起泡泡

请参阅原始论文的第 3.2 节。ELMo 是 biLM 中中间层表示的任务特定组合。对于每个令牌，L 层 biLM 计算一组 2L+1 个表示之前在第 3.1 节中说：最近最先进的神经语言模型计算与上下文无关的标记表示（通过标记嵌入或字符上的 CNN），然后将其通过 L 层的前向 LSTM。在每个位置 k，每个 LSTM 层输出一个上下文相关的表示。顶层 LSTM 输出用于通过 Softmax 层预测下一个标记。为了回答您的问题，这些表示是这些基于 LSTM 的上下文相关表示。

0 0

随时随地看视频慕课网APP