从 NER 获取全名

通过阅读文档和使用 API,看起来 CoreNLP 会告诉我每个标记的 NER 标签,但它不会帮助我从句子中提取全名。例如:

Input: John Wayne and Mary have coffee
CoreNLP Output: (John,PERSON) (Wayne,PERSON) (and,O) (Mary,PERSON) (have,O) (coffee,O)
Desired Result: list of PERSON ==> [John Wayne, Mary]

除非我错过了一些标志,否则我相信要做到这一点,我将需要解析标记并将标记为 PERSON 的连续标记粘合在一起。

有人可以确认这确实是我需要做的吗?我主要想知道 CoreNLP 中是否有一些标志或实用程序可以为我做这样的事情。如果有人有实用程序(最好是 Java,因为我使用的是 Java API)可以执行此操作并希望分享,则可加分 :)

谢谢!

梦里花落0921
浏览 131回答 2
2回答

白板的微信

您可能正在寻找实体提及而不是 NER 标签。例如使用简单 API:new Sentence("Jimi Hendrix was the greatest").nerTags()[PERSON, PERSON, O, O, O]new Sentence("Jimi Hendrix was the greatest").mentions()[Jimi Hendrix]StanfordCoreNLP上面的链接有一个使用旧管道的传统非简单 API 的示例

qq_笑_17

这是完整的 Java API 示例,其中有一个关于实体提及的部分:import edu.stanford.nlp.coref.data.CorefChain;import edu.stanford.nlp.ling.*;import edu.stanford.nlp.ie.util.*;import edu.stanford.nlp.pipeline.*;import edu.stanford.nlp.semgraph.*;import edu.stanford.nlp.trees.*;import java.util.*;public class BasicPipelineExample {  public static String text = "Joe Smith was born in California. " +      "In 2017, he went to Paris, France in the summer. " +      "His flight left at 3:00pm on July 10th, 2017. " +      "After eating some escargot for the first time, Joe said, \"That was delicious!\" " +      "He sent a postcard to his sister Jane Smith. " +      "After hearing about Joe's trip, Jane decided she might go to France one day.";  public static void main(String[] args) {    // set up pipeline properties    Properties props = new Properties();    // set the list of annotators to run    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,depparse,coref,kbp,quote");    // set a property for an annotator, in this case the coref annotator is being set to use the neural algorithm    props.setProperty("coref.algorithm", "neural");    // build pipeline    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);    // create a document object    CoreDocument document = new CoreDocument(text);    // annnotate the document    pipeline.annotate(document);    // examples    // 10th token of the document    CoreLabel token = document.tokens().get(10);    System.out.println("Example: token");    System.out.println(token);    System.out.println();    // text of the first sentence    String sentenceText = document.sentences().get(0).text();    System.out.println("Example: sentence");    System.out.println(sentenceText);    System.out.println();    // second sentence    CoreSentence sentence = document.sentences().get(1);    // list of the part-of-speech tags for the second sentence    List<String> posTags = sentence.posTags();    System.out.println("Example: pos tags");    System.out.println(posTags);    System.out.println();    // list of the ner tags for the second sentence    List<String> nerTags = sentence.nerTags();    System.out.println("Example: ner tags");    System.out.println(nerTags);    System.out.println();    // constituency parse for the second sentence    Tree constituencyParse = sentence.constituencyParse();    System.out.println("Example: constituency parse");    System.out.println(constituencyParse);    System.out.println();    // dependency parse for the second sentence    SemanticGraph dependencyParse = sentence.dependencyParse();    System.out.println("Example: dependency parse");    System.out.println(dependencyParse);    System.out.println();    // kbp relations found in fifth sentence    List<RelationTriple> relations =        document.sentences().get(4).relations();    System.out.println("Example: relation");    System.out.println(relations.get(0));    System.out.println();    // entity mentions in the second sentence    List<CoreEntityMention> entityMentions = sentence.entityMentions();    System.out.println("Example: entity mentions");    System.out.println(entityMentions);    System.out.println();    // coreference between entity mentions    CoreEntityMention originalEntityMention = document.sentences().get(3).entityMentions().get(1);    System.out.println("Example: original entity mention");    System.out.println(originalEntityMention);    System.out.println("Example: canonical entity mention");    System.out.println(originalEntityMention.canonicalEntityMention().get());    System.out.println();    // get document wide coref info    Map<Integer, CorefChain> corefChains = document.corefChains();    System.out.println("Example: coref chains for document");    System.out.println(corefChains);    System.out.println();    // get quotes in document    List<CoreQuote> quotes = document.quotes();    CoreQuote quote = quotes.get(0);    System.out.println("Example: quote");    System.out.println(quote);    System.out.println();    // original speaker of quote    // note that quote.speaker() returns an Optional    System.out.println("Example: original speaker of quote");    System.out.println(quote.speaker().get());    System.out.println();    // canonical speaker of quote    System.out.println("Example: canonical speaker of quote");    System.out.println(quote.canonicalSpeaker().get());    System.out.println();  }}
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Java