如何拆分多个连接词?

我有大约1000个条目的数组,下面是示例:


wickedweather

liquidweather

driveourtrucks

gocompact

slimprojector

我希望能够将它们分为各自的词,例如:


wicked weather

liquid weather

drive our trucks

go compact

slim projector

我希望我能做到一个正则表达式。但是,由于我没有止境可言,因此我也没有可能要大写的任何大写字母,因此可能需要某种对字典的引用?


我想可以手工完成,但是为什么-什么时候可以用代码完成!=)但是,这让我感到难过。有任何想法吗?


守着一只汪
浏览 624回答 3
3回答

aluckdog

最好的工具是递归,而不是正则表达式。基本思想是从字符串的开头开始寻找一个单词,然后从字符串的其余部分开始寻找另一个单词,依此类推,直到到达字符串的末尾。递归解决方案是很自然的,因为当字符串的给定其余部分不能分解为一组单词时,需要进行回溯。下面的解决方案使用词典来确定什么是单词,并在找到它们时打印出解决方案(一些字符串可以分解为多个可能的单词组,例如wickedweather可以解析为“对我们不利”)。如果您只想要一组单词,则需要确定选择最佳单词的规则,#!/usr/bin/perluse strict;my $WORD_FILE = '/usr/share/dict/words'; #Change as neededmy %words; # Hash of words in dictionary# Open dictionary, load words into hashopen(WORDS, $WORD_FILE) or die "Failed to open dictionary: $!\n";while (<WORDS>) {&nbsp; chomp;&nbsp; $words{lc($_)} = 1;}close(WORDS);# Read one line at a time from stdin, break into wordswhile (<>) {&nbsp; chomp;&nbsp; my @words;&nbsp; find_words(lc($_));}sub find_words {&nbsp; # Print every way $string can be parsed into whole words&nbsp; my $string = shift;&nbsp; my @words = @_;&nbsp; my $length = length $string;&nbsp; foreach my $i ( 1 .. $length ) {&nbsp; &nbsp; my $word = substr $string, 0, $i;&nbsp; &nbsp; my $remainder = substr $string, $i, $length - $i;&nbsp; &nbsp; # Some dictionaries contain each letter as a word&nbsp; &nbsp; next if ($i == 1 && ($word ne "a" && $word ne "i"));&nbsp; &nbsp; if (defined($words{$word})) {&nbsp; &nbsp; &nbsp; push @words, $word;&nbsp; &nbsp; &nbsp; if ($remainder eq "") {&nbsp; &nbsp; &nbsp; &nbsp; print join(' ', @words), "\n";&nbsp; &nbsp; &nbsp; &nbsp; return;&nbsp; &nbsp; &nbsp; } else {&nbsp; &nbsp; &nbsp; &nbsp; find_words($remainder, @words);&nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; pop @words;&nbsp; &nbsp; }&nbsp; }&nbsp; return;}
打开App,查看更多内容
随时随地看视频慕课网APP