我可以从 .DOC/.DOCX 文件中获得文本/纯文本输出。我只想用 PHP 计算此输出的单词数(无数字或标点符号)并显示在 HTML 页面中。所以我有这个:
<button type="button" id="load" class="btn btn-md btn-info">LOAD FILES</button>
<br>
<div id="result"></div>
<script src="../vendors/jquery/dist/jquery.min.js"></script>
<script src="https://static.filestackapi.com/v3/filestack.js"></script>
<script>
function numWordsR(urlk){
$.post("result_filestack.php",{
molk: urlk //urlk, example: https://process.filestackapi.com/output=format:txt/AXXXXAXeeeeW33A";
}).done(function(resp){
$("#result").html(resp);
});
}
</script>
我的文件result_filestack.php:
$url = $_POST['molk'];
$content = file_get_contents($url); //get txt/plain output content
$onlywords = preg_replace('/[[:punct:]\d]+/', '', $content); //no numbers nor punctuation symbols
function get_num_of_words($string) {
$string = preg_replace('/\s+/', ' ', trim($string));
$words = explode(" ", $string);
return count($words);
}
$numwords = get_num_of_words($onlywords);
echo "<b>TEXT:</b>: ".$onlywords."<br><br>Number of words: ".$numwords;
我得到这个结果:
例如,在这种情况下,结果显示文本中有 585 个单词,但是如果我将该文本复制并粘贴到 MS Word 中,它会显示 612 个单词。我更改 PHP 代码以映射文本数组:
function get_text($string) {
$string = preg_replace('/\s+/', ' ', trim($string));
$words = explode(" ", $string);
return $words;
}
$texto002 = get_text($onlywords);
echo print_r($texto002);
我注意到数词有错误,有的地方把两三个词合二为一:
我该如何解决?
BIG阳