如何使用 PHP 修复文本中的单词计数错误?

我可以从 .DOC/.DOCX 文件中获得文本/纯文本输出。我只想用 PHP 计算此输出的单词数(无数字或标点符号)并显示在 HTML 页面中。所以我有这个:


<button type="button" id="load" class="btn btn-md btn-info">LOAD FILES</button>

<br>

<div id="result"></div>


<script src="../vendors/jquery/dist/jquery.min.js"></script>

<script src="https://static.filestackapi.com/v3/filestack.js"></script>

<script>


    function numWordsR(urlk){ 

        $.post("result_filestack.php",{

            molk: urlk //urlk, example: https://process.filestackapi.com/output=format:txt/AXXXXAXeeeeW33A";

        }).done(function(resp){

            $("#result").html(resp);

        });

    }

</script>

我的文件result_filestack.php:


$url = $_POST['molk'];

$content = file_get_contents($url); //get txt/plain output content

$onlywords = preg_replace('/[[:punct:]\d]+/', '', $content); //no numbers nor punctuation symbols


function get_num_of_words($string) {

   $string = preg_replace('/\s+/', ' ', trim($string));

   $words = explode(" ", $string);

   return count($words);

}


$numwords = get_num_of_words($onlywords);

echo "<b>TEXT:</b>: ".$onlywords."<br><br>Number of words: ".$numwords;

我得到这个结果:

http://img3.mukewang.com/61a096b80001f52219200314.jpg

例如,在这种情况下,结果显示文本中有 585 个单词,但是如果我将该文本复制并粘贴到 MS Word 中,它会显示 612 个单词。我更改 PHP 代码以映射文本数组:


function get_text($string) {

 $string = preg_replace('/\s+/', ' ', trim($string));

 $words = explode(" ", $string);

 return $words;

}


$texto002 = get_text($onlywords);

echo print_r($texto002);

我注意到数词有错误,有的地方把两三个词合二为一:

http://img3.mukewang.com/61a096c90001fee519210562.jpg

我该如何解决?



慕雪6442864
浏览 167回答 1
1回答

BIG阳

这可能是因为空格不是常规空格而是特殊字符,前一段时间经历过这种情况,在爆炸常规空格之前,我用空格替换了实体function get_num_of_words($string) {&nbsp; &nbsp;$string = preg_replace('/\s+/', ' ', trim($string));&nbsp; &nbsp;$string = str_replace("&nbsp;", " ", $string);&nbsp; &nbsp;$string = str_replace("&#160;", " ", $string);&nbsp; &nbsp;$words = explode(" ", $string);&nbsp; &nbsp;return count($words);}
打开App,查看更多内容
随时随地看视频慕课网APP