我正在使用php 7.3并且正在计算帖子的相似性。
<?php
$posts = [
'post_count' => 3,
'posts' => [
[
'ID' => 1,
'post_content' => "Wrong do point avoid by fruit learn or in death. So passage however besides invited comfort elderly be me. Walls began of child civil am heard hoped my. Satisfied pretended mr on do determine by.",
],
[
'ID' => 2,
'post_content' => "Lorem ipsum dolor sit"
],
[
'ID' => 3,
'post_content' => "Months on ye at by esteem desire warmth former. Sure that that way gave any fond now. His boy middleton sir nor engrossed affection excellent."
],
[
'ID' => 4,
'post_content' => "Lorem ipsum dolor sit"
],
]
];
print_r($posts);
function getNonSimilarTexts($posts)
{
$similarityPercentageArr = array();
for ($i = 0; $i <= $posts['post_count']; $i++) {
// $posts->the_post();
$currentPost = $posts['posts'][$i];
if (!is_null($currentPost['ID'])) {
for ($y = 0; $y <= $posts['post_count']; $y++) {
$comparePost = $posts['posts'][$y];
if (!is_null($comparePost['ID'])) {
similar_text(strip_tags($currentPost['post_content']), strip_tags($comparePost['post_content']), $perc);
// similarity is 100 if self compare
if ($perc != 100) {
array_push($similarityPercentageArr, [$currentPost['ID'], $comparePost['ID'], $perc]);
}
}
}
}
}
return $similarityPercentageArr;
}
$p = getNonSimilarTexts($posts);
print_r($p);
如您所见,我得到一个数组作为输出[[ID, ID, similarity_percentage],...]
我想过滤这个数组并去掉所有相似之处,>20%此外,我想只保留 1 个相似的帖子并删除 ohters。我想要的结果是帖子 ID:1,2,3
有什么建议如何过滤这样的数组吗?
慕森卡
大话西游666