php - PHP 根据搜索输入减少大型文本集
问题描述
我不完全确定我应该如何措辞,但是.. 我怎样才能将一大段文本缩减为与搜索相关的信息。
例如说我有一个段落,我的搜索是efficitur eget
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec placerat libero id mi facilisis, at sagittis tortor porta. Donec eget sodales ipsum. Donec sagittis lacus mauris, et efficitur quam porttitor eu. Fusce eget consequat purus. Maecenas rutrum arcu viverra est rhoncus, et hendrerit tellus elementum. Aenean ornare dolor tempus ante porta, sit amet convallis lacus rutrum. Maecenas bibendum magna tortor. Vestibulum tortor nunc, dictum vitae nisl quis, pharetra mattis massa. Vestibulum vulputate leo eros, eget maximus ipsum tristique quis. Quisque rutrum vel felis eget feugiat. Etiam interdum nisi ac nibh egestas malesuada. Mauris fringilla nisi id rutrum fermentum. Ut ultrices ipsum rutrum, hendrerit urna non, dapibus ligula. Vivamus rhoncus eros eget eros feugiat volutpat. In ac arcu at purus porta varius. Sed commodo diam a ipsum vestibulum, et sagittis sem consectetur.
是否可以轻松地将文本缩减为包含efficitur
and的单个句子eget
,而不显示整个段落
... Donec sagittis lacus mauris, et efficitur quam porttitor eu. Fusce eget consequat purus. Maecenas rutrum ...
目前我的puesdo想法是
// Find strpos of search words
// Make positions unique
// Find words closest together within X characters
// Allow for words on LEFT and RIGHT of keyword
// .. Continue until every keyword has lapsed
// Add "dots" to LEFT or/and RIGHT of the result
// implode
// return
但是,如果这已经完成,或者 PHP 是否具有执行此操作的功能,我宁愿不重新发明轮子。
解决方案
我已经编写了自己的函数,它将大段落转换为小句子
function reduce_max_word_contents($content, $keywords, $exact, $max_words, $dots)
{
if (is_array($keywords) == false) {
$keywords = (array) $keywords;
$keywords = array_filter($keywords);
}
$format_content = $content;
$format_content = trim($format_content);
if (empty($format_content)) {
// trigger_error("No Content Given");
return "";
}
if (empty($keywords)) {
// trigger_error("No Keywords Given");
return $format_content;
}
if (!$max_words) {
// trigger_error("No Max Words Set");
return $format_content;
}
$format_content_word_s = $format_content;
$format_content_word_s = explode(' ', $format_content_word_s);
$format_content_word_s = (array) $format_content_word_s;
if (empty($format_content_word_s)) {
// trigger_error("No Words Given");
return $format_content;
}
$words_exceed_max = true;
$words_exceed_max = ($words_exceed_max && !empty($format_content_word_s));
$words_exceed_max = ($words_exceed_max && (count($format_content_word_s) > $max_words));
$words_exceed_max = (bool) $words_exceed_max;
if (!$words_exceed_max) {
return $format_content;
}
$format_lower_words = $format_content_word_s;
$format_lower_words = array_map('strtolower', $format_lower_words);
$format_lower_words = array_map('trim', $format_lower_words);
$format_lower_words = (array) $format_lower_words;
if (empty($format_lower_words)) {
return $format_content;
}
$keyword_indexes = array();
foreach ($keywords as $key => $keyword) {
$keyword_lower = $keyword;
$keyword_lower = trim($keyword_lower);
$keyword_lower = strtolower($keyword_lower);
$keyword_pos = false;
if ($exact) {
$keyword_pos = array_search($keyword_lower, $format_lower_words);
} else {
foreach ($format_lower_words as $f_key => $f_word) {
$f_is_match = true;
$f_is_match = ($f_is_match && strstr($f_word, $keyword_lower));
$f_is_match = (bool) $f_is_match;
if ($f_is_match) {
$keyword_pos = $f_key;
break;
}
}
}
if (is_numeric($keyword_pos) == false) {
continue;
}
$keyword_indexes[$key] = $keyword_pos;
}
if (empty($keyword_indexes)) {
return $format_content;
}
$keyword_side_s = array();
foreach (array_keys($keyword_indexes) as $k_key => $k_index) {
$k_position = $keyword_indexes[$k_index];
$k_position = intval($k_position);
$left_slice = array();
$left_slice['offset'] = $k_position > $max_words ? $k_position - $max_words : 0;
$left_slice['len'] = $k_position > $max_words ? $max_words : $k_position;
if ($k_position > 0) {
$array_left = array_slice($format_content_word_s, $left_slice['offset'], $left_slice['len'], true);
$array_left = (array) $array_left;
} else {
$array_left = array();
}
$right_slice = array();
$right_slice['offset'] = $k_position + 1;
$right_slice['len'] = $max_words - 1;
$array_right = array_slice($format_content_word_s, $right_slice['offset'], $right_slice['len'], true);
$array_right = (array) $array_right;
$keyword_sides = array();
$keyword_sides['left'] = $array_left;
$keyword_sides['right'] = $array_right;
$s_result = array();
$keywords_side_loop = array();
$keywords_side_loop = array_keys($keyword_indexes);
$keywords_side_loop = (array) $keywords_side_loop;
foreach ($keywords_side_loop as $x_key) {
$x_is_k = true;
$x_is_k = ($x_is_k && ($k_index == $x_key));
$x_is_k = (bool) $x_is_k;
if ($x_is_k) {
continue;
}
$x_key_pos = $keyword_indexes[$x_key];
foreach ($keyword_sides as $kw_s_key => $kw_s_values) {
if (array_key_exists($kw_s_key, $s_result)) {
continue;
}
$kw_s_is_valid = true;
$kw_s_is_valid = ($kw_s_is_valid && !empty($kw_s_values));
$kw_s_is_valid = ($kw_s_is_valid && !array_key_exists($x_key_pos, $kw_s_values));
$kw_s_is_valid = (bool) $kw_s_is_valid;
if ($kw_s_is_valid) {
$s_result[$kw_s_key] = $kw_s_values;
} else {
$s_result[$kw_s_key] = array();
}
}
}
if (empty($s_result)) {
$s_result = $keyword_sides;
}
$create_right_slice = true;
$create_right_slice = ($create_right_slice && empty($s_result['right']));
$create_right_slice = ($create_right_slice && isset($keyword_indexes[$k_index + 1]));
$create_right_slice = (bool) $create_right_slice;
// $create_right_slice = true; // good debug point
if ($create_right_slice) {
$right_word_slice = array_slice($format_content_word_s, $k_position + 1, $keyword_indexes[$k_index + 1] - 1, true);
$right_word_slice = (array) $right_word_slice;
} else {
$right_word_slice = array();
}
if ($right_word_slice && !empty($right_word_slice)) {
$s_result['connect'] = $right_word_slice;
} else {
$s_result['connect'] = array();
}
$keyword_side_s[$k_position] = $s_result;
}
if (empty($keyword_side_s)) {
return $format_content;
}
$first_key = $keyword_side_s;
reset($first_key);
$first_key = key($first_key);
$keyword_side_s_keys = array();
$keyword_side_s_keys['start'] = $first_key;
$keyword_side_s_keys['end'] = array_pop(array_keys($keyword_side_s));
$keyword_result_s = array();
foreach (array_keys($keyword_side_s) as $ks_key => $ks_position) {
$ks_sides = $keyword_side_s[$ks_position];
$ks_sides = (array) $ks_sides;
$section_left_dots = !empty($keyword_result_s) ? $dots : "";
$section_left_dots = (string) $section_left_dots;
$section_right_dots = array_keys($keyword_side_s);
$section_right_dots = isset($section_right_dots[$ks_key + 1]);
$section_right_dots = $section_right_dots ? $dots : "";
$section_right_dots = (string) $section_right_dots;
$ks_word = $format_content_word_s[$ks_position];
$ks_word = (string) $ks_word;
$keyword_section = array();
if (!empty($ks_sides['left'])) {
$keyword_section[] = $section_left_dots;
$keyword_section[] = implode(' ', $ks_sides['left']);
}
$keyword_section[] = $ks_word;
if (!empty($ks_sides['connect'])) {
$keyword_section[] = implode(' ', $ks_sides['connect']);
}
if (!empty($ks_sides['right'])) {
$keyword_section[] = implode(' ', $ks_sides['right']);
$keyword_section[] = $section_right_dots;
}
$keyword_section_s = $keyword_section;
$keyword_section_s = array_map('trim', $keyword_section_s);
$keyword_section_s = array_filter($keyword_section_s);
$keyword_section_s = (array) $keyword_section_s;
if (empty($keyword_section_s)) {
continue;
}
$keyword_result_s = array_merge($keyword_result_s, $keyword_section_s);
$keyword_result_s = (array) $keyword_result_s;
}
$keyword_result_str = $keyword_result_s;
$keyword_result_str = array_map('trim', $keyword_result_str);
$keyword_result_str = array_filter($keyword_result_str);
$keyword_result_str = array_unique($keyword_result_str);
$keyword_result_str = implode(' ', $keyword_result_str);
if (empty($keyword_result_str)) {
return $format_content;
}
if (!empty($keyword_side_s[$keyword_side_s_keys['start']]['left'])) {
$keyword_result_str = $dots . $keyword_result_str;
}
if (!empty($keyword_side_s[$keyword_side_s_keys['end']]['right'])) {
$keyword_result_str = $keyword_result_str . $dots;
}
return $keyword_result_str;
}
推荐阅读
- php - 使用超过 10.000 个数据使数据表加载更快
- python - 使用 python 从 Twitter 列表中获取成员
- keras - ValueError: ('session_kwargs 中的某些键目前不支持:%s', dict_keys(['class_mode'])
- kubernetes - 如何需要为 traefik 注释入口以生成letsencrypt证书
- angular - ExpressionChangedAfterItHasBeenCheckedError,有条件地设置元素属性时
- angular6 - 我们可以通过从服务中的 http 调用中获取值来动态地将 APP_BASE_HREF 设置为 Angular 6 吗?
- mule - 从 json 中排除一些已经映射到其中的键值
- regex - 正则表达式从不带空格的括号中获取名称
- amazon-web-services - Lambda 读取 DynamoDB 并发送到 ML Endpoint
- flexbox - 我可以改变 Bootstrap 放置列的方式吗?