首页 > 解决方案 > 如何在不有效破坏单词的情况下将句子分成固定长度的块?

问题描述

输入:“这个过程持续了几年,聋儿不会在一个月甚至两三年内使用最简单的日常交际的无数项目和表达,小听力的孩子从这些不断的轮换和模仿他的谈话中学到在他的家里听到模拟是我的,并提出了话题,并唤起了他自己思想的自发表达。”

CHUNK_SIZE:200,(假设它有 200 个字符长)。

输出:

[“这个过程持续了好几年,聋儿不会在这里一个月甚至两三年内用最简单的日常交往的无数项目和表达方式很少”,

“听力儿童从这些不断的旋转和模仿中学习他在家里听到的对话模拟的是我的,并提出了话题,并唤起了他自己的自发表达”,

“想法。”]

我知道这样做的一种方法是计算长度并检查我是否破坏了任何单词等等,但有人告诉我那是非常低效且不可取的......所以我在这里寻求帮助。

标签: javascriptalgorithm

解决方案


一种选择是使用正则表达式贪婪地匹配 200 个字符,并让它回溯,直到匹配的最后一个字符后面跟着一个空格字符或字符串的结尾:

const str = "This process was continued for several years for the deaf child does not here in a month or even in two or three years the numberless items and expressions using the simplest daily intercourse little hearing child learns from these constant rotation and imitation the conversation he hears in his home simulates is mine and suggest topics and called forth the spontaneous expression of his own thoughts.";
const chunks = str.match(/.{1,200}(?= |$)/g);
console.log(chunks);

如果您还想排除前导/尾随空格,请添加\S到匹配的开头和结尾:

const str = "This process was continued for several years for the deaf child does not here in a month or even in two or three years the numberless items and expressions using the simplest daily intercourse little hearing child learns from these constant rotation and imitation the conversation he hears in his home simulates is mine and suggest topics and called forth the spontaneous expression of his own thoughts.";
const chunks = str.match(/\S.{1,198}\S(?= |$)/g);
console.log(chunks);

要使用变量:

const chunkSize = 200;
const str = "This process was continued for several years for the deaf child does not here in a month or even in two or three years the numberless items and expressions using the simplest daily intercourse little hearing child learns from these constant rotation and imitation the conversation he hears in his home simulates is mine and suggest topics and called forth the spontaneous expression of his own thoughts.";
const chunks = str.match(new RegExp(String.raw`\S.{1,${chunkSize - 2}}\S(?= |$)`, 'g'));
console.log(chunks);

如果您还需要考虑只有一个字符的可能性,则不需要在模式中匹配两个或多个字符:

const chunkSize = 200;
const str = "This process was continued for several years for the deaf child does not here in a month or even in two or three years the numberless items and expressions using the simplest daily intercourse little hearing child learns from these constant rotation and imitation the conversation he hears in his home simulates is mine and suggest topics and called forth the spontaneous expression of his own thoughts.";
const chunks = str.match(new RegExp(String.raw`\S(?:.{0,${chunkSize - 2}}\S)?(?= |$)`, 'g'));
console.log(chunks);


推荐阅读