首页 > 解决方案 > NYSIIS 中 BOSCH 的正确编码是什么?

问题描述

NYSIIS 中 BOSCH 的正确编码是什么?我正在构建一个索引系统,该系统需要对名称拼写的细微差异保持稳健。

在 R 中测试该方法,产生“BAS”: require(phonics); nysiis('BOSCH')

Java 代码https://rosettacode.org/wiki/NYSIIS#Java产生“BA”。

而公共库 org.apache.commons.codec.language.Nysiis 类产生“B”。

正如网站http://www.dropby.com/NYSIIS.html所建议的那样,“BAS”对我来说是最正确的

以下是在https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/language/Nysiis.html上发布的规则

算法说明:

 1. Transcode first characters of name
   1a. MAC ->   MCC
   1b. KN  ->   NN
   1c. K   ->   C
   1d. PH  ->   FF
   1e. PF  ->   FF
   1f. SCH ->   SSS
 2. Transcode last characters of name
   2a. EE, IE          ->   Y
   2b. DT,RT,RD,NT,ND  ->   D
 3. First character of key = first character of name
 4. Transcode remaining characters by following these rules, incrementing by one character each time
   4a. EV  ->   AF  else A,E,I,O,U -> A
   4b. Q   ->   G
   4c. Z   ->   S
   4d. M   ->   N
   4e. KN  ->   N   else K -> C
   4f. SCH ->   SSS
   4g. PH  ->   FF
   4h. H   ->   If previous or next is nonvowel, previous
   4i. W   ->   If previous is vowel, previous
   4j. Add current to key if current != last key character
 5. If last character is S, remove it
 6. If last characters are AY, replace with Y
 7. If last character is A, remove it
 8. Collapse all strings of repeated characters
 9. Add original first character of name as first character of key

标签: javarnlpmatching

解决方案


仅基于该算法描述,答案是“BAS”。


推荐阅读