python - 如何将此 Python 算法转换为 C#
问题描述
我正在尝试将此 Stack Overflow 答案中的 Python 算法转换为将没有空格的字符串拆分为 C# 的单词。
不幸的是,我对 Python 一无所知,因此翻译非常困难。
我不明白的行是:
wordcost = dict((k, log((i+1)*log(len(words)))) for i,k in enumerate(words)) <= THIS LINE
和
def best_match(i):
candidates = enumerate(reversed(cost[max(0, i-maxword):i]))
return min((c + wordcost.get(s[i-k-1:i], 9e999), k+1) for k,c in candidates) <= THIS LINE
它看起来好像best_match(i)
应该返回一个Tuple<>
. C# 中的等价物是什么?
这是完整的 Python 脚本:
from math import log
# Build a cost dictionary, assuming Zipf's law and cost = -math.log(probability).
words = open("words-by-frequency.txt").read().split()
wordcost = dict((k, log((i+1)*log(len(words)))) for i,k in enumerate(words))
maxword = max(len(x) for x in words)
def infer_spaces(s):
"""Uses dynamic programming to infer the location of spaces in a string
without spaces."""
# Find the best match for the i first characters, assuming cost has
# been built for the i-1 first characters.
# Returns a pair (match_cost, match_length).
def best_match(i):
candidates = enumerate(reversed(cost[max(0, i-maxword):i]))
return min((c + wordcost.get(s[i-k-1:i], 9e999), k+1) for k,c in candidates)
# Build the cost array.
cost = [0]
for i in range(1,len(s)+1):
c,k = best_match(i)
cost.append(c)
# Backtrack to recover the minimal-cost string.
out = []
i = len(s)
while i>0:
c,k = best_match(i)
assert c == cost[i]
out.append(s[i-k:i])
i -= k
return " ".join(reversed(out))
解决方案
我发现这个算法很有趣,所以这是我的翻译:
class WordSplitter {
private readonly Dictionary<string, double> _wordCosts;
private readonly int _maxWordLength;
public WordSplitter(string freqFilePath) {
// words = open("words-by-frequency.txt").read().split()
var words = File.ReadAllLines(freqFilePath);
// wordcost = dict((k, log((i+1)*log(len(words)))) for i,k in enumerate(words))
_wordCosts = words.Select((k, i) => new { Key = k, Value = Math.Log((i + 1) * Math.Log(words.Length)) }).ToDictionary(c => c.Key, c => c.Value);
// maxword = max(len(x) for x in words)
_maxWordLength = words.Select(c => c.Length).Max();
}
public string InferSpaces(string target) {
// cost = [0]
var costs = new List<double>() { 0 };
foreach (var i in Enumerable.Range(1, target.Length)) {
var (c, k) = BestMatch(i);
costs.Add(c);
}
var output = new List<string>();
int len = target.Length;
while (len > 0) {
var (c, k) = BestMatch(len);
Debug.Assert(k > 0);
Debug.Assert(c == costs[len]);
// use Substring if your compiler version doesn't support slicing
// but pay attention that Substring second argument is length, not end index
output.Add(target[(len - k)..len]);
len -= k;
}
output.Reverse();
return String.Join(" ", output);
(double cost, int length) BestMatch(int i) {
var start = Math.Max(0, i - _maxWordLength);
// GetRange second argument is length
var x = costs.GetRange(start, i - start);
x.Reverse();
// now, this part is easier to comprehend if it's expanded a bit
// you can do it in cryptic way too like in python though if you like
(double cost, int length)? result = null;
for (int k = 0; k < x.Count; k++) {
var c = x[k];
var sub = target[(i - k - 1)..i];
var cost = c + (_wordCosts.ContainsKey(sub) ? _wordCosts[sub] : 9e99); // 9e99 is just some big number. 9e999 is outside of double range in C#, so use smaller one
// save minimal cost
if (result == null || result.Value.cost > cost)
result = (cost, k + 1);
}
// return minimal cost
return result.Value;
}
}
}
用法:
var splitter = new WordSplitter(@"C:\tmp\words.txt");
var result = splitter.InferSpaces("thumbgreenappleactiveassignmentweeklymetaphor");
推荐阅读
- python - 在python中填充缺失值的标准方法是什么?
- android - 在 windows 中获取特定 android 项目的 SHA1 的命令
- rust - 调用返回引用的函数时了解非词法生命周期
- ios - 如何在 Swift 中从 Firestore 中读取多个自定义对象
- php - PHP计算数组并返回0而不是空
- python - Spark:加载多个文件,执行相同的操作并合并到一个数据帧中
- swift - SwiftUI 视图阻止了对 MapKit 的触摸,但不阻止其他视图
- c - C 编程:读取文件文本并尝试找出最长单词的问题
- json - Swift JSONDecode 解码数组/字典失败
- list - 列表作为可选构造函数参数在 Dart 中为空