首页 > 解决方案 > 如何将此 Python 算法转换为 C#

问题描述

我正在尝试将此 Stack Overflow 答案中的 Python 算法转换为将没有空格的字符串拆分为 C# 的单词。

不幸的是,我对 Python 一无所知,因此翻译非常困难。

我不明白的行是:

wordcost = dict((k, log((i+1)*log(len(words)))) for i,k in enumerate(words)) <= THIS LINE

def best_match(i):
    candidates = enumerate(reversed(cost[max(0, i-maxword):i]))
    return min((c + wordcost.get(s[i-k-1:i], 9e999), k+1) for k,c in candidates) <= THIS LINE

它看起来好像best_match(i)应该返回一个Tuple<>. C# 中的等价物是什么?

这是完整的 Python 脚本:

from math import log

# Build a cost dictionary, assuming Zipf's law and cost = -math.log(probability).
words = open("words-by-frequency.txt").read().split()
wordcost = dict((k, log((i+1)*log(len(words)))) for i,k in enumerate(words))
maxword = max(len(x) for x in words)

def infer_spaces(s):
    """Uses dynamic programming to infer the location of spaces in a string
    without spaces."""

    # Find the best match for the i first characters, assuming cost has
    # been built for the i-1 first characters.
    # Returns a pair (match_cost, match_length).
    def best_match(i):
        candidates = enumerate(reversed(cost[max(0, i-maxword):i]))
        return min((c + wordcost.get(s[i-k-1:i], 9e999), k+1) for k,c in candidates)

    # Build the cost array.
    cost = [0]
    for i in range(1,len(s)+1):
        c,k = best_match(i)
        cost.append(c)

    # Backtrack to recover the minimal-cost string.
    out = []
    i = len(s)
    while i>0:
        c,k = best_match(i)
        assert c == cost[i]
        out.append(s[i-k:i])
        i -= k

    return " ".join(reversed(out))

标签: pythonc#algorithm

解决方案


我发现这个算法很有趣,所以这是我的翻译:

class WordSplitter {
    private readonly Dictionary<string, double> _wordCosts;
    private readonly int _maxWordLength;
    public WordSplitter(string freqFilePath) {
        // words = open("words-by-frequency.txt").read().split()
        var words = File.ReadAllLines(freqFilePath);
        // wordcost = dict((k, log((i+1)*log(len(words)))) for i,k in enumerate(words))
        _wordCosts = words.Select((k, i) => new { Key = k, Value = Math.Log((i + 1) * Math.Log(words.Length)) }).ToDictionary(c => c.Key, c => c.Value);
        // maxword = max(len(x) for x in words)
        _maxWordLength = words.Select(c => c.Length).Max();
    }

    public string InferSpaces(string target) {
        // cost = [0]
        var costs = new List<double>() { 0 };

        foreach (var i in Enumerable.Range(1, target.Length)) {
            var (c, k) = BestMatch(i);
            costs.Add(c);
        }

        var output = new List<string>();
        int len = target.Length;
        while (len > 0) {
            var (c, k) = BestMatch(len);
            Debug.Assert(k > 0);
            Debug.Assert(c == costs[len]);

            // use Substring if your compiler version doesn't support slicing
            // but pay attention that Substring second argument is length, not end index
            output.Add(target[(len - k)..len]);
            len -= k;
        }

        output.Reverse();
        return String.Join(" ", output);

        (double cost, int length) BestMatch(int i) {
            var start = Math.Max(0, i - _maxWordLength);
            // GetRange second argument is length
            var x = costs.GetRange(start, i - start);
            x.Reverse();
            // now, this part is easier to comprehend if it's expanded a bit
            // you can do it in cryptic way too like in python though if you like
            (double cost, int length)? result = null;
            for (int k = 0; k < x.Count; k++) {
                var c = x[k];
                var sub = target[(i - k - 1)..i];
                var cost = c + (_wordCosts.ContainsKey(sub) ? _wordCosts[sub] : 9e99); // 9e99 is just some big number. 9e999 is outside of double range in C#, so use smaller one
                // save minimal cost
                if (result == null || result.Value.cost > cost)
                    result = (cost, k + 1);
            }
            // return minimal cost
            return result.Value;
        }
    }
}

用法:

var splitter = new WordSplitter(@"C:\tmp\words.txt");
var result = splitter.InferSpaces("thumbgreenappleactiveassignmentweeklymetaphor");

推荐阅读