首页 > 解决方案 > 将 tarball 与 git 存储库匹配

问题描述

给定一个 git 存储库和一个没有修订信息的 tarball。tarball 中的树在过去的某个时间点源自存储库,并且发生了相当大的变化。存储库也发生了很大变化。从存储库复制 tarball 树的提交是未知的。任务是找到最接近 tarball 的提交,检查 tarball 树中的更改或将 tarball 树移植回存储库。

我之前通过手动二分搜索做到了这一点,最大限度地减少了diff -ruN gitrepo tartree | wc -c. 我想知道是否有可以自动执行任务的工具?

标签: gitmergebranch

解决方案


感谢fredrikÔrel的评论。我知道原始提交可能会或可能不会被发现,所以我说“最接近”。我编写了一个线性蛮力搜索,它确实找到了一个很好的极值,比我以前做的手动考虑要快得多......特别是如果你猜得很好从哪个提交开始搜索。

(更新:使用LeGECgit log --pretty=format建议的缩短脚本)。

#!/usr/bin/perl

# Estimate similarity of $DIR to every commit in ```git log``` output,
# output a line for every commit.  ```git log``` starts from the
# currently checked out commit and goes back in time.
#
# The script is quick and dirty: it checks out every commit in turn to
# take a diff.  After the script stops for whatever reason, the last
# commit seen stays checked out.  You will have to restore the original
# checkout yourself.

sub usage {
    die ("Usage:\n",
         "  cd clean-git-repo\n",
         "  git-match-dir DIR\n");
}

sub main {
    my $dir = $ARGV[0] // usage();
    open (my $fh, "git log --pretty='%H %ad'|") or die;
    while (<$fh>) {
        # d2e9457319bff7326d5162b47dd4891c652c2089 Thu Sep 14 09:44:58 2017 +0300
        my ($commit, $date) = /(\w+) \w\w\w (.*)/;
        $commit or die "unexpected output from git log: $_";
        my $out = `git checkout $commit 2>&1`;
        $? == 0 or die "$out\nCheckout error.  Stop";
        my $len = 0 + `diff -wruN --exclude .git . $dir | wc -c`;
        printf("%10u %s %s\n", $len, $commit, $date);
    }
}

main();
exit 0;


推荐阅读