首页 > 解决方案 > 自动化 post-“git move”,制作历史日志

问题描述

我正面临将几个存储库合并为一个,杂项文件四处移动

基于对 SO、SO、如何合并存储库的一些研究,我最终得到了以下草图:

user=some_user
new_superproj=new_proj # new repository, will include old repositories 
hosting=bitbucket.org # gitgub etc
r1=repo1 # repo 1 to merge
r2=repo2
...
# clone to the new place. These are throw-away (!!!) directory
git clone git@${hosting}:${some_user}/${r1}.git
git clone git@${hosting}:${some_user}/${r2}.git
...
mkdir ${new_superproj} && cd ${new_superproj}

# dummy commit so we can merge
git init
dir > deleteme.txt
git add .
git commit -m "Initial dummy commit"
git rm ./deleteme.txt
git commit -m "Clean up initial file"

# repeat for all source repositories
repo=${r1}

pushd .
cd ../${repo}

# In the throw-away repository, move to the subfolder and rewrite log
git filter-branch --index-filter '
    git ls-files -s |
    sed "s,\t,&'"${repo}"'/," |
    GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info &&
    mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE
' HEAD
popd

# now bring data in to the new repository
git remote add -f ${repo} ../${repo}
git merge --allow-unrelated-histories  ${repo}/master -m "Merging repo ${repo} in"
# remove remote to throw-away repo
git remote rm ${repo}

到目前为止一切顺利,除非我们想在保留日志的同时移动文件。Git在移动/重命名方面很糟糕,并且日志重写片段不太适应,因此以统一的方式重写完成,递归地为整个目录

想法是,当文件移动时,我们知道存储库中没有其他更改,而是重命名和移动。那么,我怎样才能将以下部分重写为规范,每个文件。取自git filter-branch,官方文档

git filter-branch --index-filter \
    'git ls-files -s | sed "s-\t\"*-&newsubdir/-" |
        GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
            git update-index --index-info &&
     mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD

我很难理解“sed”之后的东西以及它如何应用于 git filter-branch

我想运行脚本(bash、python 等),所以:

for each file in repository get moved/renamed
    ...
    # in the loop, moved/renamed file found
    old_file="..." # e.g. a/b/c/old_name.txt
    new_file="..." # e.g. a/b/f/g/new_name.txt, at this point it is known, old_file and new_file is the same file
    update_log_paths(old_file, new_file) # <--- this part is needed

有任何想法吗?

标签: bashgit

解决方案


事实证明,从以下命令Move file-by-file in git提示,它就像(伪代码)一样简单:

move_files
cd repo_root
git add . # so changes detected as moves, vs added/deleted
repo_moves=collect_moves_data()
git reset HEAD && git checkout . && git clean -df . # undo all moves

我发现的最大误解是“git log --follow”或其他“更强”的选项不适用于许多相关的 SO 问题:

git log --follow <file>

在移动之前不显示日志,而在未更改的情况下,文件已提交。

for each_move in repo_moves
    old_file, new_file=deduct_old_new_name(each_move)

    new_dir=${new_file%/*}
    filter="$filter                            \n\
      if [ -e \"${old_file}\" ]; then               \n\
          echo                                      \n\
          if [ ! -e \"${new_dir}\" ]; then          \n\
            mkdir --parents \"${new_dir}\" && echo  \n\
          fi                                        \n\
          mv \"${old_file}\" \"${new_file}\"        \n\
        fi                                          \n\
      "

git filter-branch -f --index-filter "`echo -e $filter`"

如果您需要返回:

git pull # with merge
git reset --hard <hash> # get hash of your origin/master, orignin/HEAD), which will be HEAD~2, but I'd check it manually and copy/paste hash

推荐阅读