首页 > 解决方案 > 根据 21 天的间隔计算流失率

问题描述

我使用 git 日志的 pandas 数据帧和git show特定提交的命令来计算流失率,以查看基于 loc 的更改的确切位置。但是,我无法根据天数计算流失率,即我的意思是当工程师重写或删除他们自己的少于 3 周的代码时计算流失率。

这就是我为每个基于提交的数据框所做的

git记录数据框

        sha timestamp   date    author  message body    age insertion   deletion    filepath    churn   merges
1   1   cae635054   Sat Jun 26 14:51:23 2021 -0400  2021-06-26 18:51:23+00:00   Andrew Clark    `act`: Resolve to return value of scope function (#21759)   When migrating some internal tests I found it annoying that I couldn't  -24 days +12:21:32.839997                   
2   21  cae635054   Sat Jun 26 14:51:23 2021 -0400  2021-06-26 18:51:23+00:00   Andrew Clark    `act`: Resolve to return value of scope function (#21759)   When migrating some internal tests I found it annoying that I couldn't  -24 days +12:21:32.839997   31.0    0.0 packages/react-reconciler/src/__tests__/ReactIsomorphicAct-test.js  31.0    
3   22  cae635054   Sat Jun 26 14:51:23 2021 -0400  2021-06-26 18:51:23+00:00   Andrew Clark    `act`: Resolve to return value of scope function (#21759)   When migrating some internal tests I found it annoying that I couldn't  -24 days +12:21:32.839997   1.0 1.0 packages/react-test-renderer/src/ReactTestRenderer.js   0.0 
4   23  cae635054   Sat Jun 26 14:51:23 2021 -0400  2021-06-26 18:51:23+00:00   Andrew Clark    `act`: Resolve to return value of scope function (#21759)   When migrating some internal tests I found it annoying that I couldn't  -24 days +12:21:32.839997   24.0    14.0    packages/react/src/ReactAct.js  10.0    
5   25  e2453e200   Fri Jun 25 15:39:46 2021 -0400  2021-06-25 19:39:46+00:00   Andrew Clark    act: Add test for bypassing queueMicrotask (#21743) Test for fix added in #21740    -25 days +13:09:55.839997   50.0    0.0 packages/react-reconciler/src/__tests__/ReactIsomorphicAct-test.js  50.0    
6   27  73ffce1b6   Thu Jun 24 22:42:44 2021 -0400  2021-06-25 02:42:44+00:00   Brian Vaughn    DevTools: Update tests to fix warnings/errors (#21748)  Some new ones had slipped in (e.g. deprecated ReactDOM.render message from 18)  -26 days +20:12:53.839997   4.0 5.0 packages/react-devtools-shared/src/__tests__/FastRefreshDevToolsIntegration-test.js -1.0    
7   28  73ffce1b6   Thu Jun 24 22:42:44 2021 -0400  2021-06-25 02:42:44+00:00   Brian Vaughn    DevTools: Update tests to fix warnings/errors (#21748)  Some new ones had slipped in (e.g. deprecated ReactDOM.render message from 18)  -26 days +20:12:53.839997   4.0 4.0 packages/react-devtools-shared/src/__tests__/componentStacks-test.js    0.0 
8   29  73ffce1b6   Thu Jun 24 22:42:44 2021 -0400  2021-06-25 02:42:44+00:00   Brian Vaughn    DevTools: Update tests to fix warnings/errors (#21748)  Some new ones had slipped in (e.g. deprecated ReactDOM.render message from 18)  -26 days +20:12:53.839997   12.0    12.0    packages/react-devtools-shared/src/__tests__/console-test.js    0.0 
9   30  73ffce1b6   Thu Jun 24 22:42:44 2021 -0400  2021-06-25 02:42:44+00:00   Brian Vaughn    DevTools: Update tests to fix warnings/errors (#21748)  Some new ones had slipped in (e.g. deprecated ReactDOM.render message from 18)  -26 days +20:12:53.839997   7.0 6.0 packages/react-devtools-shared/src/__tests__/editing-test.js    1.0 
10  31  73ffce1b6   Thu Jun 24 22:42:44 2021 -0400  2021-06-25 02:42:44+00:00   Brian Vaughn    DevTools: Update tests to fix warnings/errors (#21748)  Some new ones had slipped in (e.g. deprecated ReactDOM.render message from 18)  -26 days +20:12:53.839997   47.0    42.0    packages/react-devtools-shared/src/__tests__/inspectedElement-test.js   5.0 
11  32  73ffce1b6   Thu Jun 24 22:42:44 2021 -0400  2021-06-25 02:42:44+00:00   Brian Vaughn    DevTools: Update tests to fix warnings/errors (#21748)  Some new ones had slipped in (e.g. deprecated ReactDOM.render message from 18)  -26 days +20:12:53.839997   7.0 6.0 packages/react-devtools-shared/src/__tests__/ownersListContext-test.js  1.0 
12  33  73ffce1b6   Thu Jun 24 22:42:44 2021 -0400  2021-06-25 02:42:44+00:00   Brian Vaughn    DevTools: Update tests to fix warnings/errors (#21748)  Some new ones had slipped in (e.g. deprecated ReactDOM.render message from 18)  -26 days +20:12:53.839997   22.0    21.0    packages/react-devtools-shared/src/__tests__/profilerContext-test.js    1.0 

流失计算

commits = df["sha"].unique().tolist()
for commit in commits:
    contribution, churn = await self.calculate_churn(commit)

async def calculate_churn(self, stream):
        PREVIOUS_BASE_DIR = os.path.abspath("")
        try:
            GIT_DIR = os.path.join(PREVIOUS_BASE_DIR, "app/git/react.git")
            os.chdir(GIT_DIR)
        except FileNotFoundError as e:
            raise ValueError(e)
        cmd = f"git show --format= --unified=0 --no-prefix {stream}"
        cmds = [f"{cmd}"]
        results = get_proc_out(cmds)
        [files, contribution, churn] = get_loc(results)
        # need to circle back to previous path
        os.chdir(PREVIOUS_BASE_DIR)
        return contribution, churn


def is_new_file(result, file):
    # search for destination file (+++ ) and update file variable
    if result.startswith("+++"):
        return result[result.rfind(" ") + 1 :]
    else:
        return file


def is_loc_change(result, loc_changes):
    # search for loc changes (@@ ) and update loc_changes variable
    # @@ -1,5 +1,4 @@
    # @@ -l,s +l,s @@
    if result.startswith("@@"):
        # loc_change = result[2+1: ] -> -1,5 +1,4 @@
        loc_change = result[result.find(" ") + 1 :]
        # loc_change = loc_change[:9] -> -1,5 +1,4
        loc_change = loc_change[: loc_change.find(" @@")]
        return loc_change
    else:
        return loc_changes


def get_loc_change(loc_changes):
    # removals
    # -1,5 +1,4 = -1,5
    left = loc_changes[: loc_changes.find(" ")]
    left_dec = 0
    # 2
    if left.find(",") > 0:
        # 2
        comma = left.find(",")
        # 5
        left_dec = int(left[comma + 1 :])
        # 1
        left = int(left[1:comma])
    else:
        left = int(left[1:])
        left_dec = 1

    # additions
    # +1,4
    right = loc_changes[loc_changes.find(" ") + 1 :]
    right_dec = 0
    if right.find(",") > 0:
        comma = right.find(",")
        right_dec = int(right[comma + 1 :])
        right = int(right[1:comma])
    else:
        right = int(right[1:])
        right_dec = 1

    if left == right:
        return {left: (right_dec - left_dec)}
    else:
        return {left: left_dec, right: right_dec}


def get_loc(results):
    files = {}
    contribution = 0
    churn = 0
    file = ""
    loc_changes = ""
    for result in results:
        new_file = is_new_file(result, file)
        if file != new_file:
            file = new_file
            if file not in files:
                files[file] = {}
        else:
            new_loc_changes = is_loc_change(
                result, loc_changes
            )  # returns either empmty or -6 +6 or -13, 0 +14, 2 format
            if loc_changes != new_loc_changes:
                loc_changes = new_loc_changes
                locc = get_loc_change(loc_changes)  # {2: 0} or {8: 0, 9: 1}
                for loc in locc:
                    # files[file] = {2: 0, 8: 0, 9: 1}
                    #  print("loc", loc, files[file], locc[loc])
                    if loc in files[file]:
                        # change of lines triggered
                        files[file][loc] += locc[loc]
                        churn += abs(locc[loc])
                    else:
                        files[file][loc] = locc[loc]
                        contribution += abs(locc[loc])
            else:
                continue
    return [files, contribution, churn]

我如何才能使用相同的代码,但仅在仅 3 周前的代码发生更改时才检查流失?

标签: pythonpython-3.xpandasgitdataframe

解决方案


唯一可行的方法是遍历 DataDrame,因为这对 pandas 来说很糟糕,它几乎总是意味着你有错误的数据结构。如果您不进行数值分析,并且看起来您不是,那么只需保留一个简单的 dicts 列表。Pandas 有它的闪光点,但它不是一个通用数据库。

这是您需要的粗略代码,尽管我在掩盖细节:

# Go through the df row by row.

lastdate = {}
for index,row in df.iterrows():
    if row['filepath'] in lastdate:
        if lastdate[row['filepath']] - row['date'] < timedelta(days=21):
            print( "Last change to", row['filepath'], "was within three weeks" )
    lastdate[row['filepath']] = row['date']

推荐阅读