首页 > 解决方案 > 向现有 Koalas Dataframe 添加新列会导致 NaN

问题描述

我正在尝试向我现有的 Koalas 数据框添加一个新列。但是一旦添加了新列,这些值就会变成 NaN。我不确定这里发生了什么,有人可以给我一些指示吗?

这是代码:

import databricks.koalas as ks

kdf = ks.DataFrame(
    {'a': [1, 2, 3, 4, 5, 6],
     'b': [100, 200, 300, 400, 500, 600],
     'c': ["one", "two", "three", "four", "five", "six"]},
    index=[10, 20, 30, 40, 50, 60])

ks.set_option('compute.ops_on_diff_frames', True)
ks_series = ks.Series((np.arange(len(kdf.to_numpy()))))
kdf["values"] = ks_series

ks.reset_option('compute.ops_on_diff_frames')

标签: pythonpandasapache-sparkpysparkspark-koalas

解决方案


添加新列时需要匹配索引:

import databricks.koalas as ks
import numpy as np

kdf = ks.DataFrame(
    {'a': [1, 2, 3, 4, 5, 6],
     'b': [100, 200, 300, 400, 500, 600],
     'c': ["one", "two", "three", "four", "five", "six"]},
    index=[10, 20, 30, 40, 50, 60])

ks.set_option('compute.ops_on_diff_frames', True)
ks_series = ks.Series(np.arange(len(kdf.to_numpy())), index=kdf.index.tolist())
kdf["values"] = ks_series

kdf
    a    b      c  values
10  1  100    one       0
20  2  200    two       1
30  3  300  three       2
40  4  400   four       3
50  5  500   five       4
60  6  600    six       5

推荐阅读