dataframe - 如何在 IndexedTable 中添加/编辑值?
问题描述
如何在 IndexedTable 中添加或编辑值?从文档中我了解到 IndexedTable 对象本身是不可变的,但不是基础数据,所以我“理解”为什么这样的东西不起作用,但我不知道如何使用新数据获取新的 IndexedTable:
myTable = ndsparse((
region = ["US","US","US","US","EU","EU","EU","EU"],
product = ["apple","apple","banana","banana","apple","apple","banana","banana"],
year = [2011,2010,2011,2010,2011,2010,2011,2010]
),(
production = [3.3,3.2,2.3,2.1,2.7,2.8,1.5,1.3],
consumption = [4.3,7.4,2.5,9.8,3.2,4.3,6.5,3.0]
))
myTable["EU","banana",2011] = (2.5, 7.5) # ERROR: type Tuple has no field region
myTable["EU","banana",2012] = (2.5, 7.5) # ERROR: type Tuple has no field region
myTable["EU","banana",2011] = (production = 2.5, consumption = 7.5) # ERROR: type Tuple has no field region
解决方案
看起来您想要的功能在 JuliaDB 的Base.merge
.
合并(a::NDSparse,a::NDSparse;聚合)
将 a 的行与 b 的行合并。为了保持唯一键,来自 b 的值优先。提供的函数 agg 将聚合来自 a 和 b 的具有相同键的值。
Example: a = table((x = 1:5, y = rand(5)); pkey = :x) b = table((x = 6:10, y = rand(5)); pkey = :x) merge(a, b) a = ndsparse([1,3,5], [1,2,3]) b = ndsparse([2,3,4], [4,5,6]) merge(a, b) merge(a, b; agg = (x,y) -> x)
基于您的问题的工作示例:
# tested on Julia 1.0.4
julia> using JuliaDB
julia> myTable = ndsparse((
region = ["US","US","US","US","EU","EU","EU","EU"],
product = ["apple","apple","banana","banana","apple","apple","banana","banana"],
year = [2011,2010,2011,2010,2011,2010,2011,2010]
),(
production = [3.3,3.2,2.3,2.1,2.7,2.8,1.5,1.3],
consumption = [4.3,7.4,2.5,9.8,3.2,4.3,6.5,3.0]
))
3-d NDSparse with 8 values (2 field named tuples):
region product year │ production consumption
───────────────────────┼────────────────────────
"EU" "apple" 2010 │ 2.8 4.3
"EU" "apple" 2011 │ 2.7 3.2
"EU" "banana" 2010 │ 1.3 3.0
"EU" "banana" 2011 │ 1.5 6.5 # Note the old value
"US" "apple" 2010 │ 3.2 7.4
"US" "apple" 2011 │ 3.3 4.3
"US" "banana" 2010 │ 2.1 9.8
"US" "banana" 2011 │ 2.3 2.5
julia> updated_myTable = ndsparse((
region = ["EU"],
product = ["banana"],
year = [2011]
),(
production = [2.5], # new values here
consumption = [7.5]
))
3-d NDSparse with 1 values (2 field named tuples):
region product year │ production consumption
───────────────────────┼────────────────────────
"EU" "banana" 2011 │ 2.5 7.5
julia> newTable = merge(updated_myTable, myTable, agg = (x,y) -> x)
3-d NDSparse with 8 values (2 field named tuples):
region product year │ production consumption
───────────────────────┼────────────────────────
"EU" "apple" 2010 │ 2.8 4.3
"EU" "apple" 2011 │ 2.7 3.2
"EU" "banana" 2010 │ 1.3 3.0
"EU" "banana" 2011 │ 2.5 7.5 # Note the updated values here!
"US" "apple" 2010 │ 3.2 7.4
"US" "apple" 2011 │ 3.3 4.3
"US" "banana" 2010 │ 2.1 9.8
"US" "banana" 2011 │ 2.3 2.5
请注意,agg
在发生冲突的情况下,该函数如何首选第一个参数中的键。
另一种方法是在发现正确的索引后直接编辑数据元素。
julia> i = findfirst(isequal((region = "EU", product = "banana", year = 2011)), myTable.index)
4
julia> myTable.data[i]
(production = 1.5, consumption = 6.5)
julia> myTable.data[i] = (production = 2.5, consumption = 7.5)
(production = 2.5, consumption = 7.5)
julia> myTable
3-d NDSparse with 8 values (2 field named tuples):
region product year │ production consumption
───────────────────────┼────────────────────────
"EU" "apple" 2010 │ 2.8 4.3
"EU" "apple" 2011 │ 2.7 3.2
"EU" "banana" 2010 │ 1.3 3.0
"EU" "banana" 2011 │ 2.5 7.5 # Note the updated values here!
"US" "apple" 2010 │ 3.2 7.4
"US" "apple" 2011 │ 3.3 4.3
"US" "banana" 2010 │ 2.1 9.8
"US" "banana" 2011 │ 2.3 2.5
希望有帮助。