首页 > 解决方案 > 在 Python/R 中将表与其另一列分开

问题描述

我在文本文件中有一个表格,如下所示:

V4     V1  V2   V3   V5  V6  V7  V8  V9  V10  V11  V12  COMMON   SUBS
GENE1  1   37   40   .   -   7   9   1   15   14   8    567.4    145
GENE2  5   90   93   .   -   12  39  0   15   35   0    400.0    58.5
GENE3  6   278  281  .   +   22  0   12  10   30   18   100.344  0.009
GENE4  2   812  815  .   -   4   0   0   0    0    2    38.33    4.698

要在 R 中生成此表:

 m <- data.frame("V4" = c("GENE1","GENE2","GENE3","GENE4"), "V1" = c(1,5,6,2), "V2" = c("37",90,278,812), "V3" = c(40,93,281,815), "V5"=c(".",".",".","."), "V6"=c("-","-","+","-"), "V7"=c(7,12,22,4), "V8"=c(9,39,0,0), "V9"=c(1,0,12,0), "V10" = c(15,15,10,0), "V11" = c(14,35,30,0), "V12"= c(8,0,18,2), "COMMON"=c(567.4,400,100.344,38.33), "SUBS"=c(145,58.5,0.009,4.698))

当我用 R 阅读它时:

m = read.delim("RawNumbers.txt", header=F)
head(m)
      V4  V1   V2   V3 V5 V6  V7  V8  V9  V10  V11  V12   COMMON     SUBS
0  GENE1   1   37   40  .  -   7   9   1   15   14    8  567.400  145.000
1  GENE2   5   90   93  .  -  12  39   0   15   35    0  400.000   58.500
2  GENE3   6  278  281  .  +  22   0  12   10   30   18  100.344    0.009
3  GENE4   2  812  815  .  -   4   0   0    0    0    2   38.330    4.698

我需要将列中的每个值与列V7 to V9中的值分开COMMON,并V10 to V12与列中的值分开SUBS。我正在Python通过以下方式进行此划分:

import numpy as np
import pandas as pd
import codecs
doc = codecs.open('RawNumbers.txt')
df = pd.read_csv(doc, sep='\t')
col_division = [ 'V7',  'V8',  'V9']
df[col_division] = df[col_division] / df['COMMON']
col_division_two = [ 'V10',  'V11',  'V12']
df[col_division_two] = df[col_division_two] / df['SUBS']

但是在这一步中,我收到以下错误:

raise ValueError("Columns must be same length as key")

如何修复此错误,或者如何在 R 而不是 Python 中进行此划分?

标签: pythonrpandasnumpymultiple-columns

解决方案


在 R 中,这应该有效:

m[,c("V7","V8","V9")] <- m[,c("V7","V8","V9")] / m$COMMON
m[,c("V10","V11","V12")] <- m[,c("V10","V11","V12")] / m$SUBS

如果您知道列在数据框中的位置,则可以编写:

m[,7:9] <- m[,7:9] / m[,13]
m[,10:12] <- m[,10:12] / m[,14]

推荐阅读