首页 > 解决方案 > For 循环遍历 csv 的列

问题描述

总的来说,我对 python 和编程非常陌生(这是我的第一种编程语言,大约一个月前开始使用)。

我有一个 CSV 文件,其中的数据排序如下(CSV 文件数据在底部)。有 31 列数据。第一列(波长)必须作为自变量(x)读入,对于第一次迭代,它必须作为因变量(y)读入第二列(即标记为“观察”的第一列)。然后,我尝试将高斯+线模型拟合到数据中,并从数据中提取高斯 (mu) 的平均值,该数据应存储在数组中以供进一步分析。应该对每组观察重复此过程,而读取的 x 值必须保持不变(即来自 Wavelength 列)

这是我当前如何读取数据的代码:

import numpy as np #importing necessary packages
import matplotlib.pyplot as plt
import pandas as pd
import scipy as sp
from scipy.optimize import curve_fit
e=np.exp
spectral_data=np.loadtxt(r'C:/Users/Sidharth/Documents/Computing Labs/Project 1/Halpha_spectral_data.csv', delimiter=',', skiprows=2) #importing data file
print(spectral_data)
x=spectral_data[:,0] #selecting column 0 to be x-axis data
y=spectral_data[:,1] #selecting column 1 to be y-axis data

所以我需要自动化这个过程,这样每次迭代就不必手动将 y=spectral_data[:,1] 更改为 y=spectral_data[:,2] 直到 y=spectral_data[:,30]可以简单地自动化。

我生成高斯拟合的代码如下:

plt.scatter(x,y) #produce scatter plot
plt.title('Observation 1')
plt.ylabel('Intensity (arbitrary units)')
plt.xlabel('Wavelength (m)')
plt.plot(x,y,'*')
plt.plot(x,c+m*x,'-') #plots the fit

print('The slope and intercept of the regression is,', m,c)
m_best=m
c_best=c
def fit_gauss(x,a,mu,sig,m,c):
    gaus = a*sp.exp(-(x-mu)**2/(2*sig**2))
    line = m*x+c
    return gaus + line

initial_guess=[160,7.1*10**-7,0.2*10**-7,m_best,c_best]
po,po_cov=sp.optimize.curve_fit(fit_gauss,x,y,initial_guess)

高斯似乎很合适(如图所示),因此这个高斯的平均值(即其峰值的 x 坐标)是我必须从中提取的值。均值在控制台中给出(用 mu 表示):

The slope and intercept of the regression is, -731442221.6844947 616.0099144830941
The signal parameters are
 Gaussian amplitude = 19.7 +/- 0.8
 mu = 7.1e-07 +/- 2.1e-10
 Gaussian width (sigma) = -0.0 +/- 0.0
and the background estimate is
 m = 132654859.04 +/- 6439349.49
 c = 40 +/- 5

So my questions are, how can I iterate the process of reading in data from the csv so that I don't have to manually change the column y takes data from, and then how do I store the value of mu from each iteration of the read-in so that I can do further analysis/calculations with that mean later?

My thoughts are I should use a for-loop but I'm not sure how to do it.

The orange line shown in the plot is a result of some code I tried earlier. I think its irrelevant which is why it isn't in the main part of the question, but if necessary, this is all it is.

x=spectral_data[:,0] #selecting column 0 to be x-axis data
y=spectral_data[:,1] #selecting column 1 to be y-axis data
plt.scatter(x,y) #produce scatter plot
plt.title('Observation 1')
plt.ylabel('Intensity (arbitrary units)')
plt.xlabel('Wavelength (m)')
plt.plot(x,y,'*')
plt.plot(x,c+m*x,'-') #plots the fit

CSV file data Plot showing Gaussian + line fit

标签: pythonnumpyfor-loopmatplotlibscipy

解决方案


Usually when you encounter a problem like that, try to break it into what has to be kept unchanged (in your example, the x data and the analysis code), and what does have to be changed (the y data, or more specific the index which tells the rest of the code what is the right column for the y data), and how to keep the values you wish to store further down the road.
Once you figure this out, we need to formalize the right loop and how to store the values we wish to. To do the latter, an easy way is to store them in a list, so we'll initiate an empty list and at the end of each loop iteration we'll append the value to that list.

mu_list = [] # will store our mu's in this list
for i in range(1, 31): # each iteration i gets a different value, starting with 1 and ends with 30 (and not 31)
    x = spectral_data[:, 0]
    y = spectral_data[:, i]
    # Your analysis and plot code here #
    mu = po[1] # Not sure po[1] is the right place where your mu is, please change it appropriately...
    mu_list.append(mu) # store mu at the end of our growing mu_list

And you will have a list of 30 mu's under mu_list.

Now, notice we don't have to do everything inside the loop, for example x is the same regardless to i (loading x only once - improves performance) and the analysis code is basically the same, except for a different input (y data), so we can define a function for it (a good practice to make bigger code much more readable), so most likely we can take them out from the loop. We can write x = spectral_data[:, 0] before the loop, and define a function which analyizes the data and returns mu:

def analyze(x, y):
    # Your analysis and plot code here #
    mu = po[1]
    return mu

x = spectral_data[:, 0]
mu_list = [] # will store our mu's in this list
for i in range(1, 31):
    y = spectral_data[:, i]
    mu_list.append(analyze(x,y)) # Will calculate mu using our function, and store it at the end of our growing mu_list

推荐阅读