python - 以前工作的 python 脚本现在中途停止
问题描述
我一直在编写一个 python 脚本来进行 Probit 分析,以找到我们实验室中运行的检测的检测下限 (LLoD)。上周我有一个运行良好的脚本,但是很混乱,并且缺少任何类型的输入检查来确保用户的输入是有效的。
在运行脚本时,系统会提示用户几个问题(包含要分析的数据的 .csv 文件中相关数据的列标题,以及是否按原样或 log_10 形式使用所述列中的数据)。然后,该脚本将 - 执行必要的数据清理和计算 - 打印出相关数据表以及线性回归的方程和 R^2 值 - 显示数据图表以及线性回归和计算的 LLoD -打印出“95% CI 的检测下限是 [whatever]”。
现在,在运行脚本时,程序在打印数据表并显示图形后停止(不打印回归方程和 R^2 值,之后也不打印任何内容)。此外,python 不会返回提示输入标准>>>
,我必须退出并重新打开 Python。有谁知道发生了什么?完整代码在帖子底部,示例数据可以在这里找到。注意:这是我一直在使用的确切数据,上周有效。
(PS 任何清理代码的其他提示也将不胜感激,只要函数不变。我来自 C 背景,适应 Python 仍在进行中......)
代码:(FWIW我正在运行3.8)
import os
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy
from scipy.stats import norm
from numpy.polynomial import Polynomial
from tkinter import filedialog
from tkinter import *
# Initialize tkinter
root = Tk()
root.withdraw()
# Prompt user for data file and column headers, ask whether to use log(qty)
print("In the directory prompt, select the .csv file containing data for analysis")
path = filedialog.askopenfilename()
#data = pd.read_csv(path, usecols=[conc, detect])
data = pd.read_csv(path)
while True:
conc = input("Enter the column header for concentration/number of copies: ")
if conc in data.columns:
break
else:
print('Invalid input. Column \''+str(conc)+'\' does not exist. Try again')
continue
while True:
detect = input("Enter the column header for target detection: ")
if detect in data.columns:
break
else:
print('Invalid input. Column \''+str(detect)+'\' does not exist. Try again')
continue
while True:
logans = input("Analyze using log10(concentration/number of copies)? (y/n): ")
if logans == 'y':
break
elif logans == 'n':
break
else:
print('Invalid input. Please enter either y or n.')
continue
# Read the columns of data specified by the user and rename them for consistency
data = data.rename(columns={conc:"qty", detect:"result"})
# Create list of unique values for RNA quantity, initialize vectors of same length
# to store probabilies and probit scores for each
qtys = data['qty'].unique()
log_qtys = [0] * len(qtys)
prop = [0] * len(qtys)
probit = [0] * len(qtys)
# Function to get the hitrate/probability of detection for a given quantity
# Note: any values in df.result that cannot be parsed as a number will be converted to NaN
def hitrate(qty, df):
t_s = df[df.qty == qty].result
t_s = t_s.apply(pd.to_numeric, args=('coerce',)).isna()
return (len(t_s)-t_s.sum())/len(t_s)
# Iterate over quantities to calculate log10(quantity), the corresponding probability
# of detection, and its associated probit score
for idx, val in enumerate(qtys):
log_qtys[idx] = math.log10(val)
prop[idx] = hitrate(val, data)
probit[idx] = 5 + norm.ppf(prop[idx])
# Create a dataframe (with headers) composed of the quantaties and their associated
# probabilities and probit scores, then drop rows with probability of 0 or 1
hitTable = pd.DataFrame(np.vstack([qtys,log_qtys,prop,probit]).T, columns=['qty','log_qty','probability','probit'])
hitTable.probit.replace([np.inf,-np.inf],np.nan, inplace=True)
hitTable.dropna(inplace=True)
def regPlot(x, y, log):
# Update parameters, set y95 to probit score corresponding to 95% CI
params = {'mathtext.default': 'regular'}
plt.rcParams.update(params)
y95 = 6.6448536269514722
# Define lambda function for a line, run regression, and find the coefficient of determination
regFun = lambda m, x, b : (m*x) + b
regression = scipy.stats.linregress(x,y)
r_2 = regression.rvalue*regression.rvalue
# Solve y=mx+b for x at 95% CI
log_llod = (y95 - regression.intercept) / regression.slope
xmax = log_llod * 1.2
# Start plotting all the things!
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_ylabel('Probit score\n$(\sigma + 5)$')
if log == 'y':
ax.set_xlabel('$log_{10}$(input quantity)')
elif log == 'n':
ax.set_xlabel('input quantity')
else:
raise ValueError('Error when calling regPlot(x,y,log) - User input invalid.')
x_r = [0, xmax]
y_r = [regression.intercept, regFun(regression.slope,x_r[1],regression.intercept)]
ax.plot(x_r, y_r, '--k') # linear regression
ax.plot(log_llod, y95, color='red', marker='o', markersize=8) # LLOD point
ax.plot([0,xmax], [y95,y95], color='red', linestyle=':') # horiz. red line
ax.plot([log_llod,log_llod], [regFun(regression.slope,x_r[0],regression.intercept),7.1], color='red', linestyle=':') # vert. red line
ax.plot(x, y, 'bx') # actual (qty, probit) data points
ax.grid() # grid
plt.show()
print('\n Linear regression using least-squares method yields:\n')
print('\t\ty = '+str("%.3f"%regression.slope)+'x + '+str("%.3f"%regression.intercept)+'\n')
print('\twith a corresponding R-squared value of', str("%.5f"%r_2)+"\n")
return regression.slope, regression.intercept, r_2, regression.stderr, regression.intercept_stderr, log_llod
print('\n', hitTable, '\n')
if logans == 'y':
m, b, r_2, stderr, int_stderr, log_llod = regPlot(hitTable.log_qty, hitTable.probit, logans)
llod_95 = 10**log_llod
if r_2 < 0.9:
print('WARNING: low r-squared value for linear regression. Try re-analyzing without using log10.')
elif logans == 'n':
m, b, r_2, stderr, int_stderr, log_llod = regPlot(hitTable.qty, hitTable.probit, logans)
llod_95 = log_llod
if r_2 < 0.9:
print('WARNING: low r-squared value for linear regression. Try re-analyzing using log10.')
else:
raise ValueError('Error when attempting to evaluate llod_95 - User input invalid.')
print("\nThe lower limit of detection (LLoD) at 95% CI is " + str("%.4f"%llod_95) + ".\n")
解决方案
这似乎是因为plt.show()
调用时 i/o 阻塞。它在一个窗口中显示图形,并在继续执行代码之前等待您将其关闭。
这是默认情况下,matplotlib
但您可以使其成为非阻塞:
https ://stackoverflow.com/a/33050617/15332448
推荐阅读
- database - Flutter 应用程序是否可以使用 RethinkDB?
- node.js - MongoDB Atlas 连接不断下降
- vim - 为什么启动vim时vim插件'itchyny/lightline.vim'colorscheme没有改变?
- mongoose - Does collection still exists in mongoose?
- java - 指定主题时使用直接交换的 RabbitMQ
- sql-server - 如果我需要按 sql 中的计算列进行分组或过滤,我必须指定两次计算。这是否意味着 SQL Server 工作量翻倍?
- sql - 如何在 SQL 列中选择某个单词及其结果?
- coq - Coq 归纳定义中的非严格正数问题
- java - 如何实现一个java测试器类
- python - 尝试创建电子邮件脚本,但并非所有收件人都收到电子邮件