首页 > 解决方案 > 以前工作的 python 脚本现在中途停止

问题描述

我一直在编写一个 python 脚本来进行 Probit 分析,以找到我们实验室中运行的检测的检测下限 (LLoD)。上周我有一个运行良好的脚本,但是很混乱,并且缺少任何类型的输入检查来确保用户的输入是有效的。

在运行脚本时,系统会提示用户几个问题(包含要分析的数据的 .csv 文件中相关数据的列标题,以及是否按原样或 log_10 形式使用所述列中的数据)。然后,该脚本将 - 执行必要的数据清理和计算 - 打印出相关数据表以及线性回归的方程和 R^2 值 - 显示数据图表以及线性回归和计算的 LLoD -打印出“95% CI 的检测下限是 [whatever]”。

现在,在运行脚本时,程序在打印数据表并显示图形后停止(不打印回归方程和 R^2 值,之后也不打印任何内容)。此外,python 不会返回提示输入标准>>>,我必须退出并重新打开 Python。有谁知道发生了什么?完整代码在帖子底部,示例数据可以在这里找到。注意:这是我一直在使用的确切数据,上周有效。

(PS 任何清理代码的其他提示也将不胜感激,只要函数不变。我来自 C 背景,适应 Python 仍在进行中......)

代码:(FWIW我正在运行3.8)

import os
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy
from scipy.stats import norm
from numpy.polynomial import Polynomial
from tkinter import filedialog
from tkinter import *

# Initialize tkinter
root = Tk()
root.withdraw()

# Prompt user for data file and column headers, ask whether to use log(qty)
print("In the directory prompt, select the .csv file containing data for analysis")
path = filedialog.askopenfilename()
#data = pd.read_csv(path, usecols=[conc, detect])
data = pd.read_csv(path)


while True:
    conc = input("Enter the column header for concentration/number of copies: ")
    if conc in data.columns:
        break
    else:
        print('Invalid input. Column \''+str(conc)+'\' does not exist. Try again')
        continue

while True:
    detect = input("Enter the column header for target detection: ")
    if detect in data.columns:
        break
    else:
        print('Invalid input. Column \''+str(detect)+'\' does not exist. Try again')
        continue

while True:
    logans = input("Analyze using log10(concentration/number of copies)? (y/n): ")
    if logans == 'y':
        break
    elif logans == 'n':
        break
    else:
        print('Invalid input. Please enter either y or n.')
        continue

# Read the columns of data specified by the user and rename them for consistency
data = data.rename(columns={conc:"qty", detect:"result"})

# Create list of unique values for RNA quantity, initialize vectors of same length
# to store probabilies and probit scores for each
qtys = data['qty'].unique()
log_qtys = [0] * len(qtys)
prop = [0] * len(qtys)
probit = [0] * len(qtys)

# Function to get the hitrate/probability of detection for a given quantity
# Note: any values in df.result that cannot be parsed as a number will be converted to NaN
def hitrate(qty, df):
    t_s = df[df.qty == qty].result
    t_s = t_s.apply(pd.to_numeric, args=('coerce',)).isna()
    return (len(t_s)-t_s.sum())/len(t_s)

# Iterate over quantities to calculate log10(quantity), the corresponding probability
# of detection, and its associated probit score
for idx, val in enumerate(qtys):
    log_qtys[idx] = math.log10(val)
    prop[idx] = hitrate(val, data)
    probit[idx] = 5 + norm.ppf(prop[idx])

# Create a dataframe (with headers) composed of the quantaties and their associated
# probabilities and probit scores, then drop rows with probability of 0 or 1
hitTable = pd.DataFrame(np.vstack([qtys,log_qtys,prop,probit]).T, columns=['qty','log_qty','probability','probit'])
hitTable.probit.replace([np.inf,-np.inf],np.nan, inplace=True)
hitTable.dropna(inplace=True)

def regPlot(x, y, log):
    # Update parameters, set y95 to probit score corresponding to 95% CI
    params = {'mathtext.default': 'regular'}
    plt.rcParams.update(params)
    y95 = 6.6448536269514722

    # Define lambda function for a line, run regression, and find the coefficient of determination
    regFun = lambda m, x, b : (m*x) + b
    regression = scipy.stats.linregress(x,y)
    r_2 = regression.rvalue*regression.rvalue

    # Solve y=mx+b for x at 95% CI
    log_llod = (y95 - regression.intercept) / regression.slope
    xmax = log_llod * 1.2

    # Start plotting all the things!
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.set_ylabel('Probit score\n$(\sigma + 5)$')

    if log == 'y':
        ax.set_xlabel('$log_{10}$(input quantity)')

    elif log == 'n':
        ax.set_xlabel('input quantity')

    else:
        raise ValueError('Error when calling regPlot(x,y,log) - User input invalid.')

    x_r = [0, xmax]
    y_r = [regression.intercept, regFun(regression.slope,x_r[1],regression.intercept)]
    ax.plot(x_r, y_r, '--k') # linear regression
    ax.plot(log_llod, y95, color='red', marker='o', markersize=8) # LLOD point
    ax.plot([0,xmax], [y95,y95], color='red', linestyle=':') # horiz. red line
    ax.plot([log_llod,log_llod], [regFun(regression.slope,x_r[0],regression.intercept),7.1], color='red', linestyle=':') # vert. red line
    ax.plot(x, y, 'bx') # actual (qty, probit) data points
    ax.grid() # grid
    plt.show()
    print('\n Linear regression using least-squares method yields:\n')
    print('\t\ty = '+str("%.3f"%regression.slope)+'x + '+str("%.3f"%regression.intercept)+'\n')
    print('\twith a corresponding R-squared value of', str("%.5f"%r_2)+"\n")

    return regression.slope, regression.intercept, r_2, regression.stderr, regression.intercept_stderr, log_llod

print('\n', hitTable, '\n')

if logans == 'y':
    m, b, r_2, stderr, int_stderr, log_llod = regPlot(hitTable.log_qty, hitTable.probit, logans)
    llod_95 = 10**log_llod
    if r_2 < 0.9:
        print('WARNING: low r-squared value for linear regression. Try re-analyzing without using log10.')

elif logans == 'n':
    m, b, r_2, stderr, int_stderr, log_llod = regPlot(hitTable.qty, hitTable.probit, logans)
    llod_95 = log_llod
    if r_2 < 0.9:
        print('WARNING: low r-squared value for linear regression. Try re-analyzing using log10.')

else:
    raise ValueError('Error when attempting to evaluate llod_95 - User input invalid.')

print("\nThe lower limit of detection (LLoD) at 95% CI is " + str("%.4f"%llod_95) + ".\n")

标签: pythonpython-3.xpandasnumpy

解决方案


这似乎是因为plt.show()调用时 i/o 阻塞。它在一个窗口中显示图形,并在继续执行代码之前等待您将其关闭。

这是默认情况下,matplotlib但您可以使其成为非阻塞: https ://stackoverflow.com/a/33050617/15332448


推荐阅读