首页 > 解决方案 > Pyspark - how to generate random numbers within a certain range of a column value?

问题描述

Initially I wanted to generate random integers between two numbers (10 and 80):

from random import randint
df.fillna(randint(10, 80), 'score').show()

What will be a correct way to generate random decimals within a certain range of a current column's value? For example, random decimals within +/- 15% of a 'score' column with a value 25.0?

I've looked into the documentation but there are only examples showing how to generate random numbers with seed. Not sure that it is suitable in this case.

标签: pythonpython-3.xdataframerandompyspark

解决方案


I'm not sure if I'm reading this right, but you're looking to find a range of random floats between 21.25 and 28.75? If so:

score = 25.0
left_most_column =  score - (score*0.15) #21.25
right_most_column =  score + (score*0.15) #28.75
answer = random.uniform(left_most_column, right_most_column)

Uniform is the key function here.


推荐阅读