python - Pyspark - how to generate random numbers within a certain range of a column value?
问题描述
Initially I wanted to generate random integers between two numbers (10 and 80):
from random import randint
df.fillna(randint(10, 80), 'score').show()
What will be a correct way to generate random decimals within a certain range of a current column's value? For example, random decimals within +/- 15% of a 'score'
column with a value 25.0?
I've looked into the documentation but there are only examples showing how to generate random numbers with seed. Not sure that it is suitable in this case.
解决方案
I'm not sure if I'm reading this right, but you're looking to find a range of random floats between 21.25 and 28.75? If so:
score = 25.0
left_most_column = score - (score*0.15) #21.25
right_most_column = score + (score*0.15) #28.75
answer = random.uniform(left_most_column, right_most_column)
Uniform
is the key function here.