首页 > 解决方案 > 使用合成数据创建一个简单的 csv - Python

问题描述

我正在学习 python 和机器学习,并试图从合成数据中创建一个非常简单的 csv。谁能帮我调整一下让它在 PyCharm 中工作?我正在尝试从每列的选择中输入一个随机值。非常感激


import random
import pandas as pd


marriage_status = {'single', 'married', 'divorced', 'widowed', 'complicated'}
children = {'yes', 'no'}
employment = {'employed', 'self_employed', 'unemployed', 'student'}
income_abroad = {'yes', 'no'}
gender = {'M', 'F'}
response = {'refund', 'payment'}

columns = ['marriage_status', 'children', 'employment',
           'income_abroad', 'age', 'gender', 'income', 'expenses', 'response']

df = pd.DataFrame(columns=columns)

for i in range(1000):
    marriage_status = random.choice(list(marriage_status))
    children = random.choice(list(children))
    employment = random.choice(list(employment))
    income_abroad = random.choice(list(income_abroad))
    gender = random.choice(list(gender))
    response = random.choice(list(response))
    age = random.randint(18, 70)
    income = random.randint(0, 100000)
    expenses = random.randint(0, 10000)
    df = [marriage_status, children, employment, income_abroad, age, gender, income, expenses, response]

df[6].to_csv('taxfix_data.csv')
index = False

标签: pythoncsv

解决方案


如果您要使用熊猫,最简单的方法就是这样做

import pandas as pd 
df = pd.DataFrame(
{"marriage_status" : ['single' ,'married', 'divorced', 'widowed', 'complicated],
"children" : ['yes', 'no'],
"employment" : ['employed', 'self_employed', 'unemployed', 'student'],
"gender" : ['M', 'F'],
"response" : ['refund', 'payment'],
"income_abroad" : ['yes', 'no']}
 index = [1, 2, 3])



还有一个非常有用的熊猫备忘单 https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf


推荐阅读