首页 > 解决方案 > How to load large CSV into Python, select specific columns and save as new CSV?

问题描述

I have a CSV file that is about 8 million rows and about 3gb in size. I have a list of specific columns I want to save into a new CSV. I have been trying to use Panda with Python but I just can not get it right.

This is the code I have been using:

import pandas as pd
df = pd.read_csv('MyFile.csv' , usecols = ['AAA','BBB','CCC',])

After the last command the terminal line returns 3 dots like this "...". Then I try to enter this command

df.to_csv('NewFile.csv', index=False)

But I receive the following error:

file "<stdin>", line 2
  df.to_csv('NewFile.csv', index=False)
   ^
SyntaxError: invalid syntax

Any help would be so greatly appreciated it. Thank you.

EDIT: This is what the entire terminal screen text is.

Python 3.7.6 (default, Jan  8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> df=pd.read_csv('MyFile.csv' , usecols = ['AAA','BBB','CCC',]
... pd.df.to_csv('NewFile.csv', index=False)?
  File "<stdin>", line 2
    pd.df.to_csv('NewFile.csv', index=False)?
     ^
SyntaxError: invalid syntax
>>>

标签: pythonpandascsv

解决方案


您收到语法错误,因为您没有关闭终端下一行中的括号

>>> df=pd.read_csv('MyFile.csv' , usecols = ['AAA','BBB','CCC',]

推荐阅读