首页 > 解决方案 > 在 Python 中创建方法

问题描述

我有一个文件。我把它分成一个班级。另外,我想返回制作电影数量最多的前 n 年。我将使用线条属性来获取数据。

import re

import collections

 

class movie_analyzer:

    def __init__(self,s):

            self.lines=open(s, encoding="latin-1").read().split('\n')

            self.lines=[x.split('::') for x in self.lines]
       

    def freq_by_year(self):

        movies_years = [x[3] for x in self.lines]

        c = collections.Counter(movies_years)      

        for movies_years, freq in c.most_common(3):

            print(movies_years, ':', freq)



movie=movie_analyzer("modified.dat")

movie.freq_by_year()

它给出了这个错误:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-627-51913258f9e4> in <module>
----> 1 movie.freq_by_year()
 
<ipython-input-624-8dc663c0b252> in freq_by_year(self)
      9     def freq_by_year(self):
     10 
---> 11         movies_years = [x[3] for x in self.lines]
     12 
     13         c = collections.Counter(movies_years)
 
<ipython-input-624-8dc663c0b252> in <listcomp>(.0)
      9     def freq_by_year(self):
     10 
---> 11         movies_years = [x[3] for x in self.lines]
     12 
     13         c = collections.Counter(movies_years)
 
IndexError: list index out of range    

此外,movie.lines 看起来像这样:

[['1', 'Toy Story', "Animation|Children's|Comedy", '1995'],
 ['2', 'Jumanji', "Adventure|Children's|Fantasy", '1995'],
 ['3', 'Grumpier Old Men', 'Comedy|Romance', '1995'],
 ['4', 'Waiting to Exhale', 'Comedy|Drama', '1995'],
 ['5', 'Father of the Bride Part II', 'Comedy', '1995'],
 ['6', 'Heat', 'Action|Crime|Thriller', '1995'],
 ['7', 'Sabrina', 'Comedy|Romance', '1995'],
 ['8', 'Tom and Huck', "Adventure|Children's", '1995'],
 ['9', 'Sudden Death', 'Action', '1995'],
 ['10', 'GoldenEye', 'Action|Adventure|Thriller', '1995']]

.dat 文件如下所示:

电影 = ["1::玩具总动员::动画|儿童|喜剧::1995\n",

"2::Jumanji::冒险|儿童|奇幻::1995\n",

'3::脾气暴躁的老人::喜剧|浪漫::1995\n',

'4::等待呼气::喜剧|戏剧::1995\n',

'5::新娘之父第二部分::喜剧::1995\n']

标签: pythonoopmethods

解决方案


__init__给定您的代码库和文件,我在函数中发现了两个潜在问题.dat

def __init__(self, s):

  self.lines = open(s, encoding="latin-1").read().split('\n')

  self.lines = [x.split('::') for x in self.lines]
  self.lines = [l for l in self.lines if len(l) == 4] # <--(1)
  for line in self.lines: # <--(2)
      line[3] = re.sub('\D', '', line[3])

(1)有一个额外的行被解析,它只包含空字符:""。因此,出于安全考虑,您可以删除任何不完全符合您期望的四个元素的行

(2)有些years是因为附加了非数字字符而被错误解析,例如""or /n。您可以使用过滤每个非数字字符的正则表达式来管理年份列


推荐阅读