首页 > 技术文章 > python函数练习

qq1141100952com 2018-03-27 17:32 原文

 1:下载一首英文的歌词或文章

love story-taylor swift
we were both young when i first saw you
i close my eyes and the flashback starts
i'm standing there on a balcony in summer air
see the lights, see the party, the ball gowns
see you make your way through the crowd
and say hello, little did i know
that you were romeo, you were throwing pebbles
and my daddy said stay away from juliet
and i was crying on the staircase, begging you please don't go
and i said
romeo take me somewhere we can be alone
i'll be waiting, all there's left to do is run
you'll be the prince and i'll be this princess
it's a love story
baby, just say yes
so i sneak out to the garden to see you
we keep quiet 'cause we're dead if they knew
so close your eyes, escape this town for a little while
oh, oh, oh
'cause you were romeo, i was a scarlet letter
and my daddy said stay away from juliet
but you were everything to me, i was begging you please don't go
and i said
romeo take me somewhere we can be alone
i'll be waiting, all there's left to do is run
you'll be the prince and i'll be the princess
it's a love story
baby, just say yes
romeo save me try to tell me how it feels
this might be stupid boy, but its so real
don't be afraid now we'll get out of this mess
it's a love story
baby, just say yes
i got tired of waiting wondering if you were ever coming around
my faith in you is better
when i met you on the outskirts of town
and i said
romeo save me ive been feeling so alone
ill keep waiting for you but you never come
is this in my head, i don't know what to think
he fell to the ground and pulled out a ring
and said
marry me juliet you'll never have to be alone
i love you and that's all i really know
i talked to your dad you'll pick out a white dress
it's a love story
baby, just say yes
oh, oh, oh
we were both young when i first saw you

2:将所有,.?!’:等分隔符全部替换为空格

  sep=''';,.?!'''for i in sep:

    str=str.replace(i,' ')

3.将所有大写转换为小写
str=str.lower()

4:生成单词列表
   str_list=str.split()

5:
str_list=str.split()
print(str_list)

str_dict={}
for i in str_list:
str_dict[i]=str_dict.get(i,0)+1
#去掉不要的单词
for w in str:
del (str_dict)
print(w,str_dict[w])
6:排序
strList=list(str_dict.items())
strList.sort(key=lambda x:x[1] ,reverse=True)
7:排除语法型词汇,代词、冠词、连词
exclude={'the','top','is','while','when','why'}
for i in exclude:
del(str_dict)

8:输出词频最大TOP20
for i in range(20):
print(strList[i])

9:将分析对象存为utf-8编码的文件,通过文件读取的方式获得词频分析内容。
file=open('shuihuzhuan.txt','r',encoding='utf-8')
myarticle=file.read()

二、中文词频统计,下载一长篇中文文章。

代码如下:

import jieba
file=open("hh.txt","r",encoding='utf-8')
mynotes=file.read()
file.close();

sep = ''':。,?!;∶ ...“”'''
for i in sep:
    mynotes = mynotes.replace(i, ' ');

mynotes_list = list(jieba.cut(mynotes));

exclude =[' ','\n','你','我','他','和','但','了','的','来','是','去','在','上','高']

mynotes_dict={}
for w in mynotes_list:
    mynotes_dict[w] = mynotes_dict.get(w,0)+1

//取出指定内容 for w in exclude: del (mynotes_dict[w]); for w in mynotes_dict: print(w,mynotes_dict[w]) //排序 dictList = list(mynotes_dict.items()) dictList.sort(key=lambda x:x[1],reverse=True); print(dictList)
//输出20的文本内容 for i in range(20): print(dictList[i])
//把频率多于20的输出到文本 outfile = open("mytop20.txt","a") for i in range(20): outfile.write(dictList[i][0]+" "+str(dictList[i][1])+"\n") outfile.close();

  

 

 

推荐阅读