Python利用jieba库实现中文词频统计:以三国演义为例

词频统计

#CalThreeKingdoms.py
import jieba
txt=open("threekingdoms.txt",'r',encoding="utf-8").read()
words=jieba.lcut(txt)   #jieba库函数
count={}    #创建字典
for word in words:
    if len(word)==1:
        continue
    else:
        count[word]=count.get(word,0)+1
items=list(count.items())   #转换成列表
items.sort(key=lambda x:x[-1],reverse=True)
for i in range(15):
    word,count=items[i]
    print("{0:<10}{1:>5}".format(word,count))

RESTART: C:/Users/QinJX/AppData/Local/Programs/Python/Python37-32/python编程学习/CalThreeKingdoms.py
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\QinJX\AppData\Local\Temp\jieba.cache
Loading model cost 1.109 seconds.
Prefix dict has been built succesfully.
曹操          953
孔明          836
将军          772
却说          656

相关推荐
©️2020 CSDN 皮肤主题: 深蓝海洋 设计师:CSDN官方博客 返回首页
实付 29.90元
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值