Python有大数据处理性能优化好方法吗，python性能优化,现在有一个list包含有

文章由Byrx.net分享于2019-03-23 06:03:47评论（415）

Python有大数据处理性能优化好方法吗，python性能优化,现在有一个list包含有

现在有一个list包含有1500个topic，另外一个文件包含一亿个微博数据，现在我想统计，1500个topic中每个topic分别有多少条微博包含它们，我写的代码如下，但是运行起来需要非常久的时间，有什么办法可以优化吗？

    f = file("largefile")    for line in f:        try:            tweet_time = line.split(',',3)[2].split()[0]  # 微博发布时间            tweet = line.split(',',3)[-1]  # 微博内容            for topic in topics:                topic_items = topic.split()  # 每个topic可能有多个词组成                isContain = True                for item in topic_items:                    if item not in tweet:                        isContain = False                        break                if isContain:                    pass   # 该微博包含该topic        except:            continue    f.close()

参见：

编橙之家文章，

热门文章：

Python有大数据处理性能优化好方法吗，python性能优化,现在有一个list包含有

Python有大数据处理性能优化好方法吗，python性能优化,现在有一个list包含有

相关内容

最新python问答

python~HOT