教你用Python创建瀑布图(1)(2)
完整脚本
基本图形能够正常工作,但是我想添加一些标签,并做一些小的格式修改。下面是我最终的脚本:
- import numpy as np
- import pandas as pd
- import matplotlib.pyplot as plt
- from matplotlib.ticker import FuncFormatter
- #Use python 2.7+ syntax to format currency
- def money(x, pos):
- 'The two args are the value and tick position'
- return "${:,.0f}".format(x)
- formatter = FuncFormatter(money)
- #Data to plot. Do not include a total, it will be calculated
- index = ['sales','returns','credit fees','rebates','late charges','shipping']
- data = {'amount': [350000,-30000,-7500,-25000,95000,-7000]}
- #Store data and create a blank series to use for the waterfall
- trans = pd.DataFrame(data=data,index=index)
- blank = trans.amount.cumsum().shift(1).fillna(0)
- #Get the net total number for the final element in the waterfall
- total = trans.sum().amount
- trans.loc["net"]= total
- blank.loc["net"] = total
- #The steps graphically show the levels as well as used for label placement
- step = blank.reset_index(drop=True).repeat(3).shift(-1)
- step[1::3] = np.nan
- #When plotting the last element, we want to show the full bar,
- #Set the blank to 0
- blank.loc["net"] = 0
- #Plot and label
- my_plot = trans.plot(kind='bar', stacked=True, bottom=blank,legend=None, figsize=(10, 5), title="2014 Sales Waterfall")
- my_plot.plot(step.index, step.values,'k')
- my_plot.set_xlabel("Transaction Types")
- #Format the axis for dollars
- my_plot.yaxis.set_major_formatter(formatter)
- #Get the y-axis position for the labels
- y_height = trans.amount.cumsum().shift(1).fillna(0)
- #Get an offset so labels don't sit right on top of the bar
- max = trans.max()
- neg_offset = max / 25
- pos_offset = max / 50
- plot_offset = int(max / 15)
- #Start label loop
- loop = 0
- for index, row in trans.iterrows():
- # For the last item in the list, we don't want to double count
- if row['amount'] == total:
- y = y_height[loop]
- else:
- y = y_height[loop] + row['amount']
- # Determine if we want a neg or pos offset
- if row['amount'] > 0:
- y += pos_offset
- else:
- y -= neg_offset
- my_plot.annotate("{:,.0f}".format(row['amount']),(loop,y),ha="center")
- loop+=1
- #Scale up the y axis so there is room for the labels
- my_plot.set_ylim(0,blank.max()+int(plot_offset))
- #Rotate the labels
- my_plot.set_xticklabels(trans.index,rotation=0)
- my_plot.get_figure().savefig("waterfall.png",dpi=200,bbox_inches='tight')
运行该脚本将生成下面这个漂亮的图表:
最后的想法
如果你之前不熟悉瀑布图,希望这个示例能够向你展示它到底是多么有用。我想,可能一些人会觉得对于一个图表来说需要这么多的脚本代码有点糟糕。在某些方面,我同意这种想法。如果你仅仅只是做一个瀑布图,而以后不会再碰它,那么你还是继续用Excel中的方法吧。
然而,如果瀑布图真的很有用,并且你需要将它复制给100个客户,将会怎么样呢?接下来你将要怎么做呢?此时使用 Excel将会是一个挑战,而使用本文中的脚本来创建100个不同的表格将相当容易。再次说明,这一程序的真正价值在于,当你需要扩展这个解决方案时,它 能够便于你创建一个易于复制的程序。
我真的很喜欢学习更多Pandas、matplotlib和IPothon的知识。我很高兴这种方法能够帮到你,并希望其他人也可以从中学习到一些知识,并将这一课所学应用到他们的日常工作中。
评论关闭