pandas is a great library that, among other things, allows you to easily interact with tabular data. We’ll leverage this to compute the weighted average price for a number of stocks.
Weighted Average Price
If we had sold the following list of items,
Quantity | Price |
---|---|
100 | 5 |
200 | 4 |
700 | 2 |
the weighted average price would be defined as (100*5 + 200*4 + 700*2)/1000
. In other words - given 1000 items, this is the unit price which would have been paid by a customer on average.
In this example, that’s 2.7
.
Sourcing the data
Imagine we were given the table below (forget about the nomenclature for the moment):
RIC,Side,Qty,Prc
EBAY.QQ,B,100,67.4
EBAY.QQ,B,250,68.2
EBAY.QQ,B,200,66.3
VOD.LN,B,1000,100.4
VOD.LN,B,100,100.3
And our requirement is to compute the weighted average price for each (RIC,Side)
tuple. That is, we want to answer the following question: “On average, how much did we spend to buy stock X?”.
Let’s load the data into a DataFrame first:
s = 'RIC,Side,Qty,Prc\n' \
'EBAY.QQ,B,100,67.4\n' \
'EBAY.QQ,B,250,68.2\n' \
'EBAY.QQ,B,200,66.3\n' \
'VOD.LN,B,1000,100.4\n' \
'VOD.LN,B,100,100.3'
#for python2, change this to `from StringIO import StringIO`
from io import StringIO
import pandas
df = pandas.read_csv(StringIO(s))
Yielding:
RIC Side Qty Prc
0 EBAY.QQ B 100 67.4
1 EBAY.QQ B 250 68.2
2 EBAY.QQ B 200 66.3
3 VOD.LN B 1000 100.4
4 VOD.LN B 100 100.3
GroupBy
We can get a weighted price by multiplying Qty with Prc. However for this to be a weighted average price, we need to divide by the total quantity for each category. In essence, we want to use the numbers generated by:
>>> gb = df.groupby(['RIC','Side'])
>>> gb['Qty'].sum()
RIC Side
EBAY.QQ B 550
VOD.LN B 1100
Name: Qty, dtype: int64
But we can’t use this as such - the data from the groupby
and the original dataframe are of two different dimensions (or shapes). This is where pandas’ transform
method comes in handy. Let’s take a look at the output:
>>> gb['Qty'].transform(sum)
0 550
1 550
2 550
3 1100
4 1100
dtype: int64
Now this is something we can work with. We get the aggregated data at the row level of the original dataframe. For clarity, let’s add this in:
>>> df['TotalQty'] = gb['Qty'].transform(sum)
>>> df
RIC Side Qty Prc TotalQty
0 EBAY.QQ B 100 67.4 550
1 EBAY.QQ B 250 68.2 550
2 EBAY.QQ B 200 66.3 550
3 VOD.LN B 1000 100.4 1100
4 VOD.LN B 100 100.3 1100
We can then compute the weighted average price by Qty*Prc/TotalQty
:
>>> df['WeightedAvgPrc'] = df['Qty'] * df['Prc'] / df['TotalQty']
>>> df
RIC Side Qty Prc TotalQty WeightedAvgPrc
0 EBAY.QQ B 100 67.4 550 12.254545
1 EBAY.QQ B 250 68.2 550 31.000000
2 EBAY.QQ B 200 66.3 550 24.109091
3 VOD.LN B 1000 100.4 1100 91.272727
4 VOD.LN B 100 100.3 1100 9.118182
And finish up by aggregating on RIC,Side
:
>>> df.groupby(['RIC','Side']).aggregate(sum)
Qty Prc TotalQty WeightedAvgPrc
RIC Side
EBAY.QQ B 550 201.9 1650 67.363636
VOD.LN B 1100 200.7 2200 100.390909
But wait - summing Prc
and TotalQty
doesn’t make sense. Let’s only grab the columns we need:
>>> df.groupby(['RIC','Side']).aggregate(sum)[['Qty','WeightedAvgPrc']]
Qty WeightedAvgPrc
RIC Side
EBAY.QQ B 550 67.363636
VOD.LN B 1100 100.390909
Small caveat - due to the groupby
, the index is now (rightfully) a tuple of RIC
and Side
. If that’s unwanted, you can use reset_index
.