pandas is a great library that, among other things, allows you to easily interact with tabular data. We’ll leverage this to compute the weighted average price for a number of stocks.

Weighted Average Price

If we had sold the following list of items,

Quantity Price
100 5
200 4
700 2

the weighted average price would be defined as (100*5 + 200*4 + 700*2)/1000. In other words - given 1000 items, this is the unit price which would have been paid by a customer on average.

In this example, that’s 2.7.

Sourcing the data

Imagine we were given the table below (forget about the nomenclature for the moment):

RIC,Side,Qty,Prc
EBAY.QQ,B,100,67.4
EBAY.QQ,B,250,68.2
EBAY.QQ,B,200,66.3
VOD.LN,B,1000,100.4
VOD.LN,B,100,100.3

And our requirement is to compute the weighted average price for each (RIC,Side) tuple. That is, we want to answer the following question: “On average, how much did we spend to buy stock X?”.

Let’s load the data into a DataFrame first:

s = 'RIC,Side,Qty,Prc\n' \
    'EBAY.QQ,B,100,67.4\n' \
    'EBAY.QQ,B,250,68.2\n' \
    'EBAY.QQ,B,200,66.3\n' \
    'VOD.LN,B,1000,100.4\n' \
    'VOD.LN,B,100,100.3'

#for python2, change this to `from StringIO import StringIO`
from io import StringIO

import pandas
df = pandas.read_csv(StringIO(s))

Yielding:

   RIC      Side  Qty  Prc        
0  EBAY.QQ    B   100   67.4
1  EBAY.QQ    B   250   68.2
2  EBAY.QQ    B   200   66.3
3   VOD.LN    B  1000  100.4
4   VOD.LN    B   100  100.3

GroupBy

We can get a weighted price by multiplying Qty with Prc. However for this to be a weighted average price, we need to divide by the total quantity for each category. In essence, we want to use the numbers generated by:

>>> gb = df.groupby(['RIC','Side'])
>>> gb['Qty'].sum()
RIC      Side
EBAY.QQ  B        550
VOD.LN   B       1100
Name: Qty, dtype: int64

But we can’t use this as such - the data from the groupby and the original dataframe are of two different dimensions (or shapes). This is where pandas’ transform method comes in handy. Let’s take a look at the output:

>>> gb['Qty'].transform(sum)
0     550
1     550
2     550
3    1100
4    1100
dtype: int64

Now this is something we can work with. We get the aggregated data at the row level of the original dataframe. For clarity, let’s add this in:

>>> df['TotalQty'] = gb['Qty'].transform(sum)
>>> df
       RIC Side   Qty    Prc  TotalQty
0  EBAY.QQ    B   100   67.4       550
1  EBAY.QQ    B   250   68.2       550
2  EBAY.QQ    B   200   66.3       550
3   VOD.LN    B  1000  100.4      1100
4   VOD.LN    B   100  100.3      1100

We can then compute the weighted average price by Qty*Prc/TotalQty:

>>> df['WeightedAvgPrc'] = df['Qty'] * df['Prc'] / df['TotalQty']
>>> df
       RIC Side   Qty    Prc  TotalQty  WeightedAvgPrc
0  EBAY.QQ    B   100   67.4       550       12.254545
1  EBAY.QQ    B   250   68.2       550       31.000000
2  EBAY.QQ    B   200   66.3       550       24.109091
3   VOD.LN    B  1000  100.4      1100       91.272727
4   VOD.LN    B   100  100.3      1100        9.118182

And finish up by aggregating on RIC,Side:

>>> df.groupby(['RIC','Side']).aggregate(sum)
               Qty    Prc  TotalQty  WeightedAvgPrc
RIC     Side
EBAY.QQ B      550  201.9      1650       67.363636
VOD.LN  B     1100  200.7      2200      100.390909

But wait - summing Prc and TotalQty doesn’t make sense. Let’s only grab the columns we need:

>>> df.groupby(['RIC','Side']).aggregate(sum)[['Qty','WeightedAvgPrc']]
               Qty  WeightedAvgPrc
RIC     Side
EBAY.QQ B      550       67.363636
VOD.LN  B     1100      100.390909

Small caveat - due to the groupby, the index is now (rightfully) a tuple of RIC and Side. If that’s unwanted, you can use reset_index.