Last year I discovered how easy it can be to trade with the Revolut app. I was playing for a couple of months, in the beginning, I did few transactions but then I was curious if I can make a profit from a higher number of transactions, a kind of day trading. Short story, I realized that day trading is not for me. Now, in 2021, I had to calculate my realized capital gains from 2020 trading to report for tax.

If you have few trading transactions then is really easy to compute the realized capital gains, you just need to subtract from the selling price the buying price, for example:

you buy 10 MSFT shares for 1800$ (180$ for each share)

you are selling all of them for 2200 (220$ each)

to compute the realized capital gain: 2200$ - 1800$ = 400$

But if you bought n1 shares then you sold k1 (k1 < n1) then you bought another n2, and then you sold k2 (k2 < n2 + n1 - k1) shares and so on it becomes a bit hard to take track of all of them and to compute the realized capital gains. Here is an example:

you buy 2 MSFT shares for 360$(180$ each)

you buy 5 MSFT shares for 1000$( 200$ each), at this point, you have 7 MSFT (~194.286$ each, you have to use the weighted average)

you are selling 4 for 796$ (199$ each), you still have 3 more shares (~194.286$ each), you have a capital gain of (199$ - 194.286$) * 4 = 18.856 $

you are buying 10 shares for 2200$ (220$ for each), you have 13 shares (~214.07$ each)

you are selling 5 for 1150$(230$ each), you have 8 shares (~214.07$ each), and a total realized capital gain of (230$ - 214.07$) * 5 + 18.856$ ~= 98.51$

After this example imagine that you do not have only one stock but 20-30 and for each stock, you did form 2 to 20 transactions. It will take a lot of time if you want to compute the realized capital gains by hand or even in Excel, only to get the data by hand and to double-check yourself that you did it correctly it may take hours if not an entire day. This was my scenario, so I decided to use a Jupyter notebook and Pandas in order to automate Revolut trading tax calculation.

This picture perfectly expresses my feelings after I calculated my Revolut trading tax with Pandas in less than 5 minutes (obviously, after I did spend a few hours writing and testing my code). If you want to become a Kung Fu Panda tax warrior you must follow this post until the end.

**From where do you take your data (the transaction history)?**

**In order to get your **transaction** history, you have to download your trading account statements. Link **__here__** if you do not know where to look.**

**Once you downloaded all your **statements** you have to convert them from pdf to **CSV**/excel, for simplicity you can do it online if you do not care **too** much about your data or you can do it using **** tabula-py** :

`!pip install tabula-py`

Here is an example of how to convert one statement pdf to CSV:

```
import tabula
df = r"C:\Users\dell\Downloads\August.pdf"
output = r"C:\Users\dell\Downloads\converted_august.csv"
tabula.convert_into(df, output, output_format="csv", stream=True, pages='all')
```

**Getting the right **table

The account statements have more than one type of table, the ones that we want are those that have the "Activity Type" column, the values in these columns are "BUY" and "SELL" (for most of the rows).

To get the tables in padas we will use the "pandas.read_clipboard" method, first you have to select the entire area of the right table and copy:

```
import pandas as pd
# assuming that you started to trade in July
df_july = pd.read_clipboard()
df_july
```

The used data looks like real data but is fake data.

## Data cleaning

### **Keep only the activity type BUY or SELL**

```
activities = {'BUY', 'SELL'}
df_july = df_july[df_july['Activity Type'].isin(activities)].reset_index(drop=True)
df_july
```

**Use only symbols**

```
symbols = {'AMD', 'NCLH', 'OPK', "SM"}
df_july['Symbol / Description'] = [next(s for s in symbols if s in e) for e in df_july['Symbol / Description']]
df_july
```

### **Change column Amount data type to float**

`df_july.dtypes`

```
Trade Date object
Settle Date object
Currency object
Activity Type object
Symbol / Description object
Quantity float64
Price float64
Amount object
dtype: object
```

```
import re
df_july.Amount = [float(re.sub('[,()]', '', n)) for n in df_july.Amount]
df_july
```

**Remove the Settle Date column**

```
df_july.drop('Settle Date', axis=1, inplace=True) # where 1 is the axis number (0 for rows and 1 for columns)
df_july
```

**Change the Trade Date column data type to datetime**

```
df_july['Trade Date']= pd.to_datetime(df_july['Trade Date'])
df_july
```

### **Rename column Symbol / Description into Symbol**

```
df_july.rename(columns={'Symbol / Description': 'Symbol'}, inplace=True)
df_july
```

## Monthly data stats

For each row from each table, we want to compute the average price, keep track of the current "Total amount" at each step, and store in a list the profit or the loss together with the trading day after each sale. The profit or the loss is in $ so to compute the trading tax we have to convert in the local currency, the exchange rate has to be the average exchange rate from the day you had the profit or the loss. This may depend on your country but this rule is applying in my country Romania.

```
from collections import defaultdict
monthly_data = defaultdict(lambda: defaultdict(lambda: defaultdict(int)))
```

Do not get "scare" about this variable, this is how data stored by this variable will look:

```
defaultdict(<function __main__.<lambda>()>,
'August': defaultdict(<function __main__.<lambda>.<locals>.<lambda>()>,
{'AMD': defaultdict(int,
{'Avg Price': 68.87360322580645,
'Total Quantity': 31.0}),
'NCLH': defaultdict(int,
{'Avg Price': 13.92942857142857,
'Total Quantity': 70.0}),
'OPK': defaultdict(int,
{'Avg Price': 4.9216037735849065,
'Total Quantity': 530.0}),
'SM': defaultdict(int,
{'Avg Price': 2.89,
'Total Quantity': 15.0}),
'FSLY': defaultdict(int,
{'Avg Price': 0,
'Total Quantity': 0.0,
'Profit': [(Timestamp('2020-08-07 00:00:00'),
360.22)]}),
'NKLA': defaultdict(int,
{'Avg Price': 0,
'Total Quantity': 0.0,
'Profit': [(Timestamp('2020-08-07 00:00:00'),
-4.23)]})})
```

To compute the weighted average at each step (when we make a buy action) we will need a function:

```
def compute_avg_price_total_amount_quantity(symbol, quantity, symbol_data, current_amount):
total_amount = symbol_data['Avg Price'] * symbol_data['Total Quantity'] + current_amount
total_quantity = symbol_data['Total Quantity'] + quantity
symbol_data['Avg Price'] = total_amount / total_quantity
symbol_data['Total Quantity'] = total_quantity
```

To compute the profit/loss at each sell action we will need another function:

```
def compute_profit_update(symbol, quantity, symbol_data, data_time, current_amount):
assert abs(quantity) <= symbol_data['Total Quantity'] , f"{symbol}, {data_time}, quantity = {abs(quantity)}, Total Quantity = {symbol_data['Total Quantity']}"
profit = round(current_amount + symbol_data['Avg Price'] * quantity,5)
symbol_data['Total Quantity'] = symbol_data['Total Quantity'] + quantity
if symbol_data['Total Quantity'] == 0:
symbol_data['Avg Price'] = 0
if isinstance(symbol_data['Profit'], int):
symbol_data['Profit'] = [(data_time, profit)]
else:
symbol_data['Profit'].append((data_time, profit))
```

Now we are ready to compute the monthly stats for July:

```
july_data = monthly_data['july']
for index, row in df_july.iterrows():
symbol = row['Symbol']
symbol_data = july_data[symbol]
if row['Activity Type'] == 'BUY':
compute_avg_price_total_amount_quantity(symbol, row['Quantity'], symbol_data, row["Amount"])
elif row['Activity Type'] == 'BUY':
compute_profit_update(symbol, row['Quantity'], symbol_data, row['Trade Date'], row["Amount"])
symbol, quantity, symbol_data, data_time, current_amount
else:
raise Exception(f'unknown activity type {row["Activity Type"]}')
```

`monthly_data`

```
defaultdict(<function __main__.<lambda>()>,
{'july': defaultdict(<function __main__.<lambda>.<locals>.<lambda>()>,
{'AMD': defaultdict(int,
{'Avg Price': 61.8372238095238,
'Total Quantity': 21.0}),
'NCLH': defaultdict(int,
{'Avg Price': 13.92942857142857,
'Total Quantity': 70.0}),
'OPK': defaultdict(int,
{'Avg Price': 5.32,
'Total Quantity': 20.0}),
'SM': defaultdict(int,
{'Avg Price': 2.89,
'Total Quantity': 15.0})})})
```

As you can see above there is not any profit/loss because no sale was done.

### Reduce the steps to compute monthly data stats

For the next month, we can include all the cleaning data steps into one function:

```
def clean_data(df, symbols):
# keep only wanted activities
activities = {'BUY', 'SELL'}
df = df[df['Activity Type'].isin(activities)].reset_index(drop=True)
# remove bad columns
df.drop([c for c in df.columns if 'Unnamed' in c], axis=1, inplace=True)
# use only symbold
# symbols = {'AMD', 'NCLH', 'OPK', "SM"}
df['Symbol / Description'] = [next(s for s in symbols if s in e) for e in df['Symbol / Description']]
# change Amount data type to float
if df.Amount.dtypes != float:
df.Amount = [float(re.sub('[,()]', '', n)) if isinstance(n, str) else n for n in df.Amount]
# remove the Settle Date columns
df.drop('Settle Date', axis=1, inplace=True)
# change Trade Date column data type into datetime
df['Trade Date'] = pd.to_datetime(df['Trade Date'])
# rename column 'Symbol / Description'
df.rename(columns={'Symbol / Description': 'Symbol'}, inplace=True)
return df
```

August data

`df_august = pd.read_clipboard()`

Clean the data

```
df_august_clean = clean_data(df_august_2,
{'FSLY', 'NKLA', 'IVR',
'NET', 'APA', 'AMD', 'OPK',
'NIO', 'INTC', 'PBF', 'OPK',
'CLNY', 'TXMD', 'GRPN'})
```

`df_august_clean`

**Update monthly data**

```
from copy import deepcopy
monthly_data['August'] = deepcopy(monthly_data['july'])
august_data = monthly_data['August']
for index, row in df_august_2_clean.iterrows():
symbol = row['Symbol']
symbol_data = august_data[symbol]
if row['Activity Type'] == 'BUY':
compute_avg_price_total_amount_quantity(symbol, row['Quantity'], symbol_data, row["Amount"])
elif row['Activity Type'] == 'SELL':
compute_profit_update(symbol, row['Quantity'], symbol_data, row['Trade Date'], row["Amount"])
else:
raise Exception(f'unknown activity type {row["Activity Type"]}')
```

` monthly_data `

```
defaultdict(<function __main__.<lambda>()>,
{'july': defaultdict(<function __main__.<lambda>.<locals>.<lambda>()>,
{'AMD': defaultdict(int,
{'Avg Price': 61.8372238095238,
'Total Quantity': 21.0}),
'NCLH': defaultdict(int,
{'Avg Price': 13.92942857142857,
'Total Quantity': 70.0}),
'OPK': defaultdict(int,
{'Avg Price': 5.32,
'Total Quantity': 20.0}),
'SM': defaultdict(int,
{'Avg Price': 2.89,
'Total Quantity': 15.0})}),
'August': defaultdict(<function __main__.<lambda>.<locals>.<lambda>()>,
{'AMD': defaultdict(int,
{'Avg Price': 68.87360322580645,
'Total Quantity': 31.0}),
'NCLH': defaultdict(int,
{'Avg Price': 13.92942857142857,
'Total Quantity': 70.0}),
'OPK': defaultdict(int,
{'Avg Price': 4.9216037735849065,
'Total Quantity': 530.0}),
'SM': defaultdict(int,
{'Avg Price': 2.89,
'Total Quantity': 15.0}),
'FSLY': defaultdict(int,
{'Avg Price': 0,
'Total Quantity': 0.0,
'Profit': [(Timestamp('2020-08-07 00:00:00'),
360.22)]}),
'NKLA': defaultdict(int,
{'Avg Price': 0,
'Total Quantity': 0.0,
'Profit': [(Timestamp('2020-08-07 00:00:00'),
-4.23)]}),
'IVR': defaultdict(int,
{'Avg Price': 0,
'Total Quantity': 0.0,
'Profit': [(Timestamp('2020-08-07 00:00:00'),
12.8)]}),
'NET': defaultdict(int,
{'Avg Price': 0,
'Total Quantity': 0.0,
'Profit': [(Timestamp('2020-08-07 00:00:00'),
-65.01)]}),
'APA': defaultdict(int,
{'Avg Price': 15.3945,
'Total Quantity': 100.0,
'Profit': [(Timestamp('2020-08-07 00:00:00'),
0.11)]}),
'CLNY': defaultdict(int,
{'Avg Price': 2.453333333333333,
'Total Quantity': 75.0}),
'NIO': defaultdict(int,
{'Avg Price': 13.31,
'Total Quantity': 100.0}),
'INTC': defaultdict(int,
{'Avg Price': 48.5579,
'Total Quantity': 100.0}),
'PBF': defaultdict(int,
{'Avg Price': 9.15,
'Total Quantity': 10.0,
'Profit': [(Timestamp('2020-08-19 00:00:00'),
21.61)]}),
'TXMD': defaultdict(int,
{'Avg Price': 1.67,
'Total Quantity': 10.0}),
'GRPN': defaultdict(int,
{'Avg Price': 29.810000000000002,
'Total Quantity': 10.0})})})
```

Now you can compute as many months that you need. To not get you bored with too much data I will stop bringing more months.

I hope you are not this panda at this moment.

## Getting the statistical USD-RON exchange rate per day

We can get the statistical USD-RON exchange rate per day from the central bank. If you are a Romanian you can take the daily exchange rates from __here__.

Now we need to convert each profit/loss to RON according to the USD-RON exchange rate at the trading date.

After you download the daily statistical series you can use the same "pandas.read_clipboard" method to use the data with pandas.

```
usd_ron = pd.read_clipboard()
usd_ron
```

Get data to datetime format

```
usd_ron['Data'] = pd.to_datetime(usd_ron['Data'], format="%d.%m.%Y")
usd_ron.reset_index(inplace=True, drop=True)
```

To easily manipulate the usd_ron data we will use a dictionary :

```
data_leu = {}
for index, row in usd_ron.iterrows():
data_leu[row['Data']] = row['USD']
```

## Compute the realized capital gains

The realized capital gains for the last year represent the difference between total profit and total loss.

To compute the total profit and total loss you just need the data from the last month (from "monthly_data")

```
all_tranzactions = []
for s, d in august_data.items():
if 'Profit' in d:
all_tranzactions.extend([round(e[1] * data_leu[e[0]]) for e in d['Profit']])
```

```
trade_total_winn = sum(e for e in all_tranzactions if e > 0)
trade_total_lose = sum(e for e in all_tranzactions if e < 0)
```

`trade_total_lose, trade_total_winn, trade_total_winn + trade_total_lose`

`(-283, 1615, 1332)`

So in this example, our realized capital gains are 1332 RON. If you are a Romanian you will need to pay 10% for tax = 133.2 RON.

## Commentaires