Parsing Bank Statements in Python

money

A few weeks ago, I mentioned my new way of tracking my budget and also figured that there’s one gaping hole: importing bank statements. This is part 1, where I create a simple parser for bank statements provided by my bank in CSV format.

The exported statements follow a very easy format that, I presume, mirrors the more common OFX format, used by Microsoft Money and others. The CSV file has its data ordered in 8 columns and even includes a (Dutch for me) header, so there’s no possible confusion. The format looks as follows (I removed any identifying items of course):

"JAAR + REFERTE";"UITVOERINGSDATUM";"VALUTADATUM";"BEDRAG";"MUNT V/D REKENING";"TEGENPARTIJ VAN DE VERRICHTING";"DETAILS";"REKENINGNUMMER"
"2010-0245";"12/07/2010";"12/07/2010";"-9,99";"EUR";"XXX-XXXXXXXX-XX";"REMOVED TO PROTECT MY INNOCENCE";"XXX-XXXXXXXX-XX"
"2010-0244";"12/07/2010";"12/07/2010";"-100,00";"EUR";"DIRECT DEBIT";"REMOVED TO PROTECT MY INNOCENCE";"XXX-XXXXXXXX-XX"
"2010-0243";"12/07/2010";"10/07/2010";"+100,00";"EUR";"XXX-XXXXXXXX-XX";"REMOVED TO PROTECT MY INNOCENCE";"XXX-XXXXXXXX-XX"
"2010-0242";"12/07/2010";"11/07/2010";"-5,00";"EUR";"CASH WITHDRAWEL";"REMOVED TO PROTECT MY INNOCENCE";"XXX-XXXXXXXX-XX"
"2010-0241";"12/07/2010";"12/07/2010";"-1000,00";"EUR";"OTHER";"REMOVED TO PROTECT MY INNOCENCE";"XXX-XXXXXXXX-XX"

Depending on your bank, the header will most likely be in another language and could be slightly different.

The parser looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#!/usr/bin/env python3
# www.streamhead.com
 
import collections
import sys
 
ID, EXECUTION_DATE, VALUE_DATE, AMOUNT, CURRENCY, COUNTERPARTY, DETAILS, ACCOUNT_NUMBER = range(8)
SEPERATOR = ';'
 
Transaction = collections.namedtuple("Transaction",
                "id executionDate valueDate amount currency counterparty details accountNumber")
 
def process_line(line):
    fields = []
    field = ""
    quote = None
    for c in line:
        if c in "\"'":
            if quote is None:
                quote = c
            elif quote == c:
                quote = None
            else:
                field += c
            continue
        if quote is None and c == SEPERATOR:
            fields.append(field)
            field = ""
        else:
            field += c
    if field:
        fields.append(field)
    return Transaction(fields[ID], fields[EXECUTION_DATE],
                       fields[VALUE_DATE], fields[AMOUNT],
                       fields[CURRENCY], fields[COUNTERPARTY],
                       fields[DETAILS], fields[ACCOUNT_NUMBER])
 
def print_transactions(transactions):
    for transaction in transactions:
        print("{0.id} on {0.executionDate}: {0.currency} {0.amount} for {0.counterparty}".format(transaction))
 
def main():
    if len(sys.argv) == 1:
        sys.argv.insert(1, "data\\TEST.CSV")
    filename = sys.argv[1]
    print("loading file {}".format(filename))
    transactions = []
    for line in list(open(filename))[1:]:
        line = line.rstrip()
        if line:
            transactions.append(process_line(line))
    transactions.sort(key=lambda t: t.id)
    print_transactions(transactions)
 
main()

By default it will take a testfile I have in a “data” subdirectory, but it can also get the filename from the command line. And that’s really all there is to parsing in Python (I suppose it can be made even simpler, by using the csv module, but this is a Python exercise for me).

Next up: actually doing something with the data.

(image credit)

This entry was posted in New Media and the World. Bookmark the permalink. Both comments and trackbacks are currently closed.

2 Trackbacks

  1. By The Perfect Programming Language on September 1, 2010 at 6:29 pm

    [...] radio guide « Parsing Bank Statements in Python Spring, JDO and transactions on Google App Engine [...]

  2. By Review: Programming in Python 3 on December 14, 2010 at 4:04 pm

    [...] months later, I’m ashamed to admit, I haven’t really grown out of the “toy project” category yet. There are many reasons, but most importantly, it does take some real effort to [...]

  • Feedback or questions? Contact me right away.

    Comments have been disabled on my posts. Not because I don't want to hear from you, but because they were adding very little to the conversation (most of them were spam anyway). I do listen to you and try to keep as much posts as possible up-to-date and error free. So if you have a question, if something isn't working the way you hoped or you have general feedback, please use the contact form below. I guarantee an answer to every honest question or remark.
  • Get in touch
    1. (required)
    2. (valid email required)
     

    cforms contact form by delicious:days