ledgerify - format of rules files
Rules files store transformations which ledgerify applies to convert
transactions to the Ledger Format (see below). They are Python files which are
a mix of options instructing ledgerify about properties of input and output
formats, and actual rules, which transform transaction. Global options can be
changed on per-run basis by passing -c option=value to ledgerify.
After transformations, transactions must have at least a
date and one posting. This allows ledgerify to perform a
normalisation of transactions (see the NORMALISATION section).
For a glance at how the complete rules file looks like, see the
EXAMPLES section at the bottom of this page.
The output produced by ledgerify is a Ledger Format, which is made of
transactions:
2022-11-12 Transaction title
expenses:food -10.20 EUR
assets:cash 10.20 EUR
Transaction is made of the following parts:
•date: 2022-11-12
•description: Transaction
title
•
postings: each entry which consists of
account,
amount and
commodity
•account: expenses:food and
assets:cash
•amount: -10.20 and
10.20
To change configuration options, ledgerify exposes a config variable, or
its shorthand c. Options are changed by modifying its fields. For
example, below code sets the location of input file for this particular
ruleset and a date format used in it:
c.source = "~/bank.csv"
c.dateformat = "%Y/%m/%d"
The following sections list all available configuration options,
together with example values which can be assigned to them.
config.source = "~/bank.csv"
File path to the file from which ledgerify will read
transactions. This can be one of the following:
•absolute file path (optionally starting with a
tilde "~" for the home directory of current user)
•relative file path, which ledgerify will search
in the following locations:
•$XDG_DOWNLOAD_DIR, or ~/Downloads
if XDG_DOWNLOAD_DIR isn't set
•directory in which main rules file is located.
Main rules file is the one specified in ledgerify invocation.
•"-": ledgerify will read a list of
transactions from the standard input
Default: not set
config.filetype = "csv"
File type of
config.source in case it cannot be
deduced from its extension, for example when transactions are read from
standard input.
Default: not set
config.decimalmark = "."
Symbol which separates the integer part from the
fractional part of posting amounts in the input file.
Default: "."
config.dateformat = "%Y-%m-%d"
Format of dates in the input file. This accepts a the
format codes that the 1989 C standard requires. See
strptime(3) and
date(1) for the list of accepted codes.
config.dateformat doesn't affect dates of transactions
(either the config.default or the ones passed to the rules and
conditions); they should always follow the ISO 8601 format
(YYYY-MM-DD).
Default: any ISO 8601 datetime format
config.timezone = "UTC"
Ledgerify uses this field to declare a native time zone
for datetimes which don't contain this information (that is, when
config.dateformat doesn't use
%Z format code). Ledgerify will
use this information when converting input datetimes to system-local dates.
Having the time zone helps prevent off-by-one dates in the output
transactions.
config.timezone only accepts valid IANA TZ identifiers,
like "UTC", "CET", "Etc/GMT+8" or
"Europe/Budapest". For a list of identifiers see:
https://en.wikipedia.org/wiki/List_of_tz_database_time_zones, or you
can use the following Python snippet to list time zones available on your
system:
>>> import zoneinfo
>>> zoneinfo.available_timezones()
Default: not set
config.extractlatest = r".*"
Regular expression which extracts a part of transaction
description from incoming transactions and from processed transactions stored
in latest file (see:
--latest in
ledgerify(1)). When checking
whether a transaction should be excluded, ledgerify will compare only
extracted parts instead of full transaction descriptions.
Instead of using regular expressions, it's possible to set
config.extractlatest to a transformation function, which
accepts full description and returns extracted part.
Setting config.extractlatest to None disables
matching transactions descriptions entirely.
The purpose of this option is to give some flexibility when
dealing with financial institutions which change descriptions of
transactions for example after booking the transaction.
Default: ".*" (whole description)
By setting config.default it's possible to fill the values missing from
the parsed input file. It's also possible to instruct lederify to
automatically create the postings.
config.default.date = "2022-11-10"
Default date assigned to the parsed transactions when
input data doesn't have one. These dates are unaffected by the
config.dateformat and should always follow ISO 8601 format
(YYYY-MM-DD).
Default: current date
config.default.description = "Description"
Default description assigned to the parsed transactions
when input data doesn't have one.
Default: not set
config.default.account = "assets:checking"
Default account assigned to the newly created postings.
Assigning a name to
config.default.account doesn't create a new posting
with an account name. If each transaction should have a posting with a
specific account name, it should be created by assigning the name to the
specific posting's account:
config.default.account[1] = "income:salary"
config.default.account[2] = "assets:checking"
The same technique works for other parts of a posting
(amount and commodity).
Default: not set
config.default.amount = "10.00" | 10
Default amount assigned to the newly created postings.
See the
config.default.account on how to always create a posting with a
specific amount.
When you're dealing with a fractions of amounts, it's advised to
set amount as a string ("10.00" - with quotes) instead of a
floating point numbers (10.0 - without quotes). IEEE 754 floating point
numbers cannot accurately represent fractional numbers and are not suitable
for use with currencies. Internally ledgerify converts all amounts to the
decimal type, which is suitable for dealing with money, but converting from
floating point numbers may lead to rounding errors.
Default: not set
config.default.commodity = "USD"
Default commodity assigned to the newly created postings.
See the
config.default.account on how to always create a posting with a
specific commodity.
Default: not set
config.default.symbolside = "left" |
"right" | "leftjoin" | "rightjoin"
Tells ledgerify on which side of the amount it should put
the commodity symbol, and whether to put a space between the two or not.
•left: $ 10.00, EUR 10.00, PLN -10.00
•right: 10.00 $, 10.00 EUR, -10.00 PLN
•leftjoin: $10.00, EUR10.00, PLN-10.00
•rightjoin: 10.00$ 10.00EUR, -10.00PLN
Default: right
config.csv options instruct ledgerify how to parse and interpret CSV
files. They are used to create the initial transactions, which are passed to
the rules for further processing.
config.csv.fields = ["tags",
"account[2]"]
List of fields which are mapped to each column in CSV
file. They can be any strings which will be exposed in transactions passed to
the
rules and
conditions. However, ledgerify uses only specific
names when it produces the output (see:
Ledger Format).
To assign certain parts of postings to the specific posting, you
specify the number of that posting, for example "account[1]",
"commodity[2]" etc.
For example, the following CSV file:
# date, amount, account, description
2022-11-10, 10 EUR, checking, Groceries
can be interpreted by the following config.csv.fields:
config.csv.fields = ["date", "amount[1]", "acccount[1]", "description"]
As a part of input preprocessing, ledgerify will also
automatically detect the presence of commodity in the amount
field and will separate it into commodity[1] (see: INPUT
PREPROCESSING section).
Default: not set
config.csv.skip = 0
Many CSVs obtained from banks have a lot of metadata
before the actual transaction data begins.
config.csv.skip option tells
ledgerify to skip a number of lines from the start of the file, including
empty lines.
Default: 0
config.csv.separator = ","
A one-character string used to separate CSV columns.
Default: , (comma)
Rule is a Python function, registered with either @rule decorator or with
config.add_rule(fn) function. Rules are responsible for performing the
transformations of loaded transactions.
Rules accept a single argument: transaction (tr),
which they modify in-place.
@rule
def r(tr):
tr.description = "groceries"
tr.account[1] = "assets:cash"
Transactions are passed to rules in chronological order. For each
transaction rules are executed in order in which they are registere.
You can control the execution of following rules by calling
skip(), done() and end() functions on the transaction
passed to the rule:
•tr.skip() removes the current transaction
from the output;
•tr.done() stops the execution of any
further rules for the current transaction;
•
tr.end() stops the execution of any
further rules or transactions.
There are two ways to apply a rule only to a subset of transactions (for example
to transactions whose description matches a certain criteria).
First approach is to use ordinary "if" statement inside
the rule:
@rule
def r(tr):
if "Groceries" in tr.description:
tr.account[1] = "expenses:groceries"
Second approach is to create a separate condition function
and assign it to the rule. This approach is more concise and is well suited
when you wish to reuse condition in many rules.
Conditions accept a single argument, a transaction, and they
return whether a linked condition should run or not (boolean values True or
False).
def is_groceries(tr):
return "Groceries" in tr.description
@rule(condition=is_groceries)
def r(tr):
tr.account[1] = "expenses:groceries"
Matching text in parts of transaction is so common that ledgerify
provides a built-in condition for this: match (see the Built-in
Conditions section).
Conditions can be composed together with logical operators &
("and"), | ("or") and ~ ("not" -
negation). Only conditions decorated with @condition decorator can be
composed together.
@condition
def is_income(tr):
return tr.amount[1] > 0
@condition
def from_my_company(tr):
return "My Company" in tr.descrtiption
@rule(condition=is_income & from_my_company)
def r(tr):
tr.account[1] = "income:salary"
@rule(condition=is_income & ~from_my_company)
def r(tr):
tr.account[1] = "income:other"
match(pattern, field="description",
i=False)
Matches the field of transaction against a
pattern. Pattern is a string which may contain a Unix shell-style
wildcards:
•*: match everything
•?: match any single character
•[seq]: match any character in
seq
•[!
seq]: match any character not in seq
match compares the whole string, which means that for
example patter "foo" doesn't match a field with a
value"foobar", but pattern "foo*" will match it.
When i is set to True, match is case-insensitive. It is
possible to match parts of postings by passing posting number to the
field.
@rule(condition=match("*shop*", field="account[1]", i=True))
def rule(tr):
...
Ledgerify provides config.load(path) function to load another
rules file. Files passed to load function should be either absolute
paths or relative to the rules file in which loading occurs.
When ledgerify loads transactions from the input file, it performs the following
actions to make sure that transactions passed to the rules are well-formed:
•parses date field as date-times according
to the config.dateformat formatting and extracts the date
component;
•extracts a commodity bundled in the
amount.
After all transactions are processed, ledgerify performs the following
additional "normalizations" of the results:
•when transaction has only one posting, it creates
a second posting with a config.default.account to balance it (excluding
vitrual unbalanced postings);
•assigns config.default.amount and
config.default.commodity when all postings in a transaction lack these
informations;
•automatically balances postings:
•for each unbalanced posting, it searches for any
other empty posting (posting which doesn't have amount set) with the same
commodity and modifies its amount to take into account all unbalanced amounts;
only one such empty posting for each commodity may exist.
•when no empty posting which shares commodity is
found, it searches for an empty posting without commodity, which is recognized
as a "balancing posting";
•otherwise ledgerify reports an error that
transaction is unbalanced;
•sets amount to 0 for empty non-balancing postings
(postings which don't have amount but have commodity).
Normalized entries are the final output of ledgerify.
This example splits rules to 2 files. First one is a bank.py rule file
which has bank-specific informations. In the last line it loads a
common.py rule file, which has the actual rules. This way it is easy to
separate the actual logic from the file formatting rules, which makes it easy
to switch between inputs from different banks.
c.source = "bank.csv"
c.dateformat = "%Y/%m/%d"
c.decimalmark = ","
c.csv.skip = 19
c.csv.fields = ["date", "description", "accname", "category", "amount[1]"]
c.csv.separator = ";"
# For the purpose of latest matching use only a part of description before "NOT
# BOOKED" phrase appears.
c.extractlatest = r".+?(?=s+NOT BOOKED|$)"
c.load("common.py")
import re
c.default.account[1] = "expenses:unknown"
c.default.account[2] = "assets:checking"
c.default.commodity = "EUR"
@condition
def is_income(tr):
# income becomes negative in "accounts" rule below
return tr.amount[1] < 0
@rule(condition=match("*NOLEDGER*)):
def noledger(tr):
tr.skip()
@rule
def accounts(tr):
if tr.amount[1] > 0:
tr.account[1] = "income:unknown"
tr.account[2] = "assets:checking"
tr.amount[1] *= -1
@rule
def description(tr):
d = tr.description.strip().replace(", ", " | ", 1)
tr.description = re.sub(r"s+", " ", d)
@rule(condition=(is_income & match("*My Employer*)):
def salary(tr):
tr.description = "My Employer | Salary"
tr.account[1] = "income:salary"
tr.done()
@rule(condition=match("*Shop Foobar*)):
def groceries(tr):
tr.description = "Groceries"
tr.account[1] = "expenses:groceries"
tr.done()
ledgerify(1) ledger(1) date(1)
Michał Góral <dev@goral.net.pl>
Source code: https://git.goral.net.pl/ledgerify.git