ledgerify(5) File Formats Manual ledgerify(5)

ledgerify - format of rules files

Rules files store transformations which ledgerify applies to convert transactions to the Ledger Format (see below). They are Python files which are a mix of options instructing ledgerify about properties of input and output formats, and actual rules, which transform transaction. Global options can be changed on per-run basis by passing -c option=value to ledgerify.

After transformations, transactions must have at least a date and one posting. This allows ledgerify to perform a normalisation of transactions (see the NORMALISATION section).

For a glance at how the complete rules file looks like, see the EXAMPLES section at the bottom of this page.

The output produced by ledgerify is a Ledger Format, which is made of transactions:

2022-11-12 Transaction title
	expenses:food    -10.20 EUR
	assets:cash       10.20 EUR

Transaction is made of the following parts:

date: 2022-11-12
description: Transaction title
postings: each entry which consists of account, amount and commodity
account: expenses:food and assets:cash
amount: -10.20 and 10.20
commodity: EUR

To change configuration options, ledgerify exposes a config variable, or its shorthand c. Options are changed by modifying its fields. For example, below code sets the location of input file for this particular ruleset and a date format used in it:

c.source = "~/bank.csv"
c.dateformat = "%Y/%m/%d"

The following sections list all available configuration options, together with example values which can be assigned to them.

config.source = "~/bank.csv"
File path to the file from which ledgerify will read transactions. This can be one of the following:

•absolute file path (optionally starting with a tilde "~" for the home directory of current user)
•relative file path, which ledgerify will search in the following locations:
$XDG_DOWNLOAD_DIR, or ~/Downloads if XDG_DOWNLOAD_DIR isn't set
•directory in which main rules file is located. Main rules file is the one specified in ledgerify invocation.
•"-": ledgerify will read a list of transactions from the standard input

Default: not set

config.filetype = "csv"

File type of config.source in case it cannot be deduced from its extension, for example when transactions are read from standard input.

Default: not set

config.decimalmark = "."

Symbol which separates the integer part from the fractional part of posting amounts in the input file.

Default: "."

config.dateformat = "%Y-%m-%d"

Format of dates in the input file. This accepts a the format codes that the 1989 C standard requires. See strptime(3) and date(1) for the list of accepted codes.

config.dateformat doesn't affect dates of transactions (either the config.default or the ones passed to the rules and conditions); they should always follow the ISO 8601 format (YYYY-MM-DD).

Default: any ISO 8601 datetime format

config.timezone = "UTC"

Ledgerify uses this field to declare a native time zone for datetimes which don't contain this information (that is, when config.dateformat doesn't use %Z format code). Ledgerify will use this information when converting input datetimes to system-local dates. Having the time zone helps prevent off-by-one dates in the output transactions.

config.timezone only accepts valid IANA TZ identifiers, like "UTC", "CET", "Etc/GMT+8" or "Europe/Budapest". For a list of identifiers see: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones, or you can use the following Python snippet to list time zones available on your system:

>>> import zoneinfo
>>> zoneinfo.available_timezones()

Default: not set

config.extractlatest = r".*"

Regular expression which extracts a part of transaction description from incoming transactions and from processed transactions stored in latest file (see: --latest in ledgerify(1)). When checking whether a transaction should be excluded, ledgerify will compare only extracted parts instead of full transaction descriptions.

Instead of using regular expressions, it's possible to set config.extractlatest to a transformation function, which accepts full description and returns extracted part.

Setting config.extractlatest to None disables matching transactions descriptions entirely.

The purpose of this option is to give some flexibility when dealing with financial institutions which change descriptions of transactions for example after booking the transaction.

Default: ".*" (whole description)

By setting config.default it's possible to fill the values missing from the parsed input file. It's also possible to instruct lederify to automatically create the postings.

config.default.date = "2022-11-10"

Default date assigned to the parsed transactions when input data doesn't have one. These dates are unaffected by the config.dateformat and should always follow ISO 8601 format (YYYY-MM-DD).

Default: current date

config.default.description = "Description"

Default description assigned to the parsed transactions when input data doesn't have one.

Default: not set

config.default.account = "assets:checking"

Default account assigned to the newly created postings. Assigning a name to config.default.account doesn't create a new posting with an account name. If each transaction should have a posting with a specific account name, it should be created by assigning the name to the specific posting's account:

config.default.account[1] = "income:salary"
config.default.account[2] = "assets:checking"

The same technique works for other parts of a posting (amount and commodity).

Default: not set

config.default.amount = "10.00" | 10

Default amount assigned to the newly created postings. See the config.default.account on how to always create a posting with a specific amount.

When you're dealing with a fractions of amounts, it's advised to set amount as a string ("10.00" - with quotes) instead of a floating point numbers (10.0 - without quotes). IEEE 754 floating point numbers cannot accurately represent fractional numbers and are not suitable for use with currencies. Internally ledgerify converts all amounts to the decimal type, which is suitable for dealing with money, but converting from floating point numbers may lead to rounding errors.

Default: not set

config.default.commodity = "USD"

Default commodity assigned to the newly created postings. See the config.default.account on how to always create a posting with a specific commodity.

Default: not set

config.default.symbolside = "left" | "right" | "leftjoin" | "rightjoin"

Tells ledgerify on which side of the amount it should put the commodity symbol, and whether to put a space between the two or not.

•left: $ 10.00, EUR 10.00, PLN -10.00
•right: 10.00 $, 10.00 EUR, -10.00 PLN
•leftjoin: $10.00, EUR10.00, PLN-10.00
•rightjoin: 10.00$ 10.00EUR, -10.00PLN

Default: right

config.csv options instruct ledgerify how to parse and interpret CSV files. They are used to create the initial transactions, which are passed to the rules for further processing.

config.csv.fields = ["tags", "account[2]"]

List of fields which are mapped to each column in CSV file. They can be any strings which will be exposed in transactions passed to the rules and conditions. However, ledgerify uses only specific names when it produces the output (see: Ledger Format).

To assign certain parts of postings to the specific posting, you specify the number of that posting, for example "account[1]", "commodity[2]" etc.

For example, the following CSV file:

# date,     amount, account, description
2022-11-10, 10 EUR, checking, Groceries

can be interpreted by the following config.csv.fields:

config.csv.fields = ["date", "amount[1]", "acccount[1]", "description"]

As a part of input preprocessing, ledgerify will also automatically detect the presence of commodity in the amount field and will separate it into commodity[1] (see: INPUT PREPROCESSING section).

Default: not set

config.csv.skip = 0

Many CSVs obtained from banks have a lot of metadata before the actual transaction data begins. config.csv.skip option tells ledgerify to skip a number of lines from the start of the file, including empty lines.

Default: 0

config.csv.separator = ","

A one-character string used to separate CSV columns.

Default: , (comma)

Rule is a Python function, registered with either @rule decorator or with config.add_rule(fn) function. Rules are responsible for performing the transformations of loaded transactions.

Rules accept a single argument: transaction (tr), which they modify in-place.

@rule
def r(tr):
	tr.description = "groceries"
    tr.account[1] = "assets:cash"

Transactions are passed to rules in chronological order. For each transaction rules are executed in order in which they are registere.

You can control the execution of following rules by calling skip(), done() and end() functions on the transaction passed to the rule:

tr.skip() removes the current transaction from the output;
tr.done() stops the execution of any further rules for the current transaction;
tr.end() stops the execution of any further rules or transactions.

There are two ways to apply a rule only to a subset of transactions (for example to transactions whose description matches a certain criteria).

First approach is to use ordinary "if" statement inside the rule:

@rule
def r(tr):
    if "Groceries" in tr.description:
	    tr.account[1] = "expenses:groceries"

Second approach is to create a separate condition function and assign it to the rule. This approach is more concise and is well suited when you wish to reuse condition in many rules.

Conditions accept a single argument, a transaction, and they return whether a linked condition should run or not (boolean values True or False).

def is_groceries(tr):
    return "Groceries" in tr.description
@rule(condition=is_groceries)
def r(tr):
    tr.account[1] = "expenses:groceries"

Matching text in parts of transaction is so common that ledgerify provides a built-in condition for this: match (see the Built-in Conditions section).

Conditions can be composed together with logical operators & ("and"), | ("or") and ~ ("not" - negation). Only conditions decorated with @condition decorator can be composed together.

@condition
def is_income(tr):
    return tr.amount[1] > 0
@condition
def from_my_company(tr):
    return "My Company" in tr.descrtiption
@rule(condition=is_income & from_my_company)
def r(tr):
    tr.account[1] = "income:salary"
@rule(condition=is_income & ~from_my_company)
def r(tr):
    tr.account[1] = "income:other"

match(pattern, field="description", i=False)
Matches the field of transaction against a pattern. Pattern is a string which may contain a Unix shell-style wildcards:

•*: match everything
•?: match any single character
•[seq]: match any character in seq
•[!seq]: match any character not in seq

match compares the whole string, which means that for example patter "foo" doesn't match a field with a value"foobar", but pattern "foo*" will match it.

When i is set to True, match is case-insensitive. It is possible to match parts of postings by passing posting number to the field.

@rule(condition=match("*shop*", field="account[1]", i=True))
def rule(tr):
    ...

Ledgerify provides config.load(path) function to load another rules file. Files passed to load function should be either absolute paths or relative to the rules file in which loading occurs.

When ledgerify loads transactions from the input file, it performs the following actions to make sure that transactions passed to the rules are well-formed:

•parses date field as date-times according to the config.dateformat formatting and extracts the date component;
•extracts a commodity bundled in the amount.

After all transactions are processed, ledgerify performs the following additional "normalizations" of the results:

•when transaction has only one posting, it creates a second posting with a config.default.account to balance it (excluding vitrual unbalanced postings);
•assigns config.default.amount and config.default.commodity when all postings in a transaction lack these informations;
•automatically balances postings:
•for each unbalanced posting, it searches for any other empty posting (posting which doesn't have amount set) with the same commodity and modifies its amount to take into account all unbalanced amounts; only one such empty posting for each commodity may exist.
•when no empty posting which shares commodity is found, it searches for an empty posting without commodity, which is recognized as a "balancing posting";
•if after above operations there are exactly 2 postings with different commodities and different signs (one positive and one negative amount), ledgerify recognizes them as "commodity exchange" and leaves them intact;
•otherwise ledgerify reports an error that transaction is unbalanced;
•sets amount to 0 for empty non-balancing postings (postings which don't have amount but have commodity).

Normalized entries are the final output of ledgerify.

This example splits rules to 2 files. First one is a bank.py rule file which has bank-specific informations. In the last line it loads a common.py rule file, which has the actual rules. This way it is easy to separate the actual logic from the file formatting rules, which makes it easy to switch between inputs from different banks.

c.source = "bank.csv"
c.dateformat = "%Y/%m/%d"
c.decimalmark = ","
c.csv.skip = 19
c.csv.fields = ["date", "description", "accname", "category", "amount[1]"]
c.csv.separator = ";"
# For the purpose of latest matching use only a part of description before "NOT
# BOOKED" phrase appears.
c.extractlatest = r".+?(?=\s+NOT BOOKED|$)"
c.load("common.py")

import re
c.default.account[1] = "expenses:unknown"
c.default.account[2] = "assets:checking"
c.default.commodity = "EUR"
@condition
def is_income(tr):
    # income becomes negative in "accounts" rule below
    return tr.amount[1] < 0
@rule(condition=match("*NOLEDGER*)):
def noledger(tr):
	tr.skip()
@rule
def accounts(tr):
    if tr.amount[1] > 0:
        tr.account[1] = "income:unknown"
        tr.account[2] = "assets:checking"
    tr.amount[1] *= -1
@rule
def description(tr):
    d = tr.description.strip().replace(", ", " | ", 1)
    tr.description = re.sub(r"s+", " ", d)
@rule(condition=(is_income & match("*My Employer*)):
def salary(tr):
    tr.description = "My Employer | Salary"
    tr.account[1] = "income:salary"
    tr.done()
@rule(condition=match("*Shop Foobar*)):
def groceries(tr):
    tr.description = "Groceries"
    tr.account[1] = "expenses:groceries"
    tr.done()

ledgerify(1) ledger(1) date(1)

Michał Góral <dev@goral.net.pl>

Source code: https://git.goral.net.pl/ledgerify.git

2024-10-03