ledgerify(5) | File Formats Manual | ledgerify(5) |
ledgerify - format of rules files
Rules files store transformations which ledgerify applies to convert transactions to the Ledger Format (see below). They are Python files which are a mix of options instructing ledgerify about properties of input and output formats, and actual rules, which transform transaction. Global options can be changed on per-run basis by passing -c option=value to ledgerify.
After transformations, transactions must have at least a date and one posting. This allows ledgerify to perform a normalisation of transactions (see the NORMALISATION section).
For a glance at how the complete rules file looks like, see the EXAMPLES section at the bottom of this page.
The output produced by ledgerify is a Ledger Format, which is made of transactions:
2022-11-12 Transaction title expenses:food -10.20 EUR assets:cash 10.20 EUR
Transaction is made of the following parts:
To change configuration options, ledgerify exposes a config variable, or its shorthand c. Options are changed by modifying its fields. For example, below code sets the location of input file for this particular ruleset and a date format used in it:
c.source = "~/bank.csv" c.dateformat = "%Y/%m/%d"
The following sections list all available configuration options, together with example values which can be assigned to them.
config.source = "~/bank.csv"
Default: not set
config.filetype = "csv"
Default: not set
config.decimalmark = "."
Default: "."
config.dateformat = "%Y-%m-%d"
config.dateformat doesn't affect dates of transactions (either the config.default or the ones passed to the rules and conditions); they should always follow the ISO 8601 format (YYYY-MM-DD).
Default: any ISO 8601 datetime format
config.timezone = "UTC"
config.timezone only accepts valid IANA TZ identifiers, like "UTC", "CET", "Etc/GMT+8" or "Europe/Budapest". For a list of identifiers see: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones, or you can use the following Python snippet to list time zones available on your system:
>>> import zoneinfo >>> zoneinfo.available_timezones()
Default: not set
config.extractlatest = r".*"
Instead of using regular expressions, it's possible to set config.extractlatest to a transformation function, which accepts full description and returns extracted part.
Setting config.extractlatest to None disables matching transactions descriptions entirely.
The purpose of this option is to give some flexibility when dealing with financial institutions which change descriptions of transactions for example after booking the transaction.
Default: ".*" (whole description)
By setting config.default it's possible to fill the values missing from the parsed input file. It's also possible to instruct lederify to automatically create the postings.
config.default.date = "2022-11-10"
Default: current date
config.default.description = "Description"
Default: not set
config.default.account = "assets:checking"
config.default.account[1] = "income:salary" config.default.account[2] = "assets:checking"
The same technique works for other parts of a posting (amount and commodity).
Default: not set
config.default.amount = "10.00" | 10
When you're dealing with a fractions of amounts, it's advised to set amount as a string ("10.00" - with quotes) instead of a floating point numbers (10.0 - without quotes). IEEE 754 floating point numbers cannot accurately represent fractional numbers and are not suitable for use with currencies. Internally ledgerify converts all amounts to the decimal type, which is suitable for dealing with money, but converting from floating point numbers may lead to rounding errors.
Default: not set
config.default.commodity = "USD"
Default: not set
config.default.symbolside = "left" | "right" | "leftjoin" | "rightjoin"
Default: right
config.csv options instruct ledgerify how to parse and interpret CSV files. They are used to create the initial transactions, which are passed to the rules for further processing.
config.csv.fields = ["tags", "account[2]"]
To assign certain parts of postings to the specific posting, you specify the number of that posting, for example "account[1]", "commodity[2]" etc.
For example, the following CSV file:
# date, amount, account, description 2022-11-10, 10 EUR, checking, Groceries
can be interpreted by the following config.csv.fields:
config.csv.fields = ["date", "amount[1]", "acccount[1]", "description"]
As a part of input preprocessing, ledgerify will also automatically detect the presence of commodity in the amount field and will separate it into commodity[1] (see: INPUT PREPROCESSING section).
Default: not set
config.csv.skip = 0
Default: 0
config.csv.separator = ","
Default: , (comma)
Rule is a Python function, registered with either @rule decorator or with config.add_rule(fn) function. Rules are responsible for performing the transformations of loaded transactions.
Rules accept a single argument: transaction (tr), which they modify in-place.
@rule def r(tr): tr.description = "groceries"
tr.account[1] = "assets:cash"
Transactions are passed to rules in chronological order. For each transaction rules are executed in order in which they are registere.
You can control the execution of following rules by calling skip(), done() and end() functions on the transaction passed to the rule:
There are two ways to apply a rule only to a subset of transactions (for example to transactions whose description matches a certain criteria).
First approach is to use ordinary "if" statement inside the rule:
@rule def r(tr):
if "Groceries" in tr.description: tr.account[1] = "expenses:groceries"
Second approach is to create a separate condition function and assign it to the rule. This approach is more concise and is well suited when you wish to reuse condition in many rules.
Conditions accept a single argument, a transaction, and they return whether a linked condition should run or not (boolean values True or False).
def is_groceries(tr):
return "Groceries" in tr.description @rule(condition=is_groceries) def r(tr):
tr.account[1] = "expenses:groceries"
Matching text in parts of transaction is so common that ledgerify provides a built-in condition for this: match (see the Built-in Conditions section).
Conditions can be composed together with logical operators & ("and"), | ("or") and ~ ("not" - negation). Only conditions decorated with @condition decorator can be composed together.
@condition def is_income(tr):
return tr.amount[1] > 0 @condition def from_my_company(tr):
return "My Company" in tr.descrtiption @rule(condition=is_income & from_my_company) def r(tr):
tr.account[1] = "income:salary" @rule(condition=is_income & ~from_my_company) def r(tr):
tr.account[1] = "income:other"
match(pattern, field="description", i=False)
match compares the whole string, which means that for example patter "foo" doesn't match a field with a value"foobar", but pattern "foo*" will match it.
When i is set to True, match is case-insensitive. It is possible to match parts of postings by passing posting number to the field.
@rule(condition=match("*shop*", field="account[1]", i=True)) def rule(tr):
...
Ledgerify provides config.load(path) function to load another rules file. Files passed to load function should be either absolute paths or relative to the rules file in which loading occurs.
When ledgerify loads transactions from the input file, it performs the following actions to make sure that transactions passed to the rules are well-formed:
After all transactions are processed, ledgerify performs the following additional "normalizations" of the results:
Normalized entries are the final output of ledgerify.
This example splits rules to 2 files. First one is a bank.py rule file which has bank-specific informations. In the last line it loads a common.py rule file, which has the actual rules. This way it is easy to separate the actual logic from the file formatting rules, which makes it easy to switch between inputs from different banks.
c.source = "bank.csv" c.dateformat = "%Y/%m/%d" c.decimalmark = "," c.csv.skip = 19 c.csv.fields = ["date", "description", "accname", "category", "amount[1]"] c.csv.separator = ";" # For the purpose of latest matching use only a part of description before "NOT # BOOKED" phrase appears. c.extractlatest = r".+?(?=\s+NOT BOOKED|$)" c.load("common.py")
import re c.default.account[1] = "expenses:unknown" c.default.account[2] = "assets:checking" c.default.commodity = "EUR" @condition def is_income(tr):
# income becomes negative in "accounts" rule below
return tr.amount[1] < 0 @rule(condition=match("*NOLEDGER*)): def noledger(tr): tr.skip() @rule def accounts(tr):
if tr.amount[1] > 0:
tr.account[1] = "income:unknown"
tr.account[2] = "assets:checking"
tr.amount[1] *= -1 @rule def description(tr):
d = tr.description.strip().replace(", ", " | ", 1)
tr.description = re.sub(r"s+", " ", d) @rule(condition=(is_income & match("*My Employer*)): def salary(tr):
tr.description = "My Employer | Salary"
tr.account[1] = "income:salary"
tr.done() @rule(condition=match("*Shop Foobar*)): def groceries(tr):
tr.description = "Groceries"
tr.account[1] = "expenses:groceries"
tr.done()
ledgerify(1) ledger(1) date(1)
Michał Góral <dev@goral.net.pl>
Source code: https://git.goral.net.pl/ledgerify.git
2024-12-21 |