Skip to content

ICU Message Format

ICU Message Format (ICUMF) is powered by its own parser and formatter within doti18n. For pluralization, it follows CLDR Plural Rules, which are powered by Babel.

Syntax#

ICUMF uses a specific syntax for defining messages, including pluralization, selection, and variable interpolation.

  • Variable Interpolation: {variable}
  • Hash: In sub-numeric formatters (plural, selectordinal), # represents the count value.
  • Pluralization: {variable, plural, one {singular form} other {plural form}}
  • Selectordinal: {variable, selectordinal, one {1st form} two {2nd form} few {3rd form} other {default form}}
  • Select: {variable, select, option1 {text1} option2 {text2} other {default text}}
  • Formatters: {variable, formatter, style} (e.g., for dates).
  • Escaping: Use single quotes ' to escape the next character. To include a literal single quote, use two single quotes ''.
  • Nesting: ICUMF supports nested constructs for complex messages.
  • Whitespace: Ignored outside the message box(braces {}).

Basic Example#

locales/en.yaml:

cat: "I have {count, plural, one {# cat} other {# cats}}"

locales/en.json:

{
    "cat": "I have {count, plural, one {# cat} other {# cats}}"
}

locales/en.xml:

<locale>
    <cat>I have {count, plural, one {# cat} other {# cats}}</cat>
</locale>

locales/en.toml:

cat = "I have {count, plural, one {# cat} other {# cats}}"

Usage:

from doti18n import LocaleData

i18n = LocaleData("locales")

print(i18n["en"].cat(count=1))   # Output: I have 1 cat
print(i18n["en"].cat(count=3))   # Output: I have 3 cats
print(i18n["en"].cat(count=11))  # Output: I have 11 cats

Manage ICUMF#

By default, doti18n automatically enables ICUMF parsing with standard settings. However, you can customize its behavior, adjust performance settings, or disable it entirely.

To do this, you need to manually configure the Loader and inject it into LocaleData.

Disabling ICUMF#

If you don't use ICUMF features and want to avoid the parsing overhead, or if your strings contain characters that conflict with ICUMF syntax (like {} used for other purposes), you can disable it.

from doti18n import LocaleData
from doti18n.loaders import Loader

# 1. Create a loader with ICUMF disabled
loader = Loader(icumf=False)

# 2. Inject the loader into LocaleData
i18n = LocaleData("locales", loader=loader)

Forcing ICUMF#

If for some reason ICUMF doesn't parse your strings automatically (e.g., due to conflicts with other formatting styles), you can force-enable it via marking strings in your localization files. This icu: prefix tells doti18n to always treat the string as ICUMF. doti18n will delete the prefix before parsing.

locales/en.yaml:

cat: "icu:I have {count, plural, one {# cat} other {# cats}}"

locales/en.json:

{
    "cat": "icu:I have {count, plural, one {# cat} other {# cats}}"
}

locales/en.xml:

<locale>
    <cat>icu:I have {count, plural, one {# cat} other {# cats}}</cat>
</locale>

locales/en.toml:

cat = "icu:I have {count, plural, one {# cat} other {# cats}}"

Explicitly Disabling ICUMF#

If you want to ensure that a string is never parsed as ICUMF (even if it contains ICU-like syntax), use the !icu: prefix. doti18n will remove the prefix and treat the rest of the string as plain text.

locales/en.yaml:

plain: "!icu:This is {not} an ICU string"

locales/en.json:

{
    "plain": "!icu:This is {not} an ICU string"
}

locales/en.xml:

<locale>
    <plain>!icu:This is {not} an ICU string</plain>
</locale>

locales/en.toml:

plain = "!icu:This is {not} an ICU string"

Advanced Configuration#

To adjust the cache size or parser behavior, you need to create an ICUMF instance manually and pass it to the loader.

from doti18n import LocaleData
from doti18n.loaders import Loader
from doti18n.icumf import ICUMF

# 1. Configure ICUMF
icumf = ICUMF(
    cache_size=2048,      # Increase cache for rendered strings
    strict=True,          # Enforce strict validation

    # Parser options (passed via kwargs)
    depth_limit=20,       # Limit nesting depth
    allow_tags=False,     # Disable HTML-like tag parsing
    require_other=False   # Don't require 'other' in plural/selectordinal (not recommended, it may break doti18n logic)
)

# 2. Create Loader with custom ICUMF
loader = Loader(icumf=icumf)

# 3. Initialize LocaleData
i18n = LocaleData("locales", loader=loader)

Parser Parameters#

These parameters are passed as keyword arguments (**kwargs) to the ICUMF constructor and control the internal Parser.

Parameter Type Default Description
cache_size int 1024 The maximum number of rendered strings to keep in memory (LRU Cache).
depth_limit int 50 Maximum recursion depth for nested messages. Prevents stack overflow on malformed strings.
allow_tags bool True Enables parsing of HTML/XML-like tags (e.g., <b>Bold</b>).
strict_tags bool True If True, ensures that closing tags match opening tags (e.g., <b>...</i> raises an error).
tag_prefix str None If set, only tags starting with this prefix are parsed.
require_other bool True If True, requires an other option in plural, select, and selectordinal formats.
allow_format_spaces bool True Allows whitespace inside format arguments (e.g., { count, plural, ... }).

Variable Interpolation#

If you want to include additional variables in your messages, add them to the message string and pass them as keyword arguments.

locales/en.yaml:

greeting: "Hello, {name}! You have {count, plural, one {# new message} other {# new messages}}."

locales/en.json:

{
    "greeting": "Hello, {name}! You have {count, plural, one {# new message} other {# new messages}}."
}

locales/en.xml:

<locale>
    <greeting>Hello, {name}! You have {count, plural, one {# new message} other {# new messages}}.</greeting>
</locale>

locales/en.toml:

greeting = "Hello, {name}! You have {count, plural, one {# new message} other {# new messages}}."

Usage:

from doti18n import LocaleData

i18n = LocaleData("locales")

print(i18n["en"].greeting(name="Alice", count=1))   # Output: Hello, Alice! You have 1 new message.
print(i18n["en"].greeting(name="Bob", count=5))     # Output: Hello, Bob! You have 5 new messages.

Note

If your ICUMF string contains only variable interpolation (without pluralization or formatters), it won't be processed as ICUMF. Instead, it will use standard Python formatting (str.format()).

Pluralization and Selectordinal#

ICUMF supports pluralization using the plural and selectordinal formats. You can define different message forms based on the numeric value of a variable.

locales/en.yaml:

item_count: "You have {count, plural, one {# item} other {# items}} in your cart."
rank: "You are ranked {position, selectordinal, one {#st} two {#nd} few {#rd} other {#th}} in the competition."

locales/en.json:

{
    "item_count": "You have {count, plural, one {# item} other {# items}} in your cart.",
    "rank": "You are ranked {position, selectordinal, one {#st} two {#nd} few {#rd} other {#th}} in the competition."
}

locales/en.xml:

<locale>
    <item_count>You have {count, plural, one {# item} other {# items}} in your cart.</item_count>
    <rank>You are ranked {position, selectordinal, one {#st} two {#nd} few {#rd} other {#th}} in the competition.</rank>
</locale>

locales/en.toml:

item_count = "You have {count, plural, one {# item} other {# items}} in your cart."
rank = "You are ranked {position, selectordinal, one {#st} two {#nd} few {#rd} other {#th}} in the competition."

Usage:

from doti18n import LocaleData

i18n = LocaleData("locales")

print(i18n["en"].item_count(count=1))    # Output: You have 1 item in your cart.
print(i18n["en"].item_count(count=4))    # Output: You have 4 items in your cart.

print(i18n["en"].rank(position=1))       # Output: You are ranked 1st in the competition.
print(i18n["en"].rank(position=2))       # Output: You are ranked 2nd in the competition.
print(i18n["en"].rank(position=3))       # Output: You are ranked 3rd in the competition.
print(i18n["en"].rank(position=4))       # Output: You are ranked 4th in the competition.

Select#

The select format allows you to define different message forms based on exact string matches (similar to a switch-case statement).

locales/en.yaml:

user_status: "{status, select, active {Welcome back!} inactive {Please activate your account.} banned {Your account is banned.} other {Hello, guest!}}"
gender_greeting: "{gender, select, male {Hello, sir!} female {Hello, miss!} other {Hello!}}"

locales/en.json:

{
    "user_status": "{status, select, active {Welcome back!} inactive {Please activate your account.} banned {Your account is banned.} other {Hello, guest!}}",
    "gender_greeting": "{gender, select, male {Hello, sir!} female {Hello, miss!} other {Hello!}}"
}

locales/en.xml:

<locale>
    <user_status>{status, select, active {Welcome back!} inactive {Please activate your account.} banned {Your account is banned.} other {Hello, guest!}}</user_status>
    <gender_greeting>{gender, select, male {Hello, sir!} female {Hello, miss!} other {Hello!}}</gender_greeting>
</locale>

locales/en.toml:

user_status = "{status, select, active {Welcome back!} inactive {Please activate your account.} banned {Your account is banned.} other {Hello, guest!}}"
gender_greeting = "{gender, select, male {Hello, sir!} female {Hello, miss!} other {Hello!}}"

Usage:

from doti18n import LocaleData

i18n = LocaleData("locales")

print(i18n["en"].user_status(status="active"))    # Output: Welcome back!
print(i18n["en"].user_status(status="inactive"))  # Output: Please activate your account.
print(i18n["en"].user_status(status="banned"))    # Output: Your account is banned.
print(i18n["en"].user_status(status="other"))     # Output: Hello, guest!

print(i18n["en"].gender_greeting(gender="male"))    # Output: Hello, sir!
print(i18n["en"].gender_greeting(gender="female"))  # Output: Hello, miss!
print(i18n["en"].gender_greeting(gender="other"))   # Output: Hello!

Formatters#

Out of the box, doti18n supports the date formatter. You can also implement custom formatters by extending the BaseFormatter class. See the Custom Formatters section for details.

locales/en.yaml:

appointment: "Your appointment is on {date, date, short}."
now: "Current date and time: {now, date, long}."
custom: "Custom formatted date: {date, date, %A, %d %B %Y year, %H:%M:%S (%Z)}."

locales/en.json:

{
    "appointment": "Your appointment is on {date, date, short}.",
    "now": "Current date and time: {now, date, long}.",
    "custom": "Custom formatted date: {date, date, %A, %d %B %Y year, %H:%M:%S (%Z)}."
}

locales/en.xml:

<locale>
    <appointment>Your appointment is on {date, date, short}.</appointment>
    <now>Current date and time: {now, date, long}.</now>
    <custom>Custom formatted date: {date, date, %A, %d %B %Y year, %H:%M:%S (%Z)}.</custom>
</locale>

locales/en.toml:

appointment = "Your appointment is on {date, date, short}."
now = "Current date and time: {now, date, long}."
custom = "Custom formatted date: {date, date, %A, %d %B %Y year, %H:%M:%S (%Z)}."

Usage:

from doti18n import LocaleData
from datetime import datetime
from zoneinfo import ZoneInfo

i18n = LocaleData("locales")
now = datetime.now(tz=ZoneInfo("UTC"))

print(i18n["en"].appointment(date=now))  # Output: Your appointment is on 29.01.2026.
print(i18n["en"].now(now=now))           # Output: Current date and time: 29.01.2026 22:30:19.
print(i18n["en"].custom(date=now))       # Output: Custom formatted date: Thursday, 29 January 2026 year, 22:30:19 (UTC).

Escaping#

To include literal characters that are reserved for ICUMF formatting (like { or }), use single quotes ' to escape the sequence. To include a single quote itself, use two single quotes ''.

locales/en.yaml:

escaped: "This is a literal brace: '{' and this is a single quote: ''."

locales/en.json:

{
    "escaped": "This is a literal brace: '{' and this is a single quote: ''."
}

locales/en.xml:

<locale>
    <escaped>This is a literal brace: '{' and this is a single quote: ''.</escaped>
</locale>

locales/en.toml:

escaped = "This is a literal brace: '{' and this is a single quote: ''."

Usage:

from doti18n import LocaleData

i18n = LocaleData("locales")

print(i18n["en"].escaped())  # Output: This is a literal brace: { and this is a single quote: '.

Nesting#

ICUMF supports nesting constructs to build complex logic. You can place pluralization, select, and other formats inside each other.

locales/en.yaml:

backpack: |
    You have {item, select,
        book {{count, plural, one {# book} other {# books}}} 
        pen {{count, plural, one {# pen} other {# pens}}} 
        other {{count} items}
    } in your backpack.

locales/en.json:

{
    "backpack": "You have {item, select, book {{count, plural, one {# book} other {# books}}} pen {{count, plural, one {# pen} other {# pens}}} other {{count} items}} in your backpack."
}

locales/en.xml:

<locale>
    <backpack>You have {item, select, 
                  book {{count, plural, one {# book} other {# books}}} 
                  pen {{count, plural, one {# pen} other {# pens}}} 
                  other {{count} items}
                  } in your backpack.
    </backpack>
</locale>

locales/en.toml:

backpack = """
You have {item, select,
    book {{count, plural, one {# book} other {# books}}}
    pen {{count, plural, one {# pen} other {# pens}}}
    other {{count} items}
} in your backpack."""

Important: Variable Context

Notice the other option in the example above. Inside other, we use {count} instead of #.

The hash symbol (#) represents the count only within sub-numeric formatters (like plural or selectordinal). Since select is not numeric, you must use the standard {count} interpolation variable there.

Usage:

from doti18n import LocaleData

i18n = LocaleData("locales")

print(i18n["en"].backpack(item="book", count=1))    # Output: You have 1 book in your backpack.
print(i18n["en"].backpack(item="book", count=3))    # Output: You have 3 books in your backpack.
print(i18n["en"].backpack(item="pen", count=1))     # Output: You have 1 pen in your backpack.
print(i18n["en"].backpack(item="pen", count=5))     # Output: You have 5 pens in your backpack.

# This uses the 'other' case.
print(i18n["en"].backpack(item="sword", count=10))  # Output: You have 10 items in your backpack.

Whitespaces#

ICUMF ignores whitespace characters (spaces, tabs, newlines) outside the message text. You can format your ICUMF strings for better readability without affecting the logic.

However, be careful when using multi-line strings in YAML or XML, as indentation inside the message string itself (e.g., inside {...}) might be preserved depending on how the file is parsed.

locales/en.yaml:

cart: "You have {count, plural, one {  # item  } other {  # items  }} in your cart."

locales/en.json:

{
    "cart": "You have {count, plural, one {  # item  } other {  # items  }} in your cart."
}

locales/en.xml:

<locale>
    <cart>You have {count, plural, one {  # item  } other {  # items  }} in your cart.</cart>
</locale>

locales/en.toml:

cart = "You have {count, plural, one {  # item  } other {  # items  }} in your cart."

Usage:

from doti18n import LocaleData

i18n = LocaleData("locales")

print(i18n["en"].cart(count=1))   # Output: You have   1 item   in your cart.
print(i18n["en"].cart(count=4))   # Output: You have   4 items   in your cart.

As seen above, the extra spaces explicitly placed inside the plural forms ({ # item }) are preserved in the output.

Custom Formatters#

Requirements#

To create a custom formatter, define a class that inherits from doti18n.icumf.formatter.BaseFormatter and meets these criteria:

  1. name: A string representing the formatter's name (e.g., "crypto").
  2. is_subnumeric: Boolean. True if the formatter logic depends on a numeric count (like plural).
  3. is_submessage: Boolean. True if the formatter contains nested messages (like select).
  4. __init__: Must accept a strict: bool argument.
  5. __call__: Must implement the formatting logic and accept/return parameters as defined in the parent class.

Example Implementation#

Tip

If you are unsure how to implement a sub-numeric or sub-message formatter, refer to the source code of built-in formatters in the doti18n.icumf.formatters module.

locales/en.yaml:

crypto: "You have {value, crypto, usdt} in your wallet."

locales/en.json:

{
    "crypto": "You have {value, crypto, usdt} in your wallet."
}

locales/en.xml:

<locale>
    <crypto>You have {value, crypto, usdt} in your wallet.</crypto>
</locale>

locales/en.toml:

crypto = "You have {value, crypto, usdt} in your wallet."

from doti18n import LocaleData
from doti18n.icumf.formatters import BaseFormatter
from doti18n.icumf.nodes import Node, FormatNode, TextNode
from typing import Sequence, Optional
# import some_external_crypto_library as clib

class CryptoFormatter(BaseFormatter):
    name = "crypto"
    is_subnumeric = False
    is_submessage = False

    def __init__(self, strict: bool = False):
        self._strict = strict

    def __call__(self, t: "LocaleTranslator", node: Node, **kwargs) -> Sequence[Optional[Node]]:
        if not isinstance(node, FormatNode):
            raise TypeError("CryptoFormatter can only process FormatNode instances.")

        value = kwargs.get(node.name)
        if value is None:
            if self._strict:
                raise ValueError(f"Missing value for '{node.name}' in CryptoFormatter.")
            else:
                return []  # Return empty for graceful degradation

        # Assume clib.format converts coins to the style (e.g., USDT)
        # formatted_value = clib.format(value, node.style)
        formatted_value = "84467,51" # Mock result

        return [TextNode(f"{formatted_value} {node.style.upper()}")]

# Register formatter (by defining/importing it) BEFORE LocaleData(or ICUMF) initialization
i18n = LocaleData("locales")

print(i18n["en"].crypto(value="123 BTC"))  # Output: You have 84467,51 USDT in your wallet.

Execution Order

You must define or import your custom formatter before creating the LocaleData or Loader instance.

Why? The ICUMF manager registers all available formatters at initialization. If your custom formatter is defined later, it won't be registered.

Tags & HTML Support#

doti18n's parser supports XML/HTML-like tags out of the box. By default, they are rendered "as is" (useful for web apps), but you can intercept and transform them — for example, to convert HTML tags into Markdown for Telegram bots or console output. You also can implent your own tag formatter (see custom formatters) to handle custom tags or give it in another format.

Basic Usage#

Tags are parsed as structured nodes, not just text. This ensures that opening and closing tags match (unless strict_tags=False).

locales/en.yaml:

welcome: "Welcome, <b>{name}</b>! Click <link>here</link>."

locales/en.json:

{
    "welcome": "Welcome, <b>{name}</b>! Click <link>here</link>."
}

locales/en.xml:

<locale>
    <welcome>Welcome, <b>{name}</b>! Click <link>here</link>.</welcome>
</locale>

locales/en.toml:

welcome = "Welcome, <b>{name}</b>! Click <link>here</link>."

Usage (Default HTML behavior):

from doti18n import LocaleData

i18n = LocaleData("locales")
print(i18n["en"].welcome(name="User"))
# Output: Welcome, <b>User</b>! Click <link>here</link>.

Note

The ICUMF parser does not support self-closing tags (like <br/>) or tags with attributes (like <a href="...">). Such tags are returned as-is, and a warning is logged if strict_tags=True.

By default, doti18n converts <link> tags into HTML <a> tags with href attributes. You can customize this behavior by implementing a custom tag formatter (see below).

If you need to include unsupported tags, consider escaping them with single quotes ('<br/'>, '<a href="..."'>) or using placeholders instead (e.g., {line_break}) and passing the values as arguments.

Custom Tag Processing (HTML to Markdown)#

To transform tags, you need to implement a custom formatter (or use built-in) class and inject it into the ICUMF configuration via the tag_formatter argument.

The formatter receives a TagNode which contains children (the content inside the tag).

Example: Converting <b> to ** and <i> to __

locales/en.yaml:

msg: "Hello <b>{name}</b>, this is <i>italic</i>."

locales/en.json:

{
    "msg": "Hello <b>{name}</b>, this is <i>italic</i>."
}

locales/en.xml:

<locale>
    <msg>Hello <b>{name}</b>, this is <i>italic</i>.</msg>
</locale>

locales/en.toml:

msg = "Hello <b>{name}</b>, this is <i>italic</i>."

from doti18n import LocaleData
from doti18n.loaders import Loader
from doti18n.icumf import ICUMF
from doti18n.icumf.formatters import MarkdownFormatter

# Initialize ICUMF with the custom tag formatter
icumf = ICUMF(tag_formatter=MarkdownFormatter)
loader = Loader(icumf=icumf)
i18n = LocaleData("locales", loader=loader)

print(i18n["en"].msg(name="Alice"))  # Output: Hello **Alice**, this is __italic__.

Nested Tags

Since the formatter returns the original node.children, doti18n continues to process the content inside the tag. This means nesting (e.g., <b><i>Text</i></b>) works automatically with your custom formatter.

Using Differnt Formatters#

You can pass a specific formatter directly when calling a translation key. This is particularly useful when you need to use the same translation string for different output formats (e.g., HTML for web and Markdown for Telegram bots).

from doti18n import LocaleData
from doti18n.icumf.formatters import HTMLFormatter, MarkdownFormatter

i18n = LocaleData("locales")
html = HTMLFormatter(strict=True)
md = MarkdownFormatter(strict=True)

key = i18n["en"].msg  # Get the callable for the 'msg' key
print(key(name="Alice", formatter=html))  # Output: Hello <b>Alice</b>, this is <i>italic</i>.
print(key(name="Alice", formatter=md))    # Output: Hello **Alice**, this is __italic__.