Anoma Network Architecture

Author

Affiliation

TG X Thoth

Heliax AG

Doi

Abstract

We present an overview of the Anoma network architecture, and its underlying mathematical and communication models that provide the communication infrastructure for our distributed systems protocols.

The architecture consists of a sovereign domain system that enables pluralistic interoperability, a distributed name system using petnames, a distributed publish/subscribe protocol that offers Byzantine Causal Broadcast, a modular transport and routing system, and a trust and reputation mechanism for service commitments.

Domains provide distributed immutable and mutable data storage and dissemination services, where each domain induces a peer-to-peer overlay with its own authentication, membership, topology, message dissemination and storage protocols.

The communication model is inspired by the actor model, where engine processes communicate via message passing in a distributed system. Message send allows expressing routing & transport selection preferences and constraints. Message delivery semantics are either unreliable, reliable in-order, or reliable casual delivery. Nodes maintain trust and reputation metrics as well as measurements about other nodes, which influences node selection in routing & topology protocols. Compositional cryptographic identities induce unicast, multicast (pub/sub), and anycast communication channels.

Keywords

distributed systems, peer-to-peer networks

1 Introduction

The Anoma network architecture employs a distributed system design with the following main characteristics.

Cryptographic identities enable authenticated and encrypted communication, message addressing, as well as ownership and access control for shared network resources, such as domains, publish/subscribe topics, and zones in the name system.

Sovereign domains enable pluralistic interoperability in the network, where each domain induces a peer-to-peer overlay with their own protocols and access control rules for joining the overlay and for sending request to it.

Layered network protocols offer various services inside domains, such as a domain membership and topology protocol, a topic-based publish/subscribe protocol, a data storage protocol for immutable data, and a name system using secure and memorable names.

Versioned types for protocols and data at rest allows future protocol and algorithm upgrades, as well as pluralism in protocol choice for sovereign domains.

In this document we outline the overall design, requirements, message types, and interfaces for the network architecture, while precise protocol details remain future work.

For type definitions we use Juvix [Xyza], a statically typed functional language.

1.1 Design overview

1.1.1 Identities

Users, nodes, publish/subscribe topics, and domains each have a cryptographic identity. Cryptographic identities may be compositional, where multiple identities participate in a threshold cryptosystem.

1.1.2 Domains

The network consists of sovereign domains that provide services to their members, and may choose to serve external requests from non-members via a common inter-domain protocol, which is a request-response protocol that enables interoperability between domains.

Each domain has a peer-to-peer overlay associated with it, and defines its own membership rules and protocols used in the overlay.

Grassroots domains are partition-tolerant domains that may consist of multiple intermittently connected network partitions. Partitions of grassroots domains may synchronize opportunistically, however, in order to enable asynchronous communication between mobile nodes, and eventual synchrony between network partitions within a bounded time, a core network overlay with stable nodes may offer encrypted store-and-forward messaging and synchronization services to mobile nodes on edge networks.

The minimum requirement for a domain is for the owner identity to define the domain configuration in a signed Domain Advertisement that allows join requests and external requests to be sent to designated domain members. The Domain Advertisement may be shared directly with interested nodes, or published in the Decentralized Name System that allows updates and lookups.

Existing peer-to-peer overlays may also participate in the domain system by publishing a Domain Advertisement that specifies nodes that implement the inter-domain protocol enabling interoperability with other domains.

1.1.3 Anoma protocols & domains

Protocols implemented by Anoma run in domains, which delineate administrative and security boundaries in the system.

Anoma domains implement a common set of protocols that provide fundamental services in the domain, such as membership, message dissemination, and data storage.

For example, an instance of the Anoma Resource Machine [KhGo24; Shef24] may run in its own domain, together with the underlying consensus protocol instance [HeKaSh24; KaSh24], while another domain may be used for intent submission for Intent Machines [HaRe24].

A typical information flow across various protocols is the following: users submit their intents (partial transaction candidates) that are disseminated to solvers, who match them and submit them as transaction candidates to a mempool, where they are subsequently ordered by consensus, and finally the ordered transactions are executed by the execution system.

As this example illustrates, different protocols perform different tasks, have different responsibilities, and different nodes participate in them.

The network consists of multiple domains with distinct responsibilities and protocols. Compared to a single, open network, multiple domains with restricted membership reduces the attack surface on domain members that provide crucial services by allowing domains to expose publicly only a limited set of designated members that serve external requests.

1.1.4 Network protocols

The system employs a layered protocol design where each protocol layer offers services for other protocols and applications, as illustrated by Figure 1.

Figure 1: Protocol layers

1.1.4.1 Transport

Transport protocols are responsible for establishing and maintaining point-to-point connections between pairs of nodes over various physical and overlay network protocols.

Transport protocols have different security, reliability and ordering properties. E.g. QUIC & TLS offers authenticated and encrypted connections with reliable and ordered message delivery guarantees.

1.1.4.2 Unicast routing

The unicast routing protocol is responsible for routing messages over the network between two engines on different nodes based on the destination address in the message.

Source and destination addresses in messages consist of a node identifier, and an engine name.

The routing process involves address resolution using locally available node advertisements published in the name system, and transport protocol selection based on local constraints and preferences.

Unicast messaging can be extended to the asynchronous setting by using a pub/sub topic where relays & brokers perform store-and-forward message dissemination.

1.1.4.3 Distributed Publish/Subscribe

The topic-based publish/subscribe protocol offers Byzantine Causal Broadcast of signed and encrypted messages from authorized publishers to all subscribers of a topic. Subscribers may also use content-based filters to further refine their subscription.

Published messages form a Blocklace [AlSh24] data structure: a Directed Acyclic Graph (DAG) where edges indicate a happened before causality relation between messages of publishers, each of whom publishes a linear history of messages, i.e. equivocations are not allowed.

The protocol offers both a broker-based and a brokerless mode. - Brokerless: direct subscribers participate in the dissemination protocol, forwarding messages to other subscribers. - Broker-based: indirect subscribers subscribe at a broker without participating in the dissemination protocol. The broker is a direct subscriber that stores and forwards messages to its indirect subscribers.

1.1.4.4 Mergeable Replicated Data Types

Mergeable Replicated Data Types (MRDT) [KPSJ19; SKNS22] enable asynchronous multi-party collaboration on shared, mutable data structures.

Replicated Data Types (RDTs) are convergent, i.e. concurrent operations are deterministically reconciled. In the MRDT model a three-way merge function is used for this purpose.

The MRDT store is a key-value store where each value is an MRDT, and transactions submitted by writers contain a set of operations on MRDTs.

The store is permissioned at the operation level, where access control rules may specify which writer may perform what operations on which keys.

1.1.4.5 Distributed Immutable Storage

The Distributed Immutable Storage protocol is responsible for storage and retrieval of encrypted, content-addressed immutable data.

Data is chunked to a uniform size, each chunk is encrypted using convergent encryption with a convergence secret, and finally organized in a Merkle tree.

A capability-based security model is used where a verify capability (the Merkle root hash) allows traversal of the whole Merkle tree and fetching each chunk, while a read capability contains in addition the decryption key, that allows decrypting each chunk and reassembling the decrypted content.

1.1.4.6 Decentralized Name System

The Decentralized Name System is a petname system [Stie05] with secure and memorable names that assigns a zone to each identity in the system. Each zone contains a set of signed mutable records that are stored in the network and updated via pub/sub.

In addition to assigning human-readable names to records, the name system is also used for address resolution and binding identities and addresses together from different layers.

1.1.4.7 Domain Membership & Topology

The Domain Membership protocol is a peer-to-peer overlay membership protocol with a full view of members.

Based on the currently online members and their subscriptions, the Topology protocol creates and maintains Topic-Connected Overlays (TCO) for pub/sub topics.

1.1.5 Service commitments

Nodes may commit to provide services to other nodes. A service commitment is a cryptographic commitment by a service provider about services provided [GaZa24]. E.g. a storage commitment indicates that a given node stores a given piece of data for a certain period of time.

2 System model

2.1 Identities

Cryptographic identities are used for various entities in the system, such as users, nodes, publish/subscribe topics, and domains.

Each identity has an associated communication channel, and a zone in the Distributed Name System. The identity serves as the address for the communication channel, identifies the DNS zone, and signs its records.

Cryptographic identities may be compositional, where multiple identities participate in a threshold cryptosystem. This may be used e.g. for shared ownership of a domain or pub/sub topic.

2.2 Communication model

The system uses message passing between engines [HePrHa25]. The engine model is inspired by the actor model, where processes communicate via message passing in a distributed system.

An engine is a process with local state that can send messages to and receive messages from other local and remote engines, react to timer events, and spawn other engines.

A node is a set of colocated engines reachable over the network. Nodes in the same overlay network directly connect to each other using their advertised addresses.

A node consists of a set of running engine instances, has a cryptographic identity and a number of transport addresses. Nodes communicate with each other over authenticated and encrypted transport channels.

A user interacts with the system and other users via their local node that runs on each user device.

2.3 Communication patterns

Cryptographic identities induce unicast (node-to-node messages), multicast (publish/subscribe topics), and anycast (domain requests) communication channels.

2.3.1 Unicast

Unicast communication is used for direct messages between two engines, which may be reliable and ordered, or unreliable and unordered, depending on the chosen transport protocol.

2.3.2 Multicast

Multicast communication is used in pub/sub topics, where a message is sent by one or more authorized publishers and eventually delivered to all subscribers. Pub/sub topics offer Byzantine Causal Delivery guarantees with partial, causal ordering of messages.

In addition to topic subscriptions, subscribers may further refine their subscription with per-topic content-based filters.

When brokers are used in a pub/sub topic, they enable asynchronous communication for intermittently connected end-user nodes. Multicast messages may be encrypted to preserve end-to-end security of messages in a topic where broker nodes provide store-and-forward services.

2.3.3 Anycast

Anycast communication is used for messages sent to any known member of a domain that is designated to serve external requests.

2.4 Messages

2.4.1 Message addressing & routing

Destination routing is used to deliver messages sent by engines.

The source and destination address in unicast messages consists of a pair of node identifier and engine name.

Multicast messages are addressed using the topic identity as destination address and publisher identity as the source address.

Anycast messages are addressed by the domain identity.

2.4.2 Message types & versioning

Messages are sent between engines. The message body is statically typed and defined by the communicating engines. Message types may change over time and are versioned for this reason. In order for two engines to be able to communicate, both the sender and receiver engine must support the same protocol and thus message versions.

2.4.3 Message sending

Messages contain cryptographic source and destination identities, and sent to other nodes over authenticated and encrypted network connections.

Message send allows expressing routing & transport selection preferences and constraints, such as reliability, ordering, and security requirements for the transport channel.

2.4.4 Message delivery

Message delivery over the network happens over authenticated and encrypted communication channels.

Unicast message delivery guarantees between two nodes vary depending on the transport protocol used, e.g. ordered, reliable delivery when using QUIC and TLS.

Multicast communication in pub/sub topics offers Byzantine Causal Delivery guarantees, i.e. reliable delivery of causally ordered messages in a Byzantine setting.

In the underlying Blocklace data structure causally dependent messages form a DAG, which enables full and partial synchronization, as well as lost message detection and recovery.

2.5 Network architecture

2.5.1 Core & edge networks

The system consists of a core network of stable, always-on core nodes that communicate directly with each other and synchronize data continuously, as well as several edge networks of mobile edge nodes that may move between different edge networks, and synchronize data opportunistically with other nodes that belong to the same domain.

2.5.2 Domain overlays

The network is organized as several distinct peer-to-peer overlays we call domains. Inside a domain pub/sub topics form a Topic-Connected Overlay (TCO), where each topic overlay contains only subscribers of the given topic.

2.5.3 Pub/sub overlays

Each topic overlay contains a multi-hop dissemination topology, where subscribers forward messages to each other. In case of topics with few subscribers or when low latency is crucial, direct delivery may be used by publishers in a star topology.

2.6 Data model

The system offers multile data models for different purposes.

Immutable data with content hash references allows storing and retrieving encrypted data blobs used by applications.

Mutable data is modified via execution of transactions that may read and write keys of a distributed key-value store.

Two distinct execution models are offered: the resource model relies on totally ordered transactions and uses the Resource Machine [arm?] to create and consume resources, while the replicated data type model relies on causally ordered transactions and use Mergeable Replicated Data Types (MRDT) [SKNS22] that use three-way merge functions to reach eventual consistency of replicas.

3 Network Architecture

3.1 Identities

Cryptographic identities based on public key cryptography are used for each user, node, topic, and domain in the system. The internal identity refers to the private key that is used for signing, while the external identity refers to the public key that is used for addressing messages, and as the zone identity in the name system.

type PublicKey :=
  | Curve25519PubKey ByteString
  ;
type PrivateKey :=
  | Curve25519PrivKey ByteString
  ;
type Signature :=
  | Ed25519Signature ByteString

InternalID : Type := PrivateKey;
ExternalID : Type := PublicKey;
Identity : Type := Pair InternalID ExternalID;

UserID : Type := ExternalID;
NodeID : Type := ExternalID;
TopicID : Type := ExternalID;
DomainID : Type := ExternalID;
ZoneID : Type := ExternalID;

Figure 2: Identities

3.2 Engines

Protocols in the system are implemented by engines [HePrHa25].

An engine is an addressable process with local state that runs on a node, and has an engine name that is unique to the node it runs on. It may send and receive messages to and from other engines over communication channels provided by the system. It may also react to other local events, such as timers.

The engine identity, EngineID (Figure 2), is a pair of the engine name and the node identity that is used for message addressing and routing.

EngineName : Type := String;
EngineID : Type := Pair NodeID EngineName;

Figure 3: Engine Name & Identity

3.2.1 Mailboxes

An engine may have an optional mailbox [art-2025-mailbox-actors?] that is responsible for filtering an buffering incoming messages before they reach the engine. For example, a mailbox may implement causal delivery, which is necessary for pub/sub where subscribers deliver messages in causal order.

3.3 Nodes

A node is a set of colocated engines with a cryptographic node identity, NodeID, reachable via one or more transport protocols, each with its own transport address. Engines of a node are part of the local trust domain that allows them to access certain local-only engines, such as local storage.

Each node advertises its current transport addresses in the node’s zone in the name system, in a signed and versioned record set (Figure 15), which contains a node advertisement record, NodeAdvert (Figure 4). It binds the routing and transport layer, i.e. maps a NodeID to one or more transport addresses with associated properties, such as cost.

Upon establishing a transport connection, nodes perform a handshake that includes protocol version negotiation.

Nodes perform incoming and outgoing message routing for their own engines, based on locally available advertisements. Unicast routing is supported natively over various transport protocols, while multicast and anycast routing is performed by an engine instance responsible for a specific topic or domain, respectively.

TODO transport, node handshake, protocol versioning

3.4 Users

A user interacts with the system and other users via one or more nodes. Typically each user runs a node on each of their devices, however there’s no one-to-one relationship between users and nodes: a user may use multiple nodes, and a node may serve multiple users. This allows flexibility in adapting the system to different requirements and environments.

When using multiple nodes, a user may employ a pub/sub topic with all their nodes participating in it, in order to synchronize configuration, domain and topic membership, and application data.

3.4.1 User inbox & advertisement

Users may advertise an inbox for receiving messages asynchronously.

The interface to the inbox is a pub/sub broker that accepts submissions to a pub/sub topic that the recipient user’s nodes participate in.

The user advertisement, UserAdvert (Figure 5) is published in the name system in the user’s zone. It contains inbox nodes and prekeys.

The inbox is intended for irregular communication, and may have additional authentication requirements for submission, such as a pre-shared key to avoid spam. For regular communication a pub/sub topic is used instead where all communicating users’ nodes participate in.

Prekeys are cryptographic key material necessary for asynchronous key agreement, used by protocols such as X3DH [MaPe16] and PQXDH [KrSc23]. Publishing prekeys in a decentralized name system allows these protocols to be used in a decentralized setting.

3.5 Messages

An engine message, EngineMsg (Figure 6), is sent between engines over a communication channel supported by the system, i.e. either unicast, multicast, or anycast.

type EngineMsg M := mkEngineMsg {
  src : EngineID;
  dst : EngineID;
  deps : Set MsgID;
  msg : M;
  hsig : Option Signature;
};

type MsgID :=
| EngineMsgID EngineMsgID
| ExtTopicMsgID ExtTopicMsgID
;

type EngineMsgID := mkEngineMsgID {
  src : EngineMsgIDSrc;
  hash : Digest;
  hsig : Option Signature;
};

type EngineMsgIDSrc :=
| EngineMsgSrc
| EngineMsgDst
;

type ExtTopicMsgID := mkExtTopicMsgID {
  topic : TopicID;
  msg : TopicMsgID;
};

Figure 6: Engine Message

3.5.1 Message routing & addressing

Destination routing is used to deliver a message to its destination, where routing decisions are made solely based on the destination address. Each message has a source and destination address, which varies based on the type of communication channel. The unicast source and destination address is an EngineID, i.e. a pair of NodeID and EngineName. The multicast destination address is a TopicID, while the source address is a PublisherID. For anycast the destination address is DomainID, while the sender is a unicast source address. The response to an anycast message happens over unicast.

3.5.2 Message versioning & types

The message payload is versioned and typed. Communicating engines must support the given message type and version, otherwise the message is dropped.

3.5.3 Message serialization

Serialization (marshalling) and deserialization (unmarshalling) must be performed before sending a message to and after receiving a message from the network.

The serialization protocol must have a message schema that supports versioning in order to allow protocol negotiation and upgrades, and must support an efficient encoding of binary data on multiple platforms. Suitable choices include BARE [Xyzb] and Protobuf [Xyzc].

3.6 Domains

Domains organize the network in smaller, sovereign overlay networks that determine themselves the protocols they use internally, and the authentication mechanism for members. They may also participate in the inter-domain protocol that allows non-members to send external requests to query information from the domain, subject to access control rules.

3.6.1 Domain identity & advertisement

The domain identity, DomainID, is a cryptographic identity controlled by the domain’s owner, which may also be a compositional identity controlled by multiple parties in a threshold cryptosystem.

The domain advertisement, DomainAdvert (Figure 7), is published in the name system in the domain’s zone, and serves as the authoritative definition of the domain configuration. It contains the set of nodes responsible for serving join requests, the set of nodes responsible for serving external requests, and the versioned protocols used for these two types of requests.

3.6.2 Domain protocols

Domains use a set of layered protocols, as illustrated by Figure 1. Pub/sub is the underlying dissemination protocol for other protocols.

The Domain Membership protocol maintains the set of domain members, the Domain Topology protocol constructs and maintains a connected domain overlay based on members’ pub/sub topic subscriptions in the membership protocol, while each Pub/Sub Topic offers message dissemination from publishers to subscribers.

3.7 Domain Membership

A domain member is a node that may join a domain by sending a join request to a designated member of the domain with permission to add members. After membership is granted, the new member establishes connections to a number of existing members of the domain indicated in the join response, thereby joining the domain’s overlay network.

Membership models include open access where anyone can join, and restricted access that requires some form of authentication, such as a shared secret, a membership certificate issued by an authorized member that serves as an invitation, or a zero knowledge proof of membership.

A user may join a domain with multiple nodes they control when the authentication mechanism allows this, e.g. shared secret. When membership certificates are used, certificate hierarchies allow users to issue certificates to their nodes, where a node can use this certificate to join the domain. Domains may limit the number of nodes per user to avoid Sybil attacks.

The membership protocol must be scalable and must tolerate Byzantine faults. Overlay membership protocols either use a full view, or a partial view. Partial view protocols typically use a gossip protocol to exchange partial membership views, however, they can be easily attacked by a malicious node that introduces bias to the gossiped view, potentially completely isolating target nodes from the network. As shown by Fireflies [JRVJ15], a full view protocol avoids these problems, and it is feasible to implement in a scalable manner without costly and incomplete mitigation techniques such as proof of work [BGKK09] and probabilistic sampling [AnBuSe13]. The membership protocol uses a full view for this reason.

Each domain has a name store, which is an MRDT store that contains zone label to record set mappings. The MRDT store is backed by a pub/sub topic with the Byzantine Causal Broadcast property that allows members to publish updates to their own zone. The pub/sub TopicID of the name store equals to the DomainID, to which all members of the domain subscribe to. Updates to the name store are used as the membership protocol.

The domain owner publishes a record in the empty label of the domain that contains the set of NodeIDs that are members of the domain (Figure 8). Each node publishes two labels there in their own zone with signed record sets. The empty label contains a record set with the NodeAdvert, which is updated every time it changes. Another label that corresponds to the DomainID contains a record set with the node’s membership record, DomainMember (Figure 9), which specifies its pub/sub topic subscriptions. The version number and timestamp of this record set is updated periodically, which serves as a last seen timestamp for the member. This serves as a mechanism to determine recently online members in the domain.

type DomainMembers := mkDomainMembers {
  nodes : Set NodeID;
};

Figure 8: Domain Members in the domain’s zone

type DomainMember := mkDomainMember {
  subs : Set TopicID;
};

Figure 9: Domain Member in the node’s zone

3.8 Domain Topology

The Domain Topology protocol constructs a Topic-Connected Overlay (TCO) [CMTV07], where each pub/sub topic induces a connected sub-overlay of subscribers of the topic, i.e. only subscribers of the topic participate in the dissemination of messages published in the topic. To increase robustness of the overlay, domains employ a k-Topic-Connected Overlay (kTCO) [ChViJa21], where each topic sub-overlay is k-connected.

The topology protocol uses topic subscriptions from the membership protocol, where each node indicates its topic subscriptions. Since the membership protocol employs a full view, the topology protocol has access to all subscriptions for constructing the overlay.

3.9 Publish/Subscribe Topics

Pub/sub topics offer reliable message dissemination with causal order message delivery from a set of authorized publishers to all subscribers.

Each message in a topic is authenticated via a signature by the publisher. Messages in a topic form a message DAG through signed hash pointers to their causal dependencies. This unforgeable partial order log (PO-LOG) serves as the basis of eventual consistency guarantees, with optional equivocation exclusion, where needed.

A publisher is an identity authorized to publish messages in the topic.

A relay is a node that participates in the topic overlay.

A broker is a relay that also serves indirect subscribers.

A subscriber is an engine that subscribes to the topic and delivers its messages in causal order. It may subscribe either directly via a local relay or indirectly via remote broker(s), or both: a node on an end-user device may act as a relay on the local edge network, and connect to one or more remote brokers.

Messages sent to the topic may be encrypted, in which case only subscribers are able to decrypt it, relays and brokers only store and forward encrypted messages.

A topic is instantiated in one or more domains, and uses the topic overlay constructed by the Domain Topology protocol. In case a topic exists in multiple domains, relay nodes forward messages for each overlay they participate in.

3.9.1 Topic identity

Each pub/sub topic has a topic identity, TopicID, which, like domain identities, may be compositional for shared ownership.

The TopicID serves as a globally uniquely identifier for a topic, and is used for addressing messages sent to the topic.

There’s a zone in the Name System associated with the TopicID, which contains the topic advertisement that sets the rules for the topic.

3.9.2 Topic Advertisement

The topic advertisement, TopicAdvert (Figure 10), is signed by the topic identity and is used to specify the set of publisher identities, PublisherIDs, that are allowed to publish messages in the topic, the set of relay nodes that can be used to join the topic overlay, the set of broker nodes that can be used to publish and subscribe to the topic without having to join the topic overlay, and an optional set of tags that may be used for automatic subscriptions to new topics of interest.

type TopicAdvert := mkTopicAdvert {
  publishers : Map PublisherID (Set TopicMsgContentType);
  relays : Set NodeID;
  brokers : Set NodeID;
  tags : Set String;
  epoch_freq : RelTime;
  epoch_quorum : Nat;
  equiv_excl : Boolean;
};

3.9.3 Pub/sub messages

Pub/sub messages (Figure 11) are published and signed by a publisher, and contain the PublisherID, a set of causal dependencies), a set of tags, the message content, and a signed hash of the message.

The message hash is calculated from the TopicID concatenated with the message fields: H(topic||publisher||deps||tags||content) Then a signature is made over this hash by the private key that corresponds to the TopicID, resulting in the signed hash that authenticates the message.

The topic message ID, TopicMsgID, is an authenticated identifier for a message in a topic. It also serves as the pointer to causal dependencies in the message DAG of the topic. It contains the PublisherID, the message hash, the signature over the hash, and optionally the TopicID, in case it is used as an external reference outside the topic.

The message content may be either an explicit acknowledgement of a previous message, an announcement of a new epoch, or a message type defined by an engine.

The Ack message is a no-op, i.e. doesn’t have any content, it only acknowledges previous messages via its causal dependencies, and thus indicates that the sender has seen the messages reachable from the given cut in the message DAG.

type TopicMsg := mkTopicMsg {
  publisher : PublisherID;
  deps : Set TopicMsgID;
  tags : Set String;
  content : TopicMsgContent;
  hsig : Signature;
};

type TopicMsgID := mkTopicMsgID {
  topic : Optional TopicID;
  publisher : PublisherID;
  hash : Digest;
  hsig : Signature;
};

type TopicMsgContent :=
| Ack
| Epoch TopicEpochMsg
| Engine1 Engine1TopicMsg
| Engine2 Engine2TopicMsg
| ...
;

Figure 11: Topic Message

3.9.4 Epochs

An epoch is a period during which topic members store all messages published in the topic to be able to serve synchronization requests. During an epoch the TopicAdvert does not change to ensure consistency across subscribers, since it contains the set of valid publishers and their permissions, which must be the same for all subscribers to ensure consistency. Changing to a new epoch allows garbage collection of messages in the previous epoch, and forces state synchronization for subscribers. Epochs are sequentially numbered, starting with 0 at topic creation.

A new epoch is announced via a TopicEpochMsg (Figure 12). This is the first message in an epoch, and also the first message after topic creation. In order to change to a new epoch, a publisher authorized to send this message type proposes a new epoch by sending a TopicEpochMsg, which contains the incremented epoch number, the hash of the state of the previous epoch, if applicable,

type TopicEpochMsg := mkTopicEpochMsg {
  epoch : Nat;
  state : Option Digest;
  advert : RecordSet;
};

Figure 12: Topic Epoch Message

3.9.5 Message DAG

Messages published in the topic form a Directed Acyclic Graph (DAG) due to the set of casual dependencies present in each message, which establishes a partial, causal order of messages in the topic.

The set of causal dependencies in a topic message are the leaf nodes in the DAG as seen by the publisher, i.e. the last message from all publishers in the topic that are not already referenced by other messages. This set may also include the publisher’s own previous message if it is not already referenced by other publishers.

This data structure is named Blocklace in [AlSh24], where it is shown that equivocations can be prevented by excluding publishers who attempt to send different, independent messages to different subscribers, since they are eventually discovered as they become part of the DAG. The Blocklace is a Conflict-free Replicated Data Type (CRDT) [SPBZ11], a partial order log* (PO-Log) data type with a single append operation.

The message DAG enables detecting missed messages, which happens when a subscriber comes across an unknown causal dependency that it hasn’t received yet from the network. To recover missed messages and after (re)joining the topic, a DAG synchronization protocol is used between pairs of subscribers.

3.9.6 Message delivery

Subscribers of pub/sub topics deliver messages locally in causal order, buffering messages with missing dependencies. It is independent of the dissemination protocol that forwards messages without buffering. A message synchronization protocol ensures that missing dependencies eventually arrive. Once a missing message arrives, it triggers delivery of deliverable buffered messages.

Causal delivery maintains the causal order according to the message DAG, which is the result of a topological sort of the DAG.

3.9.7 Consistency model

TODO equivocation

Pub/sub topics use the Byzantine Eventual Consistency (BEC) [KlHo20] model, backed by a Byzantine Causal Broadcast (BCB) protocol.

These properties hinge on the underlying causal message DAG that enables missed message detection despite Byzantine nodes block certain messages. The combination of push and pull dissemination phases ensure eventual consistency: in the push phase the protocol strives to deliver messages to as many subscribers as it can while trying to minimize duplicates, while in the pull phase missed messages are actively synchronized from other subscribers.

3.9.8 Dissemination

TODO at-least-once

The publish/subscribe protocol uses the sub-overlays constructed by the topology protocol for dissemination.

When publishing a message, the push phase of dissemination involves the publisher’s node pushing it to other nodes on the potentially multi-hop dissemination path, who forward it further to their subscribers.

Message dissemination in the topic happens in multiple phases:

Dissemination among publishers.
Dissemination from publishers to direct subscribers.
Dissemination from brokers to indirect subscribers.

Dissemination paths may be single-hop with a star topology, which is used when the number of subscribers are small or low latency is crucial, or multi-hop with a DAG topology, which is more efficient for large numbers of subscribers.

Before forwarding a message to the next hop, each node ensures that the message is valid by verifying whether the publisher is authorized to publish the given message according to the TopicAdvert of the current epoch.

When a subscriber encounters a missed message dependency it hasn’t seen yet, it explicitly requests the missed messages using the synchronization protocol in the pull phase of dissemination.

3.9.9 Synchronization

Synchronizing the message DAG between two nodes involves the requestor sending a request with a set of missing message hashes, to which the responder responds with the requested messages, if available. This can be made more efficient by using a Bloom filter of message hashes instead, as shown by [KlHo20].

3.9.10 Brokers

Brokers are direct subscribers that participate in the dissemination protocol in the topic overlay, and allow indirect subscribers to publish and subscribe through them without having to join the overlay.

Using brokers endow mobile nodes on edge networks with asynchronous communication capabilities, for which, due to their limited hardware resources, battery life, and intermittent connectivity, participating in the topic overlay in the core network is not practical. Subscribers may also specify additional content-based filters based on the publisher identity and the set of tags in the message to further reduce the amount of messages received. Message tag semantics are entirely up to the application using the topic, e.g. it could be an opaque reference to a forum topic that allows selective subscription to that topic. Further filtering based on the application-specific message content is only possible on end subscriber’s node, which are able to decrypt and understand the type of the message content.

Brokers also serve as a point of submission to a topic for publishers that do not participate in the topic overlay. An example for this is intent submission & dissemination, where many users who do not participate in the topic overlay may submit intents to topic that disseminates them to a few solvers who are subscribers of the topic.

3.10 Distributed Immutable Storage

The Distributed Immutable Storage protocol allows storing immutable data blobs in the network. It offers encrypted, chunked, deduplicated data storage with immutable blob and chunk references.

Before storing, blobs are first chunked, then each chunk separately encrypted, and finally organized in a Merkle tree. The chunk identifier, ChunkID, is the cryptographic hash of the chunk contents. Data chunks are leaves of the tree, while the root chunk is the entry point to the stored data blob, from which the whole tree can be traversed. The chunk ID of the root node also serves as the blob identifier, BlobID. Non-leaf nodes contain pointers to tree nodes on the next level, as a set of ChunkIDs and corresponding content decryption keys. Since chunk size is limited by a maximum size, which is a parameter to the chunking algorithm, multiple levels of non-leaf nodes may be necessary when the set of data chunk pointers do not fit in a single chunk.

The ChunkID is a verification capability that allows requesting and verifying the subtree reachable from the given chunk, therefore the BlobID allows traversing the whole tree.

The ChunkID in combination with its decryption key serves as a read capability that allows decryption of chunk contents, i.e. the data in leaf nodes, and the decryption key for the next level of chunks in non-leaf nodes.

Chunk encryption is done using convergent encryption [WiWa08], where the encryption key for a chunk is derived from the cryptographic hash digest of the chunk’s plaintext content and a convergence secret. The convergence secret prevents confirmation attacks when the plaintext is known, and sets the scope for deduplication. The scope may be global for public data, or set per domain, topic, or even per blob.

type Chunk := mkChunk {
  children : List ChunkID;
  content : ByteString;
};

type ChunkContent :=
  | InternalNode (List SecretKey)
  | LeafNode ByteString;

Figure 13: Chunk

3.10.1 Storage Commitments

A Storage Commitment allows a node to commit to storing the referenced blob for a certain amount of time. It contains the ChunkID that is stored recursively, i.e. the subtree reachable from there. the NodeID that stores the blob, an expiry condition until which the blob is stored, such as an absolute time or an epoch number in a given pub/sub topic, authentication mechanism for requesting the blob, such as membership of a given domain, and a cryptographic signature by the node.

type StorageCommitment := mkStorageCommitment {
  id : ChunkID;
  node : NodeID;
  expiry : StorageCommitmentExpiry;
  auth : StorageAuth;
  version : Nat;
  created : AbsTime;
  sig : Signature;
};

Figure 14: Storage Commitment

Storage commitments are typically sent to pub/sub topics, and may also be stored in the name system, which allows assigning mutable named references to a commitment, and thus to the blob it references.

Each node stores storage commitments it knows about, along with the source it arrived from, e.g. the pub/sub topic ID or the zone and label in the name system, which may be used to look for other commitments after the commitment expired.

Periodic verification of stored data ensures data availability. The blob ID is a verification capability that allows verifying that all chunks of a blob are available, however a more efficient probabilistic data availability sampling protocol is necessary to verify large amounts of data in practice. Erasure coding, such as Reed-Solomon codes, is another way to improve fault tolerance [WiWa08; Xyzd].

3.10.2 Lookups & requests

Each node stores a cache of known storage commitments and blobs locally. Storage commitments are either sent via unicast or multicast messages between engines, or stored in name system records that are cached after a lookup.

Locally known storage commitments are used for blob lookups. During lookup, a blob request is sent to each node with a known storage commitment.

When a node receives a blob request, it may answer either with the content of the blob, if available locally and the node wishes to provide this service, or with storage commitments for the requested blob, if known.

The authentication requirement specified in the storage commitment may restrict to whom a blob may be served upon request. Such requirement could be e.g. membership of a given domain, which can prevent former members of a domain from requesting data blobs despite having a valid storage commitment about them.

When the request was successful, the returned blob is cached locally. In case the request fails, the requestor should try to update the storage commitment about the blob, e.g. by sending requests to nodes who might know about updated commitments or by performing a name system lookup to update the record where the commitment is stored.

3.11 Decentralized Name System

The Decentralized Name System allows any identity in the system – i.e. user, node, topic and domain identities – to publish & update a zone with a map of labels to record sets. Record sets are cryptographically authenticated via a signature by the zone identity, ZoneID.

The name system is a petname system [Stie05] with secure and memorable, but not globally unique names. Secure zone delegation enhances petnames with transitivity, and removes the need for a trusted root authority [WaScGr14], as name resolution is relative to the user’s own zone.

3.11.1 Zones & records

Each label in the zone points to a record set of typed records (Figure 15) signed by the zone identity.

The empty label in the zone refers to top-level records in the zone, which contains the nickname for the zone that becomes the default label for delegation, the set of topic IDs where the zone is published, and the advertisement that corresponds to the type of identity (i.e. user, node, topic, or domain advertisement).

Zone delegation happens via a zone record that assigns a label to the identity of the delegated zone, which then becomes part of transitive name resolution.

Finally, a blob record contains a storage commitment, that points to a data blob and the node that stores it.

The type system is extensible, further record types may be defined by engines and applications.

type ZoneLabel : Type := ByteString;

type RecordSet := mkZoneRecord {
  records : Set Record;
  version : Nat;
  created : AbsTime;
  sig : Commitment;
};

type Record :=
| Pub TopicID
| Nick String
| Zone ZoneID
| Blob StorageCommitment
| UserAdvert UserAdvert
| NodeAdvert NodeAdvert
| TopicAdvert TopicAdvert
| DomainAdvert DomainAdvert
;

Figure 15: Zone label & Record set

3.11.2 Publishing & replication

Records in the zone are published in one or more MRDT key-value stores. Each MRDT store is backed by a pub/sub topic that is listed in the empty label of the zone.

In the key-value store each key is composed of the ZoneID and a label, while each value contains a RecordSet. This key format allows publishing label to RecordSet mappings in multiple stores, and one store may contain records from multiple zones. Before accepting a label update, the store verifies whether the record set signature is valid, i.e. whether it corresponds to the ZoneID that is part of the key.

Labels in a zone may be published in different stores, based on information flow control requirements. In case of a user zone, public and private labels are distinguished, where private labels remain in a store that is privately replicated among the user’s nodes, while public records are replicated more widely in one or more domains.

An MRDT transaction is used to update parts of a record set, which needs to increment its version number and and update its signature at the same time to be valid. Partial updates allow bandwidth-efficient updates to records, as only the changed fields are transmitted over the network.

Each domain has a domain name store that contains records relevant to its operation, which its members may use for name resolution.

3.11.3 Name resolution

Each node maintains a local node name store in the same key-value format as they are published and described above, where it stores all locally known zone label to RecordSet mappings, which is used for name resolution.

Name resolution is transitive and relative to a locally designated top-level root zone. Users resolve names starting from their own zone that associated with their UserID. Similarly, a node’s engines start name resolution from the zone of the NodeID.

E.g. Alice may refer to her own inbox as simply inbox, after adding a delegation record for Bob, she can refer to Bob’s inbox as inbox.bob, and after Bob adds a delegation record for Carol, Alice can reach Carol’s inbox as inbox.carol.bob.

In case the name system needs to co-exist with other name systems, a pseudo-top-level-domain (pseudo-TLD) may be used as a suffix to the names, i.e. inbox becomes inbox.tld in the above example.

Querying a single name in a zone is possible by sending a request to a pub/sub broker that serves the name store that hosts the zone. Recursive queries return results with any referenced record sets, reducing the number of queries necessary, e.g. when querying the record set of a topic, the TopicAdvert contains NodeIDs that reference other zones, and thus the response would also contain record sets for the empty label of the reference zones, which in this case would contain the NodeAdverts of the relay and broker nodes. Each returned record set is stored in the local name store, to be used for name resolution.

For frequently used zones, subscribing to a name store, with key prefix filters for relevant zones, allows keeping the zone’s records synchronized.

3.12 Mergeable Replicated Data Types

Mergeable Replicated Data Types (MRDT) [KPSJ19; SKNS22] allow collaboration on arbitrary data types by employing a three-way merge function to deterministically reconcile concurrent updates. In contrast to Conflict-free Replicated Data Types (CRDT) [SPBZ11], MRDTs are more expressive, as they do not require operations to be commutative and they allow composition of data types, which allows MRDTs to be used for distributed version control of arbitrary data types. MRDTs allow purely functional data types to be promoted to RDTs by specifying a three-way merge function that describes conflict resolution policies.

3.12.1 MRDT store

The MRDT store is a permissioned key-value store where each value is a Certified MRDT [SKNS22]. Writers submit transactions in the store where each transaction contains one or more operations on MRDTs. The store is permissioned: writers may be restricted to perform only certain operations on certain keys, which is specified in the store configuration by the store owner.

3.12.2 MRDT middleware

The MRDT store is backed by a pub/sub topic that disseminates and stores the transactions in the store. Publishers of the topic correspond to writers of the store. Each message contains a transaction, and thus the message DAG of the topic corresponds to the transaction DAG in the store. This way transactions inherit properties of the pub/sub layer, i.e. causal delivery and Byzantine Causal Broadcast.

Epochs in the pub/sub topic trigger a synchronization event in the topic that otherwise provides eventual synchrony of messages submitted. This enables garbage collection of transactions, and allow changing the store configuration.

Announcing a new epoch corresponds to taking a snapshot of the MRDT store. The epoch announcement contains the hash of the MRDT store contents with all its key-value pairs at the given cut in the transaction DAG. All writers who have delivered transactions up to that point sign it off by sending an acknowledgement. Once a quorum is reached, the new epoch begins with the specified store configuration.

3.12.3 MRDT execution

Transactions are executed by all writers and readers of the store, all of which are either direct or indirect subscribers of the pub/sub topic and deliver transactions in causal order.

When a transaction is delivered, an executor first performs validation according to the permissions of the store at the current epoch, checking whether the writer is authorized to perform the given operation on the given key, then checking whether the operation itself is valid.

After validation passed, the operation is applied to the MRDT associated with the given key. In case of concurrent updates to the same RDT by different writers, the three-way merge function of the data type is executed, which takes as argument the two conflicting states and their lowest common ancestor (LCA) in the transaction DAG, i.e. the point at which the branches started to diverge.

3.12.4 MRDT brokers

MRDT brokers are nodes that allow subscriptions to store updates to subscribers who do not perform MRDT execution themselves. Brokers offer synchronization of store snapshots, as well as push notifications of changed values. Since only snapshots are signed by a quorum, subscribers may subscribe at multiple brokers to ascertain correct execution before the next snapshot.

3.13 Service commitments

A service commitment is a cryptographic commitment by a service provider about services provided [GaZa24]. Examples of services include data storage, message forwarding, and computation.

Fulfilment of service commitments may either be proven by a path in a DAG of causally dependent cryptographically signed messages, or observed by other nodes.

The reputation score of a remote node is a subjective value calculated locally, based on information shared by other nodes about how reliably the node fulfils their service commitments. The trustworthiness of information shared by nodes varies. For this reason remote nodes have a locally assigned trust score that gives weight to the information shared by them.

Service providers may advertise their services in the form of intents, which get matched with service requests in the intent network.

4 Conclusion

We have presented the design of the Anoma network architecture and its foundational protocols that constitute the lower layers of the protocol stack, which allow further protocols and applications to make use of them as building blocks of distributed systems protocols and applications.

The system provides cryptographic identities, sovereign domains with pluralistic interoperability, pub/sub topics for reliable causal message dissemination, MRDT stores for asynchronous collaboration on shared data types, and a decentralized name system.

Future work remains the detailed specification and evaluation of the discussed protocols.

References

[AlSh24]

Almeida, Paulo Sérgio ; Shapiro, Ehud: The blocklace: A universal, byzantine fault-tolerant, conflict-free replicated data type.

[AnBuSe13]

Anceaume, Emmanuelle ; Busnel, Yann ; Sericola, Bruno: Uniform node sampling service robust against collusions of malicious nodes. In: 2013 43rd annual IEEE/IFIP international conference on dependable systems and networks (DSN) : IEEE, 2013, pp. 1–12

[BGKK09]

Bortnikov, Edward ; Gurevich, Maxim ; Keidar, Idit ; Kliot, Gabriel ; Shraer, Alexander: Brahms: Byzantine resilient random membership sampling. In: Computer Networks vol. 53, Elsevier (2009), Nr. 13, pp. 2340–2359

[ChViJa21]

Chen, Chen ; Vitenberg, Roman ; Jacobsen, Hans-Arno: Building fault-tolerant overlays with low node degrees for topic-based publish/subscribe. In: IEEE Transactions on Dependable and Secure Computing vol. 19, IEEE (2021), Nr. 5, pp. 3011–3023

[CMTV07]

Chockler, Gregory ; Melamed, Roie ; Tock, Yoav ; Vitenberg, Roman: Constructing scalable overlays for pub-sub with many topics. In: Proceedings of the twenty-sixth annual ACM symposium on principles of distributed computing, 2007, pp. 109–118

[GaZa24]

Gabbay, Murdoch J. ; Zarin, Naqib: Message Logic. In: Anoma Research Topics, Zenodo (2024)

[HaRe24]

Hart, Anthony ; Reusche, D: Intent Machines. In: Anoma Research Topics, Zenodo (2024)

[HeKaSh24]

Heindel, Tobias ; Karbyshev, Aleksandr ; Sheff, Isaac: Heterogeneous Narwhal and Paxos. In: Anoma Research Topics, Zenodo (2024)

[HePrHa25]

Heindel, Tobias ; Prieto-Cubides, Jonathan ; Hart, Anthony: Dynamic Effective Timed Communication Systems. In: Anoma Research Topics, Zenodo (2025)

[JRVJ15]

Johansen, Håvard D ; Renesse, Robbert Van ; Vigfusson, Ymir ; Johansen, Dag: Fireflies: A secure and scalable membership and gossip service. In: ACM Transactions on Computer Systems (TOCS) vol. 33, ACM New York, NY, USA (2015), Nr. 2, pp. 1–32

[KaSh24]

Karbyshev, Aleksandr ; Sheff, Isaac: Heterogeneous Paxos 2.0: the Specs. In: Anoma Research Topics, Zenodo (2024)

[KhGo24]

Khalniyazova, Yulia ; Goes, Christopher: Anoma Resource Machine Specification. In: Anoma Research Topics, Zenodo (2024)

[KlHo20]

Kleppmann, Martin ; Howard, Heidi: Byzantine eventual consistency and the fundamental limits of peer-to-peer databases. In: CoRR vol. abs/2012.00472 (2020)

[KPSJ19]

Kaki, Gowtham ; Priya, Swarn ; Sivaramakrishnan, KC ; Jagannathan, Suresh: Mergeable replicated data types. In: Proceedings of the ACM on Programming Languages vol. 3, ACM New York, NY, USA (2019), Nr. OOPSLA, pp. 1–29

[KrSc23]

Kret, Ehren ; Schmidt, Rolfe: The PQXDH key agreement protocol (2023)

[MaPe16]

Marlinspike, Moxie ; Perrin, Trevor: The X3DH key agreement protocol (2016)

[Shef24]

Sheff, Isaac: Anoma State Architecture. In: Anoma Research Topics, Zenodo (2024)

[SKNS22]

Soundarapandian, Vimala ; Kamath, Adharsh ; Nagar, Kartik ; Sivaramakrishnan, KC: Certified mergeable replicated data types.

[SPBZ11]

Shapiro, Marc ; Preguiça, Nuno ; Baquero, Carlos ; Zawirski, Marek: Conflict-free replicated data types. In: 13th international conference on stabilization, safety, and security of distributed systems, SSS 2011 : Springer LNCS volume 6976, 2011, pp. 386–400

[Stie05]

Stiegler, Marc: Petname systems. In: HP Laboratories, Mobile and Media Systems Laboratory, Palo Alto, Tech. Rep. HPL-2005-148, Citeseer (2005)

[WaScGr14]

Wachs, Matthias ; Schanzenbach, Martin ; Grothoff, Christian: A censorship-resistant, privacy-enhancing and fully decentralized name system. In: Cryptology and network security: 13th international conference, CANS 2014, heraklion, crete, greece, october 22-24, 2014. Proceedings 13 : Springer, 2014, pp. 127–142

[WiWa08]

Wilcox-O’Hearn, Zooko ; Warner, Brian: Tahoe: The least-authority filesystem. In: Proceedings of the 4th ACM international workshop on storage security and survivability, 2008, pp. 21–26

[Xyzd]

Codex whitepaper

[Xyza]

Juvix.

[Xyzb]

BARE message encoding.

[Xyzc]

Protocol buffers.