Anoma Network Architecture

Author
Affiliation

TG X Thoth

Heliax AG

Abstract

We present an overview of the Anoma network architecture, and its underlying mathematical and communication models that provide the communication infrastructure for our distributed systems protocols.

The architecture consists of a sovereign domain system that enables pluralistic interoperability, a distributed name system using petnames, a distributed publish/subscribe protocol that offers Byzantine Causal Broadcast, a modular transport and routing system, and a trust and reputation mechanism for service commitments.

Domains provide distributed immutable and mutable data storage and dissemination services, where each domain induces a peer-to-peer overlay with its own authentication, membership, topology, message dissemination and storage protocols.

The communication model is inspired by the actor model, where engine processes communicate via message passing in a distributed system. Message send allows expressing routing & transport selection preferences and constraints. Message delivery semantics are either unreliable or reliable casual delivery. Nodes maintain trust and reputation metrics as well as measurements about other nodes, which influences node selection in routing algorithms. Compositional cryptographic identities induce unicast, multicast (pub/sub), and anycast communication channels.

Keywords

distributed systems, peer-to-peer networks

1 Introduction

The Anoma network architecture employs a distributed system design with the following main characteristics.

Cryptographic identities enable authenticated and encrypted communication, message addressing, as well as ownership and access control for shared network resources, such as domains, publish/subscribe topics, and zones in the name system.

Sovereign domains enable pluralistic interoperability in the network, where each domain induces a peer-to-peer overlay with their own protocols and access control rules for joining the overlay and for sending request to it.

Layered network protocols offer various services inside domains, such as a domain membership and topology protocol, a topic-based publish/subscribe protocol, a data storage protocol for immutable data, and a name system using secure and memorable names.

Versioned types for protocols and data at rest allows future protocol and algorithm upgrades, as well as pluralism in protocol choice for sovereign domains.

In this document we outline the overall design, requirements, message types, and interfaces for the network architecture, while precise protocol details remain future work.

For type definitions we use Juvix [Xyza], a statically typed functional language.

1.1 Design overview

1.1.1 Identities

Users, nodes, publish/subscribe topics, and domains each have a cryptographic identity. Cryptographic identities may be compositional, where multiple identities participate in a threshold cryptosystem.

1.1.2 Domains

The network consists of sovereign domains that provide services to their members, and may choose to serve external requests from non-members via a common inter-domain protocol, which is a request-response protocol that enables interoperability between domains.

Each domain has a peer-to-peer overlay associated with it, and defines its own membership rules and protocols used in the overlay.

Grassroots domains are partition-tolerant domains that may consist of multiple intermittently connected network partitions. Partitions of grassroots domains may synchronize opportunistically, however, in order to enable asynchronous communication between mobile nodes, and eventual synchrony between network partitions within a bounded time, a core network overlay with stable nodes may offer encrypted store-and-forward messaging and synchronization services to mobile nodes on edge networks.

The minimum requirement for a domain is for the owner identity to define the domain configuration in a signed Domain Advertisement that allows join requests and external requests to be sent to designated domain members. The Domain Advertisement may be shared directly with interested nodes, or published in the Decentralized Name System that allows updates and lookups.

Existing peer-to-peer overlays may also participate in the domain system by publishing a Domain Advertisement that specifies nodes that implement the inter-domain protocol enabling interoperability with other domains.

1.1.3 Anoma protocols & domains

Protocols implemented by Anoma run in domains, which delineate administrative and security boundaries in the system.

Anoma domains implement a common set of protocols that provide fundamental services in the domain, such as membership, message dissemination, and data storage.

For example, an instance of the Anoma Resource Machine [KhGo24; Shef24] may run in its own domain, together with the underlying consensus protocol instance [HeKaSh24; KaSh24], while another domain may be used for intent submission for Intent Machines [HaRe24].

A typical information flow across various protocols is the following: users submit their intents (partial transaction candidates) that are disseminated to solvers, who match them and submit them as transaction candidates to a mempool, where they are subsequently ordered by consensus, and finally the ordered transactions are executed by the execution system.

As this example illustrates, different protocols perform different tasks, have different responsibilities, and different nodes participate in them.

The network consists of multiple domains with distinct responsibilities and protocols. Compared to a single, open network, multiple domains with restricted membership reduces the attack surface on domain members that provide crucial services by allowing domains to expose publicly only a limited set of designated members that serve external requests.

1.1.4 Network protocols

The system employs a layered protocol design where each protocol layer offers services for other protocols and applications, as illustrated by Figure 1.

Figure 1: Prococol layers

1.1.4.1 Transport

Transport protocols are responsible for establishing and maintaining point-to-point connections between pairs of nodes over various physical and overlay network protocols.

Transport protocols have different security, reliability and ordering properties. E.g. QUIC & TLS offers authenticated and encrypted connections with reliable and ordered message delivery guarantees.

1.1.4.2 Unicast routing

The unicast routing protocol is responsible for routing messages over the network between two engines on different nodes based on the destination address in the message.

Source and destination addresses in messages consist of a node identifier, and an engine name.

The routing process involves address resolution using locally available node advertisements published in the name system, and transport protocol selection based on local constraints and preferences.

Unicast messaging can be extended to the asynchronous setting by using a pub/sub topic where relays & brokers perform store-and-forward message dissemination.

1.1.4.3 Distributed Publish/Subscribe

The topic-based publish/subscribe protocol offers Byzantine Causal Broadcast of signed and encrypted messages from authorized publishers to all subscribers of a topic. Subscribers may also use content-based filters to further refine their subscription.

Published messages form a Blocklace [AlSh24] data structure: a Directed Acyclic Graph (DAG) where edges indicate a happened before causality relation between messages of publishers, each of whom publishes a linear history of messages, i.e. equivocations are not allowed.

The protocol offers both a broker-based and a brokerless mode. - Brokerless: direct subscribers participate in the dissemination protocol, forwarding messages to other subscribers. - Broker-based: indirect subscribers subscribe at a broker without participating in the dissemination protocol. The broker is a direct subscriber that stores and forwards messages to its indirect subscribers.

1.1.4.4 Mergeable Replicated Data Types

Mergeable Replicated Data Types (MRDT) [KPSJ19; SKNS22] enable asynchronous multi-party collaboration on shared, mutable data structures.

Replicated Data Types (RDTs) are convergent, i.e. concurrent operations are deterministically reconciled. In the MRDT model a three-way merge function is used for this purpose.

The MRDT store is a key-value store where each value is an MRDT, and transactions submitted by writers contain a set of operations on MRDTs.

The store is permissioned at the operation level, where access control rules may specify which writer may perform what operations on which keys.

1.1.4.5 Distributed Immutable Storage

The Distributed Immutable Storage protocol is responsible for storage and retrieval of encrypted, content-addressed immutable data.

Data is chunked to a uniform size, each chunk is encrypted using convergent encryption with a convergence secret, and finally organized in a Merkle tree.

A capability-based security model is used where a verify capability (the Merkle root hash) allows traversal of the whole Merkle tree and fetching each chunk, while a read capability contains in addition the decryption key, that allows decrypting each chunk and reassembling the decrypted content.

1.1.4.6 Decentralized Name System

The Decentralized Name System is a petname system [Stie05] with secure and memorable names that assigns a zone to each identity in the system. Each zone contains a set of signed mutable records that are stored in the network and updated via pub/sub.

In addition to assigning human-readable names to records, the name system is also used for address resolution and binding identities and addresses together from different layers.

1.1.4.7 Domain Membership & Topology

The Domain Membership protocol is a peer-to-peer overlay membership protocol with a full view of members.

Based on the currently online members and their subscriptions, the Topology protocol creates and maintains Topic-Connected Overlays (TCO) for pub/sub topics.

1.1.5 Service commitments

Nodes may commit to provide services to other nodes. A service commitment is a cryptographic commitment by the service provider about the services provided. E.g. a storage commitment indicates that a given node stores a given piece of data for a certain period of time.

Fulfilment of service commitments may either be proven by a path in a DAG of causally dependent messages, or observed by other nodes.

The reputation score of a node is a subjective value calculated locally, based on information shared by other nodes about how reliably the node fulfils their service commitments. Performance measurements may similarly be shared with other nodes.

Fulfilment of service commitments may either be proven by a path in a DAG of causally dependent messages, or observed by other nodes.

In both cases the trustworthiness of the information varies based on the source node advertising it. For this reason it is weighted by a locally assigned trust score of the node that shared it.

Service providers may advertise their services in the form of intents, which get matched with service requests in the intent network.

2 System model

2.1 Identities

Cryptographic identities are used for various entities in the system, such as users, nodes, publish/subscribe topics, and domains.

Each identity has an associated communication channel, and a zone in the Distributed Name System. The identity serves as the address for the communication channel, identifies the DNS zone, and signs its records.

Cryptographic identities may be compositional, where multiple identities participate in a threshold cryptosystem. This may be used e.g. for shared ownership of a domain or pub/sub topic.

2.2 Communication model

The system uses message passing between engines [HePrHa25]. The engine model is inspired by the actor model, where processes communicate via message passing in a distributed system.

An engine is a process with local state that can send messages to and receive messages from other local and remote engines, react to timer events, and spawn other engines.

A node is a set of colocated engines reachable over the network. Nodes in the same overlay network directly connect to each other using their advertised addresses.

A node consists of a set of running engine instances, has a cryptographic identity and a number of transport addresses. Nodes communicate with each other over authenticated and encrypted transport channels.

A user interacts with the system and other users via their local node that runs on each user device.

2.3 Communication patterns

Cryptographic identities induce unicast (node-to-node messages), multicast (publish/subscribe topics), and anycast (domain requests) communication channels.

2.3.1 Unicast

Unicast communication is used for direct messages between two engines, which may be reliable and ordered, or unreliable and unordered, depending on the chosen transport protocol.

2.3.2 Multicast

Multicast communication is used in pub/sub topics, where a message is sent by one or more authorized publishers and eventually delivered to all subscribers. Pub/sub topics offer Byzantine Causal Delivery guarantees with partial, causal ordering of messages.

In addition to topic subscriptions, subscribers may further refine their subscription with per-topic content-based filters.

When brokers are used in a pub/sub topic, they enable asynchronous communication for intermittently connected end-user nodes. Multicast messages may be encrypted to preserve end-to-end security of messages in a topic where broker nodes provide store-and-forward services.

2.3.3 Anycast

Anycast communication is used for messages sent to any known member of a domain that is designated to serve external requests.

2.4 Messages

2.4.1 Message addressing & routing

Destination routing is used to deliver messages sent by engines.

The source and destination address in unicast messages consists of a pair of node identifier and engine name.

Multicast messages are addressed using the topic identity as destination address and publisher identity as the source address.

Anycast messages are addressed by the domain identity.

2.4.2 Message types & versioning

Messages are sent between engines. The message body is statically typed and defined by the communicating engines. Message types may change over time and are versioned for this reason. In order for two engines to be able to communicate, both the sender and receiver engine must support the same protocol and thus message versions.

2.4.3 Message sending

Messages contain cryptographic source and destination identities, and sent to other nodes over authenticated and encrypted network connections.

Message send allows expressing routing & transport selection preferences and constraints, such as reliability, ordering, and security requirements for the transport channel.

2.4.4 Message delivery

Message delivery over the network happens over authenticated and encrypted communication channels.

Unicast message delivery guarantees between two nodes vary depending on the transport protocol used, e.g. ordered, reliable delivery when using QUIC and TLS.

Multicast communication in pub/sub topics offers Byzantine Causal Delivery guarantees, i.e. reliable delivery of causally ordered messages in a Byzantine setting.

In the underlying Blocklace data structure causally dependent messages form a DAG, which enables full and partial synchronization, as well as lost message detection and recovery.

2.5 Network architecture

2.5.1 Core & edge networks

The system consists of a core network of stable, always-on core nodes that communicate directly with each other and synchronize data continuously, as well as several edge networks of mobile edge nodes that may move between different edge networks, and synchronize data opportunistically with other nodes that belong to the same domain.

2.5.2 Domain overlays

The network is organized as several distinct peer-to-peer overlays we call domains. Inside a domain pub/sub topics form a Topic-Connected Overlay (TCO), where each topic overlay contains only subscribers of the given topic.

2.5.3 Pub/sub overlays

Each topic overlay contains a multi-hop dissemination topology, where subscribers forward messages to each other. In case of topics with few subscribers or when low latency is crucial, direct delivery may be used by publishers in a star topology.

2.6 Data model

The system offers multile data models for different purposes.

Immutable data with content hash references allows storing and retrieving encrypted data blobs used by applications.

Mutable data is modified via execution of transactions that may read and write keys of a distributed key-value store.

Two distinct execution models are offered: the resource model relies on totally ordered transactions and uses the Resource Machine [arm?] to create and consume resources, while the replicated data type model relies on causally ordered transactions and use Mergeable Replicated Data Types (MRDT) [SKNS22] that use three-way merge functions to reach eventual consistency of replicas.

3 Network Architecture

3.1 Engines

Protocols in the system are implemented by engines.

An engine is an addressable process with local state that runs on a node, and has an engine name that is unique to the node it runs on. It may send and receive messages to and from other engines over communication channels provided by the system. It may also react to other local events, such as timers.

The engine identity (Engine ID) is a pair of the engine name and the node identity that is used for message addressing and routing.

3.2 Nodes

A node is a set of colocated engines with a cryptographic node identity reachable via one or more transport protocols, each with its own transport address. Engines of a node are part of the local trust domain that allows them to access certain local engines that remote engines do not have access to, such as local storage.

Each node advertises its current transport addresses via a versioned and signed node advertisement, which is the binding between the routing and transport layer, i.e. maps a node identity to one or more transport addresses with an associated priority. The node advertisement (Figure 2) is published in the node’s zone in the name system in a signed record set.

Nodes perform incoming and outgoing message routing for their own engines, based on locally available advertisements. Unicast routing is supported natively over various transport protocols, while multicast and anycast routing is performed by an engine instance responsible for a specific topic or domain, respectively.

type NodeAdvert := mkNodeAdvert {
  addrs : Set (Pair TransportAddress Priority);
};
Figure 2: Node Advertisement

3.3 Users

A user interacts with the system and other users via one or more nodes. Typically each user runs a node on each of their devices, however there’s no one-to-one relationship between users and nodes: a user may use multiple nodes, and a node may serve multiple users. This allows flexibility in adapting the system to different requirements and environments.

When using multiple nodes, a user may employ a pub/sub topic with all their nodes participating in it, in order to synchronize configuration, domain and topic membership, and application data.

3.3.1 User inbox & advertisement

Users may advertise an inbox for receiving messages asynchronously.

The interface to the inbox is a pub/sub broker that accepts submissions to a pub/sub topic with the recipient user’s nodes participating in it.

The user advertisement (Figure 3) is published in the name system in the user’s zone. It contains inbox nodes and prekeys.

The inbox is intended for irregular communication, and may have additional authentication requirements for submission, such as a pre-shared key to avoid spam. For regular communication a pub/sub topic is used instead where all communicating users’ nodes participate in.

Prekeys are cryptographic key material necessary for asynchronous key agreement, used by protocols such as X3DH [MaPe16] and PQXDH [KrSc23]. Publishing prekeys in a decentralized name system allows these protocols to be used in a decentralized setting.

type UserAdvert := mkUserAdvert {
  inbox_nodes : Set NodeID;
  inbox_prekeys : Set PreKey;
};
Figure 3: User Advertisement

3.4 Messages

A message (?@fig-engine-msg) is sent between engines over a communication channel supported by the system, i.e. either unicast, multicast, or anycast.

3.4.1 Message routing & addressing

Destination routing is used to deliver a message to its destination, where routing decisions are made solely based on the destination address. Each message has a source and destination address, which varies based on the type of communication channel. The unicast source and destination address is a pair of node identity and engine name. The multicast destination address is the topic identity, while the sender is a publisher identity. For anycast the destination address is the domain identity, while the sender is a unicast source address. The response to an anycast message happens over unicast.

3.4.2 Message versioning & types

The message payload is versioned and typed. Communicating engines must support the given message type and version, otherwise the message is dropped. Nodes perform protocol version negotiation upon establishing a transport connection, and thus e

3.4.3 Message serialization

Serialization (marshalling) and deserialization (unmarshalling) must be performed before sending a message to and after receiving a message from the network.

The serialization protocol must have a message schema that supports versioning in order to allow protocol negotiation and upgrades, and must support an efficient encoding of binary data on multiple platforms. Suitable choices include BARE [Xyzb] and Protobuf [Xyzc].

3.5 Domains

Domains organize the network in smaller, sovereign overlay networks that determine themselves the protocols they use internally, and the access control mechanism that applies to membership. They may also participate in the inter-domain protocol that allows non-members to send external requests to query information from the domain, subject to access control rules.

3.5.1 Domain identity & advertisement

The domain identity is a cryptographic identity controlled by the domain’s owner, which may also be a compositional identity controlled by multiple parties in a threshold cryptosystem.

The domain advertisement (Figure 4) is published in the name system in the domain’s zone, and serves as the authoritative definition of the domain configuration. It contains the set of nodes responsible for serving join requests, the set of nodes responsible for serving external requests, and the versioned protocols used for these two types of requests.

type DomainAdvert := mkDomainAdvert {
  join_nodes : Set NodeID;
  ext_nodes : Set NodeID;
  join_proto : JoinProto;
  ext_proto : ExtProto;
};
Figure 4: Domain Advertisement

3.5.2 Domain protocols

Domains use a set of layered protocols, as illustrated by Figure 1. Pub/sub is the underlying dissemination protocol for other protocols.

The Domain Membership protocol maintains the set of domain members, the Domain Topology protocol constructs and maintains a connected domain overlay based on members’ pub/sub topic subscriptions in the membership protocol, while each Pub/Sub Topic offers message dissemination from publishers to subscribers.

3.6 Domain Membership

A domain member is a node that may join a domain by sending a join request to a designated member of the domain with permission to add members. After membership is granted, the new member establishes connections to a number of existing members of the domain indicated in the join response, thereby joining the domain’s overlay network.

Membership models include open access where anyone can join, and restricted access that requires some form of authentication, such as a shared secret, a membership certificate issued by an authorized member that serves as an invitation, or a zero knowledge proof of membership.

A user may join a domain with multiple nodes they control when the authentication mechanism allows this, e.g. shared secret. When membership certificates are used, membership may be granted to a user identity instead of a node identity directly, where users may issue certificates for their nodes, up to a limited number of nodes per user.

The membership protocol must be scalable and must tolerate Byzantine faults. Overlay membership protocols either use a full view, or a partial view. Partial view protocols typically use a gossip protocol to exchange partial membership views, however, they can be easily attacked by a malicious node that introduces bias to the gossiped view, potentially completely isolating target nodes from the network. As shown by Fireflies [JRVJ15], a full view protocol avoids these problems, and it is feasible to implement in a scalable manner without costly and incomplete mitigation techniques such as proof of work [BGKK09] and probabilistic sampling [AnBuSe13]. The membership protocol uses a full view for this reason.

Each domain has a name store, which is an MRDT store that contains zone label to record set mappings. The MRDT store is backed by a pub/sub topic with the Byzantine Causal Broadcast property that allows members to publish updates to their own zone. The pub/sub topic ID of the name store equals to the domain ID, to which all members of the domain subscribe to. Updates to the name store are used as the membership protocol.

The domain owner publishes a record in the empty label of the domain that contains the set of node identities that are members of the domain (Figure 5). Each node publishes two labels there in their own zone with signed record sets. The empty label contains a record set with the node advertisement, which is updated every time it changes. Another label that corresponds to the domain ID contains a record set with the node’s membership record (Figure 6), which specifies its pub/sub topic subscriptions. The version number and timestamp of this record set is updated periodically, which serves as a last seen timestamp for the member. This serves as a mechanism to determine recently online members in the domain.

type DomainMembers := mkDomainMembers {
  nodes : Set NodeID;
};
Figure 5: Domain Members in the domain’s zone
type DomainMember := mkDomainMember {
  subs : Set TopicID;
};
Figure 6: Domain Member in the node’s zone

3.7 Domain Topology

The Domain Topology protocol constructs a Topic-Connected Overlay (TCO) [CMTV07], where each pub/sub topic induces a connected sub-overlay of subscribers of the topic, i.e. only subscribers of the topic participate in the dissemination of messages published in the topic. To increase robustness of the overlay, domains employ a k-Topic-Connected Overlay (kTCO) [ChViJa21], where each topic sub-overlay is k-connected.

The topology protocol uses topic subscriptions from the membership protocol, where each node indicates its topic subscriptions. Since the membership protocol employs a full view, the topology protocol has access to all subscriptions for constructing the overlay.

3.8 Publish/Subscribe Topics

Pub/sub topics offer reliable message dissemination with causal order message delivery from a set of authorized publishers to all subscribers.

3.8.1 Topic identity & advertisement

Each pub/sub topic has a topic identity, which, like domain identities, may be compositional for shared ownership.

The topic advertisement (Figure 7) is signed by the topic identity and is used to specify the set of publisher identities that are allowed to publish messages in the topic, the set of relay nodes that can be used to join the topic overlay, the set of broker nodes that can be used to publish and subscribe to the topic without having to join the topic overlay, and an optional set of tags that may be used for automatic subscriptions to new topics of interest.

type TopicAdvert := mkTopicAdvert {
  publishers : Set PublisherID;
  relays : Set NodeID;
  brokers : Set NodeID;
  tags : Set String;
};
Figure 7: Topic Advertisement

3.8.2 Pub/sub messages

Pub/sub messages (Figure 8) are published and signed by a publisher, and contain the publisher identity, a per-publisher message sequence number, the set of causal dependencies, the message content, a set of tags that can be used for content-based filtering, and a signature by the publisher.

The message content may be either a message payload by an engine, an acknowledgement of a previous message, an updated topic advertisement, and announcement of a new epoch.

type TopicMsg := mkTopicMsg {
  publisher : Map PublisherID (Set TopicMsgContentType);
  seq : Nat;
  deps : Set TopicMsgID;
  tags : Set String;
  content : TopicMsgContent;
  sig : Commitment;
};
Figure 8: Topic Message

3.8.3 Message DAG

Messages published in the topic form a Directed Acyclic Graph (DAG) due to the set of casual dependencies present in each message, which establishes a partial, causal order of messages in the topic.

The set of causal dependencies in a topic message are the leaf nodes in the DAG as seen by the publisher, i.e. the last message from all publishers in the topic that are not already referenced by other messages. This set may also include the publisher’s own previous message if it is not already referenced by other publishers.

This data structure is named Blocklace in [AlSh24], where it is shown that equivocations can be prevented by excluding publishers who attempt to send different, independent messages to different subscribers, since they are eventually discovered as they become part of the DAG. The Blocklace is a Conflict-free Replicated Data Type (CRDT) [SPBZ11], a partial order log* or PO-Log data type with a single append operation.

The message DAG enables detecting missed messages, which happens when a subscriber comes across an unknown causal dependency that it hasn’t received yet from the network. To recover missed messages and after (re)joining the topic, a DAG synchronization protocol is used between pairs of subscribers.

3.8.4 Consistency model

Pub/sub topics use the Byzantine Eventual Consistency (BEC) [KlHo20] model, backed by a Byzantine Causal Broadcast (BCB) protocol.

These properties hinge on the underlying causal message DAG that enables missed message detection despite Byzantine nodes block certain messages. The combination of push and pull dissemination phases ensure eventual consistency: in the push phase the protocol strives to deliver messages to as many subscribers as it can while trying to minimize duplicates, while in the pull phase missed messages are actively synchronized from other subscribers.

3.8.5 Dissemination

The publish/subscribe protocol uses the sub-overlays constructed by the topology protocol for dissemination.

When publishing a message, the push phase of dissemination involves the publisher’s node pushing it to other nodes on the potentially multi-hop dissemination path, who forward it further to their subscribers.

Message dissemination in the topic happens in multiple phases:

  1. Dissemination among publishers.
  2. Dissemination from publishers to direct subscribers.
  3. Dissemination from brokers to indirect subscribers.

Dissemination paths may be single-hop with a star topology, which is used when the number of subscribers are small or low latency is crucial, or multi-hop with a DAG topology, which is more efficient for large numbers of subscribers.

Before forwarding a message to the next hop, each node ensures that the message is valid by verifying whether the publisher is authorized to publish the given message according to the topic advertisement of the current epoch.

An epoch is a period during which topic members store all messages published in the topic to be able to serve synchronization requests. During an epoch the topic advertisement does not change to ensure consistency across subscribers, since it contains the set of valid publishers that must be the same for all subscribers, even if they missed some messages.

When a subscriber encounters a missed message dependency it hasn’t seen yet, it explicitly requests the missed messages using the synchronization protocol in the pull phase of dissemination.

3.8.6 Synchronization

Synchronizing the message DAG between two nodes involves the requestor sending a set of missing message hashes, to which the responder may respond by forwarding those messages, if available. This can be made more efficient by using a Bloom filter of message hashes instead, as shown by [KlHo20].

3.8.7 Brokers

Brokers are direct subscribers that participate in the dissemination protocol in the topic overlay, and allow indirect subscribers to publish and subscribe through them without having to join the overlay.

Using brokers endow mobile nodes on edge networks with asynchronous communication capabilities, for which, due to their limited hardware resources, battery life, and intermittent connectivity, participating in the topic overlay in the core network is not practical. Subscribers may also specify additional content-based filters based on the publisher identity and the set of tags in the message to further reduce the amount of messages received. Message tag semantics are entirely up to the application using the topic, e.g. it could be an opaque reference to a forum topic that allows selective subscription to that topic. Further filtering based on the application-specific message content is only possible on end subscriber’s node, which are able to decrypt and understand the type of the message content.

Brokers also serve as a point of submission to a topic for publishers that do not participate in the topic overlay. An example for this is intent submission & dissemination, where many users who do not participate in the topic overlay may submit intents to topic that disseminates them to a few solvers who are subscribers of the topic.

3.9 Distributed Immutable Storage

The Distributed Immutable Storage protocol allows storing immutable data blobs in the network. It offers encrypted, chunked, deduplicated data storage with immutable blob and chunk references.

Before storing, blobs are first chunked, then each chunk separately encrypted, and finally organized in a Merkle tree. The chunk identifier (Chunk ID), is the cryptographic hash of the chunk contents. Data chunks are leaves of the tree, while the root chunk is the entry point to the stored data blob, from which the whole tree can be traversed. The chunk ID of the root node also serves as the blob identifier (Blob ID). Non-leaf nodes contain pointers to tree nodes on the next level, as a set of chunk IDs and corresponding content decryption keys. Since chunk size is limited by a maximum size, which is a parameter to the chunking algorithm, multiple levels of non-leaf nodes may be necessary when the set of data chunk pointers do not fit in a single chunk.

The blob ID is a verification capability that allows requesting and verifying all chunks of a blob by requesting the root chunk and from there traversing the whole tree.

The blob ID in combination with its decryption key serves as a read capability that allows decryption of chunk contents, i.e. the data in leaf nodes, and the decryption key for the next level of chunks in non-leaf nodes.

Chunk encryption is done using convergent encryption [WiWa08], where the encryption key for a chunk is derived from the cryptographic hash digest of the chunk’s plaintext content and a convergence secret. The convergence secret prevents confirmation attacks when the plaintext is known, and sets the scope for deduplication. The scope may be global for public data, or set per domain, topic, or even per blob.

3.9.1 Storage Commitments

A Storage Commitment allows a node to commit to storing the referenced blob for a certain amount of time. It contains the blob ID that is stored, the node that stores the blob, an expiry condition until which the blob is stored, such as an absolute time or an epoch number in a given pub/sub topic, authentication mechanism for requesting the blob, such as membership of a given domain, and a cryptographic signature by the node.

type StorageCommitment := mkStorageCommitment {
  id : ChunkID;
  node : NodeAdvert;
  expiry : StorageCommitmentExpiry;
  auth : StorageAuth;
  version : Nat;
  created : AbsTime;
  sig : Commitment;
};
Figure 9: Storage Commitment

Storage commitments are typically sent to pub/sub topics, and may also be stored in the name system, which allows assigning mutable named references to a commitment, and thus to the blob it references.

Each node stores storage commitments it knows about, along with the source it arrived from, e.g. the pub/sub topic ID or the zone and label in the name system, which may be used to look for other commitments after the commitment expired.

Periodic verification of stored data ensures data availability. The blob ID is a verification capability that allows verifying that all chunks of a blob are available, however a more efficient probabilistic data availability sampling protocol is necessary to verify large amounts of data in practice. Erasure coding, such as Reed-Solomon codes, is another way to improve fault tolerance [WiWa08; Xyzd].

3.9.2 Lookups & requests

Each node stores a cache of known storage commitments and blobs locally. Storage commitments are either sent via unicast or multicast messages between engines, or stored in name system records that are cached after a lookup.

Locally known storage commitments are used for blob lookups. During lookup, a blob request is sent to each node with a known storage commitment.

When a node receives a blob request, it may answer either with the content of the blob, if available locally and the node wishes to provide this service, or with storage commitments for the requested blob, if known.

The authentication requirement specified in the storage commitment may restrict to whom a blob may be served upon request. Such requirement could be e.g. membership of a given domain, which can prevent former members of a domain from requesting data blobs despite having a valid storage commitment about them.

When the request was successful, the returned blob is cached locally. In case the request fails, the requestor should try to update the storage commitment about the blob, e.g. by sending requests to nodes who might know about updated commitments or by performing a name system lookup to update the record where the commitment is stored.

3.10 Decentralized Name System

The Decentralized Name System allows any identity in the system – i.e. user, node, topic and domain identities – to publish & update a zone with a map of labels to records. Records are cryptographically authenticated via a signature by the zone identity.

The name system is a petname system [Stie05] with secure and memorable, but not globally unique names. Secure zone delegation enhances petnames with transitivity, and removes the need for a trusted root authority [WaScGr14], as name resolution is relative to the user’s own zone.

3.10.1 Zones & records

Each label in the zone points to a record set of typed records (Figure 10) signed by the zone identity.

The empty label in the zone refers to top-level records in the zone, which contains the nickname for the zone that becomes the default label for delegation, the set of topic IDs where the zone is published, and the advertisement that corresponds to the type of identity (i.e. user, node, topic, or domain advertisement).

Zone delegation happens via a zone record that assigns a label to the identity of the delegated zone, which then becomes part of transitive name resolution.

Finally, a blob record contains a storage commitment, that points to a data blob and the node that stores it.

The type system is extensible, further record types may be defined by engines and applications.

type RecordSet := mkZoneRecord {
  records : Set Record;
  version : Nat;
  created : AbsTime;
  sig : Commitment;
};

type Record :=
| Pub TopicID
| Nick String
| Zone ZoneID
| Blob StorageCommitment
| UserAdvert UserAdvert
| NodeAdvert NodeAdvert
| TopicAdvert TopicAdvert
| DomainAdvert DomainAdvert
;
Figure 10: Record set & record

3.10.2 Publishing & replication

Records in the zone are published in one or more MRDT key-value stores. Each MRDT store is backed by a pub/sub topic that is listed in the empty label of the zone.

In the key-value store each key is composed of the zone ID and a label, while each value contains a record set. This key format allows publishing label to record set mappings in multiple stores, and one store may contain records from multiple zones. Before accepting a label update, the store verifies whether the record set signature is valid, i.e. whether it corresponds to the zone ID that is part of the key.

Labels in a zone may be published in different stores, based on information flow control requirements. In case of a user zone, public and private labels are distinguished, where private labels remain in a store that is privately replicated among the user’s nodes, while public records are replicated more widely in one or more domains.

An MRDT transaction is used to update parts of a record set, which needs to increment its version number and and update its signature at the same time, to be valid. Partial updates allow bandwidth-efficient updates to records, as only the changed fields are transmitted over the network.

A domain may have a designated name store that contains records relevant to its operation, which its members may use for name resolution.

3.10.3 Name resolution

Each node maintains a local name store in the same key-value format as they are published and described above, where it stores all locally known zone label to record set mappings, which is used for name resolution.

Name resolution is transitive and relative to a locally designated top-level root zone. Users resolve names starting from their own zone that associated with their user identity. Similarly, a node’s engines start name resolution from the zone of the node identity.

E.g. Alice may refer to her own inbox as simply inbox, after adding a delegation record for Bob, she can refer to Bob’s inbox as inbox.bob, and after Bob adds a delegation record for Carol, Alice can reach Carol’s inbox as inbox.carol.bob.

In case the name system needs to co-exist with other name systems, a pseudo-top-level-domain (pseudo-TLD) may be used as a suffix to the names, i.e. inbox becomes inbox.tld in the above example.

Querying a single name in a zone is possible by sending a request to a pub/sub broker that serves the name store that hosts the zone. Recursive queries return results with any referenced record sets, reducing the number of queries necessary, e.g. when querying the record set of a topic, the TopicAdvert contains NodeIDs that reference other zones, and thus the response would also contain record sets for the empty label of the reference zones, which in this case would contain the NodeAdverts of the relay and broker nodes. Each returned record set is stored in the local name store, to be used for name resolution.

For frequently used zones, subscribing to a name store, with key prefix filters for relevant zones, allows keeping the zone’s records synchronized.

3.11 Mergeable Replicated Data Types

Mergeable Replicated Data Types (MRDT) [KPSJ19; SKNS22] allow collaboration on arbitrary data types by employing a three-way merge function to deterministically reconcile concurrent updates. In contrast to Conflict-free Replicated Data Types (CRDT) [SPBZ11], MRDTs are more expressive, as they do not require operations to be commutative and they allow composition of data types, which allows MRDTs to be used for distributed version control of arbitrary data types. MRDTs allow purely functional data types to be promoted to RDTs by specifying a three-way merge function that describes conflict resolution policies.

3.11.1 MRDT store

The MRDT store is a permissioned key-value store where each value is a Certified MRDT [SKNS22]. Writers submit transactions in the store where each transaction contains one or more operations on MRDTs. The store is permissioned: writers may be restricted to perform only certain operations on certain keys, which is specified in the store configuration by the store owner.

3.11.2 MRDT middleware

The MRDT store is backed by a pub/sub topic that disseminates and stores the transactions in the store. Publishers of the topic correspond to writers of the store. Each message contains a transaction, and thus the message DAG of the topic corresponds to the transaction DAG in the store. This way transactions inherit properties of the pub/sub layer, i.e. causal delivery and Byzantine Causal Broadcast.

Epochs in the pub/sub topic trigger a synchronization event in the topic that otherwise provides eventual synchrony of messages submitted. This enables garbage collection of transactions, and allow changing the store configuration.

Announcing a new epoch corresponds to taking a snapshot of the MRDT store. The epoch announcement contains the hash of the MRDT store contents with all its key-value pairs at the given cut in the transaction DAG. All writers who have delivered transactions up to that point sign it off by sending an acknowledgement. Once a quorum is reached, the new epoch begins with the specified store configuration.

3.11.3 MRDT execution

Transactions are executed by all writers and readers of the store, all of which are either direct or indirect subscribers of the pub/sub topic and deliver transactions in causal order.

When a transaction is delivered, an executor first performs validation according to the permissions of the store at the current epoch, checking whether the writer is authorized to perform the given operation on the given key, then checking whether the operation itself is valid.

After validation passed, the operation is applied to the MRDT associated with the given key. In case of concurrent updates to the same RDT by different writers, the three-way merge function of the data type is executed, which takes as argument the two conflicting states and their lowest common ancestor (LCA) in the transaction DAG, i.e. the point at which the branches started to diverge.

3.11.4 MRDT brokers

MRDT brokers are nodes that allow subscriptions to store updates to subscribers who do not perform MRDT execution themselves. Brokers offer synchronization of store snapshots, as well as push notifications of changed values. Since only snapshots are signed by a quorum, subscribers may subscribe at multiple brokers to ascertain correct execution before the next snapshot.

4 Concluding remarks

We have presented the design of the Anoma network architecture and its foundational protocols that constitute the lower layers of the protocol stack, which allow further protocols and applications to make use of them as building blocks of distributed systems protocols and applications.

The system provides cryptographic identities, sovereign domains with pluralistic interoperability, pub/sub topics for reliable causal message dissemination, MRDT stores for asynchronous collaboration on shared data types, and a decentralized name system.

Future work remains the detailed specification and evaluation of the discussed protocols.

References

References

[AlSh24]
[AnBuSe13]
Anceaume, Emmanuelle ; Busnel, Yann ; Sericola, Bruno: Uniform node sampling service robust against collusions of malicious nodes. In: 2013 43rd annual IEEE/IFIP international conference on dependable systems and networks (DSN) : IEEE, 2013, pp. 1–12
[BGKK09]
Bortnikov, Edward ; Gurevich, Maxim ; Keidar, Idit ; Kliot, Gabriel ; Shraer, Alexander: Brahms: Byzantine resilient random membership sampling. In: Computer Networks vol. 53, Elsevier (2009), Nr. 13, pp. 2340–2359
[ChViJa21]
Chen, Chen ; Vitenberg, Roman ; Jacobsen, Hans-Arno: Building fault-tolerant overlays with low node degrees for topic-based publish/subscribe. In: IEEE Transactions on Dependable and Secure Computing vol. 19, IEEE (2021), Nr. 5, pp. 3011–3023
[CMTV07]
Chockler, Gregory ; Melamed, Roie ; Tock, Yoav ; Vitenberg, Roman: Constructing scalable overlays for pub-sub with many topics. In: Proceedings of the twenty-sixth annual ACM symposium on principles of distributed computing, 2007, pp. 109–118
[HaRe24]
Hart, Anthony ; Reusche, D: Intent Machines. In: Anoma Research Topics, Zenodo (2024)
[HeKaSh24]
Heindel, Tobias ; Karbyshev, Aleksandr ; Sheff, Isaac: Heterogeneous Narwhal and Paxos. In: Anoma Research Topics, Zenodo (2024)
[HePrHa25]
Heindel, Tobias ; Prieto-Cubides, Jonathan ; Hart, Anthony: Dynamic Effective Timed Communication Systems. In: Anoma Research Topics, Zenodo (2025)
[JRVJ15]
Johansen, Håvard D ; Renesse, Robbert Van ; Vigfusson, Ymir ; Johansen, Dag: Fireflies: A secure and scalable membership and gossip service. In: ACM Transactions on Computer Systems (TOCS) vol. 33, ACM New York, NY, USA (2015), Nr. 2, pp. 1–32
[KaSh24]
Karbyshev, Aleksandr ; Sheff, Isaac: Heterogeneous Paxos 2.0: the Specs. In: Anoma Research Topics, Zenodo (2024)
[KhGo24]
Khalniyazova, Yulia ; Goes, Christopher: Anoma Resource Machine Specification. In: Anoma Research Topics, Zenodo (2024)
[KlHo20]
Kleppmann, Martin ; Howard, Heidi: Byzantine eventual consistency and the fundamental limits of peer-to-peer databases. In: CoRR vol. abs/2012.00472 (2020)
[KPSJ19]
Kaki, Gowtham ; Priya, Swarn ; Sivaramakrishnan, KC ; Jagannathan, Suresh: Mergeable replicated data types. In: Proceedings of the ACM on Programming Languages vol. 3, ACM New York, NY, USA (2019), Nr. OOPSLA, pp. 1–29
[KrSc23]
Kret, Ehren ; Schmidt, Rolfe: The PQXDH key agreement protocol (2023)
[MaPe16]
Marlinspike, Moxie ; Perrin, Trevor: The X3DH key agreement protocol (2016)
[Shef24]
Sheff, Isaac: Anoma State Architecture. In: Anoma Research Topics, Zenodo (2024)
[SKNS22]
Soundarapandian, Vimala ; Kamath, Adharsh ; Nagar, Kartik ; Sivaramakrishnan, KC: Certified mergeable replicated data types.
[SPBZ11]
Shapiro, Marc ; Preguiça, Nuno ; Baquero, Carlos ; Zawirski, Marek: Conflict-free replicated data types. In: 13th international conference on stabilization, safety, and security of distributed systems, SSS 2011 : Springer LNCS volume 6976, 2011, pp. 386–400
[Stie05]
Stiegler, Marc: Petname systems. In: HP Laboratories, Mobile and Media Systems Laboratory, Palo Alto, Tech. Rep. HPL-2005-148, Citeseer (2005)
[WaScGr14]
Wachs, Matthias ; Schanzenbach, Martin ; Grothoff, Christian: A censorship-resistant, privacy-enhancing and fully decentralized name system. In: Cryptology and network security: 13th international conference, CANS 2014, heraklion, crete, greece, october 22-24, 2014. Proceedings 13 : Springer, 2014, pp. 127–142
[WiWa08]
Wilcox-O’Hearn, Zooko ; Warner, Brian: Tahoe: The least-authority filesystem. In: Proceedings of the 4th ACM international workshop on storage security and survivability, 2008, pp. 21–26
[Xyzd]
[Xyza]
[Xyzb]
[Xyzc]