Architecture

Learn about the VLINGO XOOM architecture for building DOMA- and DDD-based microservices.

The VLINGO XOOM platform is built fully on a highly concurrent, reactive, message-driven foundation. The bedrock of the platform is actor-based using XOOM Actors, but is made extremely simple and with rapid delivery with a service-based Compressed Architecture and the XOOM Turbo container and Designer acceleration tools. This foundation implements the Actor Model of computation. All other platform components are actor-based. Thus, it's appropriate to first discuss the architecture of the actor-based, message-driven runtime.

Message-Driven Runtime

The Actor Model provides an outstanding, non-leaky, abstraction of concurrency. Given that your hardware supports multiple cores, and even many cores, you can expect tremendous levels of parallelism among many actors that carry out application-level service requests.

The major architectural abstractions of the XOOM Actors Actor Model toolkit are:

  • World: This is the container abstraction within which actors live and operate. A World can have a number of Stage instances in which the life cycles of a subset of live actors are managed. When the World is started, actors can be created. When the World terminates, all Stage and Actor instances are terminated/stopped.

  • Stage: Every World has at least one Stage, known as the default. The Stage is specifically where actors are managed and within which they play or execute. There may be multiple Stage instances, each with several or many actors (even millions) under its management. Each Stage has an internal Directory within which live actors are held, and through which they can be found.

  • Actor: Each actor is an object, but one that reacts to incoming asynchronous messages, and that sends outgoing messages asynchronously. Each actor is assigned a Mailbox, and a Mailbox delivers messages though a Dispatcher. When the actor receives a message, it performs some business behavior internally, and then completes that specific message-handling context. All actors are type safe in that their behaviors are defined by Java method signatures rather than arbitrary object types. Every actor may implement any (practical) number of behavioral interfaces, known as actor protocols.

  • Supervisor: A recent account of cascading failure describes tens of thousands of nodes lost during a Kafka failure that caused a Kubernetes cluster to self destruct, taking out an entire infrastructure. Using supervision as bulkheads can save your system from catastrophic failure. Actors are supervised in order to deal with failure. When an actor experiences an exception while handling a message, the exception is caught by the message dispatcher and is relayed to the actor’s supervisor as a message. When received by the supervisor, the exceptional message is interpreted to determine the appropriate step to correct the actor’s problem. The corrective action can be one of the following: resume, restart, stop, or escalate. In the case of supervision escalation, the exceptional message is relayed to this supervisor’s supervisor, for it to take some action. There are four kinds of supervision: direct parent, default public root, override of public root, and registered common supervisors for specific actor protocols (behavioral interfaces).

  • Scheduler: Each Stage has a scheduler that can be used to schedule future tasks, and that may be repeated on time intervals. An actor determines the timeframe that is needed, and each occasion on which the interval is reached, the actor receives an interval signal message indicating that it is time to perform some task. The receiving actor can then execute some necessary behavior in response to the interval signal. The actor that creates the scheduled task need not be the target actor of the interval signal message.

  • Logger: Logging capabilities are provided to every actor, and different loggers may be assigned to different actors. Log output occurs asynchronously because loggers are run as actors.

  • Plugins: The VLINGO XOOM platform in general, and XOOM Actors specifically, support a plugin architecture. New kinds of plugins can be created at any time to extend the features of vlingo/actors, and any other platform component. In particular, mailboxes and dispatchers, supervision, and logging can be extended via plugins.

  • Testkit: There is a very simple testkit that accompanies XOOM Actors, making it quite easy to test individual and collaborating actors for adherence to protocol and for correctness.

The following is a XOOM Actors architecture diagram.

The components seen in this diagram can be traced back to the above names and descriptions. In the default Stage notice that there are three special actors, one that is {bright yellow} with a #, one that is {bright yellow} with a *, and one that is {red} with an X. Respectively, these are:

# The private root actor, which is the parent of the public root actor and the dead letters actor

* The public root actor, which is the default parent and supervisor if no others are specified

X The dead letters actor, which receives actor messages that could not be delivered

Actor Proxy

An important architectural design feature is the actor proxy. This supports type-safe, asynchronous message sending to an actor per a protocol (interface) that it implements.

Every actor must support at least one protocol. Here the Proposal protocol is used. It is unimportant exactly what behavior the Proposal supports, but you can imagine some sort of description that serves to propose something, such as a job or an expensive product purchase. Further, let's say that the actor that implements the Proposal protocol is named ProposalActor.

World world = World.startWithDefaults("Proposals");
...
Proposal proposal = world.actorFor(Proposal.class, ProposalActor.class);

When a client component creates a Proposal as an actor through the World or a Stage, their are two parts that are created. One part is an instantiation of the ProposalActor itself. The created second part is a proxy instance, which also implements the Proposal protocol. When the answer from the actor creation is given to the client, it is the Proposal that is backed by the proxy implementation. In other words, in the above code, the proxy is the Proposal proposal instance returned from World#actorFor(). Internally the Proposal proxy knows how to send messages asynchronously to the ProposalActor instance. The ProposalActor exists in memory, and any component that has its Proposal proxy instance may send asynchronous messages to it.

The following are facts about how the proxy and actor instances interact.

  • A client invokes methods on the Proposal proxy, never directly on the ProposalActor

  • The proxy reifies the method invocation into a message. To do so the proxy uses the parameters (if any) and creates a java.util.function.Consumer that holds the intention to invoke the actual method with any parameters on the ProposalActor instance

  • The Consumer is wrapped in a io.vlingo.xoom.actors.Message, which when sent within the local JVM, is implemented by LocalMessage of the same package

  • The Message containing the Consumer is queued to the actor's Mailbox, which causes the scheduling of the message for delivery

  • Once the new Message is queued in the actor's Mailbox, the proxy invocation returns to the client

  • When a thread becomes available, the Message is polled from the Mailbox and dispatched to the ProposalActor

  • The above points describe how asynchronous type-safe messages are sent and delivered

Specifically for the Proposal protocol, the following describes how proxy classes are created.

  • When a client creates a new Proposal using actorFor(protocol, actorType[, args...]), internally the XOOM Actors queries for a class named Proposal__Proxy

  • If it does not yet exist, XOOM Actors dynamically generates the Proposal__Proxy.java, compiles it, and loads it into its private class loader

  • The Stage can now create instances of the Proposal__Proxy

  • When actorFor(protocol, actorType[, args...]) is used, XOOM Actors actually returns a new instance of Proposal__Proxy which implements Proposal

Next, the XOOM Cluster is discussed, and following that there are comments on the overall platform architecture.

Scale and Resilience with the Multi-Node Cluster Architecture

The XOOM Cluster is a key component that sits on top of XOOM Actors, to support the development of scalable and fault-tolerant tools and applications. A number of additional tools building out the VLINGO Xoom platform are built on top of XOOM Cluster. The XOOM Lattice grid is built on the XOOM Cluster, and your applications and microservices will run on XOOM Lattice. This, all cluster properties also apply to the grid.

Generally a cluster will be composed of multiple nodes of an odd number (not just one, but for example, 3, 5, 21, or 127). The reason for choosing an odd number of nodes is to make it possible to determine whether there is a quorum of nodes (totalNodes / 2 + 1) that can form a healthy cluster. Choosing an even number of nodes works, but in that case, when loosing one node, it doesn’t improve the quorum determination, nor does it strengthen the cluster when all nodes are available.

When a quorum of nodes are available and communicating with one another, a leader is elected. The leader is responsible for making certain decisions in behalf of the cluster, and also announces newly joined nodes and those that have left the cluster.

The cluster consensus protocol is one known by the name Bully Algorithm. Although an unpleasant name, the protocol is simple but powerful. This consensus protocol chooses a leader by determining the node with the "greatest id value." The identity may be numeric or alphanumeric. If numeric the "greatest node id" is determined with numeric comparisons. If alphanumeric, the "greatest node id" is determined lexicographically. Any node that senses that the previous leader has been lost (left the cluster for any reason), messages the other known "nodes with greater ids" asking for an election to take place. Any "nodes with greater ids" tell the "lesser id node" that it will not be the leader. Finally when there are no nodes of "greater id" than a given node--known because it receives no "you aren't the leader" messages--that "greatest id node" declares itself leader. In essence the "node of the greatest id" bullies its way into the leadership position, asserting that it is currently greatest. Any latent message received from a "node of even greater id" than the current declared leader will not take leadership away from it, because it is a healthy leader node.

It's possible that our consensus protocol will be augmented in the future with the Gossip Protocol to support very large clusters. We also have the option to swap out the Bully Algorithm for another, or to provide multiple consensus strategies.

The XOOM Cluster works in full-duplex mode, with two kinds of cluster node communication channels. There are the operational channels, which the cluster nodes use to maintain the health of the cluster. There are also application channels, which the services/applications use to pass messages between nodes. Using two different channel types allows for tuning each type in different ways. For example, application channels may be assigned to a faster network than the operational channels. It also opens the possibility for either channel type to use TCP or UDP independently of the other channel type.

Besides scalable fault-tolerance, the XOOM Cluster also provides cluster-wide, synchronizing attributes of name-value pairs. This enables the cluster to share live and mutating operational state among all nodes. Likewise, application-level services can also make use of cluster-wide, synchronizing attributes in order to enhance shared application/service values.

The following diagram shows a three-node XOOM Cluster.

The cluster depicted here is composed of three nodes. If one node is lost the cluster will still maintain a quorum. However, if two nodes are lost and only one node remains in a running, healthy state, the quorum is lost and the cluster as a whole is considered unhealthy. In that case the one remaining node goes into idle state and awaits one or more of the other nodes to return to a healthy operational state, which will again constitute a quorum and enable a healthy running cluster. A leader is elected using the cluster consensus protocol.

Inter-Node Messaging

Each cluster member maintains communication with the others by means of an operational channel. Over this channel is sent cluster node health information and notifications of nodes joining and leaving the cluster. When a leader is elected, whether when the cluster first starts or when a leader node leaves the cluster, the election and final declaration of leadership are announced over the operational channel. At appropriate times the leader sends cluster directory messages to all nodes so that each node can be aware of all other available nodes.

There is a second kind of channel on each cluster node, the application channel. This channel is used strictly for application-level messages, those between actors that reside on different nodes. There are benefits to separating operational and application messages. Potentially the traffic of each channel could be separated by different networks and use different protocols. For example, the operational channels could be on a slower network because operational messages are fewer and relatively infrequent. The application channel, on the other hand, may need to transport many millions of messages over short timeframes, and thus may require a faster network than the operational one. Further, the protocol used for application channel messages could be different; one having the capacity for much higher throughput.

The actor-to-actor cross-cluster messaging is provided by XOOM Lattice components. The components include those for Reactive Domain-Oriented Microservices Architecture (DOMA) and Domain-Driven Design (DDD) projects, featuring highly concurrent models. The tools include compute grid, actor/object caching, object spaces, cross-node cluster messaging, object aggregates, state aggregates, event sourced aggregates, CQRS with projections, messaging exchanges, and long-running processes (aka Sagas).

Cluster Configuration

Using file-based cluster configuration, this defines the nodes of a three-node cluster, each with a unique op or operations port and app or application port.

node.accounts1.id = 1
node.accounts1.name = accounts1
node.accounts1.host = accounts-svr1
node.accounts1.op.port = 37371
node.accounts1.app.port = 37372

node.accounts2.id = 2
node.accounts2.name = accounts2
node.accounts2.host = accounts-svr2
node.accounts2.op.port = 37371
node.accounts2.app.port = 37372

# highest id, default leader
node.accounts3.id = 3
node.accounts3.name = accounts3
node.accounts3.host = accounts-svr3
node.accounts3.op.port = 37371
node.accounts3.app.port = 37372

When one node connects with another node, it opens a connection on both its operational and application channels. These channels facilitate only unidirectional communication. In other words, when a node receives a message on its incoming operational channel, it does not send an outgoing response to the originating node on that same bidirectional channel. Instead, the responding node uses the operational channel of the client node that originated the message. The directory messages circulated by the cluster leader contain the operational and application address information for each node in the cluster.

Cluster-Wide Attributes

The XOOM Cluster component also provides cluster-wide, synchronizing attributes. This enables the cluster to share live and mutating operational state among all nodes. These attributes may be operational in nature, or application attributes. Either way, the creation, modification, and removal of an attribute is handled over the operational channel (see previous section). Each attribute has a name, any supported simple datatype such as String, Boolean, Integer, Double, etc., and a value.

Any time that a new attribute is created, or when its value is modified, or when it is deleted, the operation is made known to all live cluster nodes, allowing them to synchronize. The attribute operational protocol includes configurable retries, and any node to newly join the cluster is informed of all current live attributes, and then will be included in ongoing activity notifications.

Platform Architecture

The overall platform architecture includes any number of nodes, between 1 and N. If there is more than one node, it is recommended to use the XOOM Cluster (above). Each node, whether one or many as illustrated in the above cluster topography, may use an array of VLINGO XOOM platform components.

In addition to those pictured here, additional components include the XOOM Directory for service registration and discovery, and the XOOM Schemata schema registry.

Each of these components are covered throughout this documentation, most of which may be perused in the Quick Reference or the sidebar Table of Contents.

Last updated