March 7 - June 27, 2024 - Nstream is hitting the road with Confluent on the #DataInMotionTour! / Learn More

Persistence Overview

Why Use NStream Persistence?

NStream persistence is an essential feature for modern, stateful applications. It provides several key advantages:

  1. State Preservation and Recovery: Preserves application state, crucial for scenarios requiring restarts and state regain. Ensures minimal disruption during unplanned outages or maintenance.

  2. Extended Historical State Storage: Allows storing a more extended historical state compared to traditional in-memory storage, which is limited and volatile. Ideal for applications needing long-term historical data access.

  3. High Performance and Scalability: Integrating with RocksDB, it offers high performance and scalability, essential for mission-critical applications demanding fast recovery and efficient handling of large data volumes.

  4. Increased Cache Limits: Significantly increases cache limits beyond typical web agent constraints, allowing storage capacities up to 1 TB without compromising latency.

  5. Reliability and Rollback Capability: Ensures a high degree of reliability with easy rollback processes during outages, aiming for 99.99% uptime.

Setting up NStream Persistence

Configuring NStream persistence involves a few simple steps. The configuration parameters are placed in additional slots within your existing configuration file. These parameters are flexible and can be tailored to meet specific application needs.

  1. Persistence Support: Ensure the nstream.persist.kernel module is on your module path.
  2. Store Implementation: Add at least one of the store implementation modules to your module path.
  3. Key-Value Store Adapter: If you are using a key-value store (e.g. RocksDB), you will also need to add the nstream.persist.kv module.
  4. Kernel Declaration: Add this declaration to your server.recon file:
    @kernel(class: "nstream.persist.kernel.PersistenceKernel")
  5. Fabric Configuration: Specify the store in the fabric tag. This example uses the Cassandra store implementation and the supported configuration options will depend on the specific store used. The implName parameter is only required if more than one store implementation is present on the module path.
    storeName: @store {
         implName: "Cassandra"
         parameters: {
             defaultReplication: 2

Lane Level Configuration

State Restoration on Restart

On restart, the state is restored from the persistence data store. Data in lanes marked with @SwimLane is reloaded, while data in lanes marked with @SwimTransient is not persisted and not restored.

Example Applications

Example applications for each supported store kind va be found at: GitHub - NStream Persistence Examples.

Each example application runs a server with the same agent definition that periodically updates its own state without user input. The state is written to the specified store implementation can will be restored when the application is restarted.

Choosing a Persistence Implementation

Choosing the most appropriate persistence implementation for your use case is a trade-off between simplicity and robustness. Generally, the simplest way to add persistence to your application use to use RocksDB. This will usually be the most performant option (dependent on your storage), however if the data is stored in a local filesystem it will be lost if the server hosting it fails. This can be mitigated, to an extent, by enabling remote snapshots for the database. This requires a separate object store to be available and can still result in the loss of the data received between the most recent snapshot and the failure of the server.

Alternatively, you can use a persistence implementation that uses a fully distributed data store (the currently supported options are Ignite and Cassandra). The primary advantage of these options is that the data is stored entirely separately from your application and is, in general, replicated. This means that, if your application fails, it can be restarted withe the same state that it had at the point of the failure. If you already have a deployment of either of these services, it will likely to be the best choice to use that (if the level of performance is adequate for your needs).

In the case where you do not already have an exiting deployment and are choosing between the two, Ignite will usually have the lowest administration overhead. Cassandra requires that server processes be separate from the NStream server whereas Ignite can be configured to run within the same process. Conversely, if you need or want your data to be state to be stored entirely independently, Cassandra would be the more appropriate option.

Nstream is licensed under the Redis Source Available License 2.0 (RSALv2).