Sonar

This book introduces Sonar, a framework for distributed media archives built upon the Dat stack.

The stack: Foundations of Sonar

Sonar is made up of several components:

The hyperstack

The hyperstack is the technical foundation of the Dat project and the primary implementation of its protocol. It is a collection of modules written in JavaScript that run on the Node.js runtime.

The primary data structure of the hyperstack is Hypercore, a cryptographically secure append-only plus an efficient replication protocol. Each hypercore is identified by a unique public key, and internally uses merkle trees to efficiently verify that all entries to the hypercore are signed by the single matching secret key. The hypercore protocol is a binary protocol to sync two hypercores both sparsely and live over any binary stream (usually, a network socket).

The primary networking scheme of the hyperstack is hyperswarm, a distributed networking stack for connecting peers. It allows processes and machines to find each other on mutual interest in a topic. Peers are found both in the local network and through a distributed hash table of peers. To establish connections through a variety of network configurations hyperswarm uses NAT holepunching.

Furthermore, the hyperstack includes several data structures built on top of Hypercore, namely:

  • hypertrie, a distributed key-value store. It allows to find any key in the keyspace with a small number of lookups, when all you have is the latest entry of an append-only log to start with.
  • Hyperdrive, a distributed file system built upon Hypercore and Hypertrie. It maps the primitives of a POSIX file system onto a hypertrie (for directory and file lookup plus metadata) and an additional hypercore for the actual file contents. On many systems hyperdrives can be mounted as a user-space filesystem that appears like a regular folder on your local device (through hyperdrive-fuse and the hyperdrive-daemon).

All of these data structure are by default single-writer, which means that only one secret key is allowed to publish updates to the data, and that single key is expected to reside only on a single device (otherwise forks would arise, which is not something the stack is designed to make use of and are currently considered data corruption).

Hypertrie and Hyperdrive have a concept of mounts, where specific paths in a tree may point to another tree. Because these data structures are efficient also when only sparsely synced, this opens the door to the idea of a huge grid of interlinked data structures.

The hyperstack is in the process of going though a major version upgrade oftenly dubbed Dat2. It includes Hypercore 8, hypercore protocol 7, hyperdrive 10 and the migration from discovery-swarm to hyperswarm. Sonar only uses this new major version of the stack (which is incompatible to earlier versions), and other Dat projects like Beakerbrowser are in the process of upgrading.

The Kappa architecture

Another strategy to deal with a set of single-writer append-only logs is the Kappa architecture. Its basic concept is that each user has a local database into which data from several single-writer logs are aggregated. Our JavaScript implementation is [kappa-core] plus an emerging set of modules for different types of views (roughly comparable to database tables in that they usually aggregate append-only data e.g. into key-value stores with support for conflict resolution primitives. This ecosystem is developent by different projects that are based around sets of Hypercores, e.g. Cabal and Mapeo and peermaps. Currently, there's ways being developed to nicely deal with these sets of hypercores, asking peers for interesting parts, and replicating those efficiently.

Tantivy

Tantivy is a full-text search engine written in Rust. In Sonar, we integrate Tantivy with our peer-to-peer database through a sonar-tantivy. It uses tantivy as a library, and runs as a standalone binary which is started by a Node.js module.

Sonar

Now what is Sonar? It's our interpretation of the Dat stack, aimed at making it easy to archive, share and discover digitial media files. It's part of a larger (and early-stage) project towards a toolset for community media and the preservation of emancipatory parts of history.

Sonar is a set of modules and a user interface to manage records in a database built on top of hypercores and kappa-core. Sonar also is the integration of that database with tantivy, a fast full-text search engine written in Rust. Finally, Sonar is a sort of a runtime for services and interfaces that interact with these records-stored-in-hypercores.

What follows is a description of the different parts of sonar, their interaction and status.

  • hyper-content-db: v1 uses hyperdrives to store records in json files. v2 will use hypercores as append-only logs of records. A record has a unique id, a schema, and a value. The value usually is any JSON object, the schema describes all records of the same type. The ID is a unique string that is usually assigned by the database when creating new records. The database includes a few basic views to find records by type, id and modification date, to track relationships between records. It also will include ways to deal with schema modification and migration.

    rename to sonar-db?

    dependencies: hypercore, (hyperdrive? should be optional), kappa-core, leveldb

  • [sonar-tantiy] is a binary for the tantivy search engine that includes a few modifications and additions around the managment of schema and indexes. It is written in Rust.

  • [sonar-dat]: Connect a database with tantivy, plus a way to manage many of these databases. Currently based on hyper-content-db v1. Also connects the databases/hypercores to hyperswarm and simple-local-swarm.

  • [sonar-server]: HTTP API to sonar-db and hyperdrives (/ sonar-fs?). Currently this is what "runs all the things".

  • [sonar-ui]: A react single-page application that talks to sonar-server through sonar-client.

  • [sonar-runtime]: A process runner that connects different clients through simple-rpc-protocol. Also will manage access and scopes?

  • sonar-server:

Technical roadmap

Data model

Clients, services and infrastructure

The bigger picture

Contribute

Notes

This section contains various notes and snippets.

Naming

others we like

  • peermaps
  • cobox
  • cabal
  • mapeo

current namings

  • arso: the group of people that we are
  • sonar: the software we're developing

variations

sonar and arso are both taken on npm and github. current npm name is e.g. @arso-project/sonar-server which is not nice.

  • sonarlabs
  • sonare
  • sonarbib
  • sonarbox
  • sonar
  • sonarco
  • sonars
  • sonarpeers
  • sonar

What is a very short description of Sonar?

  • sonar - distributed media archive sonardma
  • sonarcms

words we like

  • p2p
  • peer
  • co llective op rp
  • local-first
  • collective/community/crowd-first

CLI examples

Create a schema

echo '{"properties": { "title": { "type": "string", "title":"Title" },"body":{"type":"string","title":"Body"}, "topic": { "title":"Topic","type": "array", "items": {"type":"string"}}}}' | node cli -i hello  db put-schema doc