<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.3.3">Jekyll</generator><link href="https://rob.tech/feed.xml" rel="self" type="application/atom+xml" /><link href="https://rob.tech/" rel="alternate" type="text/html" hreflang="en" /><updated>2025-02-27T17:33:08+00:00</updated><id>https://rob.tech/feed.xml</id><title type="html">rob.tech</title><subtitle></subtitle><author><name>Robert Habermeier</name></author><entry><title type="html">Releasing NOMT v1.0-preview</title><link href="https://rob.tech/blog/nomt-1.0-preview/" rel="alternate" type="text/html" title="Releasing NOMT v1.0-preview" /><published>2025-02-24T00:00:00+00:00</published><updated>2025-02-24T00:00:00+00:00</updated><id>https://rob.tech/blog/nomt-1.0-preview</id><content type="html" xml:base="https://rob.tech/blog/nomt-1.0-preview/"><![CDATA[<p>Today, I’m happy to announce the preview release of NOMT 1.0. For the last year, we at <a href="https://thrum.dev">Thrum</a> have been building <a href="https://github.com/thrumdev/nomt">NOMT</a> as a custom blockchain state database solution combining the throughput of Solana with the interoperability of a merkle tree.</p>

<p>NOMT is resilient to crashes and drive failures of various kinds through the use of standard database design techniques: write-ahead logging and shadow paging. We have built a custom simulation and testing tool, codenamed “torture”, which has been used to ensure reliability in an enormous combination of workloads in the presence of random crashes.</p>

<p>Our performance target for NOMT was to achieve 25,000 transfers per second with a database of 1 billion items on low-cost (&lt; $1200) consumer hardware, and I am pleased to announce we have cleared this benchmark. For smaller databases with fewer than 128 million entries, NOMT can achieve over 50,000 transfers per second.</p>

<p>NOMT is written in Rust and has a thread-safe API, useful for parallel VMs, and includes optional in-memory overlays for handling unfinalized blocks.</p>

<p>We expect NOMT to scale well with the rapid advancement of NVMe SSDs as a technology. SSDs are getting faster at an incredible rate, with new technologies such as PCIv5 (a higher-throughput communication bus) and 5th generation controllers such as the Phison E28 on the horizon.</p>

<p>The intended use-case for NOMT is as a foundational building block for high-throughput blockchain nodes: for use in SDKs and custom node implementations. It is unopinionated on data formats and supports values up to gigabytes in size. The first intended user is the <a href="https://sovereign.xyz">Sovereign SDK</a> and this work has been supported with a grant by Sovereign Labs.</p>

<h2 id="benchmarks-and-methodology">Benchmarks and Methodology</h2>

<p>We performed benchmarks on a machine with the following specifications:</p>
<ul>
  <li>AMD Ryzen 7950X 16-core CPU ($500 USD)</li>
  <li>64GB DDR5 RAM ($150)</li>
  <li>Corsair MP600 Pro LPX (4TB) ($300 USD) OR Samsung 990 Pro (2TB) ($167 USD)</li>
</ul>

<p>The total hardware cost for this setup comes out to less than $1200, including the motherboard and power supply.</p>

<p>Our benchmark scenario emulates a simple value transfer from a random existing account to a random target account. We tested 3 scenarios: transferring to already-existing accounts, transferring to 50% new accounts, and transferring to 100% new accounts. This benchmark performs two random queries, one for each account and then updates. These scenarios are implemented within our custom benchmarking tool <code class="language-plaintext highlighter-rouge">benchtop</code>.</p>

<p>In each benchmark, we generated a merkle witness, gathering all the merkle nodes necessary to prove the reads and writes.</p>

<p>In two key ways, this is a “worst-case” benchmark: most blockchain usage follows a power-law 
distribution, where something like 80% of the state reads are on 20% of the state. Fully random
accesses are the absolute worst-case for a blockchain, but are needed to serve the long tail of 
global usage rather than the fat tail of power usage. Furthermore, generating merkle proofs 
including all read data adds additional overhead which could be skipped on a normal validating 
full node.</p>

<p>We ran each test for 5 minutes to appropriately measure the steady-state performance of the drive.</p>

<p>Random access is the worst-case scenario for blockchains, so the workload under power-law distributions is expected to be higher.</p>

<p><strong>Random Transfers: 0% fresh (updates only)</strong></p>

<table>
  <tbody>
    <tr>
      <td> </td>
      <td>67M items</td>
      <td>128M items</td>
      <td>1B items</td>
    </tr>
    <tr>
      <td>Corsair MP600 Pro LPX (4TB)</td>
      <td>55.7k TPS</td>
      <td>49.8k TPS</td>
      <td>27.1k TPS</td>
    </tr>
    <tr>
      <td>Samsung 990 Pro (2TB)</td>
      <td>37.8k TPS</td>
      <td>33.6k TPS</td>
      <td>22.7k TPS</td>
    </tr>
    <tr>
      <td> </td>
      <td> </td>
      <td> </td>
      <td> </td>
    </tr>
  </tbody>
</table>

<p><strong>Random Transfers: 50% fresh</strong></p>

<table>
  <tbody>
    <tr>
      <td> </td>
      <td>67M items</td>
      <td>128M items</td>
      <td>1B items</td>
    </tr>
    <tr>
      <td>Corsair MP600 Pro LPX (4TB)</td>
      <td>55.3k TPS</td>
      <td>48.1k TPS</td>
      <td>21.6k TPS</td>
    </tr>
    <tr>
      <td>Samsung 990 Pro (2TB)</td>
      <td>35.1k TPS</td>
      <td>31.2k TPS</td>
      <td>20.9k TPS</td>
    </tr>
    <tr>
      <td> </td>
      <td> </td>
      <td> </td>
      <td> </td>
    </tr>
  </tbody>
</table>

<p><strong>Random Transfers: 100% fresh</strong></p>

<table>
  <tbody>
    <tr>
      <td> </td>
      <td>67M items</td>
      <td>128M items</td>
      <td>1B items</td>
    </tr>
    <tr>
      <td>Corsair MP600 Pro LPX (4TB)</td>
      <td>53.0k TPS</td>
      <td>42.1k TPS</td>
      <td>20.1k TPS</td>
    </tr>
    <tr>
      <td>Samsung 990 Pro (2TB)</td>
      <td>33.9k TPS</td>
      <td>29.0k TPS</td>
      <td>18.9k TPS</td>
    </tr>
  </tbody>
</table>]]></content><author><name>Robert Habermeier</name></author><summary type="html"><![CDATA[A rapid merkle trie state database: Solana-like throughput on hardware less than $1200]]></summary></entry><entry><title type="html">Proposing the Plaza: A Batteries-Included, Scalable Polkadot System Chain</title><link href="https://rob.tech/blog/plaza/" rel="alternate" type="text/html" title="Proposing the Plaza: A Batteries-Included, Scalable Polkadot System Chain" /><published>2024-06-17T00:00:00+00:00</published><updated>2024-06-17T00:00:00+00:00</updated><id>https://rob.tech/blog/plaza</id><content type="html" xml:base="https://rob.tech/blog/plaza/"><![CDATA[<p><img src="/assets/images/plaza.png" alt="" /></p>

<p>Today, I’d like to present a plan for Polkadot to consolidate functionality into a single 
highly-scalable system chain as a hub for users, developers, liquidity, and apps. 
This system chain would be an evolution of the current AssetHub chain, codenamed “Plaza”, 
which already has wallet, bridging, and tooling integrations that can be preserved to build momentum.</p>

<p>I am proposing this as an individual participant in Polkadot governance and this post contains
only personal opinions.</p>

<p>Concretely, the proposal is to consolidate all the following features into a single system parachain,
to be evolved from AssetHub and brought up to the maximum possible scale:</p>
<ol>
  <li>Asset Issuance [Currently on AssetHub]</li>
  <li>Smart Contracts (Rust and EVM via RISC-V/PolkaVM) [Proposed in https://polkadot.polkassembly.io/referenda/885]</li>
  <li>Staking [Currently on the Relay Chain]</li>
  <li>Bridging Pallets [Currently on the BridgeHub parachain]</li>
  <li>Near-zero fees (until scaling limits are reached)</li>
</ol>

<p>I refer to this chain with the codename “Plaza” from here on out.</p>

<p>The current trend of Polkadot is to split these functionalities across several different chains,
and for wallets, applications, and users to coordinate their activities across these chains. This
approach has come with real costs, without a current driving scaling pressure to merit this level
of fragmentation.</p>

<p>Essentially, we should be strategic and concentrate hard on usability now that we’ve
laid the groundwork for scalability. There are two ways to do this. Option 1 is the status quo:
we spend a lot of money and time to build the best asynchronous composability framework possible and
hope this is easy enough to compete with synchronous systems once it is done. Option 2 is to focus 
scaling resources and energy on building a synchronous system which we know will have scaling 
limits and let organic scaling pressures lead the way once those limits are saturated. This post is
focused on “Option 2”.</p>

<p>Polkadot will soon have <a href="https://wiki.polkadot.network/docs/learn-elastic-scaling">Elastic Scaling</a>, where a single chain will be able to use
a large number of cores to process transactions - logically sequential, synchronously composable,
and validated in parallel across many cores. I believe we should use this to our advantage to create 
a “hub” to focus UX, integration, and developer efforts on, and we should lay the groundwork for 
this today. We can scale this chain to thousands of TPS today and much further over time. The
scaling limits of a single chain are going to be saturated, very conservatively, with tens of 
millions of daily users.</p>

<p>Smart contract support is absolutely crucial. Assets, staking, bridging, and apps benefit from
generalizable programmability. This is lacking in the current Polkadot landscape, with smart
contracts often in different chains altogether from the assets or systems they aim to interact with. 
We can support RISC-V smart contracts via <a href="https://forum.polkadot.network/t/announcing-polkavm-a-new-risc-v-based-vm-for-smart-contracts-and-possibly-more/3811">PolkaVM</a>, and with it, gain support for new 
languages like ink! as well as interpreted EVM.</p>

<p>Here are the specific difficulties the Plaza plan can address. 
For users, interacting with multiple chains is more complex,
requiring them to juggle assets, accounts, and state across several different chains. Wallet and
frontend developers have taken on large amounts of work to make this easier, but are still less
than perfect. For developers, the time and money costs of building a chain, coordinating a collator 
set, handling block explorers, indexing, bridging, and exchange integrations add a large amount of
overhead against building the products they wish to bring to the world.</p>

<p>These costs are worthwhile and even necessary once the scaling limits of a single chain have been 
reached, but are a poor trade-off until that point. While many projects do benefit from having their
own chain, the long tail of developers and users are both better served by smart 
contract platforms. Polkadot has enough cores to support chain builders and smart contract builders
together.</p>

<p>We should work together as an ecosystem to bring a batteries-included single chain up to its point 
of bursting and only then relieve the pressure by spinning out apps, users, and system functionality.
Polkadot has the cores to support all of this to the level of scale the world needs. 
The city needs to grow outwards from the center, and that center should be the Plaza.</p>

<p>Although not included in the list of core functionality, there is a case to integrate Polkadot’s 
identity and governance functionality to the Plaza over time. 
Tight integration between smart contracts, assets, identity, and governance enables powerful 
scripting and automation primitives to enhance these systems and broaden their usage and could be 
considered for inclusion in the Plaza.</p>

<p>With engineering advances like <a href="../introducing-nomt">NOMT</a>, Optimistic Concurrency Control, and ZK Merkle Proving,
we can build this “plaza” chain to support hundreds of millions of transactions per day over time.
I am not talking about using supercomputer sequencers, but just normal machines with good
software engineering. Accessibility for full nodes to join the network should remain a priority 
and we don’t need to compromise on this.</p>

<p>Another element of the Plaza is the potential for value accrual to DOT through transaction ordering
priority fees. I <a href="https://x.com/rphmeier/status/1797339044893917397">have written on twitter</a>
recently that I don’t believe trying to <em>sell all the blockspace</em> is the best strategy, particularly 
because <a href="../coprocessor-competition">the price of cryptoeconomic compute is bounded-above by the cost of ZK</a>
and because Polkadot has been so successful in scaling raw compute. However, not all blockspace
is created equal. Oftentimes, having the first transaction in a block is valuable in itself, even 
when blocks are nowhere near full. <strong>This is important: even when blockspace is abundant, the first 
transaction in a batch is a scarce resource that people will pay for.</strong> To take advantage of this,
we need synchronous composability and programmability.</p>

<p>One example of this phenomenon is in market making:
when the price of an asset moves between blocks, the first transaction often has access to a “free”
arbitrage. With high concentrations of liquidity, this arbitrage can be quite large and
being the first user to make that trade in each block is a good worth paying for. 
This race to be first is always the case in financial markets, and presents a viable opportunity 
for token value accrual, for example, by burning priority fees in part. The story I present here is
one where the median fee is near zero but the mean fee is enough to cover the cost of blockspace.</p>

<p>Many of the things I’ve discussed are already being implemented. 
For example, there is already a proposal underway
to reduce existential deposits on AssetHub <a href="https://polkadot.subsquare.io/referenda/857">here</a> and
discussions further reductions in fees.</p>

<p>A proposal for PolkaVM/Risc-V contract execution on AssetHub is being voted 
on <a href="https://polkadot.polkassembly.io/referenda/885">here</a> and is ready to be developed.</p>

<p>This proposal is about putting a larger story behind these so-far uncoordinated actions, adding
additional changes, and coordinating the ecosystem behind this direction.</p>

<p>Whether this plan is eventually put in motion will depend on the results of a 
Polkadot Wish-For-Change Referendum to approve the Plaza plan, which includes roughly 
the following goals:</p>
<ul>
  <li>To change the name of AssetHub to something that reflects a broader purpose as an ecosystem hub.</li>
  <li>To focus on scaling “Plaza” as far as possible using Polkadot cores and without compromising on
full node capabilities.</li>
  <li>To make Plaza a key focus of marketing, DevRel, and developer eduction programs.</li>
  <li>To add contract/scripting capabilities to “Plaza”.</li>
  <li>To reduce the fees on “Plaza” to the minimum possible sustainable quantity.</li>
  <li>To introduce a priority-fee mechanism to “Plaza” as a route to value capture.</li>
</ul>

<p>Implementing some of these goals will require follow up referendums. Some of them will require 
fellowship RFCs and technical planning. Some will just require small pull requests.</p>

<p>This post is a precursor to a formal proposal to Polkadot Governance, 
and is an invitation for discussion, debate, and collaboration, as well as a call to action.</p>

<p>My vision here is that this will be the New York City, Dubai, London, or Shenzhen of the Polkadot
continent, the first megacity of many and a precursor to greater expansion. We can implement this
plan eagerly, knowing that when the Plaza is saturated Polkadot (or JAM) has the raw validated 
compute power needed to handle that expansion. Here’s to doing things that don’t scale, but scale
far enough to get us to the next era.</p>]]></content><author><name>Robert Habermeier</name></author><summary type="html"><![CDATA[A strategy for Polkadot to consolidate until scaling limits are reached.]]></summary></entry><entry><title type="html">Introducing NOMT</title><link href="https://rob.tech/blog/introducing-nomt/" rel="alternate" type="text/html" title="Introducing NOMT" /><published>2024-05-19T00:00:00+00:00</published><updated>2024-05-19T00:00:00+00:00</updated><id>https://rob.tech/blog/introducing-nomt</id><content type="html" xml:base="https://rob.tech/blog/introducing-nomt/"><![CDATA[<p>Today, <a href="https://thrum.dev">we (Thrum)</a> are introducing our first Proof-of-Concept of NOMT, the Nearly-Optimal Merkle Trie
Database. NOMT is inspired by this <a href="https://sovereign.mirror.xyz/jfx_cJ_15saejG9ZuQWjnGnG-NfahbazQH98i1J3NN8">blog post by Preston Evans</a> of
Sovereign Labs, and we are targeting Sovereign-SDK as the first user. Sovereign Labs has funded
the implementation work with a grant.</p>

<p>NOMT is a permissively-licensed, single-state, merklized key-value database targeted at modern SSDs, 
optimized for fast read access, fast witnessing, and fast updating. NOMT supports efficient merkle 
multi-proofs of reads and updates. It is intended to be embedded into blockchain node software and 
is unopinionated on data format. Our vision for NOMT is for it to become the primary state database 
driver for performant blockchain nodes everywhere.</p>

<p>Against a database of 2^27 (~134M) accounts and a single-threaded execution engine, NOMT can read, 
modify, commit, prove, and write an update of 50,000 accounts in 1151 milliseconds, with a peak SSD 
read of 2.42G/s and a peak SSD write of 822MB/s. The actual trie commit, prove, and write operation 
takes only 431ms, so the main bottleneck is reading state during execution, not merklization. See the last section for more information 
on our benchmarking methodology, as well as comparisons to other databases.</p>

<p>Code is available at <a href="https://github.com/thrumdev/nomt">https://github.com/thrumdev/nomt</a> and is written in Rust.</p>

<p>This is only our first proof-of-concept, after beginning the project 2 months ago. We intend to
optimize this much further over the next 6 months.</p>

<h2 id="why-optimize-this">Why Optimize This?</h2>

<p>We optimize state access and merklization because they are the number one real bottleneck in
scaling blockchains, currently.</p>

<p>Block execution and comitting is disk I/O bound: reading and writing information to
the disk. Whether it’s for a node proposing a block, validating it for consensus, or simply
receiving and executing it, transactions may read from unpredictable accounts or contract data 
and this will need to be fetched.</p>

<p>Reading from RAM takes a few hundred nanoseconds. Reading from an SSD can take a few hundred 
microseconds. Frequently-accessed data can be cached, but not all the data.</p>

<p>Disk I/O overheads dwarf the overhead of actually executing transactions, by comparison.
These transactions need state data: account balances, smart contract state, smart contract code, or
their analogues in non-smart-contract based systems. While the most frequently used bits of state
are usually cached in a node’s RAM, large state databases with billions of entries are simply too 
large to fit into RAM. Execution pauses while waiting on this state data.</p>

<p>Disk I/O is compounded by the need for <strong>state merklization</strong>: organizing the blockchain’s state
into a <strong>merkle tree</strong>, which allows light clients (including bridges and zero-knowledge circuits)
to efficiently inspect the value of any particular piece of state based only on a tiny root
commitment or verify that a series of updates to one state leads to a new root commitment. Not all 
blockchain systems use a merkle trie to store state. Bitcoin and Solana are two
notable examples of systems which don’t. In our opinion, merklizing the state of large smart
contract systems is critical for enabling global, permissionless access to blockchain systems.</p>

<p>NOMT in particular is solving for disk I/O and computation as it relates to state merklization,
while being friendly to the critical path of block authorship.</p>

<h2 id="the-solutions">The Solutions</h2>

<p>NOMT aims to prove that blockchains don’t need to sacrifice state verifiability for scaling.</p>

<p>NOMT has been built according to the following design goals:</p>

<ol>
  <li><strong>Read from flat key-value storage during block execution</strong>. Most modern blockchain architectures
 are bottlenecked on the speed of block execution, and in particular, block building. Traversing
 a merkle trie data structure during this stage will lead to unnecessary latency and reduce
 scaling potential. Keeping a flat store of all key-value pairs makes block execution fast.</li>
  <li><strong>Pre-fetch merkle trie data aggressively</strong>. Modern SSDs are particularly good at parallel 
 fetches. We optimize for having as many parallel, in-flight requests for merkle trie data as 
 possible at any given time, even going as far as pre-fetching trie data based on hints from
 the user’s reads and writes before block execution has concluded.</li>
  <li><strong>Update the trie and generate witnesses while waiting on the SSD</strong>. Reading from an SSD necessarily 
 incurs some  latency. We keep the CPU busy whenever we are waiting on data, so updates are
 “free” and wedged between SSD read waits.</li>
  <li><strong>Optimize merkle trie storage for SSD page size</strong>. This is the main innovation from Preston’s
 post. SSDs keep data in “pages” of 4KB, and reading a single byte from a page requires loading
 the whole page. It’s better to pack your data into whole pages and minimize the number of pages
 which need to be read from the SSD. See Preston’s post for some more information on this.</li>
</ol>

<p>We currently have built on top of RocksDB as a storage backend for trie data. RocksDB is an embedded
key-value store from Facebook, which is widely used in blockchain node implementations. In our next 
milestones, we plan to implement a customized database backend to further improve the performance of 
NOMT.</p>

<p>The theoretical maximum performance of NOMT will be achieved when we can max out I/O Operations Per 
Second (IOPS) on a consumer SSD. Despite our initially strong benchmarks, there is still quite a 
lot of room for improvement.</p>

<h2 id="benchmarking">Benchmarking</h2>

<p>We performed a cursory benchmark of NOMT, sov-db (from the Sovereign SDK), and sp-trie (from Substrate).</p>

<p>The benchmarks were performed on a machine with these specs:</p>
<ul>
  <li>AMD Ryzen 7950X 16-Core Processor</li>
  <li>64GB RAM</li>
  <li>Corsair MP600 Pro LPX 4TB SSD (1M IOPS)</li>
  <li>Linux 6.1.0-17</li>
  <li>Ext4 Filesystem</li>
</ul>

<p>Our backends:
  NOMT: <code class="language-plaintext highlighter-rouge">f97d3c418</code> (recent master branch)
  sp-trie: 32.0.0
  sov-db: <code class="language-plaintext highlighter-rouge">2cc0656df</code> (https://github.com/sovereign-labs/sovereign-sdk)</p>

<p>We created one database of 2^27 accounts for each of our test targets. Each account is a 256-bit key
with a 64-bit “balance” value.</p>

<p>The benchmark performed 1,000,000 “transfer” operations. In each operation two deterministically 
pseudo-random accounts are selected, and the balance of one account is read and incremented while 
the other is read and decremented. In 25% of the runs, the incremented account is fresh.</p>

<p>In two key ways, this is a “worst-case” benchmark: most blockchain usage follows a power-law 
distribution, where something like 80% of the state reads are on 20% of the state. Fully random
accesses are the absolute worst-case for a blockchain, but are needed to serve the long tail of 
global usage rather than the fat tail of power usage. Furthermore, generating merkle proofs 
including all read data adds additional overhead which could be skipped on a normal validating 
full node.</p>

<p>All benchmarks used the same random seed, and operations were split into 40 batches each of 25,000
operations - a “workload”. The OS page cache was cleared before each run. We recorded peak I/O usage 
with iotop at a sampling frequency of 10Hz to collect our I/O stats data.</p>

<p>The backends each committed the changes to the database and generated a merkle proof necessary to
prove all changes.</p>

<p>We can observe that in all benchmark runs, the amount of read throughput falls dramatically after
a short peak early on, and then continues to slowly fall. This is because the Linux page cache is
beginning to cache most of the commonly-read pages.</p>

<p>Without further ado, here are our results:</p>

<h3 id="nomt-64-reader-threads">NOMT (64 reader threads)</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>22:17:28
metrics
  page requests         37753709
  page disk fetch mean  232 us
nomt
  mean workload: 1151 ms
  mean read: 13159 ns
  mean commit_and_prove: 431 ms
22:18:18
</code></pre></div></div>

<p>note: NOMT has been instrumented with some additional metrics and has additional data.</p>

<figure>
    <img src="/assets/images/introducing_nomt/io_nomt_64.png" />
</figure>

<h3 id="sp-trie-substrate">sp-trie (Substrate)</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>22:39:28
sp-trie
  mean workload: 12530 ms
  mean read: 212 us
  mean commit_and_prove: 874 ms
22:47:59
</code></pre></div></div>

<figure>
    <img src="/assets/images/introducing_nomt/io_sp_trie.png" />
</figure>

<h3 id="sov-db-sovereign-sdk">sov-db (Sovereign SDK)</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>22:48:28
sov-db
  mean workload: 17075 ms
  mean read: 17754 ns
  mean commit_and_prove: 16144 ms
22:59:52
</code></pre></div></div>

<p>sov-db is based off of the Jellyfish Merkle Trie implementation from Diem/Aptos and inherits quite a 
lot of their code, with some modifications by Penumbra and Sovereign.</p>

<figure>
    <img src="/assets/images/introducing_nomt/io_sov_db.png" />
</figure>

<h2 id="interpreting-the-results">Interpreting the Results</h2>

<p>NOMT is the clear winner, but still has quite a lot of head room. Note that the 64 reader threads
are mostly idle and are a result of using synchronous disk I/O APIs - using more threads enables us 
to saturate the SSD’s I/O queue, but the same effect could be achieved with fewer threads and 
an async I/O backend like <code class="language-plaintext highlighter-rouge">io_uring</code> or <code class="language-plaintext highlighter-rouge">io_submit</code>.</p>

<p>sp-trie has an extremely low random read speed, because it traverses the merkle trie on each read.
This is exactly why we have chosen to keep a flat store for key-value pairs, so as not to block
execution on trie traversals.</p>

<p>sov-db has a flat store like NOMT, but has much more data due to being an archival database and 
keeps the trie nodes in the same RocksDB “column” as the trie nodes. This leads to slower reads.</p>

<p>NOMT’s peak read of 2415MiBs is equal to the value given with a <code class="language-plaintext highlighter-rouge">fio</code> run on the same machine and corresponds to 618k IOPS:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; fio --name=fiotest --numjobs=16 --filename=test3 --size=35Gb --rw=randread --bs=4Kb
 --direct=1 --ioengine=io_uring --iodepth=16 --startdelay=5 --group_reporting --runtime=30

fio-3.33
Jobs: 16 (f=16): [r(16)][100.0%][r=2428MiB/s][r=621k IOPS][eta 00m:00s]
fiotest: (groupid=0, jobs=16): err= 0: pid=3382952: Sun May 19 14:35:18 2024
  read: IOPS=618k, BW=2415MiB/s (2533MB/s)(70.8GiB/30002msec)
    slat (nsec): min=751, max=201767, avg=1267.80, stdev=378.84
    clat (usec): min=14, max=9398, avg=826.52, stdev=456.98
     lat (usec): min=15, max=9399, avg=827.79, stdev=456.99
</code></pre></div></div>

<p>Notably, all databases we benchmarked have a large read spike at the beginning. This is likely
an artifact of RocksDB.</p>

<p>NOMT’s typical disk read throughput is around 300MiB/s, corresponding to only ~75k IOPS. It is
also still quite short of the total 1M IOPS advertised by the manufacturer. Our goal is to utilize
all these IOPS with a custom disk backend.</p>

<p>In any case, we are reaching a point where state merklization is not the main bottleneck, and state 
read performance is - meaning that nodes using NOMT could reach throughtput levels in the same
neighborhood as Solana without compromising on performance, state merklization, or greatly 
increasing hardware requirements.</p>

<p>Optimizing state read (as opposed to merkle trie read) performance is somewhat beyond the scope of 
NOMT. Using multiple threads rather than a single thread would likely yield better performance to parallelize state reads. In a blockchain setting, serializable execution and optimistic concurrency
control may be used to achieve this.</p>

<p>Watch this space as we max out SSD throughput with NOMT.</p>]]></content><author><name>Robert Habermeier</name></author><summary type="html"><![CDATA[The Nearly-Optimal Merkle Trie]]></summary></entry><entry><title type="html">Coprocessor Market Structure: Cryptoeconomic vs ZK</title><link href="https://rob.tech/blog/coprocessor-competition/" rel="alternate" type="text/html" title="Coprocessor Market Structure: Cryptoeconomic vs ZK" /><published>2024-03-04T00:00:00+00:00</published><updated>2024-03-04T00:00:00+00:00</updated><id>https://rob.tech/blog/coprocessor-competition</id><content type="html" xml:base="https://rob.tech/blog/coprocessor-competition/"><![CDATA[<figure>
    <img src="/assets/images/proofs_vs_incentives/price-elasticity.png" />
    <figcaption>Cryptoeconomic coprocessors cost less but are more price elastic than cryptographic ones.</figcaption>
</figure>

<p>Last night I had the opportunity to speak at <a href="https://www.nebra.one/">NEBRA</a>’s “Proof Singularity” event as part of a friendly “heated debate” on the relative merits of programmable cryptography and programmable incentive systems - in the context of scaling blockchains. We spoke, in effect if not in name, about coprocessors, and what the resounding effects of ZK on validated computation would be. The panel was recorded, and while I recommend watching the video once it’s live, I’ll also lay out my predictions of the market structure of coprocessors in more detail here.</p>

<p>Our discussion is largely about <strong>coprocessors</strong>, which are a category of systems that can perform computations off-chain while proving or otherwise guaranteeing their results on-chain. A ZK coprocessor is one which uses Zero-Knowledge techniques to produce succintly verifiable cryptographic proofs of computations. A cryptoeconomic coprocessor is one which uses value-at-stake and an interactive protocol to provide strong guarantees of a computation’s correctness. For a deeper primer on cryptoeconomic coprocessors, I recommend a look into <a href="https://blog.polytope.technology/cryptoeconomic-coprocessors">this article from Polytope Labs</a>. The challenge I have given myself here is this: <strong>I will make extremely strong assumptions about the possible future of cryptographic coprocessors, and evaluate the future market structure for coprocessors on that basis</strong>.</p>

<p>I’ll insert a brief detour to address the point that ZK coprocessors and cryptoeconomic ones are not identical in capabilities. This is true. Despite that, they are substitutable for many use-cases. ZK coprocessors are able to handle large quantities of data without imposing additional overhead on the blockchain, which implies that cryptoeconomic coprocessors will be better for tasks that are relatively computation-heavy rather than data-heavy. ZK, notably, can also provide user privacy. On the other hand, cryptoeconomic coprocessors have access to hash functions, cryptography, data structures, memory layouts, and hardware advantages that ZK ones don’t. They’re not exactly the same, and so they are only partially-substitutable.</p>

<p>Let’s cut to the chase and turn to the key question of the article: <strong>what will the market structure for coprocessors look like once ZK coprocessors are in full swing?</strong> Based on an analysis of economic principles, I draw three conclusions. The first is that <strong>cryptoeconomic coprocessors will be able to sustainably undercut ZK coprocessors on price</strong>. The second is that <strong>there is an effective price ceiling for cryptoeconomic coprocessors</strong>. The third is that <strong>cryptoeconomic coprocessors can operate profitably in this environment</strong>.</p>

<p>I focus my analysis on two different properties of coprocessors: their cost and their price-elasticity. Cryptoeconomic coprocessors are lower cost, but also much more price-elastic than ZK. The reason for this price elasticity is that they rely on a set of specific validator nodes, a non-substitutable good. The lower cost allows cryptoeconomic coprocessors to undercut, and the price elasticity implies that their ability to do so is bounded. The open questions are just about how much demand it takes to saturate cryptoeconomic coprocessors.</p>

<p>Proof Aggregation is a technology so powerful that it seems to border on magic. To briefly recap, it enables practically unlimited amounts of zero-knowledge proofs to be aggregated into a single small proof that can be extremely cheaply verified. When used in a blockchain context, this means that enormous amounts of computation can be proven and aggregated off-chain and a very small “receipt” of all of these computations can be posted to the blockchain to verify, while placing scarcely any load on the blockchain nodes themselves. Nifty.</p>

<p>But there are limits to what you can accomplish with proving alone. It’s worth taking a step back and anchoring this discussion in the main purposes of consensus systems. Consensus systems exist primarily for the ongoing ordering of events. The job of a consensus algorithm is to choose from many possible valid orderings of events. The most famous version of this property is the Double-Spend Problem. While it would be valid for me to send my coins to either Alice or Bob, I cannot send them to both. The consensus protocol must choose one of these competing futures if I try to double-spend. I’ll also note here that the utility of aggregation is somewhat undermined by the desire of block-builders to reorder transactions for profit and capital efficiency. As a consequence of this need for ordering, a pure ZK proponent might posit a future where a consensus protocol is used for nothing other than ordering aggregated cryptographic proofs.</p>

<p>The economics point to a more likely outcome being a balance between the two categories of coprocessors. The primary reason for that is the cost differential of the two. ZK Proving is expensive, and proving hardware costs money even if the burden on the chain is amortized to near-zero. We will focus on these hardware costs and charitably assume aggregation and verification are zero-cost. <a href="https://twitter.com/_weidai/status/1732436027388871100">Wei Dai (1KX Research) estimated a 6-order-of-magnitude difference in hardware cost</a> to prove something versus executing it. I’ve arrived at a similar factor when comparing the <a href="https://twitter.com/eduadiez/status/1623723409115938820">stated cost-per-gas of Polygon ZkEVM</a> with the raw hardware costs of validating computations on Polkadot. These costs may come down substantially, but even the most optimistic estimates assume at least one order of magnitude difference. All this to say: for the foreseeable future, there are <em>serious cost differentials</em>, and ZK’s costs are cryptoeconomic coprocessors’ potential profit margins.</p>

<p>I’ll write a bit about Polkadot here to establish where this cost differential lies, though Polkadot is not the focal point of this article. Polkadot provides a set of cryptoeconomic coprocessors we call cores. They can execute arbitrary WebAssembly, and soon RISC-V, code. Each core requires between 30-40 validators to execute each computation. This scales sub-linearly with the number of total validators. The hardware cost of processing a computation on Polkadot can be slightly overestimated with the cost of renting 30-40 decent single-cpu VPSs for 6 seconds.</p>

<figure>
    <img src="/assets/images/proofs_vs_incentives/cost-differential.png" />
    <figcaption>The cost difference between cryptoeconomic and cryptographic approaches.</figcaption>
</figure>

<p>Beyond those cost differentials, we must also look at price-elasticity: how demand affects the price of each option. ZK coprocessors have the edge here, because proving can be done on any GPU in the world - a veritable ocean of hardware. In contrast, a cryptoeconomic coprocessor relies on the hardware of specific validator nodes, and all demand for the coprocessor is proxy demand for those specific machines. Consensus systems relying on non-substitutable validators to validate blocks is a huge pain point: it makes those validators a potential bottleneck for the system, with fee spikes ensuing during periods of high demand. ZK coprocessors, on the other hand, can simply add more GPUs with near-zero price impact. This causes the ZK price/demand curve to look much like a flat line, while the cryptoeconomic price/demand curve has increasing upward slope in relation to demand.</p>

<figure>
    <img src="/assets/images/proofs_vs_incentives/price-equilibrium.png" />
    <figcaption>Cryptography sets a price ceiling for cryptoeconomic coprocessors: profits and market share are bounded. But by how much?</figcaption>
</figure>

<p>The effect that I expect to see is that <strong>ZK-proof aggregation establishes a (soft) price ceiling for cryptoeconomically validated computation</strong>. All else equal, why would someone want to pay <em>more</em> to validate their computation on a cryptoeconomic coprocessor than a cryptographic one? There are some reasons, like state locality and liquidity, that I’ll put aside for now. Despite this price ceiling, the potential profit margins for cryptoeconomically validated computation are very large. This implies three relevant questions for the future of cryptoeconomic coprocessors:</p>
<ol>
  <li>How well can cryptoeconomic coprocessors scale with more validator nodes? In the diagram, this corresponds to the slope of the green cryptoeconomic curve.</li>
  <li>How low can cryptoeconomic coprocessors keep their hardware costs? In the diagram, this is the y-intercept of the cryptoeconomic curve.</li>
  <li>How applicable is <a href="https://en.wikipedia.org/wiki/Jevons_paradox">Jevons’ paradox</a> to validated computation? This is the point on the x-axis where real demand will lie.</li>
</ol>

<p>The answer to the first question will tell us how price elastic validated computation cryptoeconomic systems are and how much demand they can absorb. The second tells us how efficient ZK proving has to get before the cost differential disappears. And the third tells us how likely it is for abundant cheap computational power to be consumed by demand. This third is very important - one argument I anticipate to this post is that ZK coprocessors will simply be “cheap enough” and therefore users won’t seek alternatives. But if Jevons’ paradox is applicable, the volumes of validated computation consumed by users will be large enough where these margins do matter. We can only speculate. Personally, I see Solana as a point in favor of Jevons’ paradox as applied to validated computation.</p>

<figure>
    <img src="/assets/images/proofs_vs_incentives/profit-margin.png" />
    <figcaption>An illustration of the profit margins afforded to cryptoeconomic coprocessors. Note that there is currently a 6 order of magnitude difference in cost. Optimistically they may reach 1 order of magnitude at a minimum, but strong margins will still persist.</figcaption>
</figure>

<p>Beyond just undercutting ZK coprocessors, cryptoeconomic coprocessors can be profitable while doing so. We’ve established that cryptoeconomic coprocessors will only be able to undercut ZK up to some equilibirum point, which is where ZK is approximately the same price. Assuming ZK compute is not sold below cost (although in practice it may be temporarily subsidied at large expense to capital), the cost differential between the two approaches is the potential profit margin for cryptoeconomic coprocessors. Even with the most charitable assumption that ZK will eventually get to one order of magnitude of cost away from standard compute, strong profit margins will be sticky for cryptoeconomic coprocessors relative to ZK ones.</p>

<p>Here are a few disclaimers to round things out. First, I hold an emotional and financial stake in Polkadot, which uses cryptoeconomic coprocessor technology. Second, I don’t intend for this post to be read as somehow being “against” cryptographic approaches. They are extremely powerful and further the mission of providing private, scalable computation for every individual on the planet, one I am very much aligned with. All that said, I believe my mental model here is robust and makes the case for the stickiness of cryptoeconomic coprocessors quite clear and I have written it with my best effort of impartiality.</p>

<p>Thanks to Pepyakin, Matti (ZeePrime) for review and proofreading.</p>]]></content><author><name>Robert Habermeier</name></author><summary type="html"><![CDATA[Cost differentials create opportunity]]></summary></entry><entry><title type="html">Nearly Optimal State Merklization (in Polkadot-SDK)</title><link href="https://rob.tech/blog/nearly-optimal-polkadot-merklization/" rel="alternate" type="text/html" title="Nearly Optimal State Merklization (in Polkadot-SDK)" /><published>2024-02-11T00:00:00+00:00</published><updated>2024-02-11T00:00:00+00:00</updated><id>https://rob.tech/blog/nearly-optimal-polkadot-merklization</id><content type="html" xml:base="https://rob.tech/blog/nearly-optimal-polkadot-merklization/"><![CDATA[<p>Recently, my friend and coworker <a href="https://pep.wtf">Sergei (Pepyakin)</a> sent me an article from Preston Evans on the subject of a more optimal Merkle Trie format designed to be efficient on SSDs. The original article is <a href="https://www.prestonevans.me/nearly-optimal-state-merklization/">here</a> and I highly recommend it as background reading to this post.</p>

<div class="flex justify-center">
  <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Building state commitments is the biggest bottleneck for most blockchains today. Here at Sovereign, we’ve been working on a new design that should speed up state merklization by a factor of 10 or more.<br /><br />Why is this such a big deal?<br /><br />Creating a state commitment allows for… <a href="https://t.co/Ef1UGRs5YK">pic.twitter.com/Ef1UGRs5YK</a></p>&mdash; Sovereign (@sovereign_labs) <a href="https://twitter.com/sovereign_labs/status/1744768837982011472?ref_src=twsrc%5Etfw">January 9, 2024</a></blockquote>
  <script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> 
</div>

<p>The optimizations presented in the original post sparked a two-day conversation with Pep in which we discussed how this might be made to work with the <a href="https://github.com/paritytech/polkadot-sdk">Polkadot-SDK</a> as well. Polkadot-SDK, while it also uses a Merkle Trie to store state, was designed on a differing set of assumptions, and so the original approach would need to be adapted in order to be integrated. This post might be seen as a summary of our conversation, covering some history, some of the original optimizations, the differences in assumptions, and tweaks that may be made in order to maintain full backwards compatibility with the Polkadot-SDK. Some familiarity with <a href="https://en.wikipedia.org/wiki/Merkle_tree">Merkle Tries</a>, especially as they are used as blockchain state databases, will help in comprehending this article, but all are welcome to come along for the ride.</p>

<p>Preston’s proposed system, in a nutshell, is a new binary Merkle Trie format and database schema that is extremely low-overhead and amenable to SSDs with most (if not all) disk accesses being predictable with no other information beyond the key being queried. We’ll revisit more specifics later, though I highly recommend reading the original blog post for a high-fidelity explanation.</p>

<hr />

<p>But first, let’s cover <em>why</em> optimization of the Merkle Trie is so important. The open secret about scaling serial blockchain execution is that most of the time isn’t actually spent on executing transactions: it’s on reading and writing state data to and from disk, such as accounts, balances, and smart contract state. Merkle Tries in blockchains are not strictly necessary, but they provide a means of easily proving some of the state of the ledger to a light client. They are a key part of permissionless blockchain systems which seek to include light clients, not just full nodes, as full participants.</p>

<p>Traversing a Merkle Trie to load some particular piece of state is a log(n) operation in the number of items within the trie. Updates to the Merkle Trie are logarithmic as well. The implication is that a larger state, even if the majority of this state is dormant, makes reading and writing more expensive. This is the real reason why Ethereum hasn’t “just increased the gas limit” and why Polkadot has required deposits for all stored state: state growth is a huge problem, adds bloat to the Merkle Trie, and slows everything else down as a result. What makes Preston’s article so important is that it shows a way for us to maintain the wonderful advantages of merklized state while drastically reducing the overheads associated with previous implementations.</p>

<p>When it comes to the Polkadot-SDK, I see this design being far more useful to Parachains than the Polkadot Relay Chain itself. The Relay Chain has relatively little state, having offloaded most of its work onto System Parachains. For parachains, the benefits will come for two reasons. Reason one is that it’s more data-efficient, utilizing less of what we refer to in Polkadot-land as the Proof-of-Validity. Reason two is that (at the time of this writing), work on <a href="https://github.com/paritytech/polkadot-sdk/issues/1829">Elastic Scaling</a> is underway, and it will in theory bound the throughput of a parachain at the rate a single node is capable of processing transactions. I foresee a future for some parachains where the Merkle Trie will be a bottleneck.</p>

<hr />

<p>In 2016, the first significant project I worked on in the blockchain space was optimizing the Parity-Ethereum node’s implementation of Ethereum’s Merkle-Patricia Trie. At that time, blockchain node technology was a lot less sophisticated. We were in a friendly rivalry with the geth team, and one way we wanted to get a leg up on geth’s performance was by implementing batch writes, where we’d only compute all of the changes in the Merkle-Patricia trie once at the end of the block (actually, the end of each transaction - at that time, transaction receipts all carried an intermediate state root). The status quo was to rather apply updates to trie nodes individually as state changes occurred during transaction execution. This may be hard to believe for node developers of today, but hey, it was 2016, and it worked - and it gave our Ethereum nodes a significant boost in performance.</p>

<p>We’ve come a long way since 2016, but many of the inefficiencies of the 16-radix Merkle-Patricia trie used in Ethereum and the Polkadot-SDK still persist. They have minor differences in node encoding and formats, but function much the same. The radix of 16 was chosen because it reduced one of the biggest problems with traversing Merkle Tries: random accesses. Child nodes are referenced by their hash, and these hashes are randomly distributed. If your traversal algorithm is naive and you load each child node as you learn its hash, you end up breaking one of the first laws of computer program optimization, which is to maintain data locality. What’s even worse is breaking that law in disk access patterns. 16-radix Merkle-Patricia Tries alleviate this issue but definitely do not solve it. State Merkle Tries in Ethereum, Polkadot-SDK, and countless other protocols still work this way today, all with optimizations.</p>

<p>One of the other issues with the 16-radix Merkle-Patricia trie is that it’s very space inefficient. All else equal, the 16-trie is more efficient than its binary counterpart in the number of disk accesses that need to be made. Proving the access of a key involves sending the hashes of the up to 15 other children at every visited branch. Binary trie proofs only involve one sibling, so all the extras are additional overhead. In a world of light and stateless blockchain nodes where Merkle proofs need to be submitted over the network, sending all these extra sibling hashes is highly wasteful.</p>

<p>There have been advancements, such as the <a href="https://developers.diem.com/papers/jellyfish-merkle-tree/2021-01-14.pdf?ref=127.0.0.1">Jellyfish Merkle Trie</a> pioneered at Diem. To summarize briefly - Jellyfish is pretty clever, and allows the same trie to be represented either in a binary format (over the network), or in a 16-radix format (on disk). It has other optimizations which aim to replace common sequence patterns of nodes with very efficient representations to minimize the required steps in traversal, update, and proving. They also remark on an approach to storing trie nodes on disk which avoids as much write amplification (read: overhead) in a RocksDB-based implementation. Jellyfish is definitely an improvement, but it also makes a key trade-off: the assumption that the keys used in the state trie have a fixed length.</p>

<p>We’ll address the subject of fixed-length keys in more detail later, but it’s the root of the differences between Preston’s assumptions and the ones we’ll be working with in this post. This difference in some sense is the real subject of this article. The Polkadot-SDK has taken an alternative path down the “tech tree” stemming from the Merkle-Patricia Trie of Ethereum. While Ethereum only ever uses fixed-length keys in its state trie, the underlying data structure actually supports arbitrary-length keys by virtue of the branch node carrying an optional value. Polkadot-SDK takes full advantage of this property and may actually be the only system to do so.</p>

<hr />

<p>It’s now time to visit the optimizations and differing assumptions, and modifications that might be made to apply these same optimizations (in spirit) to the Polkadot-SDK.</p>

<p>Preston’s post makes a few implicit assumptions which I would like to make explicit here.</p>

<ol>
  <li><strong>Keys have a fixed length</strong>. As mentioned above, this is a big one.</li>
  <li><strong>Stored keys are close to uniformly distributed across the key space</strong>. This is also important, as it implies that the state trie has a roughly uniform depth at all points and therefore guesses about how long a traversal will be are likely to be accurate across the whole state trie.</li>
  <li><strong>Only one version of the trie needs to be stored on disk</strong>. I view this assumption as an implication of Proof-of-Stake, where fast (if not single-slot) finality has become the norm. In particular, this assumption makes sense for Sovereign - they target Celestia as one of their main platforms, and Celestia has single-slot finality.</li>
</ol>

<p>Polkadot-SDK has made different decisions - in some cases, slightly different, in others, wildly different. Respectively:</p>

<ol>
  <li><strong>Keys do not have a fixed length</strong>. The storage API exposed to the business logic of a Polkadot-SDK chain is a simple mapping from arbitrary byte-strings to arbitrary byte-strings. Keys are not meant to be hashes, though they are allowed to be.</li>
  <li><strong>Keys often have long shared prefixes</strong>. Chains built with the Polkadot-SDK are comprised of <strong>modules</strong>. All storage from any particular module has a shared prefix - and all storage from any particular map within a particular module also has a shared prefix. The set of these shared prefixes is relatively small. More on why in a moment.</li>
  <li><strong>Only one version of the trie needs to be stored on disk, but finality is not instant</strong>. This is a small difference and it’s fairly trivially addressable with in-memory overlays for any unfinalized blocks, but does need to be handled.</li>
</ol>

<p>Keys not having a fixed length and having long shared prefixes is a huge difference! There is a good reason for this: not having keys be uniformly-distributed enables the tree to be <em>semantically iterable</em>. In Polkadot-SDK, you can iterate the entire storage for a module, or for a particular mapping within a module. This is a key part of what enables Polkadot-SDK’s to support trustless upgrades: you can deploy migrations that automatically iterate and migrate (or delete) entire swathes of storage without needing anyone to freeze the chain, generate a list of keys to alter or delete, and so on. State systems where the keys for related storage entries are dispersed throughout a forest of other, unrelated keys cannot provide any automated migration mechanisms. This is a very intentional product decision, but it makes Preston’s proposal incompatible in a couple ways.</p>

<p><img src="/assets/images/merklization_diagrams/uniform_vs_shared_1.svg" alt="" /></p>

<p>Here we visualize the difference between a trie containing uniformly distributed keys versus a trie with long shared prefixes and variable-length keys.</p>

<hr />

<p>Without going into the details of the original approach yet (and I’d encourage having that article open as a reference), I’ll lay out two of the properties that are crucial to making it fast. The first is that nodes have <em>extremely compact</em> and <em>consistent</em> representations on disk. The second is that all of the information needed to update the trie can be loaded by simply fetching the data needed to query the changed keys.</p>

<p>Each node has a representation which occupies only 32 bytes: it’s a hash, with the first bit taken as a “domain separator” to indicate whether it’s a branch or a leaf. Nodes are stored in fixed-size groups that cover predictable parts of the key space as a means to optimize SSDs. Nodes are stored in pages of 126 nodes, with 32 bytes for each node and 32 bytes for the page’s unique identifier, for a total size of 4064 bytes. This is just 32 bytes shy of 4096 bytes - many, though not all, SSDs work on 4096-byte pages, so this maps very well onto the physical layout of SSDs. Since the pages needed are predictable from the key itself, all of these pages can be pre-fetched from an SSD in parallel and then traversed. No hopping around.</p>

<p><img src="/assets/images/merklization_diagrams/page_1.svg" alt="" /></p>

<p>This diagram shows a scaled-down version of the page structure from the proposal.</p>

<p>When keys are fixed-length, value-carrying nodes can never have children - they are always leaves. Therefore, to query a value stored under a key, you must load all the nodes leading up to that key. Due to the page structure, this also implies loading that node’s siblings, as well as all the sibling nodes along the path. Having the path and all the sibling nodes to a key, or set of keys, is all the information that is needed to update a binary Merkle trie.</p>

<p>One problem that arises in binary merkle tries due to long shared prefixes is long traversals, assuming that your only two kinds of nodes are leaf nodes and branch nodes. Many Polkadot-SDK keys start with 256-bit shared prefixes - with these usage patterns, you’d traverse through 256 layers of the binary trie before even getting to a differentiated part. Luckily, Ethereum and Polkadot-SDK have already worked around this issue with an approach known as <strong>extension nodes</strong>. These nodes encode a long shared run of bits which all descendants of the node contain as part of their path. These work slightly differently in Ethereum and Polkadot-SDK, but achieve the same effect. Systems like Jellyfish have gotten rid of them entirely, because if your keys are uniformly distributed the odds of having long shared prefixes in a crowded trie are pretty small. But in the Polkadot-SDK model extension nodes still make sense.</p>

<p>These properties to uphold, assumptions to relax, and usage patterns to support let us finally arrive to a sketch of the solution. First, we will turn our variable-length keys into fixed-length keys with a <strong>uniform and logically large size</strong> with an <strong>efficient padding mechanism</strong>. Second, we will <strong>introduce extension nodes without substantially increasing disk accesses</strong>.</p>

<hr />

<h3 id="padding-bounded-length-keys-to-fixed-length-lookup-paths">Padding Bounded-Length Keys to Fixed-Length Lookup Paths</h3>

<p>Turning variable-length keys into fixed-length lookup paths in the general case is impossible. However, if we can assume that all keys we use are less than some fixed length, then this problem is tractable.</p>

<p>Storage keys in Polkadot-SDK are often longer than 256 bits. They are definitely less than 2^32 bits long, and in all likelihood always less than 2^12 bits long. Let’s assume some generic upper bound 2^N and further assume that all keys used have length at most 2^N - N bits.</p>

<p>Our goal is to create an efficient padding scheme that pads bit-strings that have length at most 2^N - N into unique bit-strings that are exactly length 2^N. We want to do this while preserving the initial key and only appending.</p>

<p>We can do this with the following algorithm:</p>
<ol>
  <li>Take the original key and append <code class="language-plaintext highlighter-rouge">0</code>s to it until its length is divisible by N</li>
  <li>Append the length of the original key represented as an N-bit number</li>
  <li>Append <code class="language-plaintext highlighter-rouge">0</code>s to it until its length equals 2^N</li>
</ol>

<p><img src="/assets/images/merklization_diagrams/padding_1.svg" alt="" /></p>

<p>The diagram shows the case where N=4. In practice a larger N should be used.</p>

<p>This mapping gives us 2 desirable properties:</p>
<ol>
  <li>No ambiguity. There are no 2 inputs which give the same output.</li>
  <li>The input key is kept as a prefix of the generated one. This preserves the shared prefixes in the original input keys that allow for iteration under shared prefixes.</li>
</ol>

<p>However, it does not preserve a strict lexicographic ordering. It is almost perfect, but the lexicographic order when one key is a prefix of another is not preserved. If there are two keys, A and B, where A is a prefix of B, it is possible that pad(B) &lt; pad(A). For example, with N=4, <code class="language-plaintext highlighter-rouge">111</code> would be padded to <code class="language-plaintext highlighter-rouge">1110_00110_0000_0000</code> and this will be sorted after the padded version of <code class="language-plaintext highlighter-rouge">11100000</code>, which is <code class="language-plaintext highlighter-rouge">1110_0000_1000_0000</code>.</p>

<p>If we were to put the length at the very end of the padding instead of directly after the initial key, we’d have preserved a full lexicographic ordering. But it would also lead to pairs of keys like <code class="language-plaintext highlighter-rouge">111</code> and <code class="language-plaintext highlighter-rouge">1110</code> being converted to lookup paths with extremely long shared prefixes - implying longer traversals. In practice this might be acceptable, as Polkadot-SDK doesn’t produce these types of pairs storage keys. But it’d have very bad worst-case scenario, even with extension nodes.</p>

<p>While we are then dealing with a trie which in theory has 2^N layers, we will in practice never have keys or traversals anywhere near that long and will encounter leaf nodes much earlier in our traversal.</p>

<p>The structure of a leaf node in the original proposal and this modified version is exactly the same, with one semantic difference. Our data structure maps bounded-length keys to fixed-length lookup paths. Since the key length is not fixed, the encoding of our structure has a variable length. The last 32 bytes are the hash of the stored value, and the preceding bytes are the key itself.</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">Leaf</span> <span class="p">{</span> <span class="n">key</span><span class="p">:</span> <span class="p">[</span><span class="nb">u8</span><span class="p">],</span> <span class="n">value_hash</span><span class="p">:</span> <span class="p">[</span><span class="nb">u8</span><span class="p">;</span> <span class="mi">32</span><span class="p">]</span> <span class="p">}</span>
</code></pre></div></div>

<p>There would be no point in actually constructing the extremely long padded key in memory or hashing it, as our padding mechanism introduces no new entropy and simply maps keys from smaller spaces onto a larger one. They are only lookup paths, not logical keys.</p>

<p>Note that while it is theoretically possible to invert the mapping and go from one of our padded strings to its shorter representation, this is computationally intensive. So there is one other downside to this approach: if you give someone a path to a leaf node, but don’t provide the leaf node or original key, it’s hard to know which key this is proves membership of in the state trie. I don’t believe this is a major issue, as it’s more typical to prove to someone which value is stored under a key rather than prove that a value is stored for this key. If that’s needed, you can just provide the original key along with the nodes along the longer padded lookup path.</p>

<hr />

<h3 id="introducing-extension-nodes">Introducing Extension Nodes</h3>

<p>To handle the case of long shared prefixes in storage keys, we will introduce extension nodes which encode long partial lookup paths.</p>

<p>The first challenge to solve in introducing extension nodes is to add a third kind of node, beyond branches and leaves. This requires a change in how we represent the nodes, so we can distinguish extensions from branches and leaves. Most trie implementations encode the type of node with a discriminant preceding the encoded value of the node, by using code that looks like this:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">enum</span> <span class="n">Node</span> <span class="p">{</span>
    <span class="nd">#[index</span> <span class="nd">=</span> <span class="s">"0"</span><span class="nd">]</span>
    <span class="nf">Branch</span><span class="p">(</span><span class="n">left_child</span><span class="p">,</span> <span class="n">right_child</span><span class="p">),</span>
    <span class="nd">#[index</span> <span class="nd">=</span> <span class="s">"1"</span><span class="nd">]</span>
    <span class="nf">Leaf</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">),</span>
    <span class="c1">//...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Preston’s proposal implements this differently: the type of node is encoded by its hash rather than the value. One bit is taken from the beginning of the hash - if it’s a <code class="language-plaintext highlighter-rouge">0</code>, the node referenced by the hash is a branch. If it’s a <code class="language-plaintext highlighter-rouge">1</code>, the node referenced by the hash is a leaf. This is referred to as a “domain separation”.</p>

<p>Since the hash function we’d use in the trie is cryptographic, taking 1 bit from a 256-bit hash for domain separation doesn’t meaningfully impact security. To domain-separate out 3 (or 4) kinds of nodes, we’d need to take 2 bits from the hash function output. This also does not impact security meaningfully.</p>

<p>So to add a 3rd kind of node, we extend the domain separation to 2 bits with the following schema:</p>
<ol>
  <li>If the hash starts with <code class="language-plaintext highlighter-rouge">00</code> the node is a branch.</li>
  <li>If the hash starts with <code class="language-plaintext highlighter-rouge">01</code> the node is a leaf.</li>
  <li>If the hash starts with <code class="language-plaintext highlighter-rouge">10</code> the node is an extension.</li>
  <li>The hash cannot legally start with <code class="language-plaintext highlighter-rouge">11</code>.</li>
</ol>

<p>The one exception here is that if the hash is zeroed-out completely, it denotes an empty sub-trie.</p>

<p>But what is an extension node? Its logical structure will be this:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">Extension</span> <span class="p">{</span>
    <span class="n">partial_path</span><span class="p">:</span> <span class="p">[</span><span class="nb">u8</span><span class="p">;</span> <span class="mi">64</span><span class="p">],</span>
    <span class="n">child_1</span><span class="p">:</span> <span class="p">[</span><span class="nb">u8</span><span class="p">;</span> <span class="mi">32</span><span class="p">],</span>
    <span class="n">child_2</span><span class="p">:</span> <span class="p">[</span><span class="nb">u8</span><span class="p">;</span> <span class="mi">32</span><span class="p">],</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The partial path encodes some part of the lookup path which is shared by all of its descendant nodes. It is laid out like this, bitwise:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>000000101|01101000000...
</code></pre></div></div>
<p>The first 9 bits encode a length from 0 to 511. The next 503 bits encode the partial key, right-padded with 0s. Partial key lengths of 0, 1, or any value above 503 are disallowed for obvious reasons: 0 is impossible, 1 should just be a branch, and we don’t have space for anything more than 503 bits.</p>

<p>Extensions are logically branches and have two children. It would make no sense for an extension to be followed by a single leaf. Such a structure would be more efficiently represented as a single leaf node. Therefore extensions must implicitly be branches. As a side note: this is actually a major difference between the Merkle Patricia Trie in Ethereum versus the one used in Polkadot-SDK: in Ethereum extension node format allows it to be followed by a leaf, though it is never in practice. In Polkadot-SDK it is illegal.</p>

<p>The next problem to solve is how to store the data of an extension node in our page structure. The child hashes of normal branches are stored directly “below” the branch node itself - either in the next layer of the current page or in the first layer of the next page, if the branch node is at the bottom of its page. We can adopt a similar strategy: the child hashes of the extension node will be stored in the page, and the position of the page, implied by the path to the child nodes. As a result, we may have empty pages between the extension node and its children. Importantly, the storage of descendents of the extension node will never be affected by modifications to or the removal of the extension node.</p>

<p>We still need to store that 64-byte partial path somewhere. It turns out we can store it directly under the extension node. Because extensions with 1 bit are illegal, and the children of the extension are stored in the page implied by their partial key, the slots for storing the nodes directly beneath the extension will be empty. We can use these two 32-byte slots to store the 64-byte partial key. When the extension node itself is at the bottom layer of the page, the partial key will be stored on the next page. There are possible fancier schemes to avoid this, such as finding empty locations within the current page to squirrel away the partial key - spaces under leaves and empty-subtrie markers can be overwritten. But that adds loads of complexity for little real gain: when the difference in the amount of pages we need to traverse is 1, what matters is whether we can pre-fetch the required pages from the SSD more so than how many pages we load. Modern SSDs are good at parallel reading.</p>

<p><img src="/assets/images/merklization_diagrams/extension_1.svg" alt="" /></p>

<p>This diagram shows how the 64-byte child path is stored in the 2 32-byte slots directly “under” the extension node, even if that’s a separate page, and that the children of this node are stored in the same place they would be even if there were no extension nodes.</p>

<p>The last issue we need to deal with is related: the fact that extensions can introduce gaps in the necessary pages to load erodes our ability to infer which pages must be loaded. An extension node located in a page of depth 2, which encodes a partial path of length 60, would land the child nodes of the extension squarely in page 12. We can no longer just pre-fetch the first N pages as computed from the key’s lookup path and expect to find a terminal there. This is a general issue which would be a showstopper if not for the practical workload that the Polkadot-SDK imposes on the trie: there are relatively few shared prefixes, so we can just cache the “page paths” for all of them.</p>

<p>A Polkadot-SDK runtime might have 50 pallets (modules), which each have 10 different storage maps or values, for a total of ~550 common shared prefixes. When there are only a few hundred or even a few thousand shared storage prefixes, it’s pretty trivial to keep an in-memory cache which tells us which pages you need to load to traverse to the end of some common prefix. Caching becomes intractable somewhere in the millions of shared prefixes, but that is well beyond any practical use of the Polkadot-SDK. For the Polkadot-SDK workload, our SSD page-loads will be perfectly predictable just from the key. For a smart contract workload it would be better to incorporate a “child trie” approach, where each smart contract has its own state trie.</p>

<hr />

<p>To summarize: by introducing a padding scheme, extension nodes, and caching, we can create a super SSD-friendly State Trie that is compatible with the assumptions of the Polkadot-SDK. This still inherits the assumption that we only need to store one revision of the Trie on disk and so isn’t suitable for archive nodes. Migrating a running Polkadot-SDK chain to use this database would require a total storage migration to the new format.</p>

<p>Thanks to Sergei (Pepyakin) and Preston Evans for discussion and clarifications leading up to this article.</p>]]></content><author><name>Robert Habermeier</name></author><summary type="html"><![CDATA[Building upon ideas for a novel Merkle Trie database]]></summary></entry><entry><title type="html">Hybrid Chains: Every Chain Should Have Smart Contracts</title><link href="https://rob.tech/blog/hybrid-chains/" rel="alternate" type="text/html" title="Hybrid Chains: Every Chain Should Have Smart Contracts" /><published>2023-03-27T00:00:00+00:00</published><updated>2023-03-27T00:00:00+00:00</updated><id>https://rob.tech/blog/hybrid-chains</id><content type="html" xml:base="https://rob.tech/blog/hybrid-chains/"><![CDATA[<p><img src="/assets/images/hybrid-chains.png" alt="" /></p>

<p>Application-specific blockchains, or app-chains are highly uncontroversial today. The multi-chain ecosystem will evolve such that each chain specializes in one or two specific tasks, which are either user-facing or intermediate products, and applications will be composed of interactions between these chains. It’s the natural conclusion of the multi-chain vision. Right?</p>

<p>To answer this question, we have to dive into the differences between synchronous and asynchronous composability. I define composability here as the ability for two or more programs executed in one or more consensus environments to interface and interact with each other. By programs, I mean things like smart contracts or Substrate pallets or other specialized logic. Those programs are synchronously composed when the results of their interaction can be computed on a single computer without waiting on others. Programs are asynchronously composed when the results of their interaction require input from other computers.</p>

<p>Here’s a rule of thumb for making this a bit more concrete: operations executed within an atomic transaction in a single block are synchronously composed, even if the transaction touches many different components or smart contracts. Operations which touch many blockchains or state machines, even if those chains are under the same shared security umbrella, are asynchronous. These operations require logic to be triggered on other chains before the end result can be seen, and so require waiting on machines which relay and process messages on remote chains.</p>

<p>The properties of synchronous and asynchronous composability are what led us to the app-chain thesis to begin with. The benefit of synchronous composability is tight and fast integration, while the drawback is the scalability limits of a single blockchain. The asynchronous approach lets us split work across many chains and scale further, but introduces expensive round-trips and message passing as a result. We realized we were going to need many chains - whether they were sidechains, parachains, or zones, and that we could further optimize each chain for specialized use-cases, and we could then combine these chains to achieve a more efficient ecosystem.</p>

<p>The flaw in the app-chain thesis is that it neglects the value of synchronous composability and leans entirely into asynchronous composability. We’re entering into new territory, but let’s bring an analogy as a guide. The economy of a single blockchain is to a country’s economy what the multi-chain economy is to international trade. As we have seen in the development of the global economy, countries have specialized in particular fields or areas of economic development. However, even those countries which import cheap goods from countries specialized in manufacturing retain capabilities locally. And vice-versa, there is no country which has specialized entirely into one field, to the extent of not having any other economic activity locally. The same approach is likely to be the most efficient for the multi-chain economy as well: chains will specialize, but not entirely. Chains will benefit most by retaining some generalized capabilities for the benefits of synchronous composability.</p>

<p>Practically, synchronous composability is <em>instantaneous</em>. The country analogy breaks down here, because it’s akin to each country having a teleportation service which only works within its own borders. This property implies that no matter how good cross-chain communication gets, the benefits of synchronous composability will never be entirely eroded. Improvements in cross-chain communication will reduce the amount of generalization chains need to retain, but chains will always need to stay slightly generalized to provide the most efficiency. These <strong>hybrid-chains</strong> will allow us to reap the benefits of both. Chain economies will specialize while retaining the generality and opportunity needed for rapid growth of products and communities.</p>

<p>What the hybrid-chain approach is likely to look like in practice is application-specific logic deployed alongside smart contracts, with a limited amount of blockspace allocated to smart contracts. The beauty of embedding smart contracts alongside specialized functionality is that the specialized functionality of a chain can be exposed directly and synchronously to the smart contracts deployed there, giving the smart contracts there an edge or differentiator to smart contracts deployed elsewhere. Given that contracts would only be able to consume some of the resources per-block, the absolute costs of gas would be higher relative to chains that focus entirely on smart contracts, all else equal. But the contracts deployed there would be those that can create the most value from synchronously interacting with the specialized functionality exposed by the chain. Every chain, even app-chains, should have smart contracts.</p>

<h2 id="runtime-composition-and-innovation-arms">Runtime Composition and Innovation Arms</h2>

<p>In Polkadot-land, we refer to the ‘business logic’ of the chain as its Runtime. Chains acquire <a href="https://www.rob.tech/blog/polkadot-blockspace-over-blockchains/">blockspace</a> and the runtime determines how to allocate this blockspace - a raw material derived from decentralized trust. Runtimes written in Substrate are composed of pallets, which each provide functionality such as balance transfers, smart contract logic, or specialized functionality for particular use-cases.</p>

<p>The approach to building a hybrid-chain in a Substrate chain is simple. The first step is to build a runtime composed of pallets, some particular to a use-case, and some generalized pallets for smart contracts. The next step is just to limit the amount of blockspace that all smart contract calls in a block are allowed to use. Substrate’s Weight mechanism makes this totally configurable in the runtime side, by just having the runtime reject transactions over the limit until the next block. Block authors must respect this limit for their block to be valid.</p>

<p>Contract sandboxes can also serve as an innovation and growth engine for an app-chain, particularly those which provide intermediate products meant to be used by user-facing applications. Smart contracts deployed alongside the specialized functionality create a simple on-ramp for developers to utilize it, without having to interface with multiple chains and cross-chain messaging for a first iteration. The iteration speed of developing in smart contracts can be much faster, and these contracts can spin out into a free-standing blockchain once they have the community and user-base to do so. As examples, consider derivatives smart contracts or DAO treasury management living on a chain alongside a DEX pallet, attestation service contracts living next to an identity pallet, or chains deploying “embassy” smart contracts on another chain to manage its affairs there in a synchronous way.</p>

<p>Each app-chain can build out its own innovation hub and onboard developers directly onto the application, lighting a path for them to grow and scale. Substrate pallets are a powerful tool, but their power is enhanced by using them as building-blocks for user-deployed code running in the same synchronous environment. Asynchronous composability lets us scale and expand, and synchronous composability lets us innovate, experiment, and grow.</p>

<p><em>Thanks to Björn Wagner and Hernando Castano for discussion and feedback leading to this post</em></p>]]></content><author><name>Robert Habermeier</name></author><summary type="html"><![CDATA[An exploration of Hybrid Chains, a new approach to blockchain construction that bring the advantages of both app-chains and smart-contract chains]]></summary></entry><entry><title type="html">Polkadot: Blockspace over Blockchains</title><link href="https://rob.tech/blog/polkadot-blockspace-over-blockchains/" rel="alternate" type="text/html" title="Polkadot: Blockspace over Blockchains" /><published>2022-10-14T00:00:00+00:00</published><updated>2022-10-14T00:00:00+00:00</updated><id>https://rob.tech/blog/polkadot-blockspace-over-blockchains</id><content type="html" xml:base="https://rob.tech/blog/polkadot-blockspace-over-blockchains/"><![CDATA[<p>The blockchain and crypto technology landscape has evolved quite a lot over the last 5 years. We occupy a different world from when we first set out to build Polkadot. Though much has diverged from our original vision, many of our original theses have become canon. For example, our early bet on interoperability and cross-chain composability has progressed from theory to practice, and from speculation to fact – a multichain future is table stakes now.</p>

<p>Additionally, Parachains (as originally described, these were essentially optimistic rollups), Shared Security, and Data Availability as laid out in <a href="https://polkadot.network/PolkaDotPaper.pdf">the vision paper from 2016</a> have sent ripples through the world of ideas since, and have been a source of architectural inspiration for projects both new and old. We have progressed from a universe of a few chains to one with an abundance of chains. But our goal was never to maximize the amount of blockchains for the sake of it, but rather to maximize the amount of work a decentralized network can do – or in other words, solve for <strong>scalability</strong>. The number of blockchains is related to scalability but not identical, and the differences between the two will be clarified here.</p>

<p>When we began Polkadot, we set out to create a system maximizing transaction throughput without compromising security guarantees and censorship-resistance. This aim has not changed, but progress at the application layer now allows us to lend more color and nuance to this vision. Application and protocol developers alike face new challenges in a multi-chain world. They must balance the requirements of secure execution, censorship-resistance, usability, costs, and composability. The emerging concept of <strong>blockspace</strong> serves as an abstraction and primitive which encompasses these requirements and goals.</p>

<p>In this piece, I’ll dive deeper into the definition and qualities of blockspace and how to evaluate different blockspace offerings in the market. Furthermore, I make a case for why we are shifting our perspective from blockchains to blockspace and why Polkadot is architecturally well-suited as the strongest generalized blockspace producer.</p>

<h2 id="what-is-blockspace"><strong>What is Blockspace?</strong></h2>

<blockquote>
  <p><em>“Blockspace is the best product to be selling in the 2020s.”</em></p>

  <p>Chris Dixon, a16z, on the Bankless podcast</p>
</blockquote>

<p>Blockspace is the capacity of a blockchain to finalize and commit operations. It’s a term that’s risen to prominence lately. It requires some unpacking. In some sense, it’s the primary product of the decentralized consensus systems running today. It’s an abstraction for reasoning about what blockchains actually produce: whether it is allocated to balance transfers, smart contracts, or computation is a concern for the application layer. At a high level, blockspace is a key ingredient for unstoppable applications. Unstoppable applications rely on decentralized systems for payment, consensus, or settlement. As such, the application layer is a prime consumer of blockspace as a good. As with any business, both applications and their developers should be concerned with both the quality and availability of goods in their supply chain.</p>

<p>Blockspace is an ephemeral good. When you intend to commit an operation to a chain you need blockspace in-the-moment: not yesterday’s, not tomorrow’s. Blockspace is either utilized or it is not. When a chain runs below capacity, consensus resources are wasted on producing unutilized blockspace.</p>

<p>Ethereum was the first major innovator in blockspace offerings. By introducing a virtual machine into the protocol and metering available blockspace via ‘gas’, it allowed the blockspace within a single block to be quantitatively parceled out to programs on the basis of the amount of computation performed and storage used. Since then, many projects have embarked on a journey to expand the types of blockspace. This lens provides insight into the key differentiators between Polkadot, Ethereum, Avalanche, Cosmos, Solana, and newer projects like EigenLayer or AltLayer.</p>

<p>The blockchain scaling trilemma tells us that out of security, latency, and throughput you can only pick two under heavy load. In Polkadot, our approach at the base layer has always been to maximize both security and throughput when we are forced to make a choice. While the trilemma is helpful in evaluating the theoretical utility of a base-layer protocol, the notion of blockspace allows us to reason better about how that throughput and security are allocated to the application layer.</p>

<p>Blockspace is not a commodity but rather a class of commodities. Blockspace produced by different systems will vary in quality, availability, and flexibility. The quality of blockspace can be judged by the security guarantees that the blockchain provides - the more secure, the higher the quality. Without a supply of blockspace, applications run into congestion or downtime, leading users to experience high fees, long wait times, or front-running. Without high-quality blockspace, applications are hacked and drained: low-quality blockspace is vulnerable to 51% attacks and toxic shock. Both types of occurrences will be familiar to readers who have spent time observing the blockchain ecosystem. These characteristics of blockspace are the key factors application developers must consider when choosing where to deploy.</p>

<h2 id="characteristics-of-blockspace"><strong>Characteristics of Blockspace</strong></h2>

<p>Let’s dive deeper into the 3 main characteristics of blockspace as a good: Quality, Availability, and Flexibility.</p>

<p><strong>Quality</strong> – As with any good, quality is a major factor for consumers of blockspace to consider. High-quality goods fulfill their purpose, and the purpose of blockspace is to be converted into a permanent record of state-machine execution. Within this framing, quality is equivalent to security, in crypto-economic parlance. I will use the two descriptions interchangeably going forward. Insecure or low-quality blockspace is vulnerable to 51% attacks and consensus faults. Under the hood, security is determined by two factors: the consensus protocol which is used to secure it, and the amount of real economic security (i.e. mining power or stake) utilized in the production and commitment of blockspace.</p>

<p><strong>Availability</strong> – The availability of blockspace is determined by supply and demand. The supply of blockspace is driven by the throughput and liveness of the system producing it: blockchains that stall, halt, or require manual intervention and operation will have an intermittent supply of blockspace. Blockchains which don’t maximize throughput will cap out their supply at lower scales. Blockchains which run on insecure consensus mechanisms will deliver blockspace without strong guarantees of permanence.</p>

<p><strong>Flexibility</strong> – Flexibility is the ability of the blockspace to be used in different types of operations. Bitcoin and Ethereum blockspace is somewhat flexible, in that blockspace can be allocated to user-submitted transactions. However, Bitcoin and Ethereum have a completely transactive blockspace mechanism which can act only on user-submitted transactions. It cannot be used on proactive operations that are performed without user input. Most blockchains have not advanced beyond this reactive model. Even most rollup protocols are primarily focused on user-driven balance transfers and smart contract invocations. The transaction formats, account models, and scripting languages supported by most blockchains are limited.</p>

<p>Highly flexible blockspace focuses entirely on execution, storage, and data consumption and leaves it up to the consumer of blockspace how to allocate those base resources to reactive and proactive operations. Blockspace consumers should be able to prioritize first-class application logic relative to user-submitted transactions so they can make meaningful progress even in the absence or overabundance of user-submitted transactions. This is not to say that transactive models are bad. In fact, it’s the opposite: transactive execution models can be used with good effect in interoperation with autonomous execution models. The underlying product behind both of these is blockspace, and blockspace can only support both models when it is maximally flexible. Flexible blockspace is a prerequisite for deep blockspace markets.</p>

<p>To add more nuance, we should acknowledge the fact that modern blockchain applications are based on interoperability between state machines utilizing blockspace. Mixing low-quality blockspace with high-quality spoils entire applications and exposes users to catastrophic tail risks. If we were building a restaurant, we wouldn’t serve our customers a meal prepared mostly of high-quality ingredients mixed with a small amount of garbage. Likewise, application developers shouldn’t serve their customers and users applications composed of mostly high-security blockspace and partially low-security blockspace. The low-quality ingredient ruins the rest of the dish. In the interoperable world, applications seeking to minimize risk to their users should use only high-security blockspace to deliver an end product.</p>

<p>Modern applications need parallel blockspace with predictable and consistently high quality. Furthermore, a class of blockspace is best suited to interoperable applications when all the blockspace in the class provides homogeneous security guarantees. In essence, interoperability is asynchronous state machine composability. A reliable network of asynchronously composed state machines unlocks super-additive value, and the classes of blockspace best suited to this thesis are those which provide standard and strong guarantees of security: creating value without incurring additional risk. These classes of blockspace are said to provide shared security for all blockspace within them.</p>

<p>Scaling solutions answer the blockspace supply problem. Sharding and rollups, for example, use crypto-economics to scale by introducing proof or dispute protocols where in the default case not every validator needs to check every state transition. Scaling solutions can be coupled with shared security architectures to address both the need for supply and quality.</p>

<p>Some ecosystems are now recognizing the need for shared security architectures - but they use voluntary opt-in by validators to determine how much security different blockspace products under the shared security umbrella receive. That is a poor architecture, because it enshrines particular validators as a rent-seeking special-interest group which supplicants must appeal to in order to get their project started or adequately secured. Barriers to entry between supply and demand have the potential to reduce both the availability and quality of blockspace.</p>

<p>Application and protocol developers should pay particular attention to these three characteristics, and structure their application moreso around blockspace than blockchains or smart contracts. Decentralized applications and protocols can operate at a lower cost to their users or token holders by acquiring blockspace on demand instead of running a chain 24/7. It’s quite common for early-stage blockchains to leak a large amount of tokens to validators producing blocks with minimal underlying usage. This is a side effect of inefficient blockspace allocation, which primarily benefits validators at the expense of application developers and token holders. Cloud computing outcompeted dedicated server space because it allocated physical resources in a  more granular and adaptive manner. Similarly, blockspace-centric architectures for Web3 base layers will outcompete blockchain-centric architectures.</p>

<h2 id="polkadot-a-blockspace-centric-architecture"><strong>Polkadot: A Blockspace-Centric Architecture</strong></h2>

<p>Polkadot’s consensus system is, at its heart, an efficient and flexible blockspace generator. Like modern CPUs, the Polkadot network is a multi-threaded machine. This system is based around a single primitive: the <strong>Execution Core</strong>. Each core can execute one block from a state machine at a time. The network makes use of its resources in the form of validators and bonded stake to expose the maximum number of cores at any time. Due to the efficiency gains of Polkadot’s architecture, Polkadot’s validators are able to transform the real resources they consume into more blockspace than simply by running more standalone blockchains with the same staked value. Shared security on its own is not enough to build an effective blockspace producer. Shared security guarantees homogeneous <strong>quality</strong> of blockspace. It must be coupled with a scaling mechanism to guarantee <strong>supply</strong>.</p>

<p>As a blockspace producer, Polkadot does its best by opening up its services to the maximum number of users possible. Because Polkadot uses WebAssembly and a virtual machine architecture, <strong>Polkadot blockchains don’t need to convince validators to run their software</strong>. Like smart contracts, the only requirement is to post code on-chain and acquire blockspace.</p>

<p>Polkadot validators have no choice in which blockchain they are required to work on at any given moment - the only thing that matters to them is which core they’re assigned to at any time, and the corresponding blockchain scheduled on that core. Polkadot validators are pure service providers. They are not opinionated. They work on what the market tells them to work on. Purchasers of blockspace in Polkadot have a guarantee that validators will hold up their end of the bargain without any human intervention, and validators which do not do so will be missing out on rewards.</p>

<p>It’s fair to consider Polkadot a rollup protocol. However, unlike rollup protocols based on smart-contract systems, the rollups are enshrined in the base layer logic via Execution Cores. When rollups are built on top of a smart-contract layer, the system, to some extent, devolves into ‘every rollup for itself’, as they compete for gas, validators, inclusion, and scheduling. The transactive gas-based blockspace at the base layer is not best suited to allocating blockspace to rollups. In order to provide consistent guarantees about scheduling, security, and supply, we are building a specific system which is modular where modularity counts: at the application layer.</p>

<p>The architectural distinction between the Execution Cores and the actual blockchains or state machines which run upon them is of crucial importance. We see little value in maximizing the number of blockchains; this is only a proxy for what matters most: maximizing secure blockspace. Execution Cores are the engine of blockspace production and the scheduling rights to those cores open a design space for <strong>blockspace allocation products</strong>.</p>

<h2 id="mechanisms-for-allocating-blockspace"><strong>Mechanisms for Allocating Blockspace</strong></h2>

<p>Efficient blockspace allocation is critical. Usage patterns of blockchains are not consistent. Blockchains experience periods of heavy load and congestion as well as periods of under-utilization and emptiness. On the one hand, applications should be able to adapt to periods of heavy load. On the other hand, applications should not pay for blockspace they are not using. The product design space here is underexplored, but Polkadot’s architecture is uniquely amenable to improving the market’s offerings.</p>

<p>One parallel for thinking about the design of blockspace allocation products is the cloud computing market. Cloud computing business models often have two key features: reserved instances and spot instances. Reserved instances are cheaper but guaranteed for a prolonged period of time. Spot instances are more costly, available on-demand, and ephemeral. Applications with predictable load will save money by purchasing reserved instances but can scale to meet demand without service outages by utilizing spot instances. However, reserved instances also represent a commitment - the application operator is wasting money if real demand for the application falls below the reserved supply of cloud compute resources.</p>

<p>Let’s make this concrete. Long-term slots are the only current mechanism for allocating Polkadot’s Execution Cores. These are akin to reserved instances, allocated either by governance or by slot auctions: the winners earn a dedicated Execution Core for a predetermined time of 6, 12, 18, or 24 months. Parathreads, which we<a href="https://polkadot.network/blog/parathreads-parathreads-pay-as-you-go-parachains/"> first introduced as a concept in 2019</a>, are pay-as-you-go blockchains. Parathreads are like spot instances. Our current thinking is for this spot price to be set using an optimal controller: in simple terms, the price will go up when the cores exposed by Polkadot for parathreads are saturated and the price will go down when there are empty cores. This is just one further example of how blockspace can be allocated.</p>

<p>We can take this concept of allocating execution cores further. Polkadot’s architecture is such that a single chain can have multiple cores allocated to it simultaneously - imagine a blockchain that instead of advancing by 1 block at a time, advances by 2 or 3. This is possible, and is due to particularities of Polkadot’s design that allow for validation of sequential state transitions in parallel. In practical terms, the number of cores a chain can efficiently occupy is limited only by the number of cores it can acquire at a time and the rate at which the chain can produce blocks. We expect that as this market matures there will be a wave of innovation in block generation for Polkadot chains to maximize the utility of this feature. We expect that even chains with a simple sequential block authoring method, such as is currently available, would be able to make good use of 2 or 3 cores.</p>

<p>As an example of how multiple cores might be used by a single chain, we could introduce another type of blockspace allocation product: short-term auctions. These auctions would fall somewhere between pure spot allocation and long-term reserved allocation. It is certainly possible to design an auction mechanism for allocating slots for short durations - such as one hour, one day, or one month. This could be used for something I’ve been terming “Parachain Boost” - the ability for blockchains to expand their throughput during periods of heavy load, like a highway that gets wider during rush hour.</p>

<p>Furthermore, by changing our perspective from blockchain-centric to blockspace-centric it becomes clear that there is no reason a blockchain or state machine should run forever. Ephemeral blockchains are an interesting use-case I believe is highly under-explored - longer running processes should be able to offload their computations or protocols to short-lived chains just like programs running on a PC can offload computations or work to background threads.</p>

<p>One final avenue we can pursue for execution core allocation is the ability to transfer claims on execution cores. This will create a secondary market for Polkadot blockspace: chains will be able to trade extra capacity with each other and act as re-sellers for blockspace. Chains experiencing lower or higher demand than anticipated will be able to adapt accordingly or perhaps even speculate on future demand of blockspace.</p>

<h2 id="reframing-the-meaning-of-blockchain"><strong>Reframing the Meaning of Blockchain</strong></h2>

<p>In my opinion, the blockchain ecosystem has been thinking too small about the multi-chain world. Blockchains that start and run indefinitely with a steady pulse are an evidently inefficient mechanism. The multi-chain of tomorrow consists of blockchains that scale and shrink on demand. It contains ephemeral chains spawned by on-chain factories, spun up by contracts and imbued with automated purpose - to complete their work a few hours later and disappear. Our goal in Web3 is not to maximize the number of blockchains. Maximizing the number of blockchains is something I would state explicitly as a non-goal, as it primarily benefits validator cartels seeking to extract value. Our goal in Web3 is to maximize the amount of blockspace that exists and ensure it is allocated to the state machines which need it most at any time: a constant generation and allocation of global consensus resources to those who need it the most. An enterprise without waste. In other words: the most effective blockspace producer in the world.</p>

<p><em>Thanks to Fabian Gompf, Pranay Mohan, Björn Wagner, and Gavin Wood for review, edits, and discussion.</em></p>]]></content><author><name>Robert Habermeier</name></author><summary type="html"><![CDATA[A deep dive into the new concept of blockspace, how to evaluate different blockspace offerings, and how Polkadot navigates the efficient frontier of blockspace production]]></summary></entry><entry><title type="html">Rob for Council</title><link href="https://rob.tech/blog/rob-for-council/" rel="alternate" type="text/html" title="Rob for Council" /><published>2020-11-21T00:00:00+00:00</published><updated>2020-11-21T00:00:00+00:00</updated><id>https://rob.tech/blog/rob-for-council</id><content type="html" xml:base="https://rob.tech/blog/rob-for-council/"><![CDATA[<p><strong>I’m running for the Polkadot council. My address is 13Gdmw7xZQVbVoojUCwnW2usEikF2a71y7aocbgZcptUtiX9.</strong></p>

<p>My address has the verified identity “ROB” on the Polkadot mainnet.</p>

<p><img src="/assets/images/rob-council.png" alt="" /></p>

<p>I’ve had my candidacy up for some time, but haven’t focused strongly on securing a seat. This post represents my commitment to joining the Polkadot council, and my request for your support.</p>

<p>First, a bit of context on why I want to join. I’ve been in the Polkadot ecosystem quite literally from day 1, as a co-founder of the network. I made the first commit to the codebase and built most of the consensus and staking logic for Polkadot. For the past year, I’ve been focusing exclusively on designing and building parachains, which many consider to be Polkadot’s core feature. Before that, I wrote most of the BABE and GRANDPA consensus implementations that power the Polkadot and Kusama networks stably with hundreds of globally distributed validators.</p>

<p>Beyond the core technology, Polkadot has incredible potential to enact change, power research, and accelerate the growth of the blockchain space. My core values in this regard are privacy, liberty, and community. I don’t believe that freedom implies a degradation to the Hobbesian jungle. And I believe that the place where blockchain technology can take us will allow people to abstract over most of the complexity of everyday life and focus on the things that matter. The governance and treasury systems of Polkadot enable us to identify and fund key initiatives to increase the value provided by the network. Long term I’m interested in reducing the participation of the council in favor of automated mechanisms and incentives that accomplish the desired goals.</p>

<p>My long-term outset is driven by a vision of a transparent society better able to utilize its human resources and better able to value contributions of many kinds. In the near-term, I plan to focus on user adoption, developer adoption, privacy technology, and new economic primitives.</p>

<p>I’m a crypto native who’s been with Polkadot since day 1. I’ve been here since before crypto was mainstream and I’m here for the long-term. I’m comfortable navigating the murky depths of the future and drawing a map as I go. I’m a developer with a commitment to quality, thoroughness, and practicality. I’m a generalist comfortable in business and academic circles. I’m a citizen of the West, and I can see the damage done by malfunctioning institutions, entrenched special interests, and civic disengagement. The crypto space, and Polkadot in particular, is capable of doing better.</p>

<p><strong>Vote Rob for Polkadot Council</strong></p>]]></content><author><name>Robert Habermeier</name></author><summary type="html"><![CDATA[I’m running for the Polkadot council. My address is 13Gdmw7xZQVbVoojUCwnW2usEikF2a71y7aocbgZcptUtiX9.]]></summary></entry><entry><title type="html">Blockchains, Rules, and Reciprocity</title><link href="https://rob.tech/blog/blockchains-rules-reciprocity/" rel="alternate" type="text/html" title="Blockchains, Rules, and Reciprocity" /><published>2020-09-14T00:00:00+00:00</published><updated>2020-09-14T00:00:00+00:00</updated><id>https://rob.tech/blog/blockchains-rules-reciprocity</id><content type="html" xml:base="https://rob.tech/blog/blockchains-rules-reciprocity/"><![CDATA[<p>The base level of human society functions due to the principle of reciprocity: acting benevolently towards others without any direct expectation of a reward with the understanding that the favor will be returned by someone, someday. Alexis de Tocqueville, within his 1835 study of American Democracy, referred to this as “self-interest, properly understood”. When reciprocity is applied broadly, everyone benefits.</p>

<p>For various reasons, the social fabric of the western world has worn thin over the last century. Warm reciprocity has been replaced by cool trust mediated ever more strongly by lawyers and governments. The costs of engaging lawyers and lobbyists, however, are so high as to be primarily available to large corporations, financial institutions, and other special interest groups.</p>

<p>The key and initial appeal of blockchains is democratizing and distributing access to rules and binding them to economic value. Anyone can minimize a required level of trust for any particular transaction.</p>

<p>Regardless, it’s not possible to cover every possible interaction or event with a system of rules. This is one of the primary weaknesses of civil, statutory law systems, which attempt to do exactly that.</p>

<p>We should not forget that blockchains are embedded in a broader social context, and involve a broad social contract of running nodes and respecting the rules. Beyond the rules of the chain, there is a collective of participants in the network. The rules of the chain themselves are always secondary the network they govern.</p>

<p>For any sufficiently large social organization to function, it must acknowledge the paradox of rules vs. discretion and employ a mechanism for representative human decision making. Discretion, when wisely exercised, can counteract societal forces that place disintegrative pressure on the organization. This means having a human constitution and human governance, with norms and checks and balances and representation of interests.</p>

<p>Computers and automation aren’t an escape hatch. They lead only to a labyrinth. There is an unfortunate appeal of automation, and in particular economic automation, that leads us to believe we can remove the human element entirely. This is a very dangerous line of thinking. They say the devil is in the details. A fractal carpet of rules will never cover all the surface area. It will leave us only with darker, deeper pitfalls. Technology exists as a tool to augment and extend our human capabilities, not to replace them.</p>

<p>Much more important for societal progress than the rules of a system are the mindset and values that the participants embody. Historically, the most successful societies have been those which had a high capability for social coordination. Social coordination arises naturally from a shared faith and kinship: a collective understanding that we are united in one way or another.</p>

<p>Lastly, I’d like to leave you with the basic thought that one incredibly important, unique, and powerful ability of humans is to punch through paradoxes and encounter reality. Explicit procedures for human governance are a pre-requisite for building an organization capable of expressing faith in humanity and building a collective of people who are empowered to care for each other.</p>]]></content><author><name>Robert Habermeier</name></author><summary type="html"><![CDATA[An examination of blockchains in the context of human society, rule-based systems, social paradoxes, and law]]></summary></entry></feed>