Part 1: Overview of Digital Asset Data and Infrastructure
2023-03-06 07:47:33 UTC
Dozens of layer-1 and layer-2 platforms have expanded the breadth of the blockchain landscape. Digital asset activity spans a wide range of disciplines, including payments, decentralized finance (DeFi), and NFT issuance and trading. Platforms enabling these use cases spring forth from a number of development environments, each with different methods for deploying applications and storing their data.
This Cambrian explosion of on-chain activity adds significant complexity to digital asset data and infrastructure. It also presents opportunities. A comprehensive taxonomy of blockchain data fields is being created in real-time by digital asset data firms as composable, and increasingly complex digital protocols continue to emerge. The demystification of the vast sea of digital asset data is unlocking value for the investment community, regulatory bodies, and financial institutions alike.
Likewise, infrastructure-as-a-service firms enable market participants to overcome the high technical and financial barriers associated with operating blockchain networks. Although different blockchains have different infrastructure requirements, they all require around-the-clock operation as digital assets do not have an 'off switch'. Powering these blockchain platforms and the applications built on top of them requires technical expertise and enterprise-grade risk management processes. Digital asset infrastructure providers that offload these tasks from builders and users have emerged as critical – though sometimes overlooked – service providers.
The following section of this report provides an overview of the market segments that compose the digital asset data and infrastructure landscape. It also highlights key developments and considerations for firms that conduct business within their respective verticals.
Digital Asset Data
The Role of Data Providers in the Digital Asset Data Lifecycle
Data providers play a crucial role in the value chain of digital asset data by acting as a bridge between (i) the raw data generated by digital asset trading venues and other on-chain protocols, and (ii) data consumers that employ digital asset data for a range of use cases.
At the highest level, data providers start with harvesting raw data from off-chain and on-chain sources spanning centralized digital asset exchanges (CEXes), DeFi protocols, layer-1 and layer- 2 blockchain networks, and NFT marketplaces. For example, obtaining data from CEXes is relatively straightforward via established interfaces. In contrast, extracting data from multiple decentralized exchanges (DEXes) is resource-intensive and requires data firms to run (or obtain access to) blockchain node infrastructure.
This data is then cleaned, refined, and standardized, tailoring it to end-users' requirements. For example, since automated market maker (AMM) DEXes bear no resemblance to traditional order book matching exchanges, data providers play a key role in normalizing DEX data such that it can be analyzed in a similar format as CEX data.
Finally, the data is delivered to end-consumers via a variety of methods suitable for specific data formats and customers. End-users ranging from financial institutions to research organizations consume the data for a variety of use cases. For example, financial institutions require high-quality data feeds to make investment decisions and manage their portfolios. Their sophisticated models require real-time and historical tick-level trade data as well as order book data as inputs. This allows them to analyze metrics such as liquidity across CEXes and DExes, and dynamics between different futures and spot markets. Further use cases for digital asset data end users are explored throughout the following parts of this report.
What are Key Methods/Formats for Digital Asset Data Delivery?
Different users and their use cases may require different means to access data. For example, while sophisticated investors may require highly granular historical and real-time data (e.g. order book data) via application programming interfaces (APIs), forensic analysts may prefer dashboard solutions that visualize, for example, the provenance of a user's funds. The most relevant delivery methods for digital asset data consist of:
- Cloud-Based Delivery: Storing historical order book data can require petabytes of storage space, thus resulting in onerous infrastructure requirements for end users. This necessitates data delivery (into cloud buckets set up by customers) via cloud solutions such as Amazon S3 (Simple Storage Service), Azure Cloud, or Alibaba Cloud.
- Comma Separated Values (CSV): CSV is a format that enables data storage in structured tables for use in applications such as Microsoft Excel. It is typically used for historical data and is updated daily. CSV files are easily accessed for historical analysis, but managing several files for the same data (i.e., different data vintages) is inefficient. The CSV format is also not suitable for real-time and high-frequency trade data.
- Dashboards: Dashboards are popular tools for visually analyzing data. One considerable drawback of dashboard-based data delivery (when the dashboards are not user- generated) is that it adds friction when users want to manipulate data for analyses. Institutions that build proprietary models and frequently backtest their investment strategies are unlikely to opt for dashboards as the only form of data delivery.
- FIX Protocol: The Financial Information eXchange (FIX) Protocol is a vendor-neutral open message standard for trade-related data delivery. The FIX Protocol standardizes the communication of capital markets data such as execution reporting, order submissions/changes, and trade allocation.
- Proprietary data delivery methods: In combination with the standard data delivery methods listed here, several digital asset data providers have developed proprietary methods that aim to provide clients with higher levels of customizability while minimizing infrastructure costs. Kaiko Stream and Delta Sharing are two such examples that digital asset data companies Kaiko and Nasdaq (in partnership with Amberdata) provide, respectively. With these solutions, clients can filter data (i.e., choose which data points they are interested in) at emission, instead of receiving data and then filtering it, allowing them to reduce infrastructure requirements.
- Representational state transfer (REST) APIs: APIs are definitions and protocols that allow computer programs to communicate with each other. REST APIs conform to a specific design architecture. Most data providers offer data through APIs that are easily integrated into third-party applications. Data that frequently updates, such as market data, on-chain data, DeFi data, derivatives data, and analytics are available on a real-time and historical basis through REST APIs. However, clients using APIs need to frequently send data requests to the server to continuously receive (real-time) data updates.
- WebSockets: WebSockets are bi-directional, meaning that a server can push data without clients making requests, making them ideal for real-time data delivery. Both clients and servers can send and receive messages via WebSockets as long as the connection is open. Maintaining the connection, however, can be resource-intensive and difficult to scale.
In addition to these delivery mechanisms, data providers also use blockchain oracles as a means to deliver data (typically to applications on-chain). Different oracle solutions are explored in Appendix A of this report. Given the immense quantity of data emanating from the heterogenous digital asset ecosystem, it is sensible to categorize data into separate classes.
What Classes of Data do Companies Provide?
This research report segments the digital asset data landscape into three distinct, though not mutually exclusive, categories:
(i) market data: relates to data that is primarily used by market participants to make investment decisions. Market data pertains to all data emanating from digital asset trading venues including spot and derivatives markets. This data includes, for example, trade data and order book data.
(ii) on-chain data: relates to data that captures the activity occurring on blockchain platforms and the applications that reside on top of them. Every confirmed blockchain transaction and every pending transaction in mempools falls within the realm of on-chain data. For example, data dashboards use on-chain data, such as the total number of active addresses on a layer-1 network or the total value locked (TVL) in a protocol, to gauge a blockchain's activity.
(iii) forensics and market surveillance data: relates to data used to identify and eradicate illicit activity, including use cases such as money laundering and market manipulation.
The following subsections explore each category in more detail.
"Clean and reliable crypto prices are essential to institutional investors and enterprises holding or trading digital assets, as they require independent and accurate tools to assess their custody. This is only the beginning, as [enterprise-grade price and reference] rates will pave the way to develop multi-asset indices tailored to crypto assets." - Ambre Soubiran, CEO of Kaiko (Kaiko Blog, October 2022)
Unlike traditional finance (TradFi) markets, where a designated handful of venues (e.g., NYSE, NASDAQ, CBOE, and CME in the United States) generate the vast majority of market data, dozens of centralized and decentralized exchanges generate digital asset market data. Sourcing and standardizing raw digital asset market data is complex and resource-intensive. Hence, digital asset data providers that handle the entire data value chain from procurement to refinement to delivery have emerged as mission-critical for financial institutions.
How Large is the Centralized Digital Asset Exchange Market?
The vast majority of digital asset trading still occurs on CEXes (e.g., Binance, Coinbase). Trading volumes on these CEXes have grown significantly in the past few years – on a quarterly basis, volumes have increased from ~$100 billion in Q1-19 to ~$1.6 trillion in Q4-22. While these volumes still pale in comparison to traditional spot equity exchanges such as the NYSE, digital asset exchange volumes have increased considerably across a fragmented landscape of ~30 venues.
The market data emanating from these exchanges can be further segmented into four major classes:
● Aggregated data is collected over different time intervals across venues. Examples of aggregated data include open, high, low, and close prices and volumes for a given interval (OHLCV), reference rates, and volume-weighted average prices (VWAP).
● Trading data consists of executed trades such as quantity, price, and timestamp. Trading data is critical for market analysis, profit and loss analysis, audit, and tax compliance.
● Order book data is the most granular form of data that can be used to analyze liquidity at different exchange venues. Smart order routers scan for order book data before determining the optimal venue and path for executing a trade. Typically, order book data gives insight into the following variables:
○ Market depth represents the total value of bids and asks on either side of the current market price.
○ Bid-ask spread is the difference between the highest bid and lowest ask price.
○ Slippage is the difference between a trade's expected price and the actual price at which the trade is executed.
The storage of order book data is very resource-intensive. Thus, exchanges do not typically retain complete records of all order book data. However, for some financial institutions, such as quantitative trading funds and institutional liquidity providers, historical order book data is a critical input for investment decisions.
● Derivatives data consists of futures, options, and swap data. Examples of derivatives data include open interest (OI), volume, liquidations, funding rates, expiry-related data, and options-specific data such as put-call ratios, realized versus implied volatility, and options greeks.
"Accounting for every address, every wallet, is massively complex, and a financial institution would need to spend millions of dollars and invest years of time just to learn how to do this properly." – Shawn Douglass, CEO and co-founder at Amberdata (Hedgeweek Interview, August 2022)
The transparent and public nature of blockchain data is what differentiates the digital asset industry from its TradFi counterpart. With on-chain data, fundamental network metrics (e.g., account balances, DEX trading activity, fund flows, token supply, transaction fees, yields, etc.) can all be monitored in real time and used as an input for decision making. In contrast to CEX data which is stored off-chain in centralized databases, on-chain data is persistently generated and publicly available.
How is On-chain Data Used?
On-chain data has a plethora of applications including, but not limited to, the following use cases:
● Analyzing fundamental usage of different layer-1 and layer-2 networks. Metrics such as active addresses, on-chain value settled, transaction fees, and TVL provide a "look under the hood" of different networks.
● Analyzing the profitability of participating in DeFi protocols. Metrics such as nominal staking rewards, trading fees on DEXes, liquidity on DEXes, and collateralization ratios of lending protocols all provide insight into the opportunities and risks associated with participating in DeFi.
● Assessing the financial standing of digital asset intermediaries. While a formal framework for proof-of-reserves is still being developed, on-chain data can provide insight into the digital asset holdings and, by extension, the financial stability of centralized digital asset companies.
● Monitoring the fund flows of major digital asset market participants. On-chain data can be used to track the holding patterns of digital asset investment firms, miners, and the general digital asset investment community.
Why is Gathering and Streamlining On-chain Data so Complex?
Processing raw blockchain data into a format that institutions and investors are familiar with is a daunting task. Different blockchains have different data structures (e.g., Ethereum's account- based data structure vs. Bitcoin's unspent transaction output (UTXO) data structure), different execution engines (e.g., Ethereum's Ethereum Virtual Machine (EVM) vs. Solana's Sealevel runtime), and even different classifications of what constitutes a transaction. Gathering on-chain data and delivering it on a block-by-block or aggregated basis requires deep technical expertise– especially when this data is collected across heterogenous blockchains.
In order to understand the on-chain data-related offerings of various data providers, three major categories of on-chain data are worth considering:
● Network data comprises raw metrics related to layer-1 and layer-2 networks, including statistics such as active addresses, mempool data, transaction counts, transacted value, and calculations of the circulating supply of native tokens.
● DeFi data comprises metrics and analytics related to applications deployed on top of different layer-1 and layer-2 networks. Such data includes, among others, liquidity pool metrics, DEX trading volume, and DeFi lending statistics.
● NFT data comprises metrics such as NFT sales, mints, and secondary market trading activity.
On-chain network data is critical for understanding the fundamental health and functioning of a blockchain. It is useful for analyses that involve network economics, usage, supply, and miner or validator metrics. For more information regarding on-chain data, please refer to The Block Research's Digital Asset Data and Infrastructure: 2021 report.
"DeFi is a radically transparent system that allows you to have visibility and build telemetry into mechanisms that incentivize the behavior of market participants." - Shawn Douglass, CEO of Amberdata, (CoinDesk Webinar - Quantifying Opportunities and Risks in Liquidity Protocols, October 2022)
As of December 2022, $54billion worth of value was locked across ~15 layer-1 and layer-2 blockchains.
DeFi protocols built on top of these layer-1 and layer-2 networks span a wide range of categories including spot and derivatives DEXes (e.g., Balancer, dYdX, Uniswap), DeFi lending platforms (e.g., Aave, Compound, Maker), and yield aggregators (e.g., Yearn Finance).
While the aggregate amount of financial activity in DeFi remains low relative to TradFi, DEXes have shown signs of sustained product-market fit and captured the attention of institutional trading firms. As displayed in the chart below, DEXes currently process ~12% of the trading volume that is processed by CEXes.
Accordingly, as will be discussed in Part 2 of this report, the raw amount of DEX data and the value which can be extracted from it has increased considerably over the past ~24 months.
While DeFi protocols are ushering in a new peer-to-peer financial paradigm, NFTs are redefining digital property rights for individuals and institutions alike. An NFT is a blockchain-based identifier (asset) that is verifiably unique and cannot be copied or substituted. NFTs hold great promise, for example in the realm of tokenizing real-world assets, but come with their own set of unique challenges. For example, given their non-fungible nature, NFTs are far more illiquid (akin to the art market) when compared to fungible tokens.
Over the past ~24 months NFTs have been employed to tokenize assets across a range of disciplines spanning art, collectibles, music, and gaming. While artists and content creators have been among the first to experiment with NFTs, institutions are well underway with integrating NFTs into their business models.
Furthermore, several corporations have already begun directly integrating NFTs into their business models. For example, in December 2021 Nike acquired NFTstartup RTFKT. As displayed in the chart below, the company has already generated ~$186 million of cumulative NFT revenue across primary market sales and secondary market royalties.
In conjunction with the secular growth of NFTs, the quantity and complexity of data surrounding them have increased meaningfully over the past ~24 months. The chart below provides one approximation of the growth of Ethereum's NFT ecosystem. It shows the number of ERC 721 and ERC 1155 contracts deployed on Ethereum - which increased ~500% year-over-year in 2022.
Notably, several other layer-1 networks have also emerged as popular venues for NFT commerce. For example, Solana saw $1.8 billion worth of NFTs trade on its platform in 2022. Unlike fungible tokens that typically follow standardized formats (e.g., ERC-20 for Ethereum- based tokens), NFT contracts lack standardization and can have several distinct properties. Therefore, generating standardized and high-fidelity NFT data across different collections and different blockchains has proven challenging and resource-intensive.
Nonetheless, several firms focussed primarily on NFT data and analytics, such as DappRadar, CryptoSlam, NonFungible.com, and icy.tools are pushing the pace of NFT data provision. These providers are discussed in Part 2 of this report.
Forensics and Market Surveillance Data
The open and permissionless nature of blockchain technology provides individuals with an alternative to centralized financial infrastructure. However, this permissionless nature also poses new risks and challenges for institutions required to comply with financial regulations which aim to identify and eradicate illicit behavior.
Accordingly, firms whose offerings facilitate transaction monitoring and market surveillance (and, by extension, help companies fulfill regulatory compliance requirements) have emerged as important players in the digital asset data landscape. This subsection provides insights into digital asset intelligence companies' use of blockchain data to uncover/track illicit activity and carry out market surveillance.
Digital Asset Data and Forensic Analysis
While the illicit use of digital assets is often sensationalized, it is nevertheless clear that financial institutions, law enforcement agencies, and regulatory bodies all require tools and data analyses to monitor the digital asset space. Public entities such as Europol, the FBI, and IRS require capabilities to track and identify illicit actors.
Financial crime in digital assets is on the rise. According to data from blockchain analysis firm Chainalysis, the value of digital assets received by illicit addresses increased from $7.8 billion in 2020 to $14.0 billion in 2021. It is worth noting that despite this rise, the share of illicit transactions in the total transaction volume is decreasing, as displayed in the figure below.
The illicit use of digital assets can be manifold, ranging from being the payment rail of choice on dark net markets to terrorist financing. Based on new types of data available on blockchains, intelligence companies provide tailored insights that help navigate this relatively uncharted environment. The table below outlines key areas of illicit use of digital assets and how crypto intelligence companies help address these by using new tools and artificial intelligence, as well as the distinct advantages of (mostly pseudonymous) on-chain data.
High-Quality Granular Crypto Data Allow for New Approaches to Tackle Crime
In principle, on-chain data are: (i) constantly available in real time, (ii) highly granular, up to the individual transaction level, (iii) 'global' in nature as they contain the interactions of all network participants and are therefore not siloed, and (iv) available forever with the blockchain serving as a "universal source of truth".
Despite unique challenges stemming from chain-hopping techniques to disguise activity, chain outages, privacy tools, and pseudonymity, these aforementioned characteristics allow for the creation of market monitoring solutions that can be valuable for entities responsible for identifying and rooting out illicit activity.
It is important to note that most digital asset activity does not take place peer-to-peer or on decentralized protocols. The largest part of this activity is still carried out on CEXes. Therefore, to get a more complete picture of digital asset activity, the data from digital asset exchanges also must be taken into account. Since CEXes increasingly seek to be regulatory compliant, they require their customers to provide Know Your Customer (KYC) information to meet Anti-Money Laundering (AML) standards. Crypto intelligence companies also help them to screen customer transactions in real time and automatically flag high-risk customers or transactions. For example, a customer's funds may be frozen for further investigation if they are found to be associated with a blacklisted address.
Even if illicit actors are skilled at covering their tracks over time, they only need to make a single mistake, such as funding wallets from an address that had previously submitted KYC documentation, to identify and assign, with high probability, all of their criminal activity. Accordingly, law enforcement agencies have had a number of successes in identifying illicit activity that would not have been possible without these novel data, techniques, and tools. From 2016 to the present, United States government spending with forensics-focused firms Chainalysis, CipherTrace, and Elliptic has increased from a negligible amount to over $50 million, cumulatively.
Digital Asset Data Open New Possibilities for Market Surveillance Tools
In addition to new tools that facilitate the identification, tracking, and eradication of criminal activity, other areas of the emerging cyber economy may benefit from new possibilities of automatic on-chain data analysis and market surveillance. One of these is regulation. Regulatory compliance could become much more efficient when using tools such as 'embedded supervision'(ES), which is defined as "a framework that lets compliance with regulatory goals be automatically monitored by reading the market's ledger, thus reducing the need for firms to collect, verify and deliver data." In October 2022, the European Commission put out a tender that closed onDecember 1, 2022, to study how to automatically monitor the Ethereum network through ES and collect regulatory data in real time.
Blockchain intelligence companies are well-positioned to develop such tools based on their know- how and existing tech stack for market surveillance. Using novel approaches, which make use of the rich data and infrastructure in the nascent crypto ecosystem, has the potential to also significantly lower the cost of compliance which can be exceedingly high. For example, the total projected cost of financial crime compliance in the United States and Canada for 2022 is a staggering $56.7bn, up 13.6% from 2021.
Lastly, while digital asset intelligence and forensic companies are helpful in using high-quality on- chain data to efficiently implement micro-prudential regulation, they may also provide supervisors with tools to monitor systemic risks of the crypto ecosystem as a whole ('macro-prudential' surveillance). For example, interconnections and exposures between protocols could be monitored in real time with on-chain data. This allows supervisors to simulate how well the digital asset ecosystem is able to absorb financial shocks.
The following section provides an overview of the areas of digital asset infrastructure that are explored in this report.
(i) Staking-as-a-service firms provide off-the-shelf services that enable users and organizations to directly participate in securing different proof-of-stake (PoS) blockchain networks through validation.
(ii) Node-as-a-service firms provide access and maintenance to shared and dedicated nodes across layer-1 and layer-2 networks.
(iii) Digital Wallet providers provide software and hardware products that enable end users to interact with blockchains and submit transactions for approval.
Before diving into the specifics of each of the above categories, the following figure shows the simple web3 infrastructure stack from a bird's eye view.
Layer-1 blockchains sit at the base of the web3 infrastructure stack. They perform critical network functions such as achieving consensus on the validity of transactions and ensuring that blockchain data remains available. Broadly speaking, staking requirements and node architecture (i.e., infrastructure requirements) are largely a function of the technical design of underlying layer- 1 and layer-2 networks. For example, whether a blockchain network is monolithic or modular impacts the computational resources required to stake (i.e., participate in validation) and operate node infrastructure. For more information regarding the network architecture and validator node requirements, please refer to The Block Research's Comparing Layer-1 Platforms: 2022 Edition report.
The term "staking" refers to locking up tokens in a smart contract to participate in blockchain networks via different methods such as block production, governance, and validation. So long as staked tokens originate from a sufficiently distributed base of token holders, the cost of attacking a PoS network (which can be viewed as a proxy for overall network security), increases with the amount of capital staked by a network's validator set. In return for locking up their capital, stakers are rewarded with native protocol tokens.
There are four widely recognized, though not mutually exclusive, avenues through which interested parties can participate in staking:
(i) Individual staking whereby users manage the staking process from end-to-end. This entails sourcing and maintaining computer hardware, running blockchain software, and depositing native tokens in smart contracts to meet network requirements.
(ii) Pure play staking firms are focused on providing individual and institutional users with staking-related services at their own facility or at co-located facilities. For example, staking firms such as Chorus One and Figment run node infrastructure on users' behalf.
(iii) CEXes which own their own staking hardware or outsource staking to a pure-play staking firm. Although most CEXes stake assets on users' behalf, firms such as Coinbase, allow individuals to access dedicated validator nodes.
(iv) Liquid staking protocols are community-owned and operated networks (e.g., Lido and Rocketpool) that are governed by decentralized autonomous organizations (DAOs) and facilitate the issuance of liquid staking tokens.
With the exception of the first avenue of staking (individual staking), all other avenues typically involve outsourcing infrastructure operation to providers such as Amazon and Google Cloud. The decision to outsource infrastructure operation should be carefully considered, as it can make staking firms vulnerable to the practices of their related infrastructure providers. For example, Hetzner, a Germany-based cloud service provider, blocked all Solana network activity on its servers in November 2022, thus causing ~1,000 Solana validators to go offline.
Major Developments in the Staking Landscape
Since 2021, there have been two major developments in the staking market. Firstly, In September 2022, Ethereum moved from proof-of-work (PoW) to PoS consensus, making all ETH (collectively worth ~$145 billion as of 12/31/2022) eligible for staking.
As displayed in the chart below, the total market cap of PoS assets (which serves as a proxy for the total addressable market for staking-as-a-service providers) has increased by a factor of ten from ~$22 billion in December 2020 to $223 billion by December 2022.
Secondly, liquid staking derivatives created a new avenue for staking by allowing users to stake their assets, yet still "re-use" them in other DeFi activities. With liquid staking derivatives, DAOs and staking-as-a-service providers source and stake PoS assets (e.g., ETH) on behalf of their users and issue synthetic tokens (e.g., Lido's stETH) against these stake assets. Accordingly, users can deploy these synthetic tokens for yield-generating strategies (e.g., trading, lending, providing liquidity on DEXes, etc.) while still receiving staking rewards.
What Role do Staking-as-a-service Providers Play?
In the context of staking ETH on the Ethereum blockchain, staking-as-a-service providers address three major issues faced by users. Firstly, they eliminate the need for users to procure and operate computer hardware around the clock. Secondly, they eliminate staking minimums, by aggregating the ETH balances of many users into batches. In the absence of this pooling, an individual user would need to own or obtain 32 ETH (which translates to an upfront investment of ~$38,400 as of 12/31/2022) to participate in staking. Thirdly, as previously mentioned, liquid staking service providers allow users to generate staking rewards yet retain the ability to re-deploy staked assets into other investment strategies.
What Risks Come with Staking?
Benefits offered by staking service providers are not entirely without risks. For example, there is no guarantee that synthetic staking tokens (e.g., stETH) will trade at the same value as their underlying staked assets (e.g., ETH) on the secondary market. In the face of adverse market conditions and forced liquidations, staking derivatives have traded at steep discounts. For example, during the Three Arrows Capital (3AC) contagion in June 2022, a liquidity crunch for stETH drove its discount to ETH as low as 7%.
Furthermore, outsourcing staking operations to a third party can also result in custody risks should users no longer have signing power over their transactions. In the wake of digital asset intermediary bankruptcies (e.g., BlockFi, Celsius, FTX, Genesis), users should be well aware that there are significant risks associated with entrusting their assets to centralized counterparties.
Finally, staking-as-a-service provider downtime or adverse behavior could potentially result in a user's stake getting slashed. Individuals that employ staking-as-a-service firms entrust these firms to perform all of their respective duties in the networks they are active in. Should these firms fail to meet network staking requirements, their customers could suffer financial repercussions.
While infrastructure requirements can vary considerably across different layer-1 and layer-2 networks, all of these platforms rely on distributed networks of nodes. These nodes perform different functions such as i) processing and attesting to the validity of transactions, ii) storing transaction data, and iii) acting as interfaces to submit transactions and access their related data. For example, remote procedure call (RPC) nodes are essential for decentralized application (dApp) development since they allow for dApps to communicate with the blockchain.
At the highest level, nodes can be subdivided into full nodes, which maintain all transaction records and have voting rights, and light nodes, which store and provide the necessary data to accommodate users' daily activities. Full nodes can be further broken down into several sub- subcategories such as pruned full nodes and archival full nodes. By employing node-as-a- service firms, users ranging from dApp developers to institutional investors to data analytics firms can connect to a blockchain without the need to set up, run and maintain the necessary underlying node infrastructure, which can be very resource intensive.
Self-hosted Nodes vs. Node-as-a-Service
Every dApp needs to connect to a blockchain to send and receive data. It can connect to a blockchain in two ways:
Via self-hosted nodes – In this case, the dApp developer manages the end-to-end infrastructure of a full node capable of pushing transactions into a blockchain. Self-hosted nodes reduce latency and give developers more control over their infrastructure but require individual operators to possess significant technical expertise. Although self-hosting can theoretically be the most reliable method to ensure an uninterrupted connection to a blockchain, maintaining a full node that always functions optimally can be costly and time-consuming.
Via node-as-a-service providers – Developers can outsource node operations to third-party services to focus on application development. Node service providers can further be divided into i) centrally hosted node networks and ii) peer-to-peer node networks. Centrally hosted node networks typically rely on third-party cloud hosting companies such as Amazon and Google, which provide fully managed services to host nodes. Peer-to-peer (p2p) node networks can be used by developers who neither want to run their own node nor rely on a centrally-hosted node. Node providers such as Ankr and Pocket Network (discussed in Part 3 of this report) incentivize individuals or organizations to run full nodes for multiple blockchains.
Important Attributes of Node-as-a-Service Providers
Node-as-a-service providers compete across the following dimensions to meet the needs of their customers:
- Blockchain and node support – Node-as-a-service providers usually specialize in a set of blockchains. Users must select providers that offer services on the required blockchains. Different users may require different types of nodes depending on their performance needs. Shared nodes are used by multiple customers simultaneously, whereas dedicated nodes are used exclusively by a single user. Scaling an application with a single shared node can prove difficult depending on bandwidth constraints. Users with very high performance and stability requirements may also use node clusters, which usually have at least two dedicated nodes that help with failover protection and load balancing (both described in more detail below).
- Customer Support – Node providers must have support teams with enough personnel distributed across different time zones to ensure that queries can be answered expediently. Most node-as-a-service companies have dedicated channels on applications such as Discord, Slack, and Telegram to handle such queries.
- Data Accuracy – Since blockchains operate in a decentralized fashion, there is a risk that data can be inaccurate. For example, users may not see transactions in their wallets. Such problems can arise because of a node's inability to pull accurate data due to, for example, traditional load balancing that ineffectively routes traffic. Benchmarking tools like Alchemy's Data Accuracy Benchmark can help compare the data accuracy of different node service providers.
- Decentralization – Decentralization of nodes can be important for an application to maintain minimum redundancy that ensures 100% uptime, and, as argued by purists, censorship resistance. Node decentralization can be further broken down into decentralization vectors, such as, for example, blockchain client diversity, cloud provider diversity, or geographical diversity.
- Failover Protection – Failover protection is an operational mode that automatically switches a network to a redundant server in the event of a system failure. It is used to prevent excessive loss of data and downtime from system failures in blockchain and 'web2' servers. Failover protection is critical for blockchains due to their need for constant accessibility but also for node operators to prevent being penalized when the consensus algorithm is violated due to a node's temporary downtime. For example, Ethereum node operators may be slashed if a node signs a transaction multiple times, which can happen if the node goes offline and comes back online.
- Response Time – Response time is the time it takes for a node to return requests. It can be a critical consideration for users that employ time-sensitive operations, such as traders, when choosing a provider.
- Scalability – Scalability is one of the most critical qualities of a node provider. Node scalability can be affected by two factors, i) load balancing and ii) autoscaling. dApps without load balancers rely entirely on a single node's performance. Load balancing is a scaling mechanism that directs client requests to nodes with the lowest workload in order to prevent a single node from being overwhelmed. Autoscaling refers to an automated mechanism that increases or decreases the resources allocated to a network based on its usage patterns. This allows blockchains to provide consistent performance for users in periods of high demand while reducing costs when they experience low traffic.
Wallets are user-facing applications that allow users to send transactions to blockchain nodes. Wallet types span non-custodial wallets, such as hardware and browser wallets, and custodial wallets such as exchange wallets. At their core, wallets are a set of keys: a private key and a public key. The private key proves ownership of the digital assets associated with the public key, while the public key maps to a public address that receives transactions. Non-custodial wallets require users to manage their keys, while custodial wallets offload private key management and its responsibility to entities like custodians and CEXes. This report focuses on 'off-the-shelf' wallet solutions and their providers. Readers interested in 'Wallet-as-a-service', which provides institutional customers with tailored solutions are encouraged to refer to last year's data and infrastructure report.
Given the hundreds of billions of dollars worth of client assets on their platforms, CEXes such as Binance and Coinbase account for a large share of balances held in custodial wallets. Additionally, institutional custody firms such as BitGo, Fireblocks, and Copper have emerged as popular service providers that have won the trust of many institutions. While it is difficult to pinpoint the total dollar value of all digital assets under custody, the value of assets secured by institutional custodians (e.g. BitGo, Fireblocks, Copper) is estimated to be over $200 billion. Interested readers can dive deeper into custody solutions in The Block's Institutional Custody for DigitalAssets Primer.
Non-custodial Wallet Providers
Non-custodial wallet providers allow users to take custody of their private keys. Self-custody removes the counterparty risk involved in entrusting funds with a third party, which has once again proved to be critical in the wake of the collapse of FTX. But it also presents its own set of risks, such as private keys getting lost or stolen and hardware wallets getting damaged.
Wallets can be divided into two broad categories – hot wallets and cold wallets. While both hot and cold wallet solutions are available in custodial and non-custodial implementations, this report discusses them in the context of non-custodial implementations.
Hot wallets such as MetaMask, Trust Wallet, and Exodus allow users to take custody of their private keys through digital interfaces. These hot wallets are connected to the internet 24/7, which allows for quick access to funds for on-chain traders and exchanges processing customer transactions and withdrawals. This accessibility, however, comes at the cost of increased exposure to malicious actors. Hot wallets are vulnerable to attack techniques, including phishing attacks, clipboard malware, keyloggers, and man-in-the-middle attacks.
Cold wallets, though they can be as primitive as private keys written on a physical sheet of paper, typically employ small hardware devices to store users' private keys offline. These cold wallets are generally more secure than hot wallets because they are only connected to the internet when they are in use. Even when connected to the internet, cold wallets typically sign transactions within their respective devices and broadcast that signature via the internet – meaning that the wallet's private keys are never exposed to the internet. Although cold wallets can still be subject to man- in-the-middle attacks, this makes them extremely difficult to compromise. This security advantage, however, translates to a disadvantage in terms of the speed of transaction execution.
Extra layers of security add more friction for executing transactions and are, thus, unsuitable for active trading strategies. For example, the slower execution flow typical of cold wallets may result in failed transactions on automated market makers (AMM) if the price moves outside the predetermined range before the transaction is picked up.
In addition to distinguishing between custodial and non-custodial as well as hot and cold wallets, Ethereum's Ethereum Virtual Machine (EVM) supports both externally-owned account (EOA)- based wallets and smart contract-based wallets.
EOA Based Wallets vs. Smart Contract Wallets
Ethereum Virtual Machine (EVM) compatible chains have two types of accounts - EOAs and smart contract accounts. EOAs are controlled by any user who holds private keys, whereas smart contract accounts are controlled by code. Smart contracts open up a new design space for wallets and can provide a more secure user experience than EOA-based wallets. Smart contract wallets allow native functionalities like social recovery, multisig security, multi-factor authentication, whitelisting contracts and addresses, bundled transactions, daily transaction limits, and emergency account freezing.