[DataShare Insight] The Hidden Cost of Maintaining Onchain Data Infrastructure

[DataShare Insight] The Hidden Cost of Maintaining Onchain Data Infrastructure

Why exchanges are separating data ownership from infrastructure ownership?

TL;DR

  • Blockchain data infrastructure is becoming a utility, not a differentiator. Exchanges compete on liquidity, compliance, and product. Not on who runs the best ETL pipeline.
  • The real maintenance burden is not a single salary. It is platform engineering, observability, incident response, schema management, and continuous chain adaptation. It compounds over time.
  • Data ownership and infrastructure ownership are not the same thing. You can retain full control of your data without operating the systems that prepare it.
  • The exchanges that moved on asked a different question: not "how do we build this" but "should we still be the ones running it?"
  • Nodit DataShare delivers verified, structured onchain data directly into your warehouse — your storage, your access controls, your governance.

Accessing blockchain data has become easier than ever. Maintaining the infrastructure required to operationalize that data has not. As the number of supported chains grows and regulatory expectations expand, centralized exchanges are discovering that the real challenge is not data access — it is infrastructure ownership.

Not every infrastructure needs to be owned

Most exchanges do not run their own cloud infrastructure. Most do not maintain their own database engines. Most do not build their own observability platforms from scratch.

Not because these systems are unimportant — because they are infrastructure. The value comes from using them, not from operating them.

Organizations rarely build their own databases, message queues, or monitoring systems today. Companies like Snowflake, Confluent, MongoDB Atlas, Datadog, and Elastic Cloud exist because operating these layers efficiently requires dedicated expertise, and also because the engineering capacity spent on infrastructure operations is engineering capacity not spent on product differentiation.

Blockchain data infrastructure is becoming a utility, not a differentiator. Exchanges compete on liquidity, compliance posture, execution quality, and customer experience. Not on who maintains the best blockchain ETL pipeline.

The question is no longer whether exchanges need blockchain data infrastructure. The question is whether operating it internally remains the right decision.

Most exchanges are solving the wrong problem

At first, the objective seems straightforward: collect blockchain data, store it, and make it available for internal teams.

Then the use case grows.

Every new token listing requires data that someone has to prepare from scratch. Every AML investigation starts with data that needs to be reconstructed. Every reconciliation issue eventually traces back to a data problem.

What started as a pipeline becomes a platform. A platform becomes an operational responsibility. And that responsibility compounds — new chains require new parsers, schema changes break existing transformations, historical backfills consume weeks of engineering effort.

“Why not just keep using APIs?”

APIs solve request-time access. Warehouses solve organization-wide access. They address fundamentally different problems — and the moment an exchange needs multiple teams querying the same onchain data, joining it with internal systems, and running it on extended time windows, the API alone is no longer sufficient.

The challenge is no longer data access. It is the ongoing maintenance burden of the infrastructure behind it.

The real issue: data ownership vs infrastructure ownership

For exchanges, onchain data is no longer difficult to access. The challenge is what happens after: who builds the pipeline, who maintains it, and who absorbs the cost when something breaks.

There is an important distinction worth making:

  1. Data ownership

The right to control, query, and govern your data — is something every exchange should retain.

  1. Infrastructure ownership

The operational responsibility of running the systems that collect, validate, and deliver that data — is a separate question entirely.

Most exchanges conflate the two. The assumption is that to own your data, you must own the infrastructure behind it. That assumption is worth examining.

What these job descriptions from exchanges actually tell us

Recent hiring activity from Binance and Coinbase illustrates how much operational investment leading exchanges dedicate to maintaining large-scale blockchain data platforms.

Binance is recruiting Senior Data Warehouse Engineers covering warehouse architecture, real-time pipeline monitoring, multi-chain data quality management, and onchain troubleshooting — including Solana historical backfill. Coinbase is hiring Senior Software Engineers for its Data Platform team, covering cloud warehouse and data lake operations, pipeline development across Kafka, Spark, and Airflow, data quality tracking, lineage, and internal SDK development.

These are not roles focused exclusively on blockchain data. They own warehouse architecture, batch pipelines, internal BI, streaming, governance, and platform operations broadly. Blockchain data ingestion is one component of a much larger operational footprint.

Many of the responsibilities in these roles revolve around preparing blockchain data for internal consumption — collecting it, validating it, transforming it, monitoring it, and making it available to downstream teams. This is precisely the infrastructure layer that warehouse-native blockchain data delivery aims to eliminate.

The real cost is not a single salary. It is the cumulative burden across the full platform:

The build is not the problem. The maintenance is.

Most exchanges that built their own onchain pipelines describe the same experience: the initial build was manageable. What followed was not.

Chain upgrades break parsers. Solana's transaction structure evolves without warning — failed transactions, compute unit fees, and shifting instruction formats produce edge cases that standard parsers cannot reliably handle. Ethereum reorgs happen. TRC-20 transfers surface unexpected edge cases. None of this ends at deployment — it resurfaces in every sprint, competes for engineering bandwidth, and grows alongside the business.

The longer the infrastructure exists, the more stakeholders depend on it. The more stakeholders depend on it, the more expensive the maintenance burden becomes.

Why now: The Industry Is Reaching an Inflection Point

Five years ago, most exchanges supported only a handful of networks. Today they manage dozens of chains, Layer 2s, fiat-backed stablecoins, staking assets, and protocol-specific data models — each with its own ingestion requirements, schema conventions, and edge cases.

At the same time, regulatory expectations around AML, Travel Rule compliance, and operational transparency continue to expand. Compliance teams need faster access to more complete data. Finance teams need reconciliation that runs on current chain data. Risk teams need listing due diligence that is auditable and repeatable.

Infrastructure complexity is growing faster than engineering capacity. That gap is where the maintenance burden becomes a strategic question, not just a technical one.

Why specialization matters

Every exchange will eventually need a strategy for delivering trusted blockchain data into its warehouse. There are multiple ways to do that — building internal pipelines, using public datasets from sources like BigQuery or Dune, working with providers like Allium or Goldsky, or combining approaches.

The question is not which approach is theoretically correct. It is which approach best fits the operational reality: the chains you support, the teams that depend on the data, the governance requirements you operate under, and the engineering capacity available to maintain it.

An increasing number of exchanges are choosing to externalize the infrastructure layer — not because they cannot build it, but because maintaining it competes with work that actually differentiates their product.

Blockchain data infrastructure is not Nodit's side business. It is the product. Nodit operates its own nodes, maintains indexing and validation layers, handles chain-specific edge cases including Solana's instruction-level complexity, and manages schema evolution as a core engineering function. While exchanges focus on trading, compliance, custody, and growth, that entire infrastructure layer is what Nodit focuses on exclusively.

The concern every technical leader has

Most CTOs and data leads ask the same question:

"If we stop owning the pipeline, what do we lose?" 

It is a fair concern, and one worth answering directly.

Do we lose control of our data?

No. Data is delivered to your own storage environment — your S3 bucket, your GCS bucket, your Cloudflare R2. Your keys, your access controls, your retention policies.

Can we still access raw data?

Yes. Bronze tier delivers raw, decoded blockchain data. You can build on top of it the same way you would with any internal dataset.

Can we build custom models on top?

Yes. DataShare delivers to Snowflake, BigQuery, Databricks, and Redshift. Your analytics stack, your data models, your governance.

What if we need custom datasets?

Nodit supports custom dataset requirements beyond the standard tier structure. That conversation starts with your specific chain coverage and schema needs.

What if Nodit's infrastructure goes down?

Data already delivered to your warehouse remains yours. You are not operating in a shared environment — the data lands in your storage, and your downstream systems continue to function.

The goal is not to outsource data ownership. The goal is to separate data ownership from infrastructure maintenance burden.

From API to warehouse — how the need expands

Most exchanges are already using onchain data. FDS and compliance teams query transactions through Web3 Data APIs — single calls, real-time lookups, point-in-time checks. That works well for what it is designed to do.

The limitation appears when the use case grows. When Finance needs six months of settlement history joined against internal ledger records, a single API call is not the right tool. When the Listing team wants to analyze holder concentration across a new token's full transaction history, or when Product wants to correlate onchain deposit patterns with internal campaign data — the need shifts from querying individual records to operating on structured datasets at scale.

This is where exchanges begin looking beyond APIs. They need warehouse-native blockchain datasets that are already structured, validated, and continuously maintained. The operating model changes from querying blockchain data to treating blockchain data as enterprise data — the same way finance teams treat ERP data, or compliance teams treat KYC records.

This is the model behind Nodit DataShare — not a replacement for the Web3 Data API, but the natural next step when onchain data needs to live inside the warehouse, be joined with internal systems, and be available to multiple teams without going through engineering every time.

How exchanges use DataShare in practice

Separating data ownership from infrastructure maintenance burden is not a theoretical concept. Inside exchanges, it shows up in workflows where blockchain data needs to be combined with internal systems, queried at scale, and shared across multiple teams.

Settlement and reconciliation — Finance / Ops

Deposit and withdrawal records, fee breakdowns, failed and reprocessed transactions, and chain-specific settlement data connect directly with internal ledger systems. Long-term history is queryable without pulling individual API calls. Reconciliation runs automatically as scheduled queries, and the finance team stops waiting on engineering for routine reporting.

Compliance and AML

Onchain transaction data — including erc20_transferstrc20_transfersxrp_balance_changes, and decoded Solana instructions — is delivered directly into the warehouse and queryable across extended time windows. Compliance teams can run monitoring, investigation, and wallet exposure analysis on structured, validated datasets without managing a separate ingestion pipeline. Audit trails are structured for regulatory review at the point of delivery, not assembled after the fact. For exchanges already using the Web3 Data API for real-time FDS checks, DataShare extends that capability into long-term, warehouse-native compliance workflows.

Token listing due diligence — Listing / Risk

Before a token goes live, holder distribution, transfer history, contract activity, liquidity depth, address concentration, and anomalous transaction patterns are queryable directly from the warehouse. Listing decisions that previously relied on manual extraction become standardized, repeatable, and auditable.

Wallet and asset tracking — Treasury / Wallet Ops

Exchange-held wallets — hot, cold, and chain-specific — are trackable across balance snapshots, asset flows, and point-in-time positions. Treasury teams get a continuous view of asset exposure across chains without manually aggregating data from separate sources. Historical snapshots are available from day one.

Product analytics — Product / Growth

Onchain deposit and withdrawal patterns, chain-level usage, token inflows and outflows, and post-campaign onchain behavior are analyzable directly from the warehouse. Because the data lands in the exchange's own storage environment, internal user identification data can be joined securely without exposing it to external systems — the analysis stays within existing governance controls.

What sets Nodit DataShare apart — a verified data layer that covers Solana

Nodit runs its own nodes. Data provenance starts at the source, not at a third-party RPC endpoint. Before anything reaches the warehouse, it goes through column-level validity checks, table-level consistency checks, and cross-table relationship validation. What arrives is already verified against the source chain.

The tier structure:

  • Bronze — Raw, decoded blockchain data
  • Silver — Structured, enriched datasets ready for direct analysis
  • Gold — Protocol-specific tables: token analytics, DEX activity, balance changes, cross-chain fiat-backed stablecoin flows

Data goes directly to the exchange's own storage — AWS S3, Google Cloud Storage, or Cloudflare R2 — then loads into Snowflake, BigQuery, Databricks, or Redshift with no additional transformation. SOC 2 Type 2 aligned, with audit-ready data lineage and access controls. Delivery within 3 minutes of block confirmation.

Solana is where the complexity of infrastructure maintenance becomes most visible. Failed transactions, compute unit fees, and shifting instruction formats produce edge cases that break standard parsers with regularity. Most providers treat Solana as a limitation. Nodit DataShare includes Solana datasets in beta, processed through the same validation pipeline as every other supported chain.

For exchanges with Solana exposure across listings, settlements, and AML workflows, this is an operational requirement, not a feature addition.

Read more:

Solana DataShare Beta Launch: Free exports available During Beta
TL;DR * Datashare Beta version allows data teams to evaluate blockchain datasets directly within their own storage environment before committing to enterprise deployment. * Each account receives 3 free exports, with each export covering one dataset on one chain. * The Solana datasets currently include Solana Token Transfers (Nodit Exclusive), Solana Supply

One question worth sitting with

Reliable blockchain data has become a foundational capability for every centralized exchange. The question is no longer whether to invest in it, but how that capability should be operated over the long term.

For many exchanges, maintaining blockchain data infrastructure has gradually evolved from an engineering project into a permanent operational function. As that responsibility grows, so does the opportunity cost. Every sprint spent maintaining ingestion pipelines, adapting to protocol changes, or validating historical datasets is a sprint not spent improving products, strengthening compliance workflows, or enhancing the customer experience.

This is why more engineering leaders are beginning to separate data ownership from infrastructure ownership. They continue to own their data, governance, and analytics, while rethinking whether operating the underlying infrastructure remains a strategic advantage.

Exchanges should own their data. They should not have to own the infrastructure required to prepare it.

This article is the first in the DataShare Insight series, exploring how exchanges, payment providers, custodians, and asset managers are modernizing blockchain data infrastructure across compliance, settlement, treasury, and business intelligence.

Subscribe to the DataShare Insight series to receive the next piece before it is published.

If your team is currently maintaining blockchain data pipelines, let's compare operating models. We will show how major exchanges are separating strategic data ownership from infrastructure ownership, and where DataShare fits into that model. 

Last but not least! We will also be at Solana Breakpoint 2026 in November to share more about the Datashare, stay tuned for more details!


🔎About Nodit

Nodit is an enterprise-grade Web3 platform that provides reliable node and consistent data infrastructure to support the scaling of decentralized applications in a multi chain environment. The core technology of Nodit is a robust data pipeline that performs the crawling, indexing, storing, and processing of blockchain data, along with a dependable node operation service. Through its new Validator as a Service (VaaS) offering, Nodit delivers secure, transparent, and compliant validator operations that ensure stability, performance visibility, and regulatory assurance.

By utilizing processed blockchain data, developers and enterprises can achieve seamless on chain and off chain integration, advanced analytics, comprehensive visualization, and artificial intelligence modeling to build outstanding Web3 products.

Homepage l X (Twitter) l Linkedin

Read more