Evolution of Blockchain Data Indexing: From Node to AI-Enabled Full Chain Services

The Evolution of Blockchain Data Indexing Technology: From Nodes to AI-Powered Full-Chain Data Services

1. Introduction

Since the first batch of decentralized applications (dApp) was launched in 2017, the Blockchain ecosystem has flourished, with various dApps blossoming on different public chains. When discussing these decentralized applications, have we ever thought about the sources of the various data they use?

In 2024, artificial intelligence and Web3 have become hot topics. In the field of AI, data is like the source of life for the growth and evolution of intelligent systems. Just as plants need sunlight and moisture to thrive, AI systems require massive amounts of data to continuously "learn" and "think". Without data support, even the most sophisticated AI algorithms struggle to exhibit their intended intelligence and effectiveness.

This article will delve into the evolution of blockchain data indexing in the process of industry development from the perspective of blockchain data accessibility, and will compare and analyze the similarities and differences between traditional data indexing protocols and emerging blockchain data service protocols in terms of data services and product architecture, with a particular focus on the innovations brought by new protocols that integrate AI technology.

2. Complexity and Simplicity of Data Indexing: From Blockchain Node to Full-Chain Database

2.1 Data Source: Blockchain Node

The essence of Blockchain is a decentralized distributed ledger. Blockchain Nodes are the foundation of the entire network, responsible for recording, storing, and disseminating all on-chain transaction data. Each Node keeps a complete copy of the Blockchain data to maintain the decentralized characteristics of the network. However, for ordinary users, building and maintaining a Blockchain Node is not an easy task, as it requires professional skills and faces high hardware and bandwidth costs. At the same time, the query capability of ordinary Nodes is limited, making it difficult to meet developers' needs for data formats. Therefore, although theoretically anyone can run a Node, in practice, users tend to rely on third-party services.

To solve this problem, the RPC( remote procedure call node provider has emerged. These service providers bear the costs and management of nodes, providing users with data access services through RPC endpoints. This allows users to easily access blockchain data without having to build their own nodes. Although public RPC endpoints are free, they have rate limits that may impact the user experience of dApps. Private RPC endpoints, while offering better performance, still show inefficiencies for complex queries and have poorer scalability and cross-network compatibility. Nevertheless, the standardized API interfaces of node providers have lowered the threshold for users to access on-chain data, laying the foundation for subsequent data parsing and applications.

) 2.2 Data Parsing: From Raw Data to Usable Data

The data obtained from blockchain nodes is usually raw data that has been encrypted and encoded. While this data ensures the integrity and security of the blockchain, it also increases the difficulty of data parsing. For ordinary users or developers, directly handling this raw data requires a significant amount of technical knowledge and computational resources.

In this context, the data parsing process becomes particularly important. By converting complex raw data into a format that is easier to understand and manipulate, users can make more intuitive use of this data. The quality of data parsing directly affects the efficiency and effectiveness of blockchain data applications, making it a key link in the entire data indexing process.

2.3 The Evolution of Data Indexers

As the volume of Blockchain data grows, the demand for data indexers is increasing. Indexers play a key role in organizing on-chain data and sending it to databases for querying. The way indexers work is by indexing Blockchain data and making it available at any time through a SQL-like query language ### such as GraphQL API (. By providing a unified data query interface, indexers enable developers to quickly and accurately retrieve the information they need using standardized query languages, greatly simplifying the process.

Different types of indexers use various methods to optimize data retrieval:

  1. Full Node Indexer: Extracts data directly from full Blockchain nodes to ensure data completeness and accuracy, but requires a large amount of storage and processing power.

  2. Lightweight Indexer: Relies on full nodes to retrieve specific data on demand, reducing storage requirements but potentially increasing query time.

  3. Specialized Indexer: Optimized for specific types of data or specific Blockchains, such as NFT data or DeFi transactions.

  4. Aggregated Indexer: Extracts data from multiple Blockchains and sources ), including off-chain information (, providing a unified query interface, which is particularly useful for multi-chain dApps.

Currently, the storage requirements for Ethereum archive nodes vary from 3TB to 13.5TB under different clients, and continue to increase with the growth of the Blockchain. In the face of such a massive amount of data, mainstream indexing protocols not only support multi-chain indexing but also customize data parsing frameworks for the data needs of different applications.

Compared to traditional RPC endpoints, indexers greatly enhance the efficiency of data indexing and querying. They can efficiently index massive amounts of data, support high-speed complex queries, and easily filter and analyze data. Some indexers also support aggregating data sources from multiple blockchains, avoiding the issue of multi-chain dApps needing to deploy multiple APIs. By operating in a distributed manner, indexers provide stronger security and performance, reducing the risk of disruptions that may arise from centralized RPC providers.

The indexer allows users to directly obtain the desired information without having to deal with the underlying complex data through a predefined query language. This mechanism significantly improves the efficiency and reliability of data retrieval and is an important innovation in the field of Blockchain data access.

![Reading, indexing to analysis, a brief overview of the Web3 data indexing track])https://img-cdn.gateio.im/webp-social/moments-694cb5f2be61475195e2e559567dee89.webp(

) 2.4 Whole Chain Database: Aligning to Stream Priority

Using index nodes to query data often means that the API becomes the only channel for processing on-chain data. However, when projects enter the expansion stage, more flexible data sources are often required, which standardized APIs struggle to provide. As application demands become more complex, primary data indexers and their standardized indexing formats gradually become insufficient to meet the increasingly diverse query needs, such as searching, cross-chain access, or off-chain data mapping.

In modern data pipeline architecture, the "stream-first" approach has become a solution to the limitations of traditional batch processing, enabling real-time data ingestion, processing, and analysis. This paradigm shift allows organizations to respond immediately to incoming data, thereby deriving insights and making decisions almost in real-time. Similarly, the development of blockchain data service providers is also moving towards building blockchain data streams. Traditional indexing service providers have launched products that acquire real-time blockchain data in a streaming manner, such as stream-based real-time data lakes.

These services aim to address the demand for real-time parsing of Blockchain transactions and providing more comprehensive query capabilities. Just as the "flow-first" architecture has innovated the way data is processed and consumed in traditional data pipelines by reducing latency and enhancing responsiveness, these Blockchain data stream providers also hope to support the development of more applications and assist in on-chain data analysis through more advanced and mature data sources.

Redefining the challenges of on-chain data through the lens of modern data pipelines allows us to view the management, storage, and delivery of on-chain data from a new perspective. When we begin to see indexing tools like Subgraph and Ethereum ETL as data flows within the data pipeline rather than final outputs, we can envision a possible world where high-performance datasets can be tailored for any business use case.

![Read, Index to Analyze, Brief Overview of the Web3 Data Indexing Track]###https://img-cdn.gateio.im/webp-social/moments-587ce87f6dbedee4acec7d939fed6980.webp(

3. AI + Database? In-depth Comparison of the Three Major Data Indexing Protocols

) 3.1 The Graph

The Graph network achieves multi-chain data indexing and querying services through a decentralized network of nodes, facilitating developers to index blockchain data and build decentralized applications. Its main product models include the data query execution market and the data indexing cache market, which essentially serve the product query needs of users.

Subgraphs are the fundamental data structures in The Graph network, defining how to extract and transform data from the Blockchain into a queryable format. Anyone can create a subgraph, and multiple applications can reuse these subgraphs, enhancing data reusability and efficiency.

The Graph network consists of four key roles: indexers, curators, delegators, and developers, working together to provide data support for web3 applications. Among them, indexers are responsible for indexing and query processing, delegators stake GRT tokens to support the operation of indexing nodes, curators are responsible for signaling which subgraphs should be prioritized for indexing, and developers are the main users who create and submit subgraphs to the network.

Currently, The Graph has shifted to a fully decentralized subgraph hosting service, with circulating economic incentives among different participants to ensure system operation. Index nodes earn revenue through query fees and token rewards, while delegators and curators can also receive a portion of the rewards.

The products of The Graph are also rapidly developing in the wave of AI. Tools such as AutoAgora, Allocation Optimizer, and AgentC developed by Semiotic Labs enhance the performance of the ecosystem in various aspects, such as dynamic pricing, resource optimization allocation, and natural language queries. The application of these tools allows The Graph to further enhance the intelligence and user-friendliness of the system by integrating AI.

![Read, index to analyze, a brief overview of the Web3 data indexing track]###https://img-cdn.gateio.im/webp-social/moments-cf9a002b9b094fbbe3be7f611001b5c1.webp(

) 3.2 Chainbase

Chainbase is a full-chain data network that integrates all Blockchain data into one platform, making it easier for developers to build and maintain applications. Its features include:

  • Real-time Data Lake: Provides a dedicated real-time data lake for Blockchain data streams, allowing data to be accessed instantly.

  • Dual-chain architecture: Built on Eigenlayer AVS for the execution layer, forming a parallel dual-chain architecture with the CometBFT consensus algorithm, enhancing the programmability and composability of cross-chain data.

  • Innovative Data Format Standard: Introduced the "manuscripts" data format standard, optimizing the structuring and utilization of data in the cryptocurrency industry.

  • Crypto World Model: Combining AI model technology, it has created an AI model that can understand, predict Blockchain transactions and interact with them. Currently, the basic version model Theia has been released for public use.

Chainbase's AI model Theia is a key highlight that distinguishes it from other data service protocols. Theia is based on NVIDIA's DORA model, combining on-chain and off-chain data along with temporal and spatial activities to learn and analyze encryption patterns. It responds through causal reasoning, deeply mining the potential value and rules of on-chain data, providing users with more intelligent data services.

![Read, index to analyze, a brief overview of the Web3 data indexing track]###https://img-cdn.gateio.im/webp-social/moments-b343cab5112c1a3d52f4e72122ae0df2.webp(

) 3.3 Space and Time

Space and Time ###SxT( is committed to building a verifiable computing layer, expanding zero-knowledge proofs on decentralized data warehouses, and providing trusted data processing for smart contracts, large language models, and enterprises.

In the field of data indexing and verification, Space and Time has introduced the innovative Proof of SQL technology. This is a zero-knowledge proof technology that ensures SQL queries executed on decentralized data warehouses are tamper-proof and verifiable. During a query, Proof of SQL generates cryptographic proofs to verify the integrity and accuracy of the query results. This approach changes the resource consumption of multiple nodes repeatedly indexing the same data under traditional consensus mechanisms, enhancing the overall performance of the system.

SxT closely collaborates with Microsoft's AI Joint Innovation Lab to accelerate the development of generative AI tools, enabling users to process blockchain data through natural language. In Space and Time Studio, users can input natural language queries, and the AI will automatically convert them into SQL and execute the queries, presenting the final results required by the users.

![Reading, indexing to analysis, a brief overview of the Web3 data indexing track])https://img-cdn.gateio.im/webp-social/moments-97443cbd177ac4ffd1665da670ffbf12.webp(

Conclusion and Outlook

Blockchain data indexing technology has evolved from the initial node data source, through the development of data parsing and indexers, to ultimately AI-enabled full-chain data services, undergoing a process of gradual improvement. The continuous evolution of these technologies has not only enhanced the efficiency and accuracy of data access but also brought users an unprecedented intelligent experience.

Looking to the future, with the continuous development of new technologies such as AI and zero-knowledge proofs, blockchain data services will become further intelligent and secure. We have reason to believe that blockchain data services will continue to play an important role as infrastructure in the future, providing strong support for the progress and innovation of the industry.

![Reading, indexing to analysis, brief overview of the Web3 data indexing track])https://

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 2
  • Repost
  • Share
Comment
0/400
rekt_but_resilientvip
· 9h ago
The AI for reading articles is here again.
View OriginalReply0
NotSatoshivip
· 13h ago
This wave of AI has a good show to watch.
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)