How Goldsky Democratizes Streaming Data for Web3 Developers

October 2, 2023

2 Views

SaveSavedRemoved 0

How Goldsky Democratizes Streaming Data for Web3 Developers

Goldsky is a real-time data platform that caters to the Web3 ecosystem, specifically developers of decentralized applications (dApps). The platform offers blockchain indexing, subgraphs, and data-streaming pipelines to help developers extract specific data from different blockchains. This process is often complex and time-consuming, involving remote procedure call (RPC) providers and APIs. Goldsky simplifies this by ingesting data directly from blockchains and providing embedded tools for developers to build real-time data pipelines.

One of Goldsky’s notable use cases is powering a popular betting website that tracks real-time betting trends. By ingesting data from the blockchain layer, Goldsky ensures transparency and fairness in the bets and auctions it tracks. The platform decodes and transforms the blockchain data in real-time, syncing it directly to the customer’s database. This data is then used for the website’s API layer to support features like betting order books and statuses.

Goldsky’s streaming-first architecture is designed to empower developers with access to real-time streaming data. Traditionally, data platform technology that supports reporting and dashboards was limited to internal analytics teams. However, there has been a shift in the industry, recognizing that the same technology can be used to power customer-facing features in web applications. Many app developers may not be familiar with the underlying technology stacks for data pipelines, such as Apache Kafka and Apache Flink. Goldsky addresses this by providing simple API-driven tools that bring real-time data into applications, democratizing developer access to streaming data.

Redpanda, a Kafka-compatible platform, is a key component of Goldsky’s architecture. It serves as the primary streaming-data solution, while Apache Flink is used for stream processing. Redpanda’s managed service, Redpanda Cloud, ensures minimal overhead. The platform ingests data from blockchains through direct indexing and Subgraphs, processing it into a format that is directly usable by customers.

Goldsky’s data pipeline involves Apache Flink SQL for transformation, allowing customers to write simple SQL transformations without needing deep knowledge of the underlying technology. Redpanda serves as the primary database and data lake, thanks to its S3-compatible tiered storage. This approach simplifies data retention and eliminates the need for additional resources to maintain archival systems.

Goldsky supports various sinks, including PostgreSQL, S3, and Elasticsearch, allowing customers to choose where they want to store their data. PostgreSQL is a popular choice due to its ability to support both transactional and analytical use cases, ease of integration with existing applications, and scalability for large data sets.

Redpanda, with its simplified architecture and cost efficiency, emerged as a foundational component of Goldsky’s data infrastructure. Its durability, Jepsen-tested and Raft-native architecture, and cost-efficient tiered storage make it an ideal source of truth for Goldsky. The platform is also optimistic about Redpanda’s roadmap, which includes innovations like Data Transforms powered by WebAssembly and Apache Iceberg integration.

Goldsky’s use of data streaming and stream processing has proved to be crucial in solving complex problems related to blockchain data, such as blockchain reorgs. The platform is able to enrich on-chain data with off-chain data, calculate reliable Top-N aggregations, and combine data from multiple blockchains.

In conclusion, Goldsky’s real-time data platform offers developers in the Web3 ecosystem the tools they need to extract specific data from blockchains. By simplifying the process and providing access to streaming data, Goldsky empowers developers to build powerful decentralized applications. With Redpanda at its core, Goldsky ensures the reliability, durability, and cost-efficiency required for handling high throughput and scaling at a large scale.