MySQL HeatWave Lakehouse can load 400TB data from object storage 8X faster than Redshift and 2.7X faster than Snowflake

MySQL HeatWave Lakehouse scales to 512 nodes, can process hundreds of terabytes of data in object store in multiple file formatsincluding Aurora and Redshift backups

Oracle recently announced MySQL HeatWave Lakehouse, enabling customers to process and query hundreds of terabytes of data in object store in a variety of file formats, such as CSV and Parquet, as well as Aurora and Redshift backups. MySQL HeatWave Lakehouse is the newest addition to the MySQL HeatWave portfolio, the only cloud service that combines transaction processing, analytics, machine learning, and machine learning-based automation within a single MySQL database.

Powered by the massively parallel scale-out MySQL HeatWave architecture, MySQL HeatWave Lakehouse delivers significantly better performance than competitive cloud database services for running queries and loading data, as demonstrated by industry standard benchmarks. In addition, in a single query, customers can query transactional data in the MySQL database and combine it with data in the object store using standard MySQL syntax. Oracle also announced new MySQL Autopilot capabilities that improve performance and make MySQL HeatWave Lakehouse easy to use. MySQL HeatWave Lakehouse is now available in Beta for customers to try and is slated for general availability in 1HCY23.

Customers migrating from AWS, Google, and on-premises have been using MySQL HeatWave for a broad set of use cases including marketing analytics, particularly real-time analysis of advertising campaign performance and customer data analytics to build effective campaigns. Customers migrating from AWS include leaders in the automotive, telecommunications, retail, high-tech, and healthcare industries.

“MySQL HeatWave is the result of years of research and advanced development, which we are turning into breakthrough innovations to address a bigger set of challenges for all MySQL customers. In fact, MySQL HeatWave Lakehouse is our third major MySQL HeatWave announcement this year,” said Edward Screven, chief corporate architect, Oracle. “There is a huge growth in data stored outside of databases, and with MySQL HeatWave Lakehouse, customers can leverage all the benefits of HeatWave on data residing in object store. MySQL HeatWave now provides one integrated service on multiple clouds for transaction processing, analytics across data warehouses and data lakes, and machine learning without ETL. This combination helps deliver massive improvements in performance, automation, and cost—further distancing MySQL HeatWave from other cloud database services.”

“We are excited to continue our collaboration with Oracle, evolving it into supporting their new MySQL HeatWave Lakehouse offering, which is optimized to run on AMD EPYC-powered Oracle cloud instances and leverage the latest innovations in our processors,” said Mark Papermaster, chief technology officer and executive vice president at AMD. “The collective work of the AMD and Oracle engineering teams has helped create an impressive MySQL solution that can support great scalability and performance for transaction processing, analytics, machine learning, and machine learning-based automation within a single MySQL database.”

Oracle is also publishing new lakehouse benchmarks and introducing several innovative capabilities for MySQL HeatWave Lakehouse and MySQL Autopilot.

Benchmarks

  • Faster than Snowflake & Amazon Redshift in both query performance and data loading

    As demonstrated by a fully transparent, publicly available 400 TB TPC-H* benchmark, the query performance of MySQL HeatWave Lakehouse is:
    • 17X faster than Snowflake
    • 6X faster than Amazon Redshift
  • Loading data from object store into MySQL HeatWave Lakehouse is also significantly faster. For a 400 TB TPC-H* workload, load performance of MySQL HeatWave Lakehouse is:
    • 8X faster than Amazon Redshift
    • 2.7X faster than Snowflake

All of these fully transparent benchmark scripts are available on GitHub for customers to replicate.

“MySQL HeatWave Lakehouse sets the competition on fire by blazing the trail to the previously uncharted territory of 400TB cloud database benchmarks at breakneck speeds,” said Ron Westfall, senior analyst and research director, Futurum Research. “MySQL HeatWave Lakehouse is a quantum leap for HeatWave in terms of processing capacity and computing power: from 32TB and 64 nodes to 400TB and 512 nodes with performance and price performance that handily beat Amazon Redshift and Snowflake. Meanwhile, the cloud database competitors have yet to respond to the in-database convergence and the multi-cloud presence of MySQL HeatWave. How will they cope with the 400TB MySQL HeatWave Lakehouse?”

Innovative new capabilities for MySQL HeatWave Lakehouse

  • Larger data size, standard MySQL syntax: Customers can query up to 400TB of data with MySQL HeatWave Lakehouse, and the HeatWave cluster scales to 512 nodes. Customers use standard MySQL syntax for querying the data.
  • Identical performance and compression: MySQL HeatWave offers the same query performance for data stored inside MySQL database or on object store—as demonstrated by both 10TB and 30TB TPC-H benchmarks. Furthermore, the amount of compression achieved and the amount of data which can be processed per node is the same in both instances.
  • Support for multiple file formats: With MySQL HeatWave Lakehouse, customers can load and process data stored in a variety of file formats, such as CSV and Parquet, as well as Aurora and Redshift backups from AWS. This enables customers to leverage the benefits of MySQL HeatWave even when their data is not stored inside a MySQL database. The query performance is the same regardless of the file format in which the data is stored.
  • Ability to query data in MySQL and combine it with data in object store: With MySQL HeatWave Lakehouse, customers can query their OLTP data stored inside MySQL database and combine it with data stored in the object store. Any change made to the OLTP data is updated in real time and reflected in the query result.

New MySQL Autopilot capabilities for MySQL HeatWave Lakehouse

MySQL Autopilot provides machine learning-based automation for MySQL HeatWave. Existing MySQL Autopilot capabilities such as auto provisioning and auto query plan improvement have been enhanced for MySQL HeatWave Lakehouse, which further reduces database administration overhead and improve performance. In addition, a number of new MySQL Autopilot capabilities are now available for MySQL HeatWave Lakehouse.

  • Auto schema inference: Autopilot automatically infers the mapping of the file data to datatypes in the database. As a result, customers don’t need to manually specify the mapping for each new file to be queried by MySQL HeatWave Lakehouse—thereby saving time and effort.
  • Adaptive data sampling: Autopilot intelligently samples portions of files in object storage, collecting accurate statistics with minimal data access. MySQL HeatWave uses these statistics to generate and improve query plans, determine the optimal schema mapping, and for other purposes.
  • Auto load: Autopilot analyzes the data to predict the load time into MySQL HeatWave, determines the mapping of the datatypes, and automatically generates the loading scripts. Users don’t have to manually specify the mapping of files to database schemas and tables.
  • Adaptive data flow: MySQL HeatWave Lakehouse dynamically adapts to the performance of the underlying object store. As a result, MySQL HeatWave can get the maximum available performance from the underlying cloud infrastructure which improves overall performance, price performance, and availability.

Additional enhancements to MySQL HeatWave

Oracle announced a number of other enhancements to MySQL HeatWave spanning from machine learning to the VS code plug-in. The in-database machine learning capabilities of MySQL HeatWave have been further enriched to include support for forecasting models. New machine learning explanation techniques have been added which have been optimized for MySQL HeatWave. Data scientists can now influence various stages of the automated HeatWave ML training pipeline, including the choice of algorithm, feature selection, scoring metric, and the explanation technique. HeatWave ML has also been enhanced to allow customers to import machine learning models into HeatWave.

A new multi-engine Hypergraph query optimizer further improves the performance of complex queries and eliminates the need to specify the join order. Zone map has been added, which accelerates a broader set of queries with MySQL HeatWave. And the VS code plug-in for MySQL has been enhanced to support MySQL HeatWave capabilities.

Ready for the Distributed Cloud

MySQL HeatWave is available in multiple clouds including OCI, AWS, and now Microsoft Azure. It’s available on-premises as part of OCI Dedicated Region for organizations that prefer not to move their database workloads to the public cloud. Customers can also replicate data from their on-premises MySQL OLTP applications to MySQL HeatWave to obtain near real-time analytics. MySQL HeatWave is always on the latest version of the MySQL database.

Additional Resources

  • Learn more about MySQL HeatWave
  • Watch the MySQL HeatWave explainer video

* Benchmark queries are derived from the TPC benchmarks, but results are not comparable to published TPC benchmarks results since these do not comply with the TPC specifications.

Liked this post? Follow SwirlingOverCoffee on Facebook, YouTube, and Instagram.