April 29, 2024 By Ali LeClerc 3 min read

VeloxCon 2024, the premier developer conference that is dedicated to the Velox open-source project, brought together industry leaders, engineers, and enthusiasts to explore the latest advancements and collaborative efforts shaping the future of data management. Hosted by IBM® in partnership with Meta, VeloxCon showcased the latest innovation in Velox including project roadmap, Prestissimo (Presto-on-Velox), Gluten (Spark-on-Velox), hardware acceleration, and much more.

An overview of Velox

Velox is a unified execution engine that is built and open-sourced by Meta, aimed at accelerating data management systems and streamlining their development. One of the biggest benefits of Velox is that it consolidates and unifies data management systems so you don’t need to keep rewriting the engine. Today Velox is in various stages of integration with several data systems including Presto (Prestissimo), Spark (Gluten), PyTorch (TorchArrow), and Apache Arrow. You can read more about why Velox was built in Meta’s engineering blog.

Velox at IBM

Presto is the engine for watsonx.data, IBM’s open data lakehouse platform. Over the last year, we’ve been working hard on advancing Velox for Presto – Prestissimo – at IBM. Presto Java workers are being replaced by a C++ process based on Velox. We now have several committers to the Prestissimo project and continue to partner closely with Meta as we work on building Presto 2.0.

Some of the key benefits of Prestissimo include:

  • Hugh performance boost: query processing can be done with much smaller clusters
  • No performance cliffs: no Java processes, JVM, or garbage collections, as memory arbitration improves efficiency
  • Easier to build and operate at scale: Velox gives you reusable and extensible primitives across data engines (like Spark)

This year, we plan to do even more with Prestissimo including:

  • The Iceberg reader
  • Production readiness (metrics collection with Prometheus)
  • New Velox system implementation
  • TPC-DS benchmark runs

VeloxCon 2024

We worked closely with Meta to organize VeloxCon 2024, and it was a fantastic community event. We heard speakers from Meta, IBM, Pinterest, Intel, Microsoft, and others share what they’re working on and their vision for Velox over two dynamic days.

Day 1 highlights

The conference kicked off with sessions from Meta including Amit Purohit reaffirming Meta’s commitment to open source and community collaboration. Pedro Pedreira, alongside Manos Karpathiotakis and Deblina Gupta, delved into the concept of composability in data management, showcasing Velox’s versatility and its alignment with Arrow.

Amit Dutta of Meta explored Prestissimo’s batch efficiency at Meta, shedding light on the advancements made in optimizing data processing workflows. Remus Lazar, VP Data & AI Software at IBM presented Velox’s journey within IBM and vision for its future. Aditi Pandit of IBM followed with insights into Prestissimo’s integration at IBM, highlighting feature enhancements and future plans.

The afternoon sessions were equally insightful, with Jimmy Lu of Meta unveiling the latest optimizations and features in Velox. While Binwei Yang of Intel discussed the integration of Velox with the Apache Gluten project, emphasizing its global impact. Engineers from Pinterest and Microsoft shared their experiences of unlocking data query performance by using Velox and Gluten, showcasing tangible performance gains.

The day concluded with sessions from Meta on Velox’s memory management by Xiaoxuan Meng and a glimpse into the new simple aggregation function interface that was presented by Wei He.

Day 2 highlights

The second day began with a keynote from Orri Erling, co-creator of Velox. He shared insights into Velox Wave and Accelerators, showcasing its potential for acceleration. Krishna Maheshwari from NeuroBlade highlighted their collaboration with the Velox community, introducing NeuroBlade’s SPU (SQL Processing Unit) and its transformative impact on Velox’s computational speed and efficiency.

Sergei Lewis from Rivos explored the potential of offloading work to accelerators to enhance Velox’s pipeline performance. William Malpica and Amin Aramoon from Voltron Data introduced Theseus, a composable, scalable, distributed data analytics engine, using Velox as a CPU backend.

Yoav Helfman from Meta unveiled Nimble, a cutting-edge columnar file format that is designed to enhance data storage and retrieval. Pedro Pedreira and Sridhar Anumandla from Meta elaborated on Velox’s new technical governance model, emphasizing its importance in guiding the project’s development sustainability.

The day also featured sessions on Velox’s I/O optimizations by Deepak Majeti from IBM, strategies for safeguarding against Out-Of-Memory (OOM) kills by Vikram Joshi from ComputeAI, and a hands-on demo on debugging Velox applications by Deepak Majeti.

What’s next with Velox

VeloxCon 2024 was a testament to the vibrant ecosystem surrounding the Velox project, showcasing groundbreaking innovations and fostering collaboration among industry leaders and developers alike. The conference provided attendees with valuable insights, practical knowledge, and networking opportunities, solidifying Velox’s position as a leading open source project in the data management ecosystem.

If you’re interested in learning more and joining the Velox community, here are some resources to get started:

Stay tuned for more updates and developments from the Velox community, as we continue to push the boundaries of data management and accelerate innovation together.

Try Presto with a free trial of watsonx.data
Was this article helpful?
YesNo

More from Analytics

How the Recording Academy uses IBM watsonx to enhance the fan experience at the GRAMMYs®

3 min read - Through the GRAMMYs®, the Recording Academy® seeks to recognize excellence in the recording arts and sciences and ensure that music remains an indelible part of our culture. When the world’s top recording stars cross the red carpet at the 66th Annual GRAMMY Awards, IBM will be there once again. This year, the business challenge facing the GRAMMYs paralleled those of other iconic cultural sports and entertainment events: in today’s highly fragmented media landscape, creating cultural impact means driving captivating content…

How data stores and governance impact your AI initiatives

6 min read - Organizations with a firm grasp on how, where, and when to use artificial intelligence (AI) can take advantage of any number of AI-based capabilities such as: Content generation Task automation Code creation Large-scale classification Summarization of dense and/or complex documents Information extraction IT security optimization Be it healthcare, hospitality, finance, or manufacturing, the beneficial use cases of AI are virtually limitless in every industry. But the implementation of AI is only one piece of the puzzle. The tasks behind efficient,…

IBM and ESPN use AI models built with watsonx to transform fantasy football data into insight

4 min read - If you play fantasy football, you are no stranger to data-driven decision-making. Every week during football season, an estimated 60 million Americans pore over player statistics, point projections and trade proposals, looking for those elusive insights to guide their roster decisions and lead them to victory. But numbers only tell half the story. For the past seven years, ESPN has worked closely with IBM to help tell the whole tale. And this year, ESPN Fantasy Football is using AI models…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters