
Wednesday, March 18, 5:30 p.m. PT 🔗
20 Years of CUDA: Honoring the Architects of the Accelerated Age
What started in 2006 as a daring parallel computing wager has developed into the foundational heartbeat of recent science and AI.
At GTC, NVIDIA is marking 20 years of CUDA — representing the efforts of over 6 million builders innovating throughout each layer of the computing stack. At this time, it serves as a generational bridge between the pioneers who wrote the primary kernels and the subsequent wave of builders deploying trillion-parameter AI fashions.
Led by NVIDIA CUDA Architect Stephen Jones, a panel at GTC Wednesday featured a bunch of researchers and engineers from Leap Buying and selling, Meta Superintelligence Labs and NVIDIA who highlighted the many years of innovation behind CUDA, the way it helps builders remedy among the world’s most advanced issues — and the way techniques just like the NVIDIA DGX Spark desktop AI supercomputer will allow the subsequent technology of CUDA builders.
The group shared recollections of the early days of CUDA — when “no person needed GPUs,” mentioned Paulius Micikevicius, a software program engineer at Meta Superintelligence Labs. “We needed to go and beg them to think about using GPUs.”
Throughout that point, Wen-Mei Hwu, senior distinguished analysis scientist and senior analysis director at NVIDIA, then a professor on the College of Illinois Urbana-Champaign, determined to construct a 200-GPU system in two months with a bunch of grad college students.
“A few weeks later, 200 GPU boards arrived, and energy provide and the whole lot — however there’s no chassis. So we ended up constructing wooden frames for every of those boards … and we ran the Green500 [benchmark] and we bought No. 3,” Hwu mentioned. “That was the second I spotted that the vitality effectivity of GPUs has unimaginable potential.”
As the size of accelerated computing has shifted to rack-scale techniques and AI factories, the panelists see desktop AI techniques like DGX Spark as a brand new approach ahead for prototyping and early growth.
“So long as you’ve that functionality to try this preliminary exploration and one thing that matches in your desk or your lap, that’s the important factor,” mentioned Kate Clark, distinguished devtech engineer at NVIDIA. “I don’t see that going wherever anytime quickly. We’ll all the time have CUDA in all places.”
Monday, March 16, 1:30 p.m. PT 🔗
NVIDIA cuDF and cuVS Adopted by World’s Main Knowledge Platforms, Fueling Fashionable Enterprise Knowledge Processing
Enterprises are producing tons of of zettabytes every year, and organizations are racing to show that info into insights. NVIDIA cuDF and cuVS — accelerated knowledge libraries constructed on NVIDIA CUDA‑X — are being adopted by knowledge platforms throughout industries to ship as much as 5x quicker efficiency whereas lowering prices for structured and unstructured knowledge processing.
Built-in with the world’s most generally used open supply knowledge engines — downloaded over 200 million occasions month-to-month by builders — these libraries are harnessed throughout enterprise knowledge platforms, databases and knowledge lakes. This helps organizations speed up innovation, develop extra correct fashions and course of extra knowledge whereas managing prices.
For structured knowledge, NVIDIA cuDF accelerates open supply knowledge processing engines comparable to Apache Spark, Presto, DuckDB, Polars and Velox, delivering as much as 5x quicker processing in contrast with CPU-only deployments.
For unstructured knowledge — which represents 80% of right this moment’s enterprise knowledge and is rising quickly — NVIDIA cuVS accelerates main engines together with FAISS, Amazon OpenSearch Service and Milvus. This helps brokers and purposes extract context, info and proposals from huge shops of textual content, photographs and video in a fraction of the time.
Powering Enterprise Knowledge Processing Platforms
Google Cloud integrates NVIDIA cuDF to speed up Apache Spark inside Dataproc and cuDF might be simply used inside Google Kubernetes Engine (GKE) to cut back processing occasions for enormous ETL jobs from hours to seconds whereas decreasing compute prices.
At Snap, which serves greater than 946 million lively customers, NVIDIA cuDF on GKE minimize each day knowledge processing prices by 76%. This allows 10 petabytes of knowledge to be analyzed inside a three-hour window — saving thousands and thousands of {dollars}.
“Our collaboration with NVIDIA and Google Cloud helps us innovate quicker for greater than a billion Snapchatters worldwide,” mentioned Saral Jain, chief info officer of Snap. “By decreasing knowledge processing prices and scaling experiments throughout petabytes of knowledge, we’re delivering AI-powered experiences extra rapidly and effectively.”
IBM watsonx.knowledge is a hybrid, open knowledge platform that features open supply analytics engines comparable to Apache Spark and Presto engines for structured knowledge, and a vector engine based mostly on OpenSearch. In early experiments with Nestlé’s Order-to-Money mart, watsonx.knowledge with NVIDIA cuDF accelerated workloads ran 5 occasions quicker, with 83% decrease price financial savings.
“For a corporation that serves billions, knowledge underpins determination making throughout our world operations,” mentioned Chris Wright, chief info and digital officer of Nestlé. “Working with IBM and NVIDIA, a focused proof of idea has demonstrated the flexibility to refresh world operations knowledge in a couple of minutes and at decreased price. Our focus now’s on turning this functionality into tangible enterprise impression — additional enhancing determination velocity in areas comparable to manufacturing and warehousing, and scaling these capabilities throughout our enterprise.”
The Dell AI Knowledge Platform with NVIDIA consists of accelerated knowledge engines that allow enterprises to rapidly and securely activate their Dell AI Manufacturing unit with AI-ready knowledge. It options an Apache Spark-based processing engine accelerated with NVIDIA cuDF, delivering as much as 3x quicker efficiency, and an enterprise-grade vector database accelerated with NVIDIA cuVS, delivering as much as 12x greater throughput for vector indexing in contrast with CPUs.
“Function-built for agentic AI, the Dell AI Knowledge Platform with NVIDIA makes use of accelerated knowledge processing engines to make multimodal knowledge AI-ready in hours as an alternative of days,” mentioned Michael Dell, chairman and CEO of Dell Applied sciences.
Oracle introduced that Oracle Non-public AI Companies Container can tremendously speed up vector index creation in Oracle AI Database utilizing NVIDIA cuVS, serving to organizations velocity up AI-enabled choices with the newest info.
“Enterprise AI is transferring from experimentation to manufacturing,” mentioned Clay Magouyrk, CEO of Oracle. “Oracle AI Database with NVIDIA know-how delivers AI-ready knowledge inside minutes, enabling purposes that have been beforehand unimaginable.”
NVIDIA cuDF and cuVS are supported by main enterprise knowledge platforms together with EDB Postgres AI, NetApp, Snowflake, Starburst and VAST Knowledge — setting the muse for the AI‑powered future of knowledge processing.
Pc-Aided-Engineering 🔗
Monday, March 16, 1:30 p.m. PT 🔗
NVIDIA Launches cuEST for Accelerated Quantum Chemistry in Semiconductor Design
NVIDIA this week launched NVIDIA cuEST, a brand new NVIDIA CUDA-X library that shifts electronic-structure calculations onto GPUs. Utilized Supplies, Samsung, Synopsys and TSMC are among the many preliminary adopters.
A number one-edge chip now incorporates over 50 billion transistors. Engineering them requires answering elementary physics questions on the atomic scale: how electrons bond, how they migrate and the way they work together throughout movies only a few atoms thick.
“As semiconductor scaling reaches the bodily limits of supplies, the trade requires a large enhance in computing efficiency to simulate the quantum mechanics of next-generation chip designs,” mentioned Tim Costa, basic supervisor for industrial and computational engineering at NVIDIA. “With NVIDIA cuEST, trade leaders can transfer previous the quantum bottleneck and take high-fidelity chemical modeling instantly into manufacturing to speed up semiconductor innovation.”
Trade Impression
- Utilized Supplies: Utilized Supplies makes use of cuEST-accelerated density purposeful concept (DFT) to mannequin difficult constructions, predict materials properties and research response pathways.
- Samsung: Samsung built-in cuEST into its inner pipeline, already accelerated on GPUs, to ship one more as much as 5x end-to-end speedup for key quantum-chemistry workloads.
- Synopsys: Powered by cuEST and QuantumATK, Synopsys expanded its performance to incorporate Gaussian-basis DFT, accelerating simulations as much as 30x for semiconductor workflows.
- TSMC: TSMC makes use of cuEST’s accelerated quantum chemistry to advance processes for next-generation silicon design.
From the Lab to the Fab
The most typical technique for atomistic modeling is density purposeful concept. DFT gives a powerful stability between accuracy and scalability; nevertheless, its computational price has restricted its widespread use in trade, preserving most purposes confined to analysis. With cuEST, NVIDIA makes excessive‑accuracy quantum‑chemistry possible at an industrial scale and in actual manufacturing workflows.
Traditionally, the trade has relied on CPU clusters to run these simulations, evaluating candidate supplies, together with gate dielectrics and interconnect metals, one batch at a time over hours or days.
cuEST offers optimized routines so GPUs can speed up the core matrices of a Gaussian-basis DFT calculation, together with overlap, kinetic vitality, nuclear attraction, Coulomb and exchange-correlation. It additionally helps purposeful approximations starting from normal generalized gradient approximation to hybrid functionals, permitting engineers to stability computational price with accuracy.
NVIDIA’s purpose for cuEST: transferring high-fidelity materials modeling from the lab to the fab.
Study extra about cuEST by becoming a member of the NVIDIA demo sales space and Synopsys’ sales space at GTC, and dive deeper within the GTC session, “Subsequent-Technology Discovery: Agentic AI for Science, AI-Pushed Simulation and GPU-Accelerated Chemistry.”

