NVIDIA and Siemens Healthineers Are Teaching Ultrasound to Actually Listen

NVIDIA and Siemens Healthineers just dropped something that actually makes me excited about medical AI: NV-Raw2Insights-US. It’s a reconstruction model that skips the usual ultrasound image pipeline and works directly with raw sensor data. And no, this isn’t another “AI enhances images” press release—this is fundamentally different.

For the past few decades, ultrasound imaging has followed the same recipe: grab raw signals from the probe, run them through a hand-engineered beamforming pipeline, compress everything into a pretty image, and pretend the speed of sound is constant across all tissue. Which it isn’t. Sound moves faster through bone than fat, faster through muscle than blood. But traditional systems just pick one average number and call it a day.

NV-Raw2Insights-US says: screw that assumption. Instead of working from finished images, it learns directly from the raw channel data—the closest thing we have to how sound actually interacts with the body. The model figures out a personalized map of sound speed for each patient, then uses that map to refocus the image in real time. One AI pass replaces what used to be a complex, time-consuming optimization problem. That’s the kind of practical win I can get behind.

The team calls this approach Raw2Insights. It’s a shift from processing ultrasound images to understanding the physics of each individual patient. The model doesn’t just see pixels; it hears how sound bends and bounces through your specific tissue.

Getting raw ultrasound data out of a clinical scanner is not trivial. Those signals are high-bandwidth and most scanners don’t expose them. NVIDIA’s solution is Holoscan Sensor Bridge, an open-source FPGA IP that streams raw data over DisplayPort outputs from an ACUSON Sequoia scanner to an Altera Agilex-7 FPGA, then over Ethernet to an NVIDIA IGX system. They call this Data over DisplayPort, and it’s clever because it works with existing scanner hardware.

Once the data hits GPU memory—Blackwell-class GPU, naturally—NV-Raw2Insights-US runs inference and streams the sound-speed estimate back to the scanner. The whole loop happens live. No offline processing, no waiting for results.

The architecture is modular, which I appreciate. You could swap in different AI models for different tasks without touching the hardware. The system is software-defined, so improvements come via updates rather than new machines. That’s a big deal for hospitals that can’t swap out scanners every year.

Now, some honest skepticism: this is still investigational. The technology is under development, and we don’t have clinical trial data yet. The paper in IEEE Transactions on Medical Imaging is promising, but real-world validation is a different beast. Also, the hardware requirements—Blackwell GPU, IGX Thor, FPGA kit—are not cheap. This won’t show up in a rural clinic tomorrow.

But the direction is right. Learning from raw sensor data instead of reconstructed images reduces the errors baked into traditional assumptions. And the ability to adapt to each patient’s unique tissue properties is something we’ve needed for years. I’m curious to see how this evolves once more researchers get their hands on the model weights and dataset.

You can find the code on GitHub, download the model weights, and grab the dataset if you want to tinker. Links are in the original announcement. This is one of those rare AI papers where the idea is simple, the implementation is solid, and the potential impact is real.

NVIDIA and Siemens Healthineers Are Teaching Ultrasound to Actually Listen

Comments (0)