TensorRT Inference - Search News

AI inference costs dropped up to 10x on Nvidia's Blackwell — but hardware is only half the equation

New deployment data from four inference providers shows where the savings actually come from — and what teams should evaluate ...

IT-Online

Tokenomics and how inference providers are cutting AI costs

A diagnostic insight in healthcare. A character’s dialogue in an interactive game. An autonomous resolution from a customer service agent. Each of these AI-powered interactions is built on the same ...

Network World

Nvidia claims 10x cost savings with open-source inference models

Nvidia noted that cost per token went from 20 cents on the older Hopper platform to 10 cents on Blackwell. Moving to Blackwell’s native low-precision NVFP4 format further reduced the cost to just 5 ...

NVIDIA Shows Blackwell Slashing AI Inference Costs By 10X With Open Models

Achieving that 10x cost reduction is challenging, though, and it requires a huge up-front expenditure on Blackwell hardware.

VentureBeat

Nvidia speeds up deep learning inference processing

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Nvidia announced today that it has launched ...

Datacenter Dynamics

Nvidia's TensorRT integrated into Google's TensorFlow framework

At its GPU Technology Conference, Nvidia announced several partnerships and launched updates to its software platforms that it claims will expand the potential inference market to 30 million ...

CRN

Nvidia Says New Software Will Double LLM Inference Speed On H100 GPU

The AI chip giant says the open-source software library, TensorRT-LLM, will double the H100’s performance for running inference on leading large language models when it comes out next month. Nvidia ...

12d

The $20 Billion Bet On Inference: What Every AI Infrastructure Team Needs To Get Right

Every ChatGPT query, every AI agent action, every generated video is based on inference. Training a model is a one-time ...

ZDNet

Nvidia outlines inference platform, lands Japan's industrial giants as AI, robotics customers

Nvidia launched a hyperscale data center platform that combines the Tesla T4 GPU, TensorRT software and the Turing architecture to provide inference acceleration for voice, video and image ...

The Next Platform

Optimizing AI Inference Is As Vital As Building AI Training Beasts

The history of computing teaches us that software always and necessarily lags hardware, and unfortunately that lag can stretch for many years when it comes to wringing the best performance out of iron ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results