• CONTACT
  • Privacy Policy
  • Blog
  • Terms & Conditions
  • About Us
Crypto Tag News
  • Home
  • Blockchain
  • Crypto
    • Bitcoin
    • Ethereum
    • Forex
    • Tether
  • Market
    • Binance
    • Business
    • Investor
    • Money
    • Trading
Reading: NVIDIA Surpasses 1,000 TPS/User with Llama 4 Maverick and Blackwell GPUs
Share
  • bitcoinBitcoin(BTC)$106,975.00
  • ethereumEthereum(ETH)$2,472.61
  • tetherTether(USDT)$1.00
  • rippleXRP(XRP)$2.22
  • binancecoinBNB(BNB)$653.26
  • solanaSolana(SOL)$152.66
  • usd-coinUSDC(USDC)$1.00
  • tronTRON(TRX)$0.279531
  • dogecoinDogecoin(DOGE)$0.163823
  • staked-etherLido Staked Ether(STETH)$2,470.86
Crypto Tag NewsCrypto Tag News
Aa
  • Home
  • Blockchain
  • Crypto
  • Market
Search
  • Home
  • Blockchain
  • Crypto
    • Bitcoin
    • Ethereum
    • Forex
    • Tether
  • Market
    • Binance
    • Business
    • Investor
    • Money
    • Trading
Have an existing account? Sign In
Follow US
© Crypto Tag NEWS. All Rights Reserved.
Crypto Tag News > Blog > Market > NVIDIA Surpasses 1,000 TPS/User with Llama 4 Maverick and Blackwell GPUs
Market

NVIDIA Surpasses 1,000 TPS/User with Llama 4 Maverick and Blackwell GPUs

snifferius
Last updated: 2025/05/23 at 2:36 AM
snifferius Published May 23, 2025
Share


Contents
Technological AdvancementsOptimization TechniquesImportance of Low LatencyCuda Kernel and Speculative DecodingProgrammatic Dependent Launch


Lawrence Jengar
May 23, 2025 02:10

NVIDIA achieves a world-record inference speed of over 1,000 TPS/user using Blackwell GPUs and Llama 4 Maverick, setting a new standard for AI model performance.



NVIDIA Surpasses 1,000 TPS/User with Llama 4 Maverick and Blackwell GPUs

NVIDIA has set a new benchmark in artificial intelligence performance with its latest achievement, breaking the 1,000 tokens per second (TPS) per user barrier using the Llama 4 Maverick model and Blackwell GPUs. This accomplishment was independently verified by the AI benchmarking service Artificial Analysis, marking a significant milestone in large language model (LLM) inference speed.

Technological Advancements

The breakthrough was achieved on a single NVIDIA DGX B200 node equipped with eight NVIDIA Blackwell GPUs, which managed to handle over 1,000 TPS per user on the Llama 4 Maverick, a 400-billion-parameter model. This performance makes Blackwell the optimal hardware for deploying Llama 4, either for maximizing throughput or minimizing latency, reaching up to 72,000 TPS/server in high throughput configurations.

Optimization Techniques

NVIDIA implemented extensive software optimizations using TensorRT-LLM to fully utilize the Blackwell GPUs. The company also trained a speculative decoding draft model using EAGLE-3 techniques, resulting in a fourfold speed increase compared to previous baselines. These enhancements maintain response accuracy while boosting performance, leveraging FP8 data types for operations like GEMMs and Mixture of Experts, ensuring accuracy comparable to BF16 metrics.

Importance of Low Latency

In generative AI applications, balancing throughput and latency is crucial. For critical applications requiring rapid decision-making, NVIDIA’s Blackwell GPUs excel by minimizing latency, as demonstrated by the TPS/user record. The hardware’s ability to handle high throughput and low latency makes it ideal for various AI tasks.

Cuda Kernel and Speculative Decoding

NVIDIA optimized CUDA kernels for GEMMs, MoE, and Attention operations, utilizing spatial partitioning and efficient memory data loading to maximize performance. Speculative decoding was employed to accelerate LLM inference speed by using a smaller, faster draft model to predict speculative tokens, verified by the larger target LLM. This approach yields significant speed-ups, particularly when the draft model’s predictions are accurate.

Programmatic Dependent Launch

To further enhance performance, NVIDIA utilized Programmatic Dependent Launch (PDL) to reduce GPU idle time between consecutive CUDA kernels. This technique allows overlapping kernel execution, improving GPU utilization and eliminating performance gaps.

NVIDIA’s achievements underscore its leadership in AI infrastructure and data center technology, setting new standards for speed and efficiency in AI model deployment. The innovations in Blackwell architecture and software optimization continue to push the boundaries of what’s possible in AI performance, ensuring responsive, real-time user experiences and robust AI applications.

For more detailed information, visit the NVIDIA official blog.

Image source: Shutterstock


You Might Also Like

TradFi Could Eye Blockchain Due To Banking Frustration

71% of Koreans Want to Buy More Crypto: Survey

Entergy utility subsidiaries elect new directors following written consent

$12,000/Month Cash Flow by Cracking the Rental “Formula”

20 Companies With Permanent Remote Jobs

TAGGED: Blackwell, GPUs, Llama, Maverick, Nvidia, Surpasses, TPSUser

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share this Article
Facebook Twitter Email Copy Link Print
Previous Article Sam Altman’s Worldcoin Raises $135M—WLD Token Jumps 15%
Next Article H100 Group Became The First Publicly Listed Bitcoin Treasury Company In Sweden
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Follow US

Find US on Socials
Facebook Like
Twitter Follow
Youtube Subscribe
Telegram Follow

Subscribe to our newslettern

Get Newest Articles Instantly!

- Advertisement -
Ad image
Popular News
AMZN Elliott Wave technical analysis [Video]
Understanding Bitcoin: A Beginner’s Guide to the World of Cryptocurrency
Exploring the Impact of Cryptocurrency Regulations on Global Finance

Follow Us on Socials

We use social media to react to breaking news, update supporters and share information

Twitter Youtube Telegram Linkedin
Crypto Tag News

We influence 20 million users and is the number one business blockchain and crypto news network on the planet.

Subscribe to our newsletter

You can be the first to find out the latest news and tips about trading, markets...

Ad image

© Crypto Tag NEWS. All Rights Reserved.

Removed from reading list

Undo
Welcome Back!

Sign in to your account

Lost your password?