Inference Technique - Search News

Cursor Achieves 1.8x Inference Speedup on NVIDIA B200 GPUs

In testing on NVIDIA B200 hardware, the pipeline sustained 3.95 TB/s of memory bandwidth at a batch size of 32, or about 58% ...

InfoWorld

Google targets AI inference bottlenecks with TurboQuant

The technique aims to ease GPU memory constraints that limit how enterprises scale AI inference and long-context applications ...

MLCommons Releases New MLPerf Inference v6.0 Benchmark Results

Today, MLCommons ® announced new results for its industry-standard MLPerf ® Inference v6.0 benchmark suite. This release includes several important advances that ensure the benchmark suite tests ...

Geeky Gadgets

SteerLM a simple technique to customize LLMs during inference introduced by NVIDIA

Large language models (LLMs) have made significant strides in artificial intelligence (AI) natural language generation. Models such as GPT-3, Megatron-Turing, Chinchilla, PaLM-2, Falcon, and Llama 2 ...

11d

IndexCache, a new sparse attention optimizer, delivers 1.82x faster inference on long-context AI models

Researchers at Tsinghua University and Z.ai built IndexCache to eliminate redundant computation in sparse attention models ...

TechRepublic

DeepSeek-GRM: Introducing an Enhanced AI Reasoning Technique

Researchers from DeepSeek and Tsinghua University say combining two techniques improves the answers the large language model creates with computer reasoning techniques. Image: Envato/DC_Studio ...

Semiconductor Engineering

Review of Tools & Techniques for DL Edge Inference

A new technical paper titled “Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review” was published in “Proceedings of the IEEE” by researchers at University ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results