Nvidia’s new software doubles inference speed on H100 GPUs
Nvidia has announced a new open source software package aimed at drastically improving the performance of large language model inference on its latest GPU accelerators, H100. Inference speed refers to the rate at which a trained machine learning model can […]