Document Details
Clip:
Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches Yishun Lu 1 , Wesley Armour 1 1 Department of Engineering Science University of Oxford Oxford, UK Abstract Modern GPUs are equipped with large amounts of high- bandwidth memory, enabling them to support mini-batch sizes of up to tens of thousands of training samples. How- ever, most existing optimizers struggle to perform effectively at such a large batch size. As batch size increases, gradient noise decreases due to averaging over many samples, lim-
Filename:
2508.13898v2.pdf
Filetype:
application/pdf
Size:
618928 bytes
Uploaded On:
2025-10-24
Abstract:
Summary:
Tags:
Notes:
Visible:
1
Status:
Parsed
Author:
Yishun Lu; Wesley Armour
Creator:
arXiv GenPDF (tex2pdf:)
DOI:
https://doi.org/10.48550/arXiv.2508.13898
License:
http://creativecommons.org/licenses/by-nc-sa/4.0/
PTEX.Fullbanner:
This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5
Producer:
pikepdf 8.15.1
TemplateVersion:
2026.1
Title:
Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches
Trapped:
False
ArXivID:
https://arxiv.org/abs/2508.13898v2
Pages:
15
Return to Document Library