Document Details


2412.02595v2.pdf
Download View Text Delete
Clip: arXiv:2412.02595v2 [cs.CL] 30 May 2025 Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset Dan Su * , Kezhi Kong * , Ying Lin * , Joseph Jennings, Brandon Norick, Markus Kliegl † ,Mostofa Patwary,Mohammad Shoeybi,Bryan Catanzaro NVIDIA *
Filename: 2412.02595v2.pdf
Filetype: application/pdf
Size: 400612 bytes
Uploaded On: 2025-10-24
Abstract:
Summary:
Tags:
Notes:
Visible: 1
Status: Parsed
Author: Dan Su; Kezhi Kong; Ying Lin; Joseph Jennings; Brandon Norick; Markus Kliegl; Mostofa Patwary; Mohammad Shoeybi; Bryan Catanzaro
Creator: arXiv GenPDF (tex2pdf:)
DOI: https://doi.org/10.48550/arXiv.2412.02595
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
PTEX.Fullbanner: This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5
Producer: pikepdf 8.15.1
Title: Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset
Trapped: False
ArXivID: https://arxiv.org/abs/2412.02595v2
Pages: 17

Return to Document Library