Document Library
-
wang-probabilistic-address-crf.pdf
Wang, Minlue; Haberland, Valeriia; Yeo, Amos; Martin, Andrew; Howroyd, John and Bishop, Mark. 2016. &ampFiletype: application/pdfSize: 286140 bytesUploaded On: 2024-01-20Status: UploadedClip: Wang, Minlue; Haberland, Valeriia; Yeo, Amos; Martin, Andrew; Howroyd, John and Bishop, Mark. 2016. 'A Probabilistic Address Parser Using Conditional Random Fields and Stochastic Regular Grammar'. In: IEEE 2016 International Conference on Data Mining. Barcelona, Spain Dec 12 - 15, 2016. [Conference or Workshop Item] http://research.gold.ac.uk/19514/The version presented here may differ from the published, performed or presented work. Please go to the persistent GRO record above for more information. If you believe that any material held in the repository infringes copyright law, please contact the Repository Team at Goldsmiths, University of London via the following email address: gro@gold.ac.uk. The item will be removed from the repository while any claim is being investigated. For more information, please contact the GRO team: gro@gold.ac.uk A Probabilistic Address Parser using Conditional Random Fields and Stochastic Regular Grammar Minlue Wang, Valeriia Haberland, Amos Yeo,- differentiable-genetic-programming.pdf
Dierentiable Genetic Programming Dario Izzo 1 , Francesco Biscani 2 , and Alessio Mereta 1 Advanced Concepts Team, European SpaFiletype: application/pdfSize: 479217 bytesUploaded On: 2024-01-20Status: UploadedClip: Dierentiable Genetic Programming Dario Izzo 1 , Francesco Biscani 2 , and Alessio Mereta 1 Advanced Concepts Team, European Space Agency, Noordwijk 2201AZ, The Netherlands dario.izzo@esa.int Abstract.We introduce the use of high order automatic dierentiation, implemented via the algebra of truncated Taylor polynomials, in genetic programming. Using the Cartesian Genetic Programming encoding we obtain a high-order Taylor representation of the program output that is then used to back-propagate errors during learning. The resulting ma-- 12320-56302-1-PB.pdf
Bidirectional Search That Is Guaranteed to Meet in the Middle Robert C. Holte Computing Science Dept. University of Alberta EdmoFiletype: application/pdfSize: 867298 bytesUploaded On: 2024-01-20Status: UploadedClip: Bidirectional Search That Is Guaranteed to Meet in the Middle Robert C. Holte Computing Science Dept. University of Alberta Edmonton, Canada T6G 2E8 (rholte@ualberta.ca) Ariel Felner ISE Department Ben-Gurion University Be’er-Sheva, Israel (felner@bgu.ac.il) Guni Sharon ISE Department Ben-Gurion University Be’er-Sheva, Israel- 4965-22627-1-PB.pdf
Learning SVM Classiers with Indenite Kernels Suicheng GuandYuhong Guo Department of Computer and Information Sciences Temple UFiletype: application/pdfSize: 619733 bytesUploaded On: 2024-01-20Status: UploadedClip: Learning SVM Classiers with Indenite Kernels Suicheng GuandYuhong Guo Department of Computer and Information Sciences Temple University Philadelphia, PA 19122, USA yuhong@temple.edu Abstract Recently, training support vector machines with indef- inite kernels has attracted great attention in the ma- chine learning community. In this paper, we tackle this problem by formulating a joint optimization model over SVM classications and kernel principal compo- nent analysis. We rst reformulate the kernel principal component analysis as a general kernel transformation framework, and then incorporate it into the SVM clas-- 16647.full.pdf
PERSPECTIVE Meal frequency and timing in health and disease Mark P. Mattson a,b,1 , David B. Allison c , Luigi Fontana d,e,f , MFiletype: application/pdfSize: 995400 bytesUploaded On: 2024-01-20Status: UploadedClip: PERSPECTIVE Meal frequency and timing in health and disease Mark P. Mattson a,b,1 , David B. Allison c , Luigi Fontana d,e,f , Michelle Harvie g , Valter D. Longo h , Willy J. Malaisse i- 1912.02164.pdf
PLUG ANDPLAYLANGUAGEMODELS:ASIMPLE APPROACH TOCONTROLLEDTEXTGENERATION Sumanth Dathathri CMS, Caltech Andrea Madotto HKUSTFiletype: application/pdfSize: 583717 bytesUploaded On: 2024-01-20Status: UploadedClip: PLUG ANDPLAYLANGUAGEMODELS:ASIMPLE APPROACH TOCONTROLLEDTEXTGENERATION Sumanth Dathathri CMS, Caltech Andrea Madotto HKUST Janice Lan Uber AI Jane Hung Uber AI Eric Frank Uber AI Piero Molino- 1901.02878v1.pdf
A Constructive Approach for One-Shot Training of Neural Networks Using Hypercube-Based Topological Coverings W. Brent Daniel, EnFiletype: application/pdfSize: 2970753 bytesUploaded On: 2024-01-20Status: UploadedClip: A Constructive Approach for One-Shot Training of Neural Networks Using Hypercube-Based Topological Coverings W. Brent Daniel, Enoch Yeung Abstract In this paper we presented a novel constructive approach for training deep neural networks using geometric approaches. We show that a topological covering can be used to dene a class of distributed linear matrix inequalities, which in turn directly specify the shape and depth of a neural network architecture. The key insight is a fundamental relationship between linear matrix inequalities and their ability to bound the shape of data, and the rectied linear unit (ReLU) activation function employed in modern neural networks. We show that unit cover geometry and cover porosity are two design variables in cover-constructive learning that play a critical role in dening the complexity of the model and generalizability of the resulting- 697e76cf94e74ef34596d15b255c9feac2e2.pdf
Filetype: application/pdfSize: 3121481 bytesUploaded On: 2024-01-20Status: UploadedClip:- 8d4f27771fa0d882feaf780ae2b1d1dfb5b9b66f.pdf
Published as a conference paper at ICLR 2019 PROXYLESSNAS: DIRECTNEURALARCHITECTURE SEARCH ONTARGETTASK ANDHARDWARE Han Cai, LigFiletype: application/pdfSize: 1978719 bytesUploaded On: 2024-01-20Status: UploadedClip: Published as a conference paper at ICLR 2019 PROXYLESSNAS: DIRECTNEURALARCHITECTURE SEARCH ONTARGETTASK ANDHARDWARE Han Cai, Ligeng Zhu, Song Han Massachusetts Institute of Technology fhancai, ligeng, songhan g@mit.edu ABSTRACT Neural architecture search (NAS) has a great impact by automatically designing effective neural network architectures. However, the prohibitive computational demand of conventional NAS algorithms (e.g.10 4 GPU hours) makes it difcult todirectlysearch the architectures on large-scale tasks (e.g. ImageNet). Differen- tiable NAS can reduce the cost of GPU hours via a continuous representation of network architecture but suffers from the high GPU memory consumption issue- spl.pdf
1 Abstract—Face detection and alignment in unconstrained en- vironment are challenging due to various poses, illuminationsFiletype: application/pdfSize: 5364846 bytesUploaded On: 2024-01-20Status: UploadedClip: 1 Abstract—Face detection and alignment in unconstrained en- vironment are challenging due to various poses, illuminations and occlusions. Recent studies show that deep learning approaches can achieve impressive performance on these two tasks. In this paper, we propose a deep cascaded multi-task framework which exploits the inherent correlation between detection and alignment to boost up their performance. In particular, our framework leverages a cascaded architecture with three stages of carefully designed deep convolutional networks to predict face and land- mark location in a coarse-to-fine manner. In addition, we propose a new online hard sample mining strategy that further improves the performance in practice. Our method achieves superior ac- curacy over the state-of-the-art techniques on the challenging FDDB and WIDER FACE benchmarks for face detection, and- 1904.09658.pdf
Probabilistic Face Embeddings Yichun Shi and Anil K. Jain Michigan State University, East Lansing, MI shiyichu@msu.edu, jain@cseFiletype: application/pdfSize: 1995504 bytesUploaded On: 2024-01-20Status: UploadedClip: Probabilistic Face Embeddings Yichun Shi and Anil K. Jain Michigan State University, East Lansing, MI shiyichu@msu.edu, jain@cse.msu.edu Abstract Embedding methods have achieved success in face recognition by comparing facial features in a latent seman- tic space. However, in a fully unconstrained face setting, the facial features learned by the embedding model could be ambiguous or may not even be present in the input face, leading to noisy representations. We propose Probabilistic Face Embeddings (PFEs), which represent each face image as a Gaussian distribution in the latent space. The mean of the distribution estimates the most likely feature values while the variance shows the uncertainty in the feature val-- 1602.07360.pdf
Under review as a conference paper at ICLR 2017 SQUEEZENET: ALEXNET-LEVEL ACCURACY WITH 50X FEWER PARAMETERS AND &amFiletype: application/pdfSize: 924939 bytesUploaded On: 2024-01-21Status: UploadedClip: Under review as a conference paper at ICLR 2017 SQUEEZENET: ALEXNET-LEVEL ACCURACY WITH 50X FEWER PARAMETERS AND <0.5MBMODEL SIZE Forrest N. Iandola 1 , Song Han 2 , Matthew W. Moskewicz 1 , Khalid Ashraf 1 , William J. Dally 2 , Kurt Keutzer- 1503.03832.pdf
FaceNet: A Unied Embedding for Face Recognition and Clustering Florian Schroff fschroff@google.com Google Inc. Dmitry KalenicheFiletype: application/pdfSize: 4705274 bytesUploaded On: 2024-01-21Status: UploadedClip: FaceNet: A Unied Embedding for Face Recognition and Clustering Florian Schroff fschroff@google.com Google Inc. Dmitry Kalenichenko dkalenichenko@google.com Google Inc. James Philbin jphilbin@google.com Google Inc. Abstract Despite signicant recent advances in the eld of face recognition [10,,,], implementing face verication and recognition efciently at scale presents serious chal- lenges to current approaches. In this paper we present a- Chudasama_TherISuRNet_-_A_Computationally_Efficient_Thermal_Image_Super-Resolution_Network_CVPRW_2020_paper.pdf
TherISuRNet - A Computationally Efficient Thermal Image Super-Resolution Network Vishal Chudasama 1 , Heena Patel 1 , Kalpesh PrFiletype: application/pdfSize: 2626906 bytesUploaded On: 2024-01-21Status: UploadedClip: TherISuRNet - A Computationally Efficient Thermal Image Super-Resolution Network Vishal Chudasama 1 , Heena Patel 1 , Kalpesh Prajapati 1 , Kishor Upla 1, 2 , Raghavendra Ramachandra 2 , Kiran Raja 2- acm-data-cleaning.pdf
ACM Books is a series of high-quality books published by ACM for the computer science community. ACM Books publications are wiFiletype: application/pdfSize: 15219192 bytesUploaded On: 2024-01-21Status: UploadedClip: ACM Books is a series of high-quality books published by ACM for the computer science community. ACM Books publications are widely distributed in print and digital formats by major booksellers and are available to libraries and library consortia. Individual ACM members may access ACM Books publications via separate annual subscription. BOOKS.ACM.ORG • WWW.MORGANCLAYPOOLPUBLISHERS.COM ABOUT ACM BOOKS Data quality is one of the most important problems in data management, since dirty data often leads to inaccurate data analytics results and incorrect business decisions. Poor data across businesses and the U.S. government are reported to cost trillions of dollars a year. Multiple surveys show that dirty data is the most common barrier faced by data scientists. Not surprisingly, developing effective and efficient data cleaning solutions- (Ancient Cultures) Markham J. Geller - Ancient Babylonian Medicine_ Theory and Practice (Ancient Cultures)-Wiley-Blackwell (2010).pdf
A John Wiley &amFiletype: application/pdfSize: 1916118 bytesUploaded On: 2024-01-21Status: UploadedClip: A John Wiley & Sons, Ltd., Publication Markham J. Geller Ancient Babylonian Medicine Theory and Practice- (Osprey Elite (Book 204)) Raffaele D'Amato, Andrea Salimbeti - Sea Peoples of the Bronze Age Mediterranean c.1400 BC-1000 BC-Osprey Publishing (2015).pdf
Filetype: application/pdfSize: 4594867 bytesUploaded On: 2024-01-21Status: UploadedClip:- (Ancient Magic and Divination 9) Matthew Rutz - Bodies of Knowledge in Ancient Mesopotamia_ The Diviners of Late Bronze Age Emar and Their Tablet Collection-Brill Academic Publishers (2013).pdf
Bodies of Knowledge in Ancient Mesopotamia Ancient Magic and Divination Editors tzvi abusch – ann k. guinan – nils p. heeFiletype: application/pdfSize: 3778319 bytesUploaded On: 2024-01-21Status: UploadedClip: Bodies of Knowledge in Ancient Mesopotamia Ancient Magic and Divination Editors tzvi abusch – ann k. guinan – nils p. heesel francesca rochberg – frans a. m. wiggermann Volume 9 The titles published in this series are listed at brill.com/amd- (Osprey Elite (Book 204)) Raffaele D'Amato, Andrea Salimbeti - Sea Peoples of the Bronze Age Mediterranean c.1400 BC-1000 BC-Osprey Publishing (2015).pdf
Filetype: application/pdfSize: 4594867 bytesUploaded On: 2024-01-21Status: UploadedClip:- James Robinson - Nag Hammadi Library in English_ The Definitive Translation of the Gnostic Scriptures. Complete in One Volume-HarperOne (2000).pdf
Filetype: application/pdfSize: 23168205 bytesUploaded On: 2024-01-21Status: ParsedClip: ! " " " # $ % ! & # ' ( ( !) * # % !& * " + !" + % ! & #- Yogi Ramacharaka - Advanced Course in Yogi Philosophy and Oriental Occultism-Yoga Publication Society (1980).pdf
Ramacharaka, Yogi Advanced course in yogi philosophy and oriental occultism A Collection of Sacred-Magick.Com &amFiletype: application/pdfSize: 11717651 bytesUploaded On: 2024-01-21Status: ParsedClip: Ramacharaka, Yogi Advanced course in yogi philosophy and oriental occultism A Collection of Sacred-Magick.Com < The Esoteric Library Symbole applicable pour tout, ou partie des documents microfilm4s Texte d6t6rior6 - reliure d6fectueuse NF Z 43-120-1 1 Sym bole applicable pour tout, ou partie des documents microfilm& Original illisible NF Z 43-120-10 A Collection of Sacred-Magick.Com < The Esoteric Library- Kenneth Grant - Hecate's Fountain-Skoob Books (1993).pdf
Filetype: application/pdfSize: 10859336 bytesUploaded On: 2024-01-21Status: ParsedClip:- Khrulkov_Hyperbolic_Image_Embeddings_CVPR_2020_paper.pdf
Hyperbolic Image Embeddings Valentin Khrulkov 1,4*Leyla Mirvakhabova 1*Evgeniya Ustinova 1 Ivan Oseledets 1,2 Victor Lempitsky 1Filetype: application/pdfSize: 1929454 bytesUploaded On: 2024-01-21Status: ParsedClip: Hyperbolic Image Embeddings Valentin Khrulkov 1,4*Leyla Mirvakhabova 1*Evgeniya Ustinova 1 Ivan Oseledets 1,2 Victor Lempitsky 1,3 Skolkovo Institute of Science and Technology (Skoltech), Moscow 1 Institute of Numerical Mathematics of the Russian Academy of Sciences, Moscow 2 Samsung AI Center, Moscow 3- Magus Incognito, Paul Tice - The Secret Doctrine of the Rosicrucians-Book Tree (2000).pdf
The Secret Doctrine of The Rosicrucians Illustrated with The Secret Rosicrucian Symbols By MAGUS INCOGNITO ADVANCED THOUGHFiletype: application/pdfSize: 927563 bytesUploaded On: 2024-01-21Status: ParsedClip: The Secret Doctrine of The Rosicrucians Illustrated with The Secret Rosicrucian Symbols By MAGUS INCOGNITO ADVANCED THOUGHT PUBLISHING CO. 159 N. State St., Chicago, Ill. L. N. FOWLER & CO. 7 Imperial Arcade, Ludgate Circus, London, Eng. [1918]- Dating_the_war_of_the_Hyksos.pdf
Dating the war of the Hyksos Abstract. The three Hyksos dynasties (XIV, XV, XVI) ruled Egypt approximately from 1750 to 1530Filetype: application/pdfSize: 12270207 bytesUploaded On: 2024-01-21Status: ParsedClip: Dating the war of the Hyksos Abstract. The three Hyksos dynasties (XIV, XV, XVI) ruled Egypt approximately from 1750 to 1530 BCE and then disappear abruptly after the death of Pharaoh Seqenenre Taa. One can notice that Egyptian documents unanimously describe the departure of the Hyksos from Egypt to Palestine in a disaster. Modern Egyptologists pictured a ‘war of the Hyksos’, however no document speaks of war but only that Avaris, Hyksos' capital, was looted and vandalized after their departure. Moreover all accounts of former historians picture the Hyksos as the ancestors of the Hebrews, led into Palestine under the leadership of Moses. In addition both biblical and Egyptian chronologies date the Hyksos departure in 1533 BCE, which implies the coincidence of these two dramatic events. The only way to date the so-called “Hyksos' war” is: gathering all historical and archaeological documents about the Hyksos, establishing a relative chronology of the “Hyksos' war”, identifying who was Apopi and his links with the biblical Moses, determining from where came the Hyksos and where did they go, dating the Hyksos war according to the Egyptian chronology through synchronisms dated by astronomy and dating the Exodus according to the Israelite chronology (based on masoretical text) checked by absolute dates.- 2208.12262.pdf
Filetype: application/pdfSize: 1523217 bytesUploaded On: 2024-01-21Status: Large FileClip:- 2311.12908.pdf
Filetype: application/pdfSize: 12999512 bytesUploaded On: 2024-01-21Status: Large FileClip:- 2305.01569.pdf
Filetype: application/pdfSize: 6833945 bytesUploaded On: 2024-01-21Status: Large FileClip:- holographic_neural_architectures.pdf
Filetype: application/pdfSize: 3422924 bytesUploaded On: 2024-01-21Status: Large FileClip:- mathematics_cheat_sheet.pdf
Filetype: application/pdfSize: 4660997 bytesUploaded On: 2024-01-21Status: Large FileClip:- impact_of_a_nonrestrictive_satiating_diet_on_anthropometrics_satiety_responsiveness_and_eating_behaviour_traits_in_obese_men_displaying_a_high_or_a_low_satiety_phenotype.pdf
Impact of a non-restrictive satiating diet on anthropometrics, satiety responsiveness and eating behaviour traits in obese men dFiletype: application/pdfSize: 494138 bytesUploaded On: 2024-01-21Status: ParsedClip: Impact of a non-restrictive satiating diet on anthropometrics, satiety responsiveness and eating behaviour traits in obese men displaying a high or a low satiety phenotype Hélène Arguin 1 , Angelo Tremblay 1,2,3 , John E. Blundell 4 , Jean-Pierre Després 1,2 , Denis Richard 2 , Benoît Lamarche- IJAM_48_2_11.pdf
The Quaternion Domain Fourier Transform and its Application in Mathematical Statistics Mawardi Bahri, Amir Kamal Amir, ResnawatiFiletype: application/pdfSize: 946650 bytesUploaded On: 2024-01-21Status: ParsedClip: The Quaternion Domain Fourier Transform and its Application in Mathematical Statistics Mawardi Bahri, Amir Kamal Amir, Resnawati, and Chrisandi Lande AbstractRecently a generalization of the quaternion Fourier transform over quaternion domains so-called the quaternion domain Fourier transform (QDFT) has been in- troduced, including its properties such as shift, modulation, convolution theorem and uncertainty principle. In the present paper we explore more properties of the QDFT such as the correlation and product theorems and propose its application in probability theory and mathematical statistics. Index Termsquaternion domain Fourier transform, quater- nion random variable I. INTRODUCTION It is well known that in signal and image processing, the- Heat traps in installations 20120508.pdf
ESBE AB · BRUKSGATAN 22 · SE-330 21 REFTELE · TEL: +46 (0)371 570 000 · FAX: +46 (0)371 570 010 · SALES@ESBE.SE · WWW.ESBFiletype: application/pdfSize: 202054 bytesUploaded On: 2024-01-21Status: ParsedClip: ESBE AB · BRUKSGATAN 22 · SE-330 21 REFTELE · TEL: +46 (0)371 570 000 · FAX: +46 (0)371 570 010 · SALES@ESBE.SE · WWW.ESBE.SE VAT.NO. SE556269631901 · BANK: NORDEA · ACCOUNT: 4217-101200-1 · SWIFT NDEASESS · IBAN: SE65 3000 0000 0421 7101 2001 Reftele, Sweden, May 30, 2012 Have we forgotten to make heat traps? The use of heat traps by suitable piping to stop the natural circulation is a former well‐known knowledge, but seems to have been forgotten in many of todays installations. This is often confirmed when we follow up different installations in the field. Natural circulation leads to additional heat loss, but- glx137.pdf
Filetype: application/pdfSize: 1840284 bytesUploaded On: 2024-01-21Status: Large FileClip:- journal.pcbi.1002863.PDF
Filetype: application/pdfSize: 3832924 bytesUploaded On: 2024-01-21Status: Large FileClip:- journal.pcbi.1002606.PDF
Filetype: application/pdfSize: 1414201 bytesUploaded On: 2024-01-21Status: Large FileClip:- Jeong_et_al-2013-STEM_CELLS.pdf
Filetype: application/pdfSize: 1252860 bytesUploaded On: 2024-01-21Status: Large FileClip:- ja8b02322_si_001.pdf
Filetype: application/pdfSize: 3452462 bytesUploaded On: 2024-01-21Status: Large FileClip:- J. Biol. Chem.-2003-Tuli-41227-36.pdf
Transforming Growth Factor--mediated Chondrogenesis of Human Mesenchymal Progenitor Cells Involves N-cadherin and Mitogen- actiFiletype: application/pdfSize: 510132 bytesUploaded On: 2024-01-21Status: ParsedClip: Transforming Growth Factor--mediated Chondrogenesis of Human Mesenchymal Progenitor Cells Involves N-cadherin and Mitogen- activated Protein Kinase and Wnt Signaling Cross-talk* Received for publication, May 20, 2003, and in revised form, July 30, 2003 Published, JBC Papers in Press, July 31, 2003, DOI 10.1074/jbc.M305312200 Richard Tuli‡§¶, Suraj Tuli‡, Sumon Nandi‡, Xiaoxue Huang‡, Paul A. Manner‡ , William J. Hozack§, Keith G. Danielson§, David J. Hall‡, and Rocky S. Tuan‡§ ** From the‡Cartilage Biology and Orthopaedics Branch, NIAMS, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland 20892, the§Department of Orthopaedic Surgery, Thomas Jefferson University, Philadelphia, Pennsylvania 19107, and the Department of Orthopaedic Surgery, George Washington University, Washington, D. C. 20037 The multilineage differentiation potential of adult tissue-derived mesenchymal progenitor cells (MPCs),- lyophilzation of BM.pdf
Zhang et al. / J Zhejiang Univ-Sci B (Biomed &amFiletype: application/pdfSize: 202279 bytesUploaded On: 2024-01-21Status: ParsedClip: Zhang et al. / J Zhejiang Univ-Sci B (Biomed & Biotechnol) 2010 11(11):889-894 889 Preliminary study on the freeze-drying of human bone marrow-derived mesenchymal stem cells * Shao-zhi ZHANG 1 , Huan QIAN 2 , Zhen WANG- kim_sp_2019.pdf
Filetype: application/pdfSize: 1109122 bytesUploaded On: 2024-01-21Status: Large FileClip:- srep39302.pdf
Filetype: application/pdfSize: 1981955 bytesUploaded On: 2024-01-21Status: Large FileClip:- s41598-017-09892-w.pdf
Filetype: application/pdfSize: 2610133 bytesUploaded On: 2024-01-21Status: Large FileClip:- nihms314235.pdf
Host–Bacterial Symbiosis in Health and Disease Janet Chow 1 , S. Melanie Lee 1 , Yue Shen 1 , Arya Khosravi 1 , and Sarkis K.Filetype: application/pdfSize: 375975 bytesUploaded On: 2024-01-21Status: ParsedClip: Host–Bacterial Symbiosis in Health and Disease Janet Chow 1 , S. Melanie Lee 1 , Yue Shen 1 , Arya Khosravi 1 , and Sarkis K. Mazmanian Division of Biology, California Institute of Technology, Pasadena, California, USA Abstract All animals live in symbiosis. Shaped by eons of co-evolution, host-bacterial associations have developed into prosperous relationships creating mechanisms for mutual benefits to both microbe and host. No better example exists in biology than the astounding numbers of bacteria harbored by- vectorcalculus.pdf
Filetype: application/pdfSize: 43549037 bytesUploaded On: 2024-01-21Status: Large FileClip:- 2312.14135.pdf
V ∗ : Guided Visual Search as a Core Mechanism in Multimodal LLMs Penghao Wu † UC San Diego pew011@ucsd.edu Saining Xie NewFiletype: application/pdfSize: 5324146 bytesUploaded On: 2024-01-22Status: ParsedClip: V ∗ : Guided Visual Search as a Core Mechanism in Multimodal LLMs Penghao Wu † UC San Diego pew011@ucsd.edu Saining Xie New York University saining.xie@nyu.edu[targets located] V*: LLM‐guided Search Visual Working Memory (VWM) VQA- pdf
Under review as a conference paper at ICLR 2024 SDXL:IMPROVINGLATENTDIFFUSIONMODELS FOR HIGH-RESOLUTIONIMAGESYNTHESIS AnonymousFiletype: application/pdfSize: 16271160 bytesUploaded On: 2024-01-22Status: ParsedClip: Under review as a conference paper at ICLR 2024 SDXL:IMPROVINGLATENTDIFFUSIONMODELS FOR HIGH-RESOLUTIONIMAGESYNTHESIS Anonymous authors Paper under double-blind review ABSTRACT We presentStable Diffusion XL(SDXL), a latent diffusion model for text-to-image synthesis. Compared to previous versions ofStable Diffusion,SDXLleverages a three times larger UNet backbone, achieved by significantly increasing the number of attention blocks and including a second text encoder. Further, we design multiple novel conditioning schemes and trainSDXLon multiple aspect ratios. To ensure highest quality results, we also introduce arefinement modelwhich is used to improve the visual fidelity of samples generated bySDXLusing a post-hoc image-to-imagetechnique. We demonstrate thatSDXLimproves dramatically over previous versions ofStable Diffusionand achieves results competitive with those- 2307.03172.pdf
Lost in the Middle: How Language Models Use Long Contexts Nelson F. Liu 1∗ Kevin Lin 2 John Hewitt 1 Ashwin Paranjape 3 MichelFiletype: application/pdfSize: 747542 bytesUploaded On: 2024-01-24Status: ParsedClip: Lost in the Middle: How Language Models Use Long Contexts Nelson F. Liu 1∗ Kevin Lin 2 John Hewitt 1 Ashwin Paranjape 3 Michele Bevilacqua 3 Fabio Petroni 3 Percy Liang 1- 2312.02133.pdf
Filetype: application/pdfSize: 42611083 bytesUploaded On: 2024-01-24Status: Large FileClip:- 2203.15556.pdf
Training Compute-Optimal Large Language Models &Filetype: application/pdfSize: 6004349 bytesUploaded On: 2024-01-27Status: ParsedClip: Training Compute-Optimal Large Language Models ������ ������� ★ , Sebastian Borgeaud ★ , Arthur Mensch ★ , Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals and Laurent Sifre ★ ★ Equal contributions We investigate the optimal model size and number of tokens for training a transformer language model- 2308.12350.pdf
Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation Duo Peng 1 Ping Hu 2 Qiuhong KeFiletype: application/pdfSize: 2952020 bytesUploaded On: 2024-01-27Status: ParsedClip: Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation Duo Peng 1 Ping Hu 2 Qiuhong Ke 3 Jun Liu 1,* 1 Singapore University of Technology and Design 2 Boston University 3- 2308.14963.pdf
Vector Search with OpenAI Embeddings: Lucene Is All You Need Jimmy Lin, 1 Ronak Pradeep, 1 Tommaso Teofili, 2 Jasper Xian 1 1 DaFiletype: application/pdfSize: 415708 bytesUploaded On: 2024-01-27Status: ParsedClip: Vector Search with OpenAI Embeddings: Lucene Is All You Need Jimmy Lin, 1 Ronak Pradeep, 1 Tommaso Teofili, 2 Jasper Xian 1 1 David R. Cheriton School of Computer Science, University of Waterloo 2 Department of Engineering, Roma Tre University Abstract- HEDI-2010-02.pdf
2 It can be seen that he much preferred the education of his students than the small satisfaction derived from astonishmentFiletype: application/pdfSize: 161880 bytesUploaded On: 2024-01-27Status: ParsedClip: 2 It can be seen that he much preferred the education of his students than the small satisfaction derived from astonishment; he never believed that he had truly done enough for Science if he did not feel that that he had added new truths to enrich it and the exposure of the simplicity of the idea which lead him there. … Of the sixteen professors attached to the Saint Petersburg Academy eight were trained under him and all are known through their works and have been awarded various academic distinctions and are proud to add the- 442_isparse_output_informed_sparsi.pdf
Under review as a conference paper at ICLR 2020 ISPARSE: OUTPUTINFORMED SPARSIFICATION OF NEURALNETWORKS Anonymous authors PaperFiletype: application/pdfSize: 454953 bytesUploaded On: 2024-01-27Status: ParsedClip: Under review as a conference paper at ICLR 2020 ISPARSE: OUTPUTINFORMED SPARSIFICATION OF NEURALNETWORKS Anonymous authors Paper under double-blind review ABSTRACT Deep neural networks have demonstrated unprecedented success in various knowledge management applications. However, the networks created are often very complex, with large numbers of trainable edges which require extensive com- putational resources. We note that many successful networks nevertheless often contain large numbers of redundant edges. Moreover, many of these edges may have negligible contributions towards the overall network performance. In this pa- per, we propose a noveliSparse framework, and experimentally show, that we can sparsify the network, by 30-50%, without impacting the network performance. iSparse leverages a novel edge signicance score,E, to determine the importance- 2401.01335.pdf
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models Zixiang Chen ∗† Yihe Deng ∗‡ Huizhuo YuanFiletype: application/pdfSize: 1719766 bytesUploaded On: 2024-01-27Status: ParsedClip: Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models Zixiang Chen ∗† Yihe Deng ∗‡ Huizhuo Yuan ∗§ Kaixuan Ji ¶ Quanquan Gu ‖ Abstract Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the prospect- 1904.09751.pdf
Published as a conference paper at ICLR 2020 THECURIOUSCASE OF NEURALTEXTDeGENERATION Ari Holtzman yz Jan Buys xy Li Du y MaxwelFiletype: application/pdfSize: 4691115 bytesUploaded On: 2024-02-04Status: ParsedClip: Published as a conference paper at ICLR 2020 THECURIOUSCASE OF NEURALTEXTDeGENERATION Ari Holtzman yz Jan Buys xy Li Du y Maxwell Forbes yz Yejin Choi yz y Paul G. Allen School of Computer Science & Engineering, University of Washington- 2402.01878v1.pdf
LiPO: Listwise Preference Optimization through Learning-to-Rank Tianqi Liu * 1 Zhen Qin * 1 Junru Wu 1 Jiaming Shen 1 Misha KhalFiletype: application/pdfSize: 1119744 bytesUploaded On: 2024-02-25Status: ParsedClip: LiPO: Listwise Preference Optimization through Learning-to-Rank Tianqi Liu * 1 Zhen Qin * 1 Junru Wu 1 Jiaming Shen 1 Misha Khalman 2 Rishabh Joshi 2 Yao Zhao 2- 2402.04291.pdf
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs Wei Huang 1 Yangdong Liu 2 Haotong Qin B3 2 Ying Li 2 Shiming ZhFiletype: application/pdfSize: 7299460 bytesUploaded On: 2024-02-25Status: ParsedClip: BiLLM: Pushing the Limit of Post-Training Quantization for LLMs Wei Huang 1 Yangdong Liu 2 Haotong Qin B3 2 Ying Li 2 Shiming Zhang 1 Xianglong Liu 2 Michele Magno 3- 2203.14680.pdf
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space Mor Geva ;1 Avi Caciularu ;2;yFiletype: application/pdfSize: 692857 bytesUploaded On: 2024-02-27Status: ParsedClip: Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space Mor Geva ;1 Avi Caciularu ;2;y Kevin Ro Wang 3 Yoav Goldberg 1;2 1 Allen Institute for AI 2 Bar-Ilan University 3- 2402.17764.pdf
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Shuming Ma ∗ Hongyu Wang ∗ Lingxiao Ma Lei Wang Wenhui WanFiletype: application/pdfSize: 463748 bytesUploaded On: 2024-02-28Status: ParsedClip: The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Shuming Ma ∗ Hongyu Wang ∗ Lingxiao Ma Lei Wang Wenhui Wang Shaohan Huang Li Dong Ruiping Wang Jilong Xue Furu Wei ⋄ https://aka.ms/GeneralAI Abstract Recent research, such as BitNet [WMD + 23 ], is paving the way for a new era of 1- bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant,- 2012.00152.pdf
Every Model Learned by Gradient Descent Is Approximately a Kernel Machine Pedro Domingos pedrod@cs.washington.edu Paul G. AllenFiletype: application/pdfSize: 383653 bytesUploaded On: 2024-03-03Status: ParsedClip: Every Model Learned by Gradient Descent Is Approximately a Kernel Machine Pedro Domingos pedrod@cs.washington.edu Paul G. Allen School of Computer Science & Engineering University of Washington Seattle, WA 98195-2350, USA Abstract Deep learning's successes are often attributed to its ability to automatically discover new representations of the data, rather than relying on handcrafted features like other learning methods. We show, however, that deep networks learned by the standard gradient de- scent algorithm are in fact mathematically approximately equivalent to kernel machines, a learning method that simply memorizes the data and uses it directly for prediction via a similarity function (the kernel). This greatly enhances the interpretability of deep network weights, by elucidating that they are eectively a superposition of the training examples. The network architecture incorporates knowledge of the target function into the kernel.- 2306.03819.pdf
LEACE: Perfect linear concept erasure in closed form Nora Belrose 1 David Schneider-Joseph 1 Shauli Ravfogel 2 Ryan Cotterell 3Filetype: application/pdfSize: 787636 bytesUploaded On: 2024-03-04Status: ParsedClip: LEACE: Perfect linear concept erasure in closed form Nora Belrose 1 David Schneider-Joseph 1 Shauli Ravfogel 2 Ryan Cotterell 3 Edward Raff 4 Stella Biderman 1,4 1 EleutherAI- 2403.01643.pdf
You Need to Pay Better Attention Mehran Hosseini ∗ Department of Informatics King’s College London London, UK mehran.hosseinFiletype: application/pdfSize: 392445 bytesUploaded On: 2024-03-05Status: ParsedClip: You Need to Pay Better Attention Mehran Hosseini ∗ Department of Informatics King’s College London London, UK mehran.hosseini@kcl.ac.uk Peyman Hosseini ∗ School of Electronic Engineering & Computer Science Queen Mary University of London London, UK s.hosseini@qmul.ac.uk Abstract We introduce three new attention mechanisms that outperform standard multi-- 2401.02412v1.pdf
LLM AUGMENTEDLLMS: EXPANDINGCAPABILITIES THROUGH COMPOSITION Rachit Bansal 1 Bidisha Samanta 1 Siddharth Dalmia 2 Nitish Gupta 1Filetype: application/pdfSize: 477679 bytesUploaded On: 2024-03-06Status: ParsedClip: LLM AUGMENTEDLLMS: EXPANDINGCAPABILITIES THROUGH COMPOSITION Rachit Bansal 1 Bidisha Samanta 1 Siddharth Dalmia 2 Nitish Gupta 1 Shikhar Vashishth 1 Sriram Ganapathy 1 Abhishek Bapna- 2308.06259.pdf
Published as a conference paper at ICLR 2024 SELF-ALIGNMENT WITH INSTRUCTIONBACKTRANS- LATION Xian Li, Ping Yu, Chunting Zhou, TFiletype: application/pdfSize: 1899674 bytesUploaded On: 2024-03-19Status: ParsedClip: Published as a conference paper at ICLR 2024 SELF-ALIGNMENT WITH INSTRUCTIONBACKTRANS- LATION Xian Li, Ping Yu, Chunting Zhou, Timo Schick, Omer Levy, Luke Zettlemoyer Jason Weston&Mike Lewis Meta {xianl,jase,mikelewis}@meta.com ABSTRACT We present a scalable method to build a high quality instruction following language model by automatically labelling human-written text with corresponding instruc- tions. Our approach, namedinstruction backtranslation, starts with a language model finetuned on a small amount of seed data, and a given web corpus. The seed model is used to construct training examples by generating instruction prompts for web documents (self-augmentation), and then selecting high quality examples from among these candidates (self-curation). This data is then used to finetune- 2402.01306.pdf
KTO: Model Alignment as Prospect Theoretic Optimization Kawin Ethayarajh 1 Winnie Xu 2 Niklas Muennighoff 2 Dan Jurafsky 1 DouweFiletype: application/pdfSize: 1055023 bytesUploaded On: 2024-03-19Status: ParsedClip: KTO: Model Alignment as Prospect Theoretic Optimization Kawin Ethayarajh 1 Winnie Xu 2 Niklas Muennighoff 2 Dan Jurafsky 1 Douwe Kiela 1 2 Abstract Kahneman & Tversky’sprospect theorytells us that humans perceive random variables in a biased but well-defined manner (1992); for example, hu-- 2310.05914.pdf
Preprint NEFTUNE: NOISYEMBEDDINGSIMPROVEINSTRUC- TIONFINETUNING Neel Jain 1∗ , Ping-yeh Chiang 1∗ , Yuxin Wen 1∗ , John KiFiletype: application/pdfSize: 801752 bytesUploaded On: 2024-03-28Status: ParsedClip: Preprint NEFTUNE: NOISYEMBEDDINGSIMPROVEINSTRUC- TIONFINETUNING Neel Jain 1∗ , Ping-yeh Chiang 1∗ , Yuxin Wen 1∗ , John Kirchenbauer 1 , Hong-Min Chu 1 , Gowthami Somepalli- 2310.07831.pdf
ADAPTIVELEARNINGRATESCHEDULING BY REFINEMENT When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement Aaron DefazFiletype: application/pdfSize: 1933245 bytesUploaded On: 2024-04-07Status: ParsedClip: ADAPTIVELEARNINGRATESCHEDULING BY REFINEMENT When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement Aaron Defazio ADEFAZIO@META.COM FAIR, Meta Ashok Cutkosky ASHOK@CUTKOSKY.COM Boston University Harsh Mehta HARSHM@GOOGLE.COM Google Research Konstantin Mishchenko KONSTA.MISH@GMAIL.COM Samsung AI Center Abstract Learning rate schedules used in practice bear little resemblance to those recommended by theory. We close much of this theory/practice gap, and as a consequence are able to derive newproblem-adaptivelearning rate schedules. Our key technical contribution is a refined analysis of learning rate schedules for a wide class of- 2404.03715.pdf
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Corby Rosset ∗ Ching-An Cheng ArinFiletype: application/pdfSize: 1104120 bytesUploaded On: 2024-04-08Status: ParsedClip: Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Corby Rosset ∗ Ching-An Cheng Arindam Mitra Michael Santacroce Ahmed Awadallah ∗ Tengyang Xie ∗ Microsoft Research Abstract This paper studies post-training large language models (LLMs) using preference feedback from a powerful oracle to help a model iteratively improve over itself. The typical approach for post-training LLMs involves Reinforcement Learning from Human Feedback (RLHF), which traditionally separates reward learning and subsequent policy opti- mization. However, such a reward maximization approach is limited by the nature of “point-wise” rewards (such as that- 2403.09629
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Eric Zelikman Stanford University Georges Harik NotbadFiletype: application/pdfSize: 971557 bytesUploaded On: 2024-05-01Status: ParsedClip: Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Eric Zelikman Stanford University Georges Harik Notbad AI Inc Yijia Shao Stanford University Varuna Jayasiri Notbad AI Inc Nick Haber Stanford University Noah D. Goodman Stanford University Abstract- 2401.15024v2 (1).pdf
Published as a conference paper at ICLR 2024 SLICEGPT: COMPRESSLARGELANGUAGEMODELS BYDELETINGROWS ANDCOLUMNS Saleh Ashkboos †Filetype: application/pdfSize: 617778 bytesUploaded On: 2024-05-13Status: ParsedClip: Published as a conference paper at ICLR 2024 SLICEGPT: COMPRESSLARGELANGUAGEMODELS BYDELETINGROWS ANDCOLUMNS Saleh Ashkboos †∗ ETH Zurich saleh.ashkboos@inf.ethz.ch Maximilian L. Croci † Microsoft Research t-mcroci@microsoft.com Marcelo Gennari do Nascimento Microsoft marceloge@microsoft.com Torsten Hoefler- 2405.15071
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Boshi Wang ♠ Xiang Yue 3∗ YFiletype: application/pdfSize: 1165013 bytesUploaded On: 2024-05-27Status: ParsedClip: Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Boshi Wang ♠ Xiang Yue 3∗ Yu Su ♠ Huan Sun ♠ ♠ The Ohio State University 3 Carnegie Mellon University {wang.13930,yue.149,su.809,sun.397}@osu.edu- 2312.03732
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA Damjan Kalajdzievski 1 Tenyx Abstract As large language models (LLFiletype: application/pdfSize: 876626 bytesUploaded On: 2024-06-05Status: ParsedClip: A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA Damjan Kalajdzievski 1 Tenyx Abstract As large language models (LLMs) have become increasingly compute and memory intensive, parameter-efficient fine-tuning (PEFT) methods are now a common strategy to fine-tune LLMs. A popular PEFT method is Low-Rank Adapters (LoRA), which adds trainable low-rank “adapters” to selected layers. Each adapter consists of a low-rank matrix product, multiplicatively scaled by a rank-dependent factor. This scaling factor, which divides adapters by a factor of the rank, results in slowed learning and stunted performance for LoRA with higher-rank adapters. Consequently, the use of LoRA in practice has generally been limited to very low ranks. In this work, we study the impact of the scaling factor on the learning process and prove that LoRA adapters should be divided by a factor of the square root of the rank. Modifying LoRA with the appropriate scaling factor, which we call the rank-stabilized LoRA (rsLoRA) method, easily provides for a fine-tuning compute/performance trade-off, where larger ranks can be used to trade off increased computational resources during training for better fine-tuning performance, with no- 2305.18290
Direct Preference Optimization: Your Language Model is Secretly a Reward Model Rafael Rafailov ∗† Archit Sharma ∗† EricFiletype: application/pdfSize: 1298077 bytesUploaded On: 2024-06-07Status: ParsedClip: Direct Preference Optimization: Your Language Model is Secretly a Reward Model Rafael Rafailov ∗† Archit Sharma ∗† Eric Mitchell ∗† Stefano Ermon †‡ Christopher D. Manning † Chelsea Finn † †- 2406.02528
Scalable MatMul-free Language Modeling Rui-Jie Zhu 1 , Yu Zhang 2 , Ethan Sifferman 1 , Tyler Sheaves 3 , Yiqiao Wang 4 , DustinFiletype: application/pdfSize: 1255594 bytesUploaded On: 2024-06-09Status: ParsedClip: Scalable MatMul-free Language Modeling Rui-Jie Zhu 1 , Yu Zhang 2 , Ethan Sifferman 1 , Tyler Sheaves 3 , Yiqiao Wang 4 , Dustin Richmond 1 ,Peng Zhou- 2403.17031v1.pdf
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization Shengyi Huang Michael NoukhovitchArian HosseFiletype: application/pdfSize: 2996825 bytesUploaded On: 2024-06-10Status: ParsedClip: The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization Shengyi Huang Michael NoukhovitchArian Hosseini Kashif Rasul Weixun Wang Lewis TunstallHugging FaceMila, Université de Montréal Fuxi AI Lab, NetEase costa@huggingface.co Abstract This work is the first to openly reproduce the Reinforcement Learning from Human Feedback (RLHF)scaling behaviorsreported in OpenAI’s seminal TL;DR summa- rization work (Stiennon et al., 2020). We create an RLHF pipeline from scratch, enumerate over 20 key implementation details, and share key insights during the reproduction. Our RLHF-trained Pythia models demonstrate significant gains in re- sponse quality that scale with model size with our 2.8B, 6.9B models outperforming OpenAI’s released 1.3B checkpoint. We publicly release the trained model check-- 2110.01786
MoEcation: Transformer Feed-forward Layers are Mixtures of Experts Zhengyan Zhang 1;2 , Yankai Lin 3 , Zhiyuan Liu 1;2;4;5y , PFiletype: application/pdfSize: 1614636 bytesUploaded On: 2024-06-10Status: ParsedClip: MoEcation: Transformer Feed-forward Layers are Mixtures of Experts Zhengyan Zhang 1;2 , Yankai Lin 3 , Zhiyuan Liu 1;2;4;5y , Peng Li 3;6 , Maosong Sun 1;2;4;5;7y , Jie Zhou 3 1- 2405.07987v1.pdf
The Platonic Representation Hypothesis Minyoung Huh * 1 Brian Cheung * 1 Tongzhou Wang * 1 Phillip Isola * 1 Abstract We argue tFiletype: application/pdfSize: 3260958 bytesUploaded On: 2024-06-12Status: ParsedClip: The Platonic Representation Hypothesis Minyoung Huh * 1 Brian Cheung * 1 Tongzhou Wang * 1 Phillip Isola * 1 Abstract We argue that representations in AI models, par- ticularly deep networks, are converging. First, we survey many examples of convergence in the lit- erature: over time and across multiple domains, the ways by which different neural networks rep-- 2405.07813v1.pdf
Localizing Task Information for Improved Model Merging and Compression Ke Wang * 1 Nikolaos Dimitriadis * 1 Guillermo Ortiz-JimFiletype: application/pdfSize: 1328579 bytesUploaded On: 2024-06-16Status: ParsedClip: Localizing Task Information for Improved Model Merging and Compression Ke Wang * 1 Nikolaos Dimitriadis * 1 Guillermo Ortiz-Jim´enez 2 3 Franc¸ois Fleuret 4 Pascal Frossard 1 Abstract Model merging and task arithmetic have emerged as promising scalable approaches to merge multiple single-task checkpoints to one multi-task- 2406.09325v1
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space Tomer Ashuach Martin Tutek YoFiletype: application/pdfSize: 1476779 bytesUploaded On: 2024-06-16Status: ParsedClip: REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space Tomer Ashuach Martin Tutek Yonatan Belinkov Technion – Israel Institute of Technology {tomerashuach,martin.tutek,belinkov}@campus.technion.ac.il Abstract Large language models (LLMs) risk inadvertently memorizing and divulging sensitive or personally identifiable information (PII) seen in training data, causing privacy concerns. Current approaches to address this issue involve costly dataset scrubbing, or model filtering through unlearning and model editing, which can be bypassed through extraction attacks. We propose REVS, a novel model editing method for unlearning sensitive information from LLMs. REVS identifies and modifies a small subset of neurons relevant for each piece of sensitive information. By projecting these neurons to the vocabulary space (unembedding), we pinpoint the components driving its generation. We then compute a model edit based on the- 2405.00675
Self-Play Preference Optimization for Language Model Alignment Yue Wu ∗† Zhiqing Sun ∗‡ Huizhuo Yuan ∗§ Kaixuan Ji ¶Filetype: application/pdfSize: 769690 bytesUploaded On: 2024-06-26Status: ParsedClip: Self-Play Preference Optimization for Language Model Alignment Yue Wu ∗† Zhiqing Sun ∗‡ Huizhuo Yuan ∗§ Kaixuan Ji ¶ Yiming Yang ‖ Quanquan Gu ∗∗ Abstract- 2410.05258
DIFFERENTIALTRANSFORMER Tianzhu Ye ∗ †‡ Li Dong ∗ † Yuqing Xia ∗ † Yutao Sun ∗ †‡ Yi Zhu † Gao Huang ‡ FFiletype: application/pdfSize: 993253 bytesUploaded On: 2024-10-08Status: ParsedClip: DIFFERENTIALTRANSFORMER Tianzhu Ye ∗ †‡ Li Dong ∗ † Yuqing Xia ∗ † Yutao Sun ∗ †‡ Yi Zhu † Gao Huang ‡ Furu Wei †⋄- 2405.12399
Diffusion for World Modeling: Visual Details Matter in Atari † Eloi Alonso ∗ University of Geneva Adam Jelley ∗ UniversityFiletype: application/pdfSize: 3735661 bytesUploaded On: 2024-10-13Status: ParsedClip: Diffusion for World Modeling: Visual Details Matter in Atari † Eloi Alonso ∗ University of Geneva Adam Jelley ∗ University of Edinburgh Vincent Micheli University of Geneva Anssi Kanervisto Microsoft Research Amos Storkey University of Edinburgh- 2210.03629
Published as a conference paper at ICLR 2023 REACT: SYNERGIZINGREASONING AND ACTING IN LANGUAGEMODELS Shunyu Yao *,1 , JeffreyFiletype: application/pdfSize: 633805 bytesUploaded On: 2024-10-13Status: ParsedClip: Published as a conference paper at ICLR 2023 REACT: SYNERGIZINGREASONING AND ACTING IN LANGUAGEMODELS Shunyu Yao *,1 , Jeffrey Zhao 2 , Dian Yu 2 , Nan Du 2 , Izhak Shafran 2 , Karthik Narasimhan 1- 2111.09832
Merging Models with Fisher-Weighted Averaging Michael Matena Colin Raffel Department of Computer Science University of North CarFiletype: application/pdfSize: 529404 bytesUploaded On: 2024-12-06Status: ParsedClip: Merging Models with Fisher-Weighted Averaging Michael Matena Colin Raffel Department of Computer Science University of North Carolina at Chapel Hill {mmatena,craffel}@cs.unc.edu Abstract Averaging the parameters of models that have the same architecture and initializa- tion can provide a means of combining their respective capabilities. In this paper, we take the perspective that this merging operation can be seen as choosing pa- rameters that approximately maximize the joint likelihood of the posteriors of the models' parameters. Computing a simple average of the models' parameters there- fore corresponds to making an isotropic Gaussian approximation to their posteriors. We develop an alternative merging procedure based on the Laplace approximation where we approximate each model's posterior as a Gaussian distribution whose precision matrix corresponds to its Fisher information. We rst show that our- 2504.01990v1.pdf
Filetype: application/pdfSize: 12952471 bytesUploaded On: 2025-04-28Status: Large FileClip:- 2504.16891v1.pdf
2025-4-24 AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset Ivan MFiletype: application/pdfSize: 979610 bytesUploaded On: 2025-04-28Status: ParsedClip: 2025-4-24 AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset Ivan Moshkov, Darragh Hanley, Ivan Sorokin, Shubham Toshniwal, Christof Henkel, Benedikt Schifferer, Wei Du, Igor Gitman Abstract: This paper presents our winning submission to the AI Mathematical Olympiad - Progress Prize 2 (AIMO-2) competition. Our recipe for building state-of-the-art mathematical reasoning models relies on three key pillars. First, we create a large-scale dataset comprising 540K unique high-quality math problems, including olympiad-level problems, and their 3.2M long-reasoning solutions. Second, we develop a novel method to integrate code execution with long reasoning models through iterative training, generation, and quality filtering, resulting in 1.7M high-quality Tool-Integrated Reasoning solutions. Third, we create a pipeline to train models to select the most promising solution from many candidates. We show that such generative solution selection (GenSelect) can significantly improve upon majority voting baseline. Combining these ideas,- 2507.21509
Preprint. PERSONA VECTORS :MONITORING AND CONTROLLING CHARACTER TRAITS IN LANGUAGE MODELS Runjin Chen *‡1,2 Andy Arditi †1 HFiletype: application/pdfSize: 4817767 bytesUploaded On: 2025-08-02Status: ParsedClip: Preprint. PERSONA VECTORS :MONITORING AND CONTROLLING CHARACTER TRAITS IN LANGUAGE MODELS Runjin Chen *‡1,2 Andy Arditi †1 Henry Sleight 3 Owain Evans 4,5 Jack Lindsey †‡6 1 Anthropic Fellows Program- 2503.05136
TheBeginner’sTextbook forFullyHomomorphicEncryption Ronny Ko ∗ LG Electronics Inc. ∗ Acknowledgments: Robin Geelen TianjiaFiletype: application/pdfSize: 6504423 bytesUploaded On: 2025-09-22Status: ParsedClip: TheBeginner’sTextbook forFullyHomomorphicEncryption Ronny Ko ∗ LG Electronics Inc. ∗ Acknowledgments: Robin Geelen Tianjian Yang Yongwoo Lee arXiv:2503.05136v15 [cs.CR] 8 Sep 2025 Preface Fully Homomorphic Encryption (FHE) is a cryptographic scheme that enables computations to be performed directly on encrypted data, as if the data were in plaintext. After all computations are- 2509.15591v1.pdf
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification Zinan Lin ∗ MiFiletype: application/pdfSize: 9609139 bytesUploaded On: 2025-09-23Status: ParsedClip: Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification Zinan Lin ∗ Microsoft Research Redmond, WA, USA Enshu Liu Tsinghua University Beijing, China Xuefei Ning Tsinghua University Beijing, China Junyi Zhu †- 2510.17558v1.pdf
TheFreeTransformer François Fleuret 1 1 FAIR at Meta We propose an extension of the decoder Transformer that conditions its genFiletype: application/pdfSize: 705469 bytesUploaded On: 2025-10-24Status: ParsedClip: TheFreeTransformer François Fleuret 1 1 FAIR at Meta We propose an extension of the decoder Transformer that conditions its generative process on random latent variables which are learned without supervision thanks to a variational procedure. Experimental evaluations show that allowing such a conditioning translates into substantial improvements on downstream tasks. Date: Correspondence: 1 Since their invention, the Transformer (Vaswani et al.,), and more specifically the decoder-only Transformers used originally for the GPT series of models (Radford et al.,), have become the core components of AI systems.- 2505.08727v1.pdf
Memorization-Compression Cycles Improve Generalization Fangyuan Yu Temus Abstract We prove theoretically that generalization impFiletype: application/pdfSize: 1177032 bytesUploaded On: 2025-10-24Status: ParsedClip: Memorization-Compression Cycles Improve Generalization Fangyuan Yu Temus Abstract We prove theoretically that generalization improves not only through data scaling but also by compressing internal representations. To operationalize this insight, we introduce the Information Bottleneck Language Modeling (IBLM) objective, which reframes language modeling as a constrained optimization problem: minimizing representation entropy subject to optimal prediction performance. Empirically, we observe an emergent memorization–compression cycle during LLM pretraining, ev- idenced by oscillating positive/negative gradient alignment between cross-entropy and Matrix-Based Entropy (MBE), a measure for representation entropy. This pat- tern closely mirrors the predictive–compressive trade-off prescribed by IBLM and also parallels the biological alternation between active learning and sleep consoli-- 2510.18554v1.pdf
2025-10-22 Extracting alignment data in open models Federico Barbero 1,* , Xiangming Gu 2, * , Christopher A. Choquette-Choo 3,Filetype: application/pdfSize: 2335201 bytesUploaded On: 2025-10-24Status: ParsedClip: 2025-10-22 Extracting alignment data in open models Federico Barbero 1,* , Xiangming Gu 2, * , Christopher A. Choquette-Choo 3, * , Chawin Sitawarin 4 , Matthew Jagielski 5, * , Itay Yona 6, *- 2306.03081v2.pdf
Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs Alexander K. Lew MIT alexlew@mit.edu Tan ZFiletype: application/pdfSize: 666918 bytesUploaded On: 2025-10-24Status: ParsedClip: Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs Alexander K. Lew MIT alexlew@mit.edu Tan Zhi-Xuan MIT xuan@mit.edu Gabriel Grand MIT grandg@mit.edu Vikash K. Mansinghka MIT vkm@mit.edu Abstract- Science from Fisher Information - Bernard Roy Frieden.pdf
SCIENCE FROM FISHER INFORMATION The aim of this book is to show that information is at the root of all fields of science. TheseFiletype: application/pdfSize: 2303780 bytesUploaded On: 2025-10-24Status: ParsedClip: SCIENCE FROM FISHER INFORMATION The aim of this book is to show that information is at the root of all fields of science. These fields may be generated by use of the concept of ‘‘extreme physical information,’’ or EPI. The physical information is defined to be the loss of Fisher information that is incurred in observing any scientific phenom- enon. The act of observation randomly perturbs the phenomenon, and sets off a physical process that may be modelled as a mathematical game between the observer and a ‘‘demon’’ characterizing the phenomenon. The currency of the game is Fisher information. The output of the game is the distribution law characterizing the statistics of the effect and, in particular, the acquired data. Thus, in a sense, the act of measurement creates the very law that governs the measurement. It is self-realized. This second edition ofPhysics from Fisher Informationhas been rewritten throughout in addition to including much new material. B. R- Paraneter-based Fishers information of orthogonal polynomials.pdf
Journal of Computational and Applied Mathematics 214 (2008) 136 – 147 www.elsevier.com/locate/cam Parameter-based Fisher’s iFiletype: application/pdfSize: 194746 bytesUploaded On: 2025-10-24Status: ParsedClip: Journal of Computational and Applied Mathematics 214 (2008) 136 – 147 www.elsevier.com/locate/cam Parameter-based Fisher’s information of orthogonal polynomials J.S. Dehesa a,c,∗ , B. Olmos a,c , R.J.Yáñez b,c a Departamento de Física Moderna, Universidad de Granada, 18071-Granada, Spain b Departamento de Matemática Aplicada, Universidad de Granada, 18071-Granada, Spain c Instituto Carlos I de Física Teórica y Computacional, Universidad de Granada, 18071-Granada, Spain- design_axial_flow_co2_laser.pdf
DESIGN AND CONSTRUCTION OF AXIAL SLOW FLOW CONTINUOUS WAVE FOLDED CARBON DIOXIDE LASER A THESIS SUBMITTED TO THFiletype: application/pdfSize: 2502136 bytesUploaded On: 2025-10-24Status: ParsedClip: DESIGN AND CONSTRUCTION OF AXIAL SLOW FLOW CONTINUOUS WAVE FOLDED CARBON DIOXIDE LASER A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES OF THE MIDDLE EAST TECHNICAL UNIVERSITY- 2509.15207v1.pdf
FlowRL: Matching Reward Distributions for LLM Reasoning FlowRL: Matching Reward Distributions for LLM Reasoning Xuekai Zhu 1 , DFiletype: application/pdfSize: 926461 bytesUploaded On: 2025-10-24Status: ParsedClip: FlowRL: Matching Reward Distributions for LLM Reasoning FlowRL: Matching Reward Distributions for LLM Reasoning Xuekai Zhu 1 , Daixuan Cheng 6 , Dinghuai Zhang 3 , Hengli Li 5 , Kaiyan Zhang 4 , Che Jiang 4- 2509.13237v1.pdf
MetacognitiveReuse: TurningRecurringLLM ReasoningIntoConciseBehaviors Aniket Didolkar 1,2 , 1 , 1,3,† , 1,† 1 Meta, 2 Mila-QFiletype: application/pdfSize: 1139130 bytesUploaded On: 2025-10-24Status: ParsedClip: MetacognitiveReuse: TurningRecurringLLM ReasoningIntoConciseBehaviors Aniket Didolkar 1,2 , 1 , 1,3,† , 1,† 1 Meta, 2 Mila-Quebec AI Institute, University of Montreal, 3- 2508.13898v2.pdf
Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches Yishun Lu 1 , Wesley Armour 1 1 DepartmentFiletype: application/pdfSize: 618928 bytesUploaded On: 2025-10-24Status: ParsedClip: Fisher-Orthogonal Projection Methods for Natural Gradient Descent with Large Batches Yishun Lu 1 , Wesley Armour 1 1 Department of Engineering Science University of Oxford Oxford, UK Abstract Modern GPUs are equipped with large amounts of high- bandwidth memory, enabling them to support mini-batch sizes of up to tens of thousands of training samples. How- ever, most existing optimizers struggle to perform effectively at such a large batch size. As batch size increases, gradient noise decreases due to averaging over many samples, lim-- 2506.16640v2.pdf
arXiv:2506.16640v2 [cs.CL] 24 Jun 2025 Long-Context Generalization with Sparse Attention Pavlo Vasylenko 1,2 , Marcos TrevisoFiletype: application/pdfSize: 1291447 bytesUploaded On: 2025-10-24Status: ParsedClip: arXiv:2506.16640v2 [cs.CL] 24 Jun 2025 Long-Context Generalization with Sparse Attention Pavlo Vasylenko 1,2 , Marcos Treviso 2 , André F. T. Martins 1,2,3,4 1 Instituto Superior Técnico, University of Lisbon 2 Instituto de Telecomunicações, 3 Unbabel, 4- 2502.07864v5.pdf
arXiv:2502.07864v5 [cs.LG] 12 Jun 2025 TransMLA: MLA Is All You Need Fanxu Meng 1∗ , Pingzhi Tang 1∗ , Xiaojuan Tang 1 , ZFiletype: application/pdfSize: 1795292 bytesUploaded On: 2025-10-24Status: ParsedClip: arXiv:2502.07864v5 [cs.LG] 12 Jun 2025 TransMLA: MLA Is All You Need Fanxu Meng 1∗ , Pingzhi Tang 1∗ , Xiaojuan Tang 1 , Zengwei Yao 4 , Xing Sun 3 , Muhan Zhang 1,2† 1- 2412.02595v2.pdf
arXiv:2412.02595v2 [cs.CL] 30 May 2025 Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining DatasetFiletype: application/pdfSize: 400612 bytesUploaded On: 2025-10-24Status: ParsedClip: arXiv:2412.02595v2 [cs.CL] 30 May 2025 Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset Dan Su * , Kezhi Kong * , Ying Lin * , Joseph Jennings, Brandon Norick, Markus Kliegl † ,Mostofa Patwary,Mohammad Shoeybi,Bryan Catanzaro NVIDIA *- 2308.15136v2.pdf
CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs 1 st Hiroyuki Ootomo NVIDIA Tokyo, JaFiletype: application/pdfSize: 868848 bytesUploaded On: 2025-10-24Status: ParsedClip: CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs 1 st Hiroyuki Ootomo NVIDIA Tokyo, Japan ORCID: 0000-0002-9522-3789 2 nd Akira Naruse NVIDIA Tokyo, Japan ORCID: 0000-0002-3140-0854 3- 2201.02373v11.pdf
Mirror Learning: A Unifying Framework of Policy Optimisation Jakub Grudzien Kuba 1 Christian Schroeder de Witt 1 Jakob FoersterFiletype: application/pdfSize: 3138600 bytesUploaded On: 2025-10-24Status: ParsedClip: Mirror Learning: A Unifying Framework of Policy Optimisation Jakub Grudzien Kuba 1 Christian Schroeder de Witt 1 Jakob Foerster 1 Abstract Modern deep reinforcement learning (RL) al- gorithms aremotivatedby either the gener- alised policy iteration (GPI) or trust-region learn- ing (TRL) frameworks. However, algorithms thatstrictly respectthese theoretical frameworks have proven unscalable. Surprisingly, the only known scalable algorithms violate the GPI/TRL- 2001.04451v2.pdf
Published as a conference paper at ICLR 2020 REFORMER: THEEFFICIENTTRANSFORMER Nikita Kitaev U.C. Berkeley &amFiletype: application/pdfSize: 658515 bytesUploaded On: 2025-10-24Status: ParsedClip: Published as a conference paper at ICLR 2020 REFORMER: THEEFFICIENTTRANSFORMER Nikita Kitaev U.C. Berkeley & Google Research kitaev@cs.berkeley.edu ukasz Kaiser Google Research flukaszkaiser,levskaya g@google.com Anselm Levskaya Google Research ABSTRACT Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long- 1909.00015v2.pdf
Adaptively Sparse Transformers Gonc¸alo M. Correia ä goncalo.correia@lx.it.pt Vlad Niculae ä vlad@vene.ro ä Instituto de TelFiletype: application/pdfSize: 915609 bytesUploaded On: 2025-10-24Status: ParsedClip: Adaptively Sparse Transformers Gonc¸alo M. Correia ä goncalo.correia@lx.it.pt Vlad Niculae ä vlad@vene.ro ä Instituto de Telecomunicac¸oes, Lisbon, Portugal ã Unbabel, Lisbon, Portugal Andr´e F.T. Martins ä ã andre.martins@unbabel.com Abstract- 1805.02867v2.pdf
arXiv:1805.02867v2 [cs.PF] 28 Jul 2018 Online normalizer calculation for softmax Maxim Milakov NVIDIA mmilakov@nvidia.com NataFiletype: application/pdfSize: 147347 bytesUploaded On: 2025-10-24Status: ParsedClip: arXiv:1805.02867v2 [cs.PF] 28 Jul 2018 Online normalizer calculation for softmax Maxim Milakov NVIDIA mmilakov@nvidia.com Natalia Gimelshein NVIDIA ngimelshein@nvidia.com Abstract The Softmax function is ubiquitous in machine learning, multiple previous works suggested faster alternatives for it. In this paper we propose a way to compute classical Softmax with fewer memory accesses and hypothesize that this reduction in memory accesses should improve Softmax performance on actual hardware. The benchmarks confirm this hypothesis: Softmax accelerates by up to1.3x and Softmax+TopK combined and fused by up to5x.- 1602.02068v2.pdf
From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classication Andr´e F. T. Martins y] ANDRE.MARTINS@UNBAFiletype: application/pdfSize: 1562573 bytesUploaded On: 2025-10-24Status: ParsedClip: From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classication Andr´e F. T. Martins y] ANDRE.MARTINS@UNBABEL.COM Ram´on F. Astudillo y RAMON@UNBABEL.COM y Unbabel Lda, Rua Visconde de Santar´em, 67-B, 1000-286 Lisboa, Portugal ] Instituto de Telecomunicac¸oes (IT), Instituto Superior T´ecnico, Av. Rovisco Pais, 1, 1049-001 Lisboa, Portugal Instituto de Engenharia de Sistemas e Computadores (INESC-ID), Rua Alves Redol, 9, 1000-029 Lisboa, Portugal Abstract- 1803.03635v5.pdf
Published as a conference paper at ICLR 2019 THELOTTERYTICKETHYPOTHESIS: FINDINGSPARSE, TRAINABLENEURALNETWORKS Jonathan FrankleFiletype: application/pdfSize: 4001475 bytesUploaded On: 2025-10-24Status: ParsedClip: Published as a conference paper at ICLR 2019 THELOTTERYTICKETHYPOTHESIS: FINDINGSPARSE, TRAINABLENEURALNETWORKS Jonathan Frankle MIT CSAIL jfrankle@csail.mit.edu Michael Carbin MIT CSAIL mcarbin@csail.mit.edu ABSTRACT Neural network pruning techniques can reduce the parameter counts of trained net- works by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difcult to train from the start, which would similarly improve training performance.- 2510.19093v1.pdf
W FOR Atli Kosson, 12† Jeremy Welborn, 1 Yang Liu, 1 Martin Jaggi, 2 Xi Chen 1 1 Amazon FAR (Frontier AI &Filetype: application/pdfSize: 1755933 bytesUploaded On: 2025-10-24Status: ParsedClip: W FOR Atli Kosson, 12† Jeremy Welborn, 1 Yang Liu, 1 Martin Jaggi, 2 Xi Chen 1 1 Amazon FAR (Frontier AI & Robotics), 2- 2510.23095v1.pdf
Under review as a conference paper at ICLR 2026 R V Jie Huang 1,2,*, Xuejing Liu 1,* Sibo Song 1 Ruibing Hou 2,† Hong Chang 2Filetype: application/pdfSize: 5493452 bytesUploaded On: 2025-10-29Status: ParsedClip: Under review as a conference paper at ICLR 2026 R V Jie Huang 1,2,*, Xuejing Liu 1,* Sibo Song 1 Ruibing Hou 2,† Hong Chang 2 Junyang Lin 1 Shuai Bai- 1702.08591
The Shattered Gradients Problem: If resnets are the answer, then what is the question? David Balduzzi 1 Marcus Frean 1 Lennox LeFiletype: application/pdfSize: 1562725 bytesUploaded On: 2025-11-07Status: ParsedClip: The Shattered Gradients Problem: If resnets are the answer, then what is the question? David Balduzzi 1 Marcus Frean 1 Lennox Leary 1 JP Lewis 1 2 Kurt Wan-Duo Ma 1 Brian McWilliams 3 Abstract- 2504.13837v4.pdf
October 24, 2025 Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Yang Yue 1 ,ZhFiletype: application/pdfSize: 2181369 bytesUploaded On: 2025-11-09Status: ParsedClip: October 24, 2025 Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Yang Yue 1 ,Zhiqi Chen 1 ,Rui Lu 1 ,Andrew Zhao 1 ,Zhaokai Wang 2 ,Yang Yue 1 , Shiji Song 1 , and Gao Huang 1 - differentiable-genetic-programming.pdf