Document Details


Clip: Merging Models with Fisher-Weighted Averaging Michael Matena Colin Raffel Department of Computer Science University of North Carolina at Chapel Hill {mmatena,craffel}@cs.unc.edu Abstract Averaging the parameters of models that have the same architecture and initializa- tion can provide a means of combining their respective capabilities. In this paper, we take the perspective that this “merging” operation can be seen as choosing pa- rameters that approximately maximize the joint likelihood of the posteriors of the models' parameters. Computing a simple average of the models' parameters there- fore corresponds to making an isotropic Gaussian approximation to their posteriors. We develop an alternative merging procedure based on the Laplace approximation where we approximate each model's posterior as a Gaussian distribution whose precision matrix corresponds to its Fisher information. We rst show that our
Filename: 2111.09832
Filetype: application/pdf
Size: 529404 bytes
Uploaded On: 2024-12-06
Abstract:
Summary:
Tags:
Notes:
Visible: 1
Status: Parsed
Author:
CreationDate: 2022-08-29T01:23:21+00:00
Creator: LaTeX with hyperref
Keywords:
ModDate: 2022-08-29T01:23:21+00:00
PTEX.Fullbanner: This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2
Producer: pdfTeX-1.40.21
Subject:
Title:
Trapped: False
Pages: 16

Return to Document Library