Document Details
Clip:
Merging Models with Fisher-Weighted Averaging Michael Matena Colin Raffel Department of Computer Science University of North Carolina at Chapel Hill {mmatena,craffel}@cs.unc.edu Abstract Averaging the parameters of models that have the same architecture and initializa- tion can provide a means of combining their respective capabilities. In this paper, we take the perspective that this “merging” operation can be seen as choosing pa- rameters that approximately maximize the joint likelihood of the posteriors of the models' parameters. Computing a simple average of the models' parameters there- fore corresponds to making an isotropic Gaussian approximation to their posteriors. We develop an alternative merging procedure based on the Laplace approximation where we approximate each model's posterior as a Gaussian distribution whose precision matrix corresponds to its Fisher information. We rst show that our
Filename:
2111.09832
Filetype:
application/pdf
Size:
529404 bytes
Uploaded On:
2024-12-06
Abstract:
Summary:
Tags:
Notes:
Visible:
1
Status:
Parsed
Author:
CreationDate:
2022-08-29T01:23:21+00:00
Creator:
LaTeX with hyperref
Keywords:
ModDate:
2022-08-29T01:23:21+00:00
PTEX.Fullbanner:
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2
Producer:
pdfTeX-1.40.21
Subject:
Title:
Trapped:
False
Pages:
16
Return to Document Library