Document Details


2510.23095v1.pdf
Download View Text Delete
Clip: Under review as a conference paper at ICLR 2026 R V Jie Huang 1,2,*, Xuejing Liu 1,* Sibo Song 1 Ruibing Hou 2,† Hong Chang 2 Junyang Lin 1 Shuai Bai
Filename: 2510.23095v1.pdf
Filetype: application/pdf
Size: 5493452 bytes
Uploaded On: 2025-10-29
Abstract:
Summary:
Tags:
Notes:
Visible: 1
Status: Parsed
Author: Jie Huang; Xuejing Liu; Sibo Song; Ruibing Hou; Hong Chang; Junyang Lin; Shuai Bai
Creator: arXiv GenPDF (tex2pdf:e76afa9)
DOI: https://doi.org/10.48550/arXiv.2510.23095
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
PTEX.Fullbanner: This is pdfTeX, Version 3.141592653-2.6-1.40.28 (TeX Live 2025) kpathsea version 6.4.1
Producer: pikepdf 8.15.1
Title: Revisiting Multimodal Positional Encoding in Vision-Language Models
Trapped: False
ArXivID: https://arxiv.org/abs/2510.23095v1
Pages: 16

Return to Document Library