EAGLE: Exploring the Design Space for Multimodal Large Language Models with a Mixture of Encoders
The ability to accurately interpret complex visual information is a crucial focus…
MINT-1T: Scaling Open-Source Multimodal Data by 10x
Training frontier large multimodal models (LMMs) requires large-scale datasets with interleaved sequences…
MINT-1T: Scaling Open-Source Multimodal Data by 10x
Training frontier large multimodal models (LMMs) requires large-scale datasets with interleaved sequences…
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
The recent progress and advancement of Large Language Models has experienced a…
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
The recent advancements in the architecture and performance of Multimodal Large Language…
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
The advancements in large language models have significantly accelerated the development of…