Abstract: Multimodal object detection plays a crucial role in all-weather and multiscene applications of aerial imagery. Existing studies mainly focus on multimodal fusion and interlevel feature ...
Abstract: Transformer-based object detection models usually adopt an encoding-decoding architecture that mainly combines self-attention (SA) and multilayer perceptron (MLP). Although this architecture ...