The core of the method lies in an alternating spatial-channel attention mechanism, which adaptively balances local and global information through cross-layer channel attention and cross-feature spatial attention. These mechanisms dynamically allocate computational resources based on both task and content complexity—using lightweight convolution for simple regions and powerful self-attention for complex ones.

