Efficiently capturing multi-scale local information and building long-range dependencies among pixels are essential for medical image segmentation because of the various sizes and shapes of the lesion regions or organs. In this paper, we propose the multi-scale cross-axis attention (MCA) mechanism to address these challenges through enhanced axial attention. To address the issues of insufficient learning of positional bias and limited long-distance interaction in axial attention caused by the small dataset, we propose using a dual cross-attention mechanism instead of axial attention to enhance global information capture. Meanwhile, to compensate for the lack of explicit attention to local information in axial attention, we use multiple convolutions of strip-shaped kernels with different kernel sizes in each axial attention path, which improves the efficiency of MCA in local information encoding. By integrating MCA into the multi-scale cross-axis attention network (MSCAN) backbone, we develop our network architecture, termed MCANet. With merely 4 M+ parameters, MCANet outperforms previous heavyweight approaches (e.g., swin transformer-based methods) across four challenging tasks: skin lesion segmentation, nuclei segmentation, abdominal multi-organ segmentation, and polyp segmentation. The code is available at
https://github.com/haoshao-nku/medical_seg.