Abstract:
To solve the problems of high computing cost and slow model training in vision Transformer backbone extraction network, and to further improve the performance of Transformer structure in the field of medical image segmentation, a new lightweight U-architecture medical image segmentation network named BiUNet was proposed. The input medical image was cut into several blocks, and then the blocks were fed into the BiFormer based on the dynamic sparse attention mechanism of Bi-level routing. By combining downsampling and BiFormer modules with a specific number of blocks, a multi-level pyramid structure was constructed to achieve feature extraction. Subsequently, the feature map output from the encoder was decoded by a multi-level pyramid structure which was constructed by combining the upsampling and convolution modules, and pixel-level semantic segmentation was realized. The model achieved 90. 2%, 93. 7% and 85. 6% mIoU values as well as 5. 55 G Flops and 28. 10 M parameters on the three medical datasets sequentially. The results show that BiUNet can effectively improve the accuracy of medical image segmentation with a lightweight effect.