Inadequate training data and class imbalances will often affect the generalizability of many deep learning models, primarily those meant to detect rare illnesses. We propose that increasing the number of positive samples would address the difficulty of detecting breast lesions in whole slide images. We employ a controllable image synthesis framework for data augmentation inspired by CycleGAN. We use a semantic mask to guide the image-to-image translation between the healthy and pathological domains. We introduce pathology on healthy whole slide images in a location specified on the binary mask and train our model using adversarial learning. The masks provide exact information about the shape and location of the pathological features, resulting in realistic images that can be used alongside real data. We then add the synthetic images to the real data from the publicly available BReAst Carcinoma Subtyping (BRACS) dataset comprising breast histology images and use the augmented data to detect lesions. When enhanced with classical data augmentation, our enriched dataset increases breast lesion detection capabilities, offering a unique opportunity for early cancer diagnosis. The model trained with the combined data had its area under the curve (AUC) closest to one, implying a minimal risk of missing potential positive diagnoses and the chance to identify potential breast cancer cases early. We demonstrate that leveraging synthetic images as an additional augmentation tool potentially solves the challenge of insufficient pathological data in biomedical imaging. Our code is available on GitHub.
|