The difficulty of obtaining sufficient number of appropriately labelled samples is a major obstacle to learning class discriminating features by Machine Learning (ML) algorithms for tumor diagnostics from Ultrasound (US) images. This is often mitigated by sample augmentation, whereby new samples are generated from existing samples by rotation and flipping operations, Singular Value Decomposition (SVD) or generating synthetic image by Generative Adversarial Networks (GANs). The first approach does not generate new genuine samples, SVD generates images may not be easy to recognize as US tumor scans, and while GANs generate images are visually convincing their use for diagnostics may lead to overfitting and subject to adversarial attacks. We propose an innovative sample augmentation approach that utilizes our recently developed Tumor Margin Appending (TMA) scheme. The TMA scheme constructs the Convex Hull (CH) of the tumor region using a small set of radiologist marked tumor boundary points and crops the image at different radial expansion ratios of the CH onto surrounding tissue. Various ML algorithms, handcrafted features and Convolutional Neural Network (CNN), trained with TMA images at different ratios achieved acceptable diagnostic accuracies. In this paper, our sample augmentation scheme expands the ML training datasets by including TMA samples at several expansion ratios. Results of experiments on training CNN tumor diagnostic schemes for breast tumors yield improved classification performance with additional benefits, including robustness against different inadvertently practiced cropping at different hospitals, serves as a regularizer to reduce model overfitting when tested on unseen datasets obtained using unknown tumor segmentation and cropping procedure.
|