Proceedings Article | 24 March 2016
KEYWORDS: Neural networks, Image segmentation, Lung, Machine learning, Computed tomography, Medical imaging, Image processing algorithms and systems, Detection and tracking algorithms, Machine vision, Computer vision technology
Deep Learning, refers to large set of neural network based algorithms, have emerged as promising machine- learning tools in the general imaging and computer vision domains. Convolutional neural networks (CNNs), a specific class of deep learning algorithms, have been extremely effective in object recognition and localization in natural images. A characteristic feature of CNNs, is the use of a locally connected multi layer topology that is inspired by the animal visual cortex (the most powerful vision system in existence). While CNNs, perform admirably in object identification and localization tasks, typically require training on extremely large datasets. Unfortunately, in medical image analysis, large datasets are either unavailable or are extremely expensive to obtain. Further, the primary tasks in medical imaging are organ identification and segmentation from 3D scans, which are different from the standard computer vision tasks of object recognition. Thus, in order to translate the advantages of deep learning to medical image analysis, there is a need to develop deep network topologies and training methodologies, that are geared towards medical imaging related tasks and can work in a setting where dataset sizes are relatively small. In this paper, we present a technique for stacked supervised training of deep feed forward neural networks for segmenting organs from medical scans. Each `neural network layer' in the stack is trained to identify a sub region of the original image, that contains the organ of interest. By layering several such stacks together a very deep neural network is constructed. Such a network can be used to identify extremely small regions of interest in extremely large images, inspite of a lack of clear contrast in the signal or easily identifiable shape characteristics. What is even more intriguing is that the network stack achieves accurate segmentation even when it is trained on a single image with manually labelled ground truth. We validate this approach,using a publicly available head and neck CT dataset. We also show that a deep neural network of similar depth, if trained directly using backpropagation, cannot acheive the tasks achieved using our layer wise training paradigm.