KEYWORDS: Matrices, Education and training, Interpolation, Image segmentation, Ultrasonography, Deep learning, Data modeling, Principal component analysis, Image classification, Diagnostics
Deep learning models (DLM) encounter challenges in medical image segmentation and classification tasks, primarily due to the requirement for a substantial volume of annotated images, which are both time-consuming and expensive to acquire. In our work, we utilize neural style transfer (NST) to enhance a tiny dataset of ultrasound images, significantly improving the performance of deep learning models (DLM). Additionally, we explore style interpolation to generate new target styles, specifically tailored for ultrasound images. In summary, our objective is to demonstrate the potential utility of neural style transfer in scenarios with limited datasets, particularly in the context of ultrasound images using the dataset of breast ultrasound images (Walid Al-Dhabyani, Mohammed Gomaa, Hussien Khaled, and Aly Fahmy, 2020).
Most bacteria classifiers created by neural networks and/or image processing methods are unable to generalize when used with different data bases of images acquired with the same type of acquisition systems, even if the sample preparation is similar. In this work, we introduce an ensemble of deep neural networks designed for the classification of bacteria in a broad context. We use a dataset comprising Actinomyces, Escherichia, Staphylococcus, Lactobacillus, and Micrococcus bacteria with Gram staining, which was acquired through brightfield microscopy from various sources. To normalize diversity of image characteristics, we applied domain generalization and adaptation techniques. Subsequently, we used phenotypic characteristics, such as the color reaction to Gram staining and morphology, to classify the bacteria.
Deep neural networks automatically extract features; however, in many cases, the features extracted by the classifier are biased by the classes during the training of the model. Analyzing 3D medical images can be challenging due to the high number of channels in the images, which require long training times when using complex deep models. To address this issue, we propose a two-step approach: (i) We train an autoencoder to reconstruct the input images using some channels in the volume. As a result, we obtain a hidden representation of the images. (ii) Shallow models are then trained with the hidden representation to classify the images using an ensemble of features. To validate the proposed method, we use 3D datasets from the MedMNIST archive. Our results show that the proposed model achieves similar or even better performance than ResNet models, despite having significantly fewer parameters (approximately 14,000 parameters).
This study explores the impact of synthetic medical images, created through Stable Diffusion, on neural network training for lung condition classification. Using a hybrid dataset combining real and synthetic images, diverse state-of-the-art vision models were trained. Neural networks effectively learned from synthetic data, its performance is similar or superior to models trained purely on real images as long as the training is carried out under equal conditions: same architecture, same number of epochs, same training style, same resolution of the input image. We selected ConvNext-small as our test architecture. Its best performance when trained with a hybrid dataset (synthetic and real images) was 89% while when trained with purely real images it was 85%. These results were obtained when evaluated with an external validation data set curated by a radiologist. However, hybrid models seem to show a limit in their performance when exploring different training techniques. In contrast, a simpler architecture trained with only real images can take advantage of more complex training regimes to elevate its final performance. In this regard, our best hybrid-trained model (ConvNext-small) achieved an external validation accuracy of 87% while ResNet-34 attained a 93% validation accuracy trained only on real images. Both models were evaluated with the real-image-only dataset provided by the radiologist. This study concludes by comparing our top AI models and radiologists’ performance levels.
KEYWORDS: Medical imaging, Data modeling, Visualization, Artificial intelligence, Visual process modeling, Deep learning, Image restoration, Brain, Tumors, Neuroimaging
Explainability and bias mitigation are crucial aspects of deep learning (DL) models for medical image analysis. Generative AI, particularly autoencoders, can enhance explainability by analyzing the latent space to identify and control variables that contribute to biases. By manipulating the latent space, biases can be mitigated in the classification layer. Furthermore, the latent space can be visualized to provide a more intuitive understanding of the model’s decision-making process. In our work, we demonstrate how the proposed approach enhances the explainability of the decision-making process, surpassing the capabilities of traditional methods like Grad-Cam. Our approach effectively identifies and mitigates biases in a straightforward manner, without necessitating model retraining or dataset modification, showing how Generative AI has the potential to play a pivotal role in addressing explainability and bias mitigation challenges, enhancing the trustworthiness and clinical utility of DL-powered medical image analysis tools.
Research on interpretable CNN classifiers, involves comparing semantic segmentation masks with heat maps designed as visual explanations. A robust explanation accurately identifies or approximates the segmentation of an object. Our focus is on CNN classifiers with enhanced explainability, particularly in the middle layers. To explore this, we propose testing an encoder, trimmed to a medium layer, within a Fully Convolutional Network (FCN). Semantic segmentation is a pivotal task in computer vision preceding object recognition, and demands efficiency to optimize performance, energy consumption, and hardware costs. While various lightweight FCN proposals exist for distinct semantic segmentation tasks, their designs often introduce additional complexity compared to the more basic FCN design we advocate. Our goal is to see how well a minimal FCN works in a simple semantic segmentation task using medical images and how its accuracy changes when the training dataset is shrunk. The study involves characterizing and comparing our minimal FCN against other lightweight deep segmentation models and analyzing accuracy curves concerning the quantity of training data. Utilizing chest CT imaging, we focus on segmenting the lungs. We highlight the importance of data consumption and model size as decisive factors in selecting an architecture, especially when differences in predictive accuracy are marginal. Characterizing deep architectures based on their data requirements, allows for a thorough comparison fostering a deeper understanding of their suitability for specific applications.
In Machine Learning projects effective preparation of training datasets is essential. When dealing with image datasets, especially limited ones, data augmentation techniques play a crucial role in increasing the dataset size and diversity. These techniques, spanning from basic to deformable to deep learning augmentations, offer varying effects from simple noise addition to generating entirely new synthetic images.
In this study, we propose an alternative approach to augmenting a dataset utilizing a technique found in video processing called Video Frame Interpolation (VFI). Unlike traditional methods, with VFI we aim to produce images that are neither mere variations of the original images nor entirely synthetic ones, instead providing a middle ground where the images generated are synthetic temporal variations of the original ones. We propose to use pre-trained VFI networks in conjunction with Transfer Learning to develop specialized models capable of interpolating medical images with enough precision so that a medical specialist would deem them clinically plausible.
For this study, we worked with a model developed by Niklaus et al., on cardiac ultrasound videos and images alongside a seasoned cardiologist to provide an expert evaluation on the viability of this technique. Our findings indicate that the results produced by our fine-tuned model can indeed be considered realistic, and depending on the use case, the results of the pre-trained model can also be useful.
Early-stage detection of Coronavirus Disease 2019 (COVID-19) is crucial for patient medical attention. Since lungs are the most affected organs, monitoring them constantly is an effective way to observe sickness evolution. The most common technique for lung-imaging and evaluation is Computed Tomography (CT). However, its costs and effects over human health has made Lung Ultrasound (LUS) a good alternative. LUS does not expose the patient to radiation and minimizes the risk of contamination. Also, there is evidence of a relation between different artifacts on LUS and lung’s diseases coming from the pleura, whose abnormalities are related with most acute respiratory disorders. However, LUS often requires an expert clinical interpretation that may increase diagnosis time or decrease diagnosis performance. This paper describes and compares machine learning classification methods namely Naive Bayes (NB) Support Vector Machine (SVM), K-Nearest Neighbor (K-NN) and Random Forest (RF) over several LUS images. They obtain a classification between lung images with COVID-19, pneumonia, and healthy patients, using image’s features previously extracted from Gray Level CoOccurrence Matrix (GLCM) and histogram’s statistics. Furthermore, this paper compares the above classic methods with different Convolutional Neural Networks (CNN) that classifies the images in order to identify these lung’s diseases.
Sargassum has affected the Mexican Caribbean coasts since 2015 in atypical amounts, causing economic and ecological problems. Removal once it reaches the coast is complex since it is not easily separated from the sand, damaging dune vegetation, heavy transport compacts the sand and further deteriorates the coastline. Therefore, it is important to detect and estimate the sargassum mats path to optimize the collection efforts in the water. There have been some improvements in systems that rely on satellite images to determine areas and possible paths of sargassum, but these methods do not solve the problems near the coastline where the big mats observed in deep sea end up segregating in little mats which often do not show up in the satellite images. Besides, the temporal scales of nearshore sargassum dynamics are characterized by finer temporal resolution. This paper focuses on cameras located near the coast of Puerto Morelos reef lagoon where images are recorded of both beach and near-coastal sea. First, we apply preprocessing techniques based on time that allows us to discriminate the moving sargassum mats from the static sea bottom, then, using classic image processing techniques and neural networks we detect, trace, and estimate the path of the mat towards the place of arrival on the beach. We compared classic algorithms with neural networks. Some of the algorithms we tested are k-means and random forest for segmentation and dense optical flow to follow and estimate the path. This new methodology allows to supervise in real time the demeanor of sargassum close to shore without complex technical support.
Epifluorescence Microscopy Imaging is a technique used by neuroscientists for observation of hundreds of neurons at the same time, with single-cell resolution and low cost from living tissue. Recording, identifying and tracking neurons and their activity in those observations is a crucial step for researching. However, manual identification of neurons is a hardworking task as well as prone to errors. For this reason, automatized applications to process the recordings to identify functional neurons are required. Several proposals have emerged; they can be classified in four kinds of approaches: 1) matrix factorization, 2) clustering, 3) dictionary learning and 4) deep learning. Unfortunately, they have resulted inadequate to solve this problem. In fact, it remains as an open problem; two major reasons are: 1) lack of datasets duly labeled and 2) existing approaches do not consider the temporal dimension or just consider a tiny fraction of it, integrating all the frames in a single image is very common but inefficient because temporal dynamics are disregarded. We propose an application for automatic segmentation of neurons with a Deep Learning approach, considering temporal dimension through recurrent neural networks and using a dataset labeled by neuroscientists. Additional aspects considered in our proposal include motion correction and validation to ensure that segmentations correspond to truly functional neurons. Furthermore, we compare this application with a previous proposal which uses sophisticated digital image processing techniques on the same dataset.
This article presents the application of the full multiresolution active shape model for the segmentation of the left ventricle in ultrasound images as well as a comparison between the classical active shape model and the full multiresolution active shape model. Our objective is to evaluate the performance of the full multiresolution framework in a complex image segmentation task such as ultrasound of the left ventricle. The accuracy of this method is evaluated through the DICE coefficient between the expert annotation and the final segmentation of the full multiresolution active shape model (FMR-ASM). The validation and training data were obtained from the CAMUS database, with 100 training and 60 validation images on which we obtained a mean DICE coefficient of 0.76 with the FMR-ASM.
Deep learning (DL) is now widely used to perform tasks involving the analysis of biomedical imaging. However, the small amounts available of annotated examples of these types of images make it difficult to use DL-based systems, since large amounts of data are required for adequate generalization and performance. For this reason, in recent years, Generative Adversarial Networks (GANs) have been used to obtain synthetic images that artificially increase the amount available. Despite this, the usual training instability in GANs, in addition to their empirical design, does not always allow for high-quality results. Through the neuroevolution of GANs it has been possible to reduce these problems, but many of these works use benchmark datasets with thousands of images, a scenario that does not reflect the real conditions of cases in which it is necessary to increase the data due to the limited amount available. In this work, cDCGAN-PSO is presented, an algorithm for the neuroevolution of GANs that adapts the concepts of the DCGAN-PSO to a conditional-DCGAN that allows the synthesis of three classes of chest X-ray images and that is trained with only 600 images of each class. The synthetic images obtained from evolved GANs show good similarity with real chest X-ray images.
Active contour-based methods are widely popular in the image segmentation field. Basically, they perform a semiautomatic region identification by partitioning the image content mainly into the foreground and background. Nevertheless, the accurate delimitation still remains as an important challenge which usually depends on how close the initial contour is placed to the object of interest (OI). Several applications of active contours require the user interaction to give prior information about the initial position as the first step, which drives the tool substantially dependent on a manual process. This paper describes how to overcome this limitation by including the expertise provided by the training stage of a Convolutional Neural Network (CNN). Despite CNN methods require a large dataset or data augmentation techniques to improve their results, the combined proposal accomplishes a presegmentation task with a reduced number of images to obtain the assumed locations for each OI. These results are used to initialize a multiphase active contour model that follows a level set scheme to lead a smoother multiregion segmentation with less effort. Experiments of this approach are included to compare classic techniques of contour initialization and show the benefits of our proposal.
Ultrasound (US) has become one of the most common forms for medical imaging in clinical practice. It is a non-invasive and safe practice that allows obtaining images in real time. It is also a technology with important challenges such as low image quality and high variability (between manufacturers and institutions) [1]. This work aims to apply a fast and accurate deep learning architecture to detect and locate cerebellum in prenatal ultrasound images. Cerebellum biometry is used to estimate fetal age [2] and cerebellum segmentation could be applied to detect malformation [3]. YOLO (You Only Look Once) is a convolutional neural network (CNN) architecture for detection, classification and location of objects in images [4]. YOLO was innovative because it solved a regression problem to predict the location (coordinates and sizes) of bounding boxes and associated classes. We used 316 ultrasound scans of fetal brains and their respective cerebellar segmentations. From these, 78 images were randomly taken to be treated as test images and the rest were available to feed the trainings. Segmentation masks were converted to numerical descriptions of bounding boxes. To deal with small data set, transfer learning was done by initializing convolutional layers with weights pretrained on Imagenet [5]. We evaluated detection using F1 score and localization using average precision (AP) for 78 test images. Our best AP was 84.8% using 121 divisions or cells per image. Future work will focus on segmentation task assisted by localization.
Superpixel algorithms oversegment an image by grouping pixels with similar local features such as spatial position, gray level intensity, color, and texture. Superpixels provide visually significant regions and avoid a large number of redundant information to reduce dimensionality and complexity for subsequent image processing tasks. However, superpixel algorithms decrease performance in images with high-frequency contrast variations in regions of uniform texture. Moreover, most state-of-the-art methods use only basic pixel information -spatial and color-, getting superpixels with low regularity, boundary smoothness and adherence. The proposed algorithm adds texture information to the common superpixel representation. This information is obtained with the Hermite Transform, which extracts local texture features in terms of Gaussian derivatives. A local iterative clustering with adaptive feature weights generates superpixels preserving boundary adherence, smoothness, regularity, and compactness. A feature adjustment stage is applied to improve algorithm performance. We tested our algorithm on Berkeley Segmentation Dataset and evaluated it with standard superpixel metrics. We also demonstrate the usefulness and adaptability of our proposal in medical image application.
In this paper we propose a semi-automatic method to segment the fetal cerebellum in ultrasound images. The method is based on an active shape model which includes profiles of Hermite features. In order to fit the shape model we used a PCA of Hermite features. This model was tested on ultrasound images of the fetal brain taken from 20 pregnant women with gestational weeks varying from 18 to 24. Segmentation results compared to manual annotation show a mean Hausdorff distance of 6.85 mm using a conventional active shape model trained with gray profiles, and a mean Hausdorff distance of 5.67 mm using Hermite profiles. We conclude that the Hermite profile model is more robust in segmenting fetal cerebellum in ultrasound images.
Texture is one of the most important elements used by the human visual system (HVS) to distinguish different objects in a scene. Early bio-inspired methods for texture segmentation involve partitioning an image into distinct regions by setting a criterion based on their frequency response and local properties in order to further perform a grouping task. Nevertheless, the correct texture delimitation still remains as an important challenge in image segmentation. The aim of this study is to generate a novel approach to discriminate different textures by comparing internal and external image content in a set of evolving curves. We propose a multiphase formulation with an active contour model applied on the highest energy coefficients generated by the Hermite transform (HT). Local texture features such as scale and orientation are reflected in the HT coefficients which guide the evolution of each curve. This process leads to the enclosure of similar characteristics in a region associated with a level set function. The efficiency of our proposal is evaluated using a variety of synthetic images and real textured scenes.
Periodic variations in patterns within a group of pixels provide important information about the surface of interest and can be used to identify objects or regions. Hence, a proper analysis can be applied to extract particular features according to some specific image properties. Recently, texture analysis using orthogonal polynomials has gained attention since polynomials characterize the pseudo-periodic behavior of textures through the projection of the pattern of interest over a group of kernel functions. However, the maximum polynomial order is often linked to the size of the texture, which implies in many cases, a complex calculation and introduces instability in higher orders leading to computational errors. In this paper, we address this issue and explore a pre-processing stage to compute the optimal size of the window of analysis called “texel.” We propose Haralick-based metrics to find the main oscillation period, such that, it represents the fundamental texture and captures the minimum information, which is sufficient for classification tasks. This procedure avoids the computation of large polynomials and reduces substantially the feature space with small classification errors. Our proposal is also compared against different fixed-size windows. We also show similarities between full-image representations and the ones based on texels in terms of visual structures and feature vectors using two different orthogonal bases: Tchebichef and Hermite polynomials. Finally, we assess the performance of the proposal using well-known texture databases found in the literature.
Medical image analysis has become an important tool for improving medical diagnosis and planning treatments. It involves volume or still image segmentation that plays a critical role in understanding image content by facilitating extraction of the anatomical organ or region-of-interest. It also may help towards the construction of reliable computer-aided diagnosis systems. Specifically, level set methods have emerged as a general framework for image segmentation; such methods are mainly based on gradient information and provide satisfactory results. However, the noise inherent to images and the lack of contrast information between adjacent regions hamper the performance of the algorithms, thus, others proposals have been suggested in the literature. For instance, characterization of regions as statistical parametric models to handle level set evolution. In this paper, we study the influence of texture on a level-set-based segmentation and propose the use of Hermite features that are incorporated into the level set model to improve organ segmentation that may be useful for quantifying left ventricular blood flow. The proposal was also compared against other texture descriptors such as local binary patterns, Image derivatives, and Hounsfield low attenuation values.
In recent years, the use of Magnetic Resonance Imaging (MRI) to detect different brain structures such as
midbrain, white matter, gray matter, corpus callosum, and cerebellum has increased. This fact together with
the evidence that midbrain is associated with Parkinson’s disease has led researchers to consider midbrain
segmentation as an important issue. Nowadays, Active Shape Models (ASM) are widely used in literature for
organ segmentation where the shape is an important discriminant feature. Nevertheless, this approach is based
on the assumption that objects of interest are usually located on strong edges. Such a limitation may lead to a
final shape far from the actual shape model. This paper proposes a novel method based on the combined use
of ASM and Local Binary Patterns for segmenting midbrain. Furthermore, we analyzed several LBP methods
and evaluated their performance. The joint-model considers both global and local statistics to improve final
adjustments. The results showed that our proposal performs substantially better than the ASM algorithm and
provides better segmentation measurements.
This paper describes a segmentation method for time series of 3D cardiac images based on deformable models. The goal
of this work is to extend active shape models (ASM) of
tree-dimensional objects to the problem of 4D (3D + time)
cardiac CT image modeling. The segmentation is achieved by constructing a point distribution model (PDM) that
encodes the spatio-temporal variability of a training set, i.e., the principal modes of variation of the temporal shapes are
computed using some statistical parameters. An active search is used in the segmentation process where an initial
approximation of the spatio-temporal shape is given and the gray level information in the neighborhood of the landmarks
is analyzed. The starting shape is able to deform so as to better fit the data, but in the range allowed by the point
distribution model. Several time series consisting of eleven 3D images of cardiac CT are employed for the method
validation. Results are compared with manual segmentation made by an expert. The proposed application can be used
for clinical evaluation of the left ventricle mechanical function. Likewise, the results can be taken as the first step of
processing for optic flow estimation algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.