In modern neuroscience, functional magnetic resonance imaging (fMRI) has been a crucial and irreplaceable tool that provides a non-invasive window into the dynamics of whole-brain activity. Nevertheless, fMRI is limited by hemodynamic blurring as well as high cost, immobility, and incompatibility with metal implants. Electroencephalography (EEG) is complementary to fMRI and can directly record the cortical electrical activity at high temporal resolution, but has more limited spatial resolution and is unable to recover information about deep subcortical brain structures. The ability to obtain fMRI information from EEG would enable cost-effective, naturalistic imaging across a wider set of brain regions. Further, beyond augmenting the capabilities of EEG, cross-modality models would facilitate the interpretation of fMRI signals. However, as both EEG and fMRI are high-dimensional and prone to noise and artifacts, it is currently challenging to model fMRI from EEG. Indeed, although correlations between these two modalities have been widely investigated, few studies have successfully used EEG to directly reconstruct fMRI time series. To address this challenge, we propose a novel architecture that can predict fMRI signals directly from multi-channel EEG without explicit feature engineering. Our model achieves this by implementing a Sinusoidal Representation Network (SIREN) to learn frequency information in brain dynamics from EEG, which serves as the input to a subsequent encoder-decoder to effectively reconstruct the fMRI signal in a specific brain region. We evaluate our model using a simultaneous EEG-fMRI dataset with 8 subjects and investigate its potential for predicting subcortical fMRI signals. The present results reveal that our model outperforms a recent state-of-the-art model and indicate the potential of leveraging periodic activation functions in deep neural networks to model functional neuroimaging data.
KEYWORDS: Image segmentation, Medical imaging, Kidney, Ultrasonography, Monte Carlo methods, Performance modeling, Image enhancement, Data modeling, Reliability, Uncertainty analysis
The Segment Anything Model (SAM) is a recently developed all-range foundation model for image segmentation. It can use sparse manual prompts such as bounding boxes to generate pixel-level segmentation in natural images but struggles in medical images such as low-contrast, noisy ultrasound images. We propose a refined test-phase prompt augmentation technique designed to improve SAM’s performance in medical image segmentation. The method couples multi-box prompt augmentation and an aleatoric uncertainty-based false-negative (FN) and false-positive (FP) correction (FNPC) strategy. We evaluate the method on two ultrasound datasets and show improvement in SAM’s performance and robustness to inaccurate prompts, without the necessity for further training or tuning. Moreover, we present the Single-Slice-to-Volume (SS2V) method, enabling 3D pixel-level segmentation using only the bounding box annotation from a single 2D slice. Our results allow efficient use of SAM in even noisy, low-contrast medical images. The source code has been released at: https://github.com/MedICL-VU/FNPC-SAM
For those experiencing severe-to-profound sensorineural hearing loss, the cochlear implant (CI) is the preferred treatment. Augmented reality (AR) aided surgery can potentially improve CI procedures and hearing outcomes. Typically, AR solutions for image-guided surgery rely on optical tracking systems to register pre-operative planning information to the display so that hidden anatomy or other important information can be overlayed and co-registered with the view of the surgical scene. In this paper, our goal is to develop a method that permits direct 2D-to-3D registration of the microscope video to the pre-operative Computed Tomography (CT) scan without the need for external tracking equipment. Our proposed solution involves using surface mapping of a portion of the incus in surgical recordings and determining the pose of this structure relative to the surgical microscope by performing pose estimation via the perspective-n-point (PnP) algorithm. This registration can then be applied to pre-operative segmentations of other anatomy-of-interest, as well as the planned electrode insertion trajectory to co-register this information for the AR display. Our results demonstrate the accuracy with an average rotation error of less than 25 degrees and a translation error of less than 2 mm, 3 mm, and 0.55% for the x, y, and z axes, respectively. Our proposed method has the potential to be applicable and generalized to other surgical procedures while only needing a monocular microscope during intra-operation.
The accurate reconstruction of surgical scenes from surgical videos is critical for various applications, including intraoperative navigation and image-guided robotic surgery automation. However, previous approaches, mainly relying on depth estimation, have limited effectiveness in reconstructing surgical scenes with moving surgical tools. To address this limitation and provide accurate 3D position prediction for surgical tools in all frames, we propose a novel approach called SAMSNeRF that combines Segment Anything Model (SAM) and Neural Radiance Field (NeRF) techniques. Our approach generates accurate segmentation masks of surgical tools using SAM, which guides the refinement of the dynamic surgical scene reconstruction by NeRF. Our experimental results on public endoscopy surgical videos demonstrate that our approach successfully reconstructs high-fidelity dynamic surgical scenes and accurately reflects the spatial information of surgical tools. Our proposed approach can significantly enhance surgical navigation and automation by providing surgeons with accurate 3D position information of surgical tools during surgery. The code will be released soon at: https://github.com/AngeLouCN/SAMSNeRF
Depth estimation in surgical video plays a crucial role in many image-guided surgery procedures. However, it is difficult and time consuming to create depth map ground truth datasets in surgical videos due in part to inconsistent brightness and noise in the surgical scene. Therefore, building an accurate and robust self-supervised depth and camera ego-motion estimation system is gaining more attention from the computer vision community. Although several self-supervision methods alleviate the need for ground truth depth maps and poses, they still need known camera intrinsic parameters, which are often missing or not recorded. Moreover, the camera intrinsic prediction methods in existing works depend heavily on the quality of datasets. In this work, we aimed to build a self-supervised depth and ego-motion estimation system which can predict not only accurate depth maps and camera pose, but also camera intrinsic parameters. We proposed a cost-volume-based supervision manner to give the system auxiliary supervision for camera parameters prediction. The experimental results showed that the proposed method improved the accuracy of estimated camera parameters, ego-motion, and depth estimation.
KEYWORDS: Equipment, 3D modeling, Image segmentation, Image restoration, Education and training, Cameras, 3D mask effects, 3D image reconstruction, Surgery, 3D image processing
Surgical instrument tracking is an active research area that can provide surgeons feedback about the location of their tools relative to anatomy. Recent tracking methods are mainly divided into two parts: segmentation and object detection. However, both can only predict 2D information, which is limiting for application to real-world surgery. An accurate 3D surgical instrument model is a prerequisite for precise predictions of the pose and depth of the instrument. Recent singleview 3D reconstruction methods are only used in natural object reconstruction and do not achieve satisfying reconstruction accuracy without 3D attribute-level supervision. Further, those methods are not suitable for the surgical instruments because of their elongated shapes. In this paper, we firstly propose an end-to-end surgical instrument reconstruction system — Self-supervised Surgical Instrument Reconstruction (SSIR). With SSIR, we propose a multi-cycle-consistency strategy to help capture the texture information from a slim instrument while only requiring a binary instrument label map. Experiments demonstrate that our approach improves the reconstruction quality of surgical instruments compared to other self-supervised methods and achieves promising results.
KEYWORDS: Image segmentation, Medical imaging, Education and training, Content addressable memory, Control systems, Magnetic resonance imaging, Machine learning, Data modeling, White matter, Biomedical applications
Medical image harmonization aims to transform the image ‘style’ among heterogeneous datasets while preserving the anatomical content. It enables data-sensitive learning-based approaches to fully leverage the data power of large multi-site datasets with different image acquisitions. Recently, the attention mechanism has achieved excellent performance on the image-to-image (I2I) translation of natural images. In this work, we further explore the potential of leveraging the attention mechanism to improve the performance of medical image harmonization. Here, we introduce two attention-based frameworks with outstanding performance in the natural I2I scenario in the context of cross-scanner MRI harmonization for the first time. We compare them with the existing commonly used harmonization frameworks by evaluating their ability to enhance the performance of the downstream subcortical segmentation task on T1-weighted (T1w) MRI datasets from 1.5T vs. 3T scanners. Both qualitative and quantitative results prove that the attention mechanism contributes to a noticeable improvement in harmonization ability.
PurposeSegmenting medical images accurately and reliably is important for disease diagnosis and treatment. It is a challenging task because of the wide variety of objects’ sizes, shapes, and scanning modalities. Recently, many convolutional neural networks have been designed for segmentation tasks and have achieved great success. Few studies, however, have fully considered the sizes of objects; thus, most demonstrate poor performance for small object segmentation. This can have a significant impact on the early detection of diseases.ApproachWe propose a context axial reverse attention network (CaraNet) to improve the segmentation performance on small objects compared with several recent state-of-the-art models. CaraNet applies axial reserve attention and channel-wise feature pyramid modules to dig the feature information of small medical objects. We evaluate our model by six different measurement metrics.ResultsWe test our CaraNet on segmentation datasets for brain tumor (BraTS 2018) and polyp (Kvasir-SEG, CVC-ColonDB, CVC-ClinicDB, CVC-300, and ETIS-LaribPolypDB). Our CaraNet achieves the top-rank mean Dice segmentation accuracy, and results show a distinct advantage of CaraNet in the segmentation of small medical objects.ConclusionsWe proposed CaraNet to segment small medical objects and outperform state-of-the-art methods.
Segmenting medical images accurately and reliably is important for disease diagnosis and treatment. It is a challenging task because of the wide variety of objects’ sizes, shapes, and scanning modalities. Recently, many convolutional neural networks (CNN) have been designed for segmentation tasks and achieved great success. Few studies, however, have fully considered the sizes of objects, and thus most demonstrate poor performance for small objects segmentation. This can have a significant impact on the early detection of diseases. This paper proposes a Context Axial Reserve Attention Network (CaraNet) to improve the segmentation performance on small objects compared with several recent state-of-the-art models. We test our CaraNet on brain tumor (BraTS 2018) and polyp (Kvasir-SEG, CVC-ColonDB, CVC-ClinicDB, CVC-300, and ETIS-LaribPolypDB) segmentation datasets. Our CaraNet achieves the top-rank mean Dice segmentation accuracy, and results show a distinct advantage of CaraNet in the segmentation of small medical objects. Codes available: https://github.com/AngeLouCN/CaraNet
Recently, deep learning has become much more popular in computer vision applications. The Convolutional Neural Network (CNN) has brought a breakthrough in image segmentation, especially for medical images. In this regard, the UNet is the predominant approach to the medical image segmentation task. The U-Net not only performs well in segmenting multimodal medical images generally, but also in some difficult cases. We found, however, that the classical U-Net architecture has limitations in several respects. Therefore, we applied modifications: 1) designed efficient CNN architecture to replace encoder and decoder, 2) applied residual module to replace skip connection between encoder and decoder to improve, based on the-state-of-the-art U-Net model. Following these modifications, we designed a novel architecture -- DC-UNet, as a potential successor to the U-Net architecture. We created a new effective CNN architecture and built the DC-UNet based on this CNN. We have evaluated our model on three datasets with difficult cases and have obtained a relative improvement in performance of 2.90%, 1.49%, and 11.42% respectively compared with classical UNet. In addition, we used the Tanimoto similarity measure to replace the Jaccard measure for gray-to-gray image comparisons.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.