|
1.INTRODUCTIONThis project covers the approach to dynamic imaging of the foot and ankle through the lens of image registration, in particular volume to projection registration. We explore the parametric, bone-by-bone approach as we assume that the movement of the individual bones can be modelled through rigid transformation. Each bone is segmented out of the initial static scan and the volume containing one of the bones is matched to a sequence of fluoroscopic projections. We use the term fluoroscopy here to mean an ”X-ray movie”, i.e. photons shooting in quick succession to track the movement of the scanned object while the source, detector and the sample stage remain stationary. The registration is performed with the help of the Flexible Algorithms for Image Registration (FAIR) package1 for MATLAB and to test this approach we use the phantom described in section 1.1. 1.1Data acquisitionThe data for this project was acquired at the CWI research institute in Amsterdam, using the state-of-the-art X-ray scanner called FleXray. During the acquisition, the FleXray is completely sealed and can only scan small (up to 10 cm tall) objects, so we constructed a remotely controlled base and a simple phantom consisting of two ”bones” made from gypsum plaster submerged in some silicone. Silicone was chosen to reproduce the effect of cartilage and other soft tissue between and surrounding the bones. The data was collected at 78 kV with maximum power due to the high attenuation of gypsum and silicone. To remove lower energy photons, layers of thin copper plates were attached to the source as can be seen in fig. 1. Static frame-by-frame data of 40 individual positions was acquired as well as data from continuously shooting X-ray beams (i.e. fluoroscopic data) from a few selected angles. 1.2Image registrationImage registration is an umbrella term for aligning two or more sets of images. It is often used in image processing for combining images to either extend the view, such as in panoramic images, or to enhance an existing view and reduce signal-to-noise ratio. The purpose of image registration is often to identify similar features in different images in order to acquire further information about the features. This has a very clear application in medical imaging, where scans are often performed multiple times during treatment and sometimes even using different devices (e.g. MR and X-ray systems). To find out the exact differences between the images, it is crucial to align the images correctly, i.e. perform registration. To avoid the need for experts to identify correct features in order to perform landmark-based registration, we instead choose to rely on intensity based registration. Intensity based methods align images by comparing their intensity values and contrasts, and often require little to no preparation in terms of marker placement etc., although they can be quite time consuming during the registration process. An example of a fully intensity based method can be found in2 where individual vertebrae in a 3D CT volume were robustly registered to the corresponding vertebrae in a 2D X-ray image. The registration was performed by optimising for one of the six degrees of freedom of the 3D bone at a time. We propose a method in which we use the FAIR toolbox to optimise for all six transformation parameters simultaneously. 1.3Practical backgroundImage registration can occur when both images are in the same domain. That is, both are in Ω ⊂ ℝd, where d is the common dimension, usually 2 or 3. So for some reference image uR: Ω → ℝ, we try to transform a template image uT: Ω → ℝ using the transformation ρ : ℝd → ℝd in a way that minimizes the following objective function: Here, dist is some distance measure function and 𝓡 is a regularizer, x ∈ Ω is the set of coordinates and w is the vector of parameters for the transformation ρ. In our case, the template is a 3D bone s segmented out of the whole image uT and the reference is a projection (or a set of projections) gR. Since the registration is to be performed between a 3D volume s and 2D measurement data gR, one must first generate digitally reconstructed radiographs (DRRs) from the 3D volume, and then perform the registration between the DRR(s) and given X-ray measurement data. Let s : Ω → ℝ be the template image and gR : Ω’ → ℝ be a reference image. Domain Ω’ is a set of one or more stacked X-ray projections from different angles, Ω’ ⊂ ℝd–1 x [0, 2π). Let also Ag be the X-ray cone-beam forward operator, where θ is the set of indices representing the present projection angles. Then for 2D3D registration, the eq. 1 becomes As we are working with rigid bodies, our parameter vector w is going to consist of 6 elements corresponding to 6 degrees of freedom, 3 for rotation w1,w2,w3 and 3 for translation w4,w5,w6. 2.NUMERICAL REALIZATION AND RESULTSRecall that the objective function we want to minimize is We set the distance function to be the squared ℓ2 norm, also known as the sum of squared differences, which we refer to as 𝜓. So if the residual r = Aθ(s(ρ(w, x))) − gR(x), then the dist function is In the discrete setting, where the image is defined on a grid of cells, 𝜓(r) is multiplied by a scaling factor h which is defined as the cell width. This is particularly useful for the multiscale optimization, where different levels/scales correspond to different grid densities. As a constraint on transformation parameters w, we use Tikhonov’s regularization, i.e. given a reference set of parameters wref, where M is a diagonal matrix of weights for the transformation parameters. 2.1Pre-processing the dataFor the two-dimensional reference images, one or two projection images are used for each of the 41 static scans (one initial position + 40 poses along a programmed motion path). The template image is obtained by reconstructing the static scan in initial position and segmenting it with K-means3 into 3 separate masks - air, silicone (as well as lego) and gypsum, corresponding to air, soft tissue and bone. Noise from the reconstruction uT is smoothed by a convolution with a Gaussian filter. The bone mask is split into two masks corresponding to the two shapes coded as the mtibia (top) and the mtalus (bottom). These bones are then segmented out of the reconstruction by means of element-wise multiplication (⊚) to obtain a background volume which can be projected with the forward operator Aθ into the data domain These background projections can then be removed from all following reference images to give the best chance to our intensity based registration algorithm, The masks, segmented bones and an example of a projected background for the static initial position can be found in fig. 2. As the bones move further away from the initial position, the background extracted from it becomes a worse approximation to the true background due to decreasing overlap of the occlusion in the initial position with the occlusion in the present frame. However, as this only affects a small part of the background volume, there is still a benefit to subtracting the background. 2.2ResultsWe run the registration problem on all 40 frames + the first initial position using 4 different strategies. The registration itself is performed using Gauss-Newton method and the template image s1 is always the bone (in our case tibia, or stibia) in the initial position, while is set to w0 = [0, 0, 0,0,0,0]T. Setup 1. For each consecutive frame to be registered (iteration k + 1), the GN method is warm started by setting wref to the parameters wk obtained from the registration of the previous frame k Meanwhile the template image s for a consecutive frame registration is updated based on the previous iteration’s output parameters, using rigid transformation Trigid and spline interpolation TSI, so To simplify notation, we combine the rigid transformation and the interpolation into one transformation operator T. The updated sk+1 is used as a template image for the following iteration. Results for setup 1 are visualized in fig. 4(a). Setup 2. In this setup, we obtain the template image sk (for k > 1) directly from s1 via rigid transformation with parameters wk, while the reference parameters wref for the optimization stay the same for all iterations. Then in every iteration we obtain global transformation parameters as opposed to setup 1, where the parameters are computed locally. For every iteration k = 1, 2,…, 40, Note that wref and s1 remain unchanged through all iterations. Results for setup 2 are visualized in fig. 4(b). Setup 3. As in setup 2, except that the registration is now performed using the coarse-to-fine multiscale approach for registration with the goal to register over larger distances. Results for setup 3 are visualized in fig. 4(c). Setup 4. In this setup we propose a scheme to aggregate the optimized parameters and use those along with the s1. As can be seen in fig. 3 this scheme aggregates incremental transformations w1,…,wk. In practice, this is performed stepwise using the nested_parameters function described in appendix A. This way, interpolation is applied only once to s1 to move it into position of frame k. The variables for this setup are updated as follows: For k = 0, For k = 1, 2, 3,…, 40 The updated template s is used as initialization for the following iterations. Results for setup 4 are visualized in fig. 4(d). 3.DISCUSSIONIn this work we consider the 2D3D parametric image registration between fluoroscopic X-ray data and a segmented volume. The registration is performed with spline interpolation and Gauss-Newton optimization as implemented in the FAIR toolbox,1 adjusted for our 2D3D problem. We create an objective function eq. 2, which we optimize for rigid transformation parameters that would align the template bone with the reference projections. Gauss-Newton algorithm functions as the optimizer. The main difficulty arises with the initialization, so we test out four different setups. The first setup involves initializing the optimization of a consecutive frame with a previous frame’s transformed template and optimal parameters. This leads to very blurry results due to what amounts to recursive interpolation (see fig. 4(a)). To avoid this, the second setup initializes the rigid transformation with all parameters w set to 0 to directly compute the transformation from the initial position to the currently processed timestep. However, as the bone moves further away from its initial position, the results become consistently worse. And then as the bone moves back, closer to the initial position, the algorithm is able to recover and realign the bones (see fig. 4(b)). To try and make up for the second setup’s inability to account for large displacements, we repeat its initialization in setup 3 but now the registration is performed with a coarse-to-fine grid approach. As can be seen from fig. 4(c), though, this multiscale approach does not lead to better results. This is likely due to the presence of background elements in the reference image(s). In particular, the presence of the other bone causes the template to be aligned with it, as can be most clearly seen in fig. 4(c) with 2 projections. Finally, for setup 4, we propose initializing the Gauss-Newton optimization with the previously optimised parameters and updating the template image from the initial position with aggregated parameters. These aggregated parameters come from nesting previous transformations together to achieve one single transformation (using nested_parameters function, appendix A). In fig. 4(d), we can see that this approach does not manage to perform correct registration with just one projection, since if it wrongly estimates in one frame, the mistake only gets amplified with each frame and the algorithm can no longer recover. The moving bone can even end up completely out of the visible region of interest in the final few frames. However, two orthogonally directed projections (at 0° and 90°) constrain the problem enough to eliminate the ambiguity between motion and scaling, leading to promising results depicted in fig. 4(d) on the right, with the transformed template bone tracking the true motion quite accurately. ACKNOWLEDGEMENTSThe authors would like to thank NVIDIA Corporation for the GeForce Titan Xp GPU used to create the reconstructions presented here. AppendicesAPPENDIX A.NESTED_PARAMETERS FUNCTIONThe nested_parameters function is a function that aggregates previous registration parameters to move a rigid object from the initial position at t = 0 to all the positions in the following timeframes. The nesting or aggregation of parameters is derived as follows. Recall that rigid transformation ρ on x can be expressed as where is the extended translation vector that contains the shift to the centre of the domain c, rotation Rc, and shift back, as well as the actual translation vector τ = [τ1, τ2, τ3]T. This extended translation vector τext = (I − R)c + τ corresponds to the last three elements of the rigid transformation parameters vector w. The first three parameters of w are the angles of rotation α, β and γ (with respect to the axes x, y and z), which are used to calculate the corresponding rotation matrix R. We want to be able to perform the transformation 3 recursively to compute the effective rotation Rnest and translation τnest of two timeframes together. Let R1 and τext refer to the rotation and translation performed on the first timeframe and R2 and on the second one. Then the effective transformation ρ’ can be computed with the nested rotation matrix Rnest and nested translation vector τnest as follows Computing the translation parameters of τnest is straightforward while finding the rotation angles a, b and c of Rnest is a little more involved. We know that as a 3D rotation matrix, Rnest has the form To solve for the rotation parameters (a, b, and c), we consider the following equation To make the calculation simpler to follow, we shall replace cos α, cos β and cos γ by c(1), c(2) and c(3). For sin α, sin β, sin γ we have s(1), s(2), s(3) and analogously for the α’, β’, γ’ parameters, their sine and cosine functions turn into s’(1), s’(2), s’(3), and c’(1), c’(2), c’(3). This gives us where Then the rotation parameters a, b, c are: So the whole nested transformation vector wnest is then defined as and serves as the output to the nested_parameters(w, w’) function, where w is the vector of transformation parameters of the first timeframe and w’ of the second. The process can be repeated to compute the effective rotation and transformation for any number of steps, assuming wnest is a good estimate of all the previous timeframes’ transformations. REFERENCESModersitzki, J.,
“[FAIR: Flexible Algorithms for Image Registration],”
SIAM, Philadelphia(2009). https://doi.org/10.1137/1.9780898718843 Google Scholar
Penney, G., Batchelor, P., Hill, D., Hawkes, D., and Weese, J.,
“Validation of a two- to three-dimensional registration algorithm for aligning preoperative ct images and intraoperative fluroscopy images,”
Medical physics, 28 1024
–32
(2001). https://doi.org/10.1118/1.1373400 Google Scholar
Jin, X. and Han, J.,
“[K-Means Clustering],”
563
–564 Springer US, Boston, MA(2010). Google Scholar
|