Given the recent interest in the role of deep generative models (DGM) in medical imaging pipelines, it is imperative to evaluate the capacity of such models to generate medically accurate images. Popular methods of evaluation of natural images generated using generative adversarial networks (GANs), a type of DGM, are often applied to medical data. Such methods are insufficient to evaluate anatomical realism, representations of which include high-order spatial information. To our knowledge, no test exists for the faithful replication of spatial statistics beyond the second-order. In this work, purposefully designed stochastic object models (SOMs) are proposed to encode predetermined rules governing the prevalence of features within single images, thus encoding known high-order spatial information within each realization. These SOMs are independent of the network architecture being tested and can also be applied to any new architecture that may be proposed. Two popular GANs are trained on these SOM datasets and the generated images are tested for the encoded statistics. It is observed that although ensemble statistics might be well replicated, this is not necessarily true for realization i.e., per-image statistics. Thus, GAN-generated images might not be ready for clinical use. With the proposed SOMs, the rate of image errors and the rate of feature malformation can be quantified for any architecture, while providing one measure of GAN utility in a diagnostic scenario.
|