Action recognition is one of the popular research areas in computer vision because it can be applied to solve many problems especially in security surveillance, behavior analysis, healthcare and so on. Some of the well-known Convolutional Neural Network (CNN) in action classification using 3D convolution are C3D, I3D and R(2 + 1)D. These 3D CNNs assume that the spatial and temporal dimensions of motion are uniform where the 3D filters are uniformly shaped. However, the path in motion can be in any directions and a uniform shape filter might not be able to capture nonuniform spatial motion and this limits the performance of the classification. To address the above problem, we incorporate a 3D deformable filter in a C3D network for action classification. The deformable convolution adds offsets to the regular grid sampling locations in the standard convolution resulting in non-uniform sampling location. We will also investigate the performance of the network when apply the 3D deformable convolution in different layers and the effect of different dilation size of the 3D deformable filter. UCF101 dataset is used in the experiments. From our experiments, it is found that applying the deformable convolution in lower layer yield better result compare to other layers. Our experiment shows that if we put the deformable convolution in Conv1a, the accuracy achieved is 48.50%.
Fisheries are one of the few disciplines in biology where data collection continues to rely on the removal and destruction of the objects being studied. This practice has come under increasing scrutiny in recent years as research projects have been terminated due to denied permits. In some instances, research budgets have had to absorb the cost of purchasing quota for the fish captured, with difficulty in publishing results due to animal welfare concerns. In this paper, we proposed a non-extractive sampling system to localize fishes from underwater images obtained at aquaculture farms, which suffered from several issues; being, 1) low luminance issues, which could be significantly in detecting fishes, 2) severe water‟s turbidity due to fish mass caged in an area, and 3) protection / enclosure designed for camera to ensure fishes shy away from it. These images acquired in highly turbid waters are difficult to be recovered; due to 1) the fish feeding process (fish feed) which add noises to the already turbid water, and 2) the existing healthy biodiversity at the aquaculture farms. In this work, we investigate the performance of Faster R-CNN to localize fishes in the highly turbid dataset, under different network architectures at the base network. Different base network such as MobileNet, MobileNetV2, DenseNet, and ResNet, are employed. Experimental results show that MobileNetV2, with learning rate 0.01, with 500 iteration and 15 epoch, and with 87.52 % classification accuracy, is the most feasible to be deployed in the resource constrained environment, with about 6.7M parameters requiring 27.2 MB storage. These findings will be useful when the equipment is embedded with the Faster R-CNN when placed underwater for monitoring purposes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.