International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
Volume 187 - Issue 30 |
Published: August 2025 |
Authors: Zhipeng Liang, Xinqi Fu, Haijin Fu, Junfeng Zhang, Feng Zhao, Jinyu Hao, Yali Li |
![]() |
Zhipeng Liang, Xinqi Fu, Haijin Fu, Junfeng Zhang, Feng Zhao, Jinyu Hao, Yali Li . Oyster Meat Yield Estimation via Multimodal Fusion of Shape and Appearance Features with ViT and VAE. International Journal of Computer Applications. 187, 30 (August 2025), 47-56. DOI=10.5120/ijca2025925529
@article{ 10.5120/ijca2025925529, author = { Zhipeng Liang,Xinqi Fu,Haijin Fu,Junfeng Zhang,Feng Zhao,Jinyu Hao,Yali Li }, title = { Oyster Meat Yield Estimation via Multimodal Fusion of Shape and Appearance Features with ViT and VAE }, journal = { International Journal of Computer Applications }, year = { 2025 }, volume = { 187 }, number = { 30 }, pages = { 47-56 }, doi = { 10.5120/ijca2025925529 }, publisher = { Foundation of Computer Science (FCS), NY, USA } }
%0 Journal Article %D 2025 %A Zhipeng Liang %A Xinqi Fu %A Haijin Fu %A Junfeng Zhang %A Feng Zhao %A Jinyu Hao %A Yali Li %T Oyster Meat Yield Estimation via Multimodal Fusion of Shape and Appearance Features with ViT and VAE%T %J International Journal of Computer Applications %V 187 %N 30 %P 47-56 %R 10.5120/ijca2025925529 %I Foundation of Computer Science (FCS), NY, USA
As an economically important species in aquaculture, quality classification and meat yield assessment of oysters are crucial for industrial efficiency. Traditional manual assessment methods are inefficient and subjective. While computer vision-based approaches have been explored for oyster weight estimation, they primarily rely on manually measured morphological parameters and often overlook valuable visual appearance features inherent in the raw images. Furthermore, weight alone is an insufficient indicator of meat content, as large shells may contain little meat. To address these limitations, this study pioneers a multimodal oyster meat yield prediction model that synergistically combines shape and appearance features for quality grading. Specifically, a segmentation network extracts shape parameters and appearance image data, constructing a multimodal dataset. A dual-branch feature extraction architecture is designed: the appearance branch utilizes self-attention mechanisms to capture pixel-level interactions, while the shape branch employs variational autoencoders (VAE) to map features into robust latent representations. These modality-specific features are concatenated and processed through a Multilayer Perceptron (MLP) to directly predict meat yield. Experimental results demonstrate that the proposed multimodal fusion approach, which comprehensively leverages both morphological and visual characteristics, establishes significantly more robust and accurate mapping relationships than unimodal models relying solely on shape or appearance. The model effectively captures complementary information and adaptively modulates cross-modal influences, thereby enhancing prediction accuracy (R²=0.9567). The key advantages of the proposed method lie in its ability to overcome the limitations of manual feature measurement and unimodal analysis by automatically extracting and fusing richer information and achieve superior prediction performance crucial for practical quality grading applications in oyster aquaculture.