Abstract:
This study aimed to design, implement and field-validate a cloud–edge–end coordinated multimodal data acquisition and management system for honeybee colonies, enabling non-intrusive, fully automated and long-term monitoring of colony behavior and environmental conditions with reliable temporal alignment and reduced reliance on manual inspection and visual annotation. The proposed system consisted of an intelligent hive terminal and a cloud-based pre-annotation and management module. The hive terminal integrated a rail-mounted dual-box structure with automatic comb-shifting imaging to sequentially capture dense comb surfaces, a guided hive-entrance video capture channel to constrain flight trajectories, and multi-point internal and external environmental sensing, including temperature, humidity, carbon dioxide, total volatile organic compounds, particulate matter and internal acoustic signals. A layered cloud–edge–end architecture was adopted to decouple high-frequency video streams from low-frequency sensor data, providing unified clock synchronization, local buffering and stable data transmission under field network conditions. On the cloud side, a multimodal data processing pipeline was constructed to perform timestamp alignment, automated visual pre-annotation and structured storage. The pipeline combined object detection and instance segmentation with time-series databases and relational metadata management, enabling unified spatiotemporal indexing, cross-modal association and efficient retrieval of heterogeneous data types. A continuous 30-day field deployment was conducted at a commercial apiary in Jiangning District, Nanjing, to evaluate system stability, data accuracy and annotation efficiency under unattended operating conditions. Throughout the deployment, the system maintained stable operation without manual intervention. The overall data packet loss rate remained below 6%, indicating reliable long-term transmission of both video streams and environmental sensor data. The web-based management interface supported five concurrent users for data browsing and downloading without observable blocking or performance degradation. Environmental measurements recorded by the system exhibited high consistency with reference instruments. For external hive temperature, the mean absolute error was 0.3℃, the root mean square error was 0.4℃, and the Pearson correlation coefficient reached 0.995. The largest measurement deviation occurred in internal carbon dioxide concentration, with a mean absolute error of 48 μmol/mol, a root mean square error of 62 μmol/mol, and a Pearson correlation coefficient of 0.937. Other environmental variables, including internal temperature, humidity and gas-related parameters, generally showed correlation coefficients above 0.98, demonstrating that the system met the accuracy requirements for long-term apicultural monitoring and behavioral analysis. The cloud-based automated visual pre-annotation module achieved an average per-frame processing time of approximately 10 ms, covering data loading, preprocessing, model inference and result storage. For hive-entrance videos, automated detection of individual bees achieved an accuracy of 98%, enabling efficient extraction of foraging activity information. For dense comb images, instance segmentation under frequent occlusion and adhesion achieved an average accuracy of 76%, reflecting the increased difficulty of delineating overlapping individuals on crowded comb surfaces. Compared with manual labeling, the proposed automated pipeline substantially improved annotation efficiency while maintaining accuracy sufficient for population-level behavioral studies. Edge buffering and video compression further ensured compatibility with typical field network bandwidth while preserving the analytical integrity of selected frames and key feature segments. The results demonstrated that the proposed system enabled continuous, non-intrusive acquisition, automated pre-annotation and structured management of multimodal honeybee colony data within a unified cloud–edge–end framework. By integrating mechanical design, environmental sensing and data-driven annotation, the system formed a standardized data infrastructure to support quantitative behavioral modeling, pollination efficiency assessment and intelligent apiculture applications. Future work will focus on improving instance segmentation performance in densely populated comb scenes and extending long-term validation across multiple apiaries and ecological conditions to further enhance robustness and generalizability.