Abstract:
A multimodal big data acquisition and management system is often required to reduce the reliance on manual inspection and visual annotation. This study aimed to design, implement and field-validate a cloud–edge–end coordinated system for the honeybee colonies. Non-intrusive, fully automatic, and long-term monitoring was realized on the colony behavior and environmental conditions with the reliable temporal alignment. The system consisted of an intelligent hive terminal and a cloud-based pre-annotation module. The hive terminal was integrated with a rail-mounted dual-box structure with automatic comb-shifting imaging, in order to sequentially capture the dense comb surfaces. A hive-entrance video was captured on the channel to constrain the flight trajectories, as well as the multi-point internal and external environmental sensing, including the temperature, humidity, carbon dioxide, total volatile organic compounds, particulate matter, and internal acoustic signals. A layered cloud–edge–end architecture was adopted to decouple the high-frequency video streams from the low-frequency sensor data, providing unified clock synchronization, local buffering, and stable data transmission under field network conditions. On the cloud side, a multimodal data processing was constructed to perform the timestamp alignment, automatic visual pre-annotation, and structured storage. The object detection and instance segmentation were combined with the time-series databases and relational metadata management, thereby enabling unified spatiotemporal indexing, cross-modal association, and efficient retrieval of the heterogeneous data types. A continuous 30-day field deployment was conducted at a commercial apiary in Jiangning District, Nanjing City, China. A systematic evaluation was carried out on the system stability, data accuracy, and annotation efficiency under unattended operating conditions. Stable operation was maintained after the deployment without manual intervention. The overall loss rate of the data packet remained below 6%, indicating the reliable long-term transmission of both video streams and environmental sensor data. The web interface supported five concurrent users for data browsing and downloading without observable blocking or performance degradation. Environmental measurements were recorded by the system, indicating the high consistency with the reference instruments. In external hive temperature, the mean absolute error was 0.3 °C, the root mean square error was 0.4 °C, and the Pearson correlation coefficient reached 0.995. The largest deviation of the measurement occurred in the concentration of the internal carbon dioxide, with a mean absolute error of 48 μmol/mol, a root mean square error of 62 μmol/mol, and a Pearson correlation coefficient of 0.937. Other environmental variables, including the internal temperature, humidity, and gas-related parameters, generally showed correlation coefficients above 0.98, fully meeting the accuracy requirements for the long-term apicultural monitoring and behavioral analysis. The cloud-based visual pre-annotation module was achieved in an average per-frame processing time of approximately 10 ms, covering the data loading, preprocessing, model inference, and data storage. In hive-entrance videos, the detection of the individual bees was achieved with an accuracy of 98%, thus enabling efficient extraction of the foraging activity information. In dense comb images, the instance segmentation was achieved with an average accuracy of 76% under frequent occlusion and adhesion, indicating the increasing difficulty in delineating the overlapping individuals on the crowded comb surfaces. Compared with manual labeling, the pipeline substantially improved the annotation efficiency, with sufficient accuracy for the population-level behavioral studies. Edge buffering and video compression were further compatible with the typical bandwidth of the field network, while preserving the analytical integrity of the frames and key feature segments. The system was realized for the continuous, non-intrusive acquisition, automatic pre-annotation and structured management of the multimodal honeybee colony data within a unified cloud–edge–end framework. Mechanical design, environmental sensing, and data-driven annotation were integrated to form a standardized data infrastructure in order to support the quantitative behavioral modeling and pollination efficiency assessment in intelligent apiculture. Future work can improve the instance segmentation in the densely populated comb scenes. The robustness and generalization can be enhanced to extend the long-term validation over multiple apiaries and ecological conditions.