Abstract:
To address the problems of high energy consumption, frequent path turning, and insufficient convergence stability in multi-waypoint path planning for unmanned aerial vehicles (UAVs) operating in complex agricultural environments, this study proposes a low-energy path planning method integrating improved Q-learning and genetic algorithm, termed the Improved Q-learning with Genetic Algorithm (IQL-GA). Unlike conventional path planning approaches that mainly focus on minimizing path length, the proposed method establishes an energy-sensitive path planning model by jointly considering flight distance, turning maneuvers, and obstacle avoidance risks. The UAV task is formulated as a constrained combinational optimization problem under multiple operational constraints, including obstacle safety distance, heading angle limitation, route connectivity, and return-to-start requirements. Geographic coordinates are transformed into planar Cartesian coordinates through Mercator projection to construct a two-dimensional UAV operation environment map and support accurate route planning and simulation analysis. The proposed IQL-GA framework integrates evolutionary optimization and reinforcement learning through a three-stage cooperative optimization mechanism. First, an improved multi-strategy Genetic Algorithm (GA) is employed for global exploration and population initialization. Through selection, crossover, and mutation operations, a multimodal candidate path set is generated, which effectively maintains the diversity of the solution space and avoids falling into local optima. Logistic chaotic mapping enhances population diversity, while an energy-aware fitness function combining path length and turning smoothness guides the evolutionary process toward low-energy solutions. A Sigmoid-based adaptive crossover and mutation mechanism dynamically adjusts genetic parameters to balance exploration capability and convergence stability. Second, high-quality non-dominated solutions generated by GA are mapped into the initial Q-table of the Q-learning module through normalized energy-value transformation, effectively alleviating the cold-start problem and accelerating convergence. A Boltzmann exploration strategy with exponentially decaying temperature coefficients replaces the conventional epsilon-greedy strategy, enabling adaptive transition from global exploration to local exploitation. In addition, an energy penalty term derived from the proposed energy model is incorporated into the Bellman update equation, allowing the agent to optimize cumulative reward and energy efficiency simultaneously. The reward function further integrates path efficiency, turning smoothness, and obstacle avoidance safety to improve environmental adaptability. Third, a 2-opt local search strategy is introduced to eliminate path crossings and redundant flight segments generated during global optimization. By iteratively exchanging nonadjacent path edges, the local search module improves route smoothness and further reduces turning-related energy consumption. The combination of GA-based global exploration, Q-learning-based adaptive optimization, and 2-opt local refinement forms a cooperative global-local optimization architecture capable of balancing search breadth and solution precision. To validate the effectiveness of the proposed method, simulation experiments, ablation studies, comparative evaluations, and real-world flight tests were conducted. Ablation experiments demonstrate that IQL-GA significantly outperforms conventional GA and standalone Q-learning methods in terms of energy utilization efficiency, reducing total path energy consumption by 5.8% and 8.4%, respectively, while maintaining a low collision rate and near-optimal path cost. Comparative experiments against Simulated Annealing (SA), Particle Swarm Optimization (PSO), KMCMSS, and A-Ptr-Net further show that IQL-GA achieves lower energy consumption, shorter path length, and better optimization stability. Compared with SA, PSO, and two other hybrid algorithms, the proposed IQL-GA algorithm reduces the average energy consumption by 11.8%. To further evaluate engineering applicability, field experiments were conducted using a DJI Mavic 3 UAV equipped with an RTK positioning module. Thirty-three real waypoints were selected for practical flight validation. Experimental results indicate that the proposed IQL-GA algorithm generates smoother and more compact flight trajectories with fewer redundant turns and shorter hovering durations. Compared with other planning methods, the proposed algorithm achieves the lowest total energy consumption of 15,988.4 J and the shortest hovering time of 89 s in practical operations. These field test results further verify the engineering practicability of the proposed method. This study provides an innovative and reliable solution for UAV autonomous navigation under resource-constrained conditions and shows promising application prospects.