基于遗传算法与Q-learning算法的无人机低能耗协同路径规划

曾文岐; 陆健强; 兰玉彬; 王婷; 苏柳; 史凯涛

doi:10.11975/j.issn.1002-6819.202603256

基于遗传算法与Q-learning算法的无人机低能耗协同路径规划

A hybrid Evolutionary-Reinforcement Learning Architecture for Global-Local Cooperative Energy-Efficient UAV Path Planning

摘要

摘要: 针对复杂农业环境下无人机多航点路径规划过程中存在的能耗较高、路径转向频繁及收敛稳定性不足等问题，提出一种融合改进Q学习与遗传算法的低能耗路径规划方法（improved Q-learning genetic algorithm，IQL-GA）。首先，采用墨卡托投影构建无人机二维作业环境地图，并综合考虑路径长度、转向代价及障碍物规避成本，建立无人机综合能耗模型。其次，用改进型遗传算法进行全局搜索，通过选择、交叉和变异操作生成多模态候选路径集，有效维持解空间多样性并规避局部最优陷阱；将遗传算法得出的优质非支配解映射为Q-table初始值，采用Q-learning算法结合Boltzmann探索策略与能耗惩罚项进行局部精细化调整；再通过2-opt局部搜索消除路径交叉，进一步降低飞行能耗。最后，通过仿真试验与实地飞行测试对算法性能进行验证。消融试验表明，IQL-GA算法在能源利用效率方面显著优于标准GA和QL算法，路径总能耗分别降低5.8%和8.4%，同时保持低碰撞率与近优路径成本；对比试验表明，与模拟退火算法（simulated annealing，SA）、粒子群优化算法（particle swarm optimization，PSO）及另外两种混合算法相比，IQL-GA算法的平均能耗降低了11.8%。田间试验结果共同验证了该算法的工程实用价值。该研究为资源受限条件下的无人机自主导航提供了创新解决方案，具有良好的应用前景。

Abstract: To address the problems of high energy consumption, frequent path turning, and insufficient convergence stability in multi-waypoint path planning for unmanned aerial vehicles (UAVs) operating in complex agricultural environments, this study proposes a low-energy path planning method integrating improved Q-learning and genetic algorithm, termed the Improved Q-learning with Genetic Algorithm (IQL-GA). Unlike conventional path planning approaches that mainly focus on minimizing path length, the proposed method establishes an energy-sensitive path planning model by jointly considering flight distance, turning maneuvers, and obstacle avoidance risks. The UAV task is formulated as a constrained combinational optimization problem under multiple operational constraints, including obstacle safety distance, heading angle limitation, route connectivity, and return-to-start requirements. Geographic coordinates are transformed into planar Cartesian coordinates through Mercator projection to construct a two-dimensional UAV operation environment map and support accurate route planning and simulation analysis. The proposed IQL-GA framework integrates evolutionary optimization and reinforcement learning through a three-stage cooperative optimization mechanism. First, an improved multi-strategy Genetic Algorithm (GA) is employed for global exploration and population initialization. Through selection, crossover, and mutation operations, a multimodal candidate path set is generated, which effectively maintains the diversity of the solution space and avoids falling into local optima. Logistic chaotic mapping enhances population diversity, while an energy-aware fitness function combining path length and turning smoothness guides the evolutionary process toward low-energy solutions. A Sigmoid-based adaptive crossover and mutation mechanism dynamically adjusts genetic parameters to balance exploration capability and convergence stability. Second, high-quality non-dominated solutions generated by GA are mapped into the initial Q-table of the Q-learning module through normalized energy-value transformation, effectively alleviating the cold-start problem and accelerating convergence. A Boltzmann exploration strategy with exponentially decaying temperature coefficients replaces the conventional epsilon-greedy strategy, enabling adaptive transition from global exploration to local exploitation. In addition, an energy penalty term derived from the proposed energy model is incorporated into the Bellman update equation, allowing the agent to optimize cumulative reward and energy efficiency simultaneously. The reward function further integrates path efficiency, turning smoothness, and obstacle avoidance safety to improve environmental adaptability. Third, a 2-opt local search strategy is introduced to eliminate path crossings and redundant flight segments generated during global optimization. By iteratively exchanging nonadjacent path edges, the local search module improves route smoothness and further reduces turning-related energy consumption. The combination of GA-based global exploration, Q-learning-based adaptive optimization, and 2-opt local refinement forms a cooperative global-local optimization architecture capable of balancing search breadth and solution precision. To validate the effectiveness of the proposed method, simulation experiments, ablation studies, comparative evaluations, and real-world flight tests were conducted. Ablation experiments demonstrate that IQL-GA significantly outperforms conventional GA and standalone Q-learning methods in terms of energy utilization efficiency, reducing total path energy consumption by 5.8% and 8.4%, respectively, while maintaining a low collision rate and near-optimal path cost. Comparative experiments against Simulated Annealing (SA), Particle Swarm Optimization (PSO), KMCMSS, and A-Ptr-Net further show that IQL-GA achieves lower energy consumption, shorter path length, and better optimization stability. Compared with SA, PSO, and two other hybrid algorithms, the proposed IQL-GA algorithm reduces the average energy consumption by 11.8%. To further evaluate engineering applicability, field experiments were conducted using a DJI Mavic 3 UAV equipped with an RTK positioning module. Thirty-three real waypoints were selected for practical flight validation. Experimental results indicate that the proposed IQL-GA algorithm generates smoother and more compact flight trajectories with fewer redundant turns and shorter hovering durations. Compared with other planning methods, the proposed algorithm achieves the lowest total energy consumption of 15,988.4 J and the shortest hovering time of 89 s in practical operations. These field test results further verify the engineering practicability of the proposed method. This study provides an innovative and reliable solution for UAV autonomous navigation under resource-constrained conditions and shows promising application prospects.

HTML全文

参考文献(32)

施引文献

资源附件(0)