Prediction-guided multi-objective reinforcement learning with corner solution search

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Nowadays, several Reinforcement Learning (RL) tasks that feature conflicting objectives are being posed as multi-objective problems and consequently solved using dedicated Multi-Objective RL (MORL) algorithms. In MORL, the aim is to find several trade-off policies (Pareto optimal set) that optimize the featured objectives. To achieve this, several Evolutionary Multi-Objective optimization (EMO) schemes have been employed in the literature. Although it is well-established in the EMO community that the most important sub-tasks required to efficiently approximate the Pareto front (Pareto set in objective space) are those associated with the corner direction vectors, these sub-tasks are often not prioritized in most MORL schemes. Therefore in this paper, we propose a mechanism that prioritizes sub-tasks resulting from the corner direction weight vectors. Specifically, the sub-tasks are prioritized through a dynamic budget allocation scheme where higher budget allocations are assigned to the important sub-tasks in the initial stage of the evolution process. By so doing, the Pareto corner solutions can be approximated and contribute towards the effective realization of the optimal Pareto Front. The proposed scheme is incorporated into the Prediction Guided MORL algorithm (PGMORL) which is a high-performing evolutionary-based MORL Framework. Consequently, the resulting algorithm termed PGMORL with Corner Solution Search (csPGMORL) is favorably compared to the baseline PGMORL algorithm on five continuous robot locomotion control problems.

Original languageEnglish
Article number109964
JournalComputers and Electrical Engineering
Volume122
DOIs
StatePublished - Mar 2025

Keywords

  • Indicator-based evolutionary algorithm
  • Multi-objective reinforcement learning

Fingerprint

Dive into the research topics of 'Prediction-guided multi-objective reinforcement learning with corner solution search'. Together they form a unique fingerprint.

Cite this