Offline Reinforcement Learning for Railway Optimal Maintenance and Comparison with Online Reinforcement Learning Solutions

Code for the research project "Offline Reinforcement Learning for Railway Optimal Maintenance and Comparison with Online Reinforcement Learning Solutions" by David Streuli.

Abstract

This paper explores the application of offline reinforcement learning (RL) for optimising railway maintenance planning. By utilising historical data, we develop decision-making policies without further environmental interaction, addressing the impracticality of direct experimentation. We compare the performance of offline RL algorithms, such as Deep Q-Networks (DQN), Batch-Constrained Q- learning (BCQ), and Conservative Q-Learning (CQL), against traditional online RL methods. Our results demonstrate the potential of offline RL to enhance maintenance decisions and highlight the importance of a balanced dataset for training.

Setup

For the setup please run the following commands.

conda create -n rllib python=3.8.13
conda activate rllib
pip install -r requirements.txt

Environment

The environment is implemented in env.py.

Data Sampling

Running data_sampling.py generates a dataset of trajectories using a specified environment and reward matrix. It allows for customisable parameters such as randomness in action selection, number of trajectories, and random seed for reproducibility. The dataset is saved in a compressed pickle file.

Usage

python data_sampling.py --seed <seed> --randomness <randomness> --num_trajectories <num_trajectories>

Training of Offline RL Algorithms

The algorithms can be trained using training.py. The models will be saved in d3rlpy_logs/. It assumes the availability of the datasets in datasets/.

Usage

python training.py --optimalities <optimalities> --seed <seed> --num_trajectories_list <num_trajectories> --algos <algos>

Evaluating Algorithms

The algorithms can be evaluated using evaluation_env.py, evaluation_fqe.py and evaluation_magic.py to evaluate the algorithms using the respective methods. The results will be saved in d3rlpy_logs/ in the directory of the corresponding algorithm.

Usage

python evaluation_env.py --optimalities <optimalities> --seed <seed> --num_episodes <num_episodes>

python evaluation_fqe.py --optimalities <optimalities> --seed <seed>

python evaluation_magic.py --optimalities <optimalities> --seed <seed>

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config_belief.json		config_belief.json
config_gtrxl.json		config_gtrxl.json
config_lstm.json		config_lstm.json
data_sampling.py		data_sampling.py
datasets_job.sh		datasets_job.sh
env.py		env.py
evaluation.py		evaluation.py
evaluation_env.py		evaluation_env.py
evaluation_env_job.sh		evaluation_env_job.sh
evaluation_fqe.py		evaluation_fqe.py
evaluation_fqe_job.sh		evaluation_fqe_job.sh
evaluation_magic.py		evaluation_magic.py
evaluation_magic_job.sh		evaluation_magic_job.sh
hmm_AR_k_Tstud.py		hmm_AR_k_Tstud.py
requirements.txt		requirements.txt
training.py		training.py
training_job.sh		training_job.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Offline Reinforcement Learning for Railway Optimal Maintenance and Comparison with Online Reinforcement Learning Solutions

Abstract

Setup

Environment

Data Sampling

Usage

Training of Offline RL Algorithms

Usage

Evaluating Algorithms

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Offline Reinforcement Learning for Railway Optimal Maintenance and Comparison with Online Reinforcement Learning Solutions

Abstract

Setup

Environment

Data Sampling

Usage

Training of Offline RL Algorithms

Usage

Evaluating Algorithms

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages