Online Stacking Using RL with Positional and Tactical Features

Research output: Contribution to book/anthology/report/proceedingArticle in proceedingsResearchpeer-review

36 Downloads (Pure)


We study the scenario where some items are stored temporarily in stacks and where it is not allowed to put an item on top of another item leaving earlier. An arriving item is assigned to a stack based only on information on the arrival and departure times for the new item and items currently stored. The objective is to minimize the maximum number of stacks used over time. This problem is referred to as online stacking. We use Reinforcement Learning (RL) techniques to improve heuristics earlier presented in the literature. Using an analogy to chess, we look at positional and tactical features where the former give high priority to stacking configurations that are well suited to meet the challenges on a long-term basis and the latter focus on using few stacks on a short-term basis. We show how the RL approach finds the optimal mix of positional and tactical features to be used at different stages of the stacking process. We document quantitatively that positional features play a bigger role at stages of the stacking process with few items stored. We believe that the RL approach combining positional and tactical features can be used in many other online settings within operations research.

Original languageEnglish
Title of host publicationLearning and Intelligent Optimization - 14th International Conference, LION 14, 2020, Revised Selected Papers : LION 2020
EditorsIlias S. Kotsireas, Panos M. Pardalos
Number of pages11
Place of publicationCham
Publication date2020
ISBN (Print)978-3-030-53551-3
Publication statusPublished - 2020
Event14th International Conference, LION 14 - Athen, Greece
Duration: 24 May 202028 May 2020
Conference number: 14


Conference14th International Conference, LION 14
Internet address
SeriesLecture Notes in Computer Science


Dive into the research topics of 'Online Stacking Using RL with Positional and Tactical Features'. Together they form a unique fingerprint.

Cite this