Improved Bounds for Delay Dependent Bandits
Owen Oertell*, Ahan Mishra*, Parker Rho, and Robert Kleinberg
Preprint
Abstract
Delay dependent bandits are a popular class of bandit problems where the each arm’s payoff function is dependent on the time since it was last pulled. These problems are popular for representing time dependence in choices. In this work, we consider two similar scenarios, the recharging bandit setting and the monotone last switch dependent setting, and provide tighter bounds for each of these through new algorithms and analysis. In particular, we show a new high probability bound for reward in the last switch dependent setting and a 5/6 approximation bound for the recharging bandit setting. Our analysis relies on no further assumptions for each of the settings.