Avancerad sökning

Visar resultat 1 - 5 av 14 avhandlingar som matchar ovanstående sökkriterier.

  1. 1. Structured Stochastic Bandits

    Författare :Stefan Magureanu; Alexandre Proutiere; Emilie Kaufmann; KTH; []
    Nyckelord :TEKNIK OCH TEKNOLOGIER; ENGINEERING AND TECHNOLOGY; Multi-armed bandits; Learning to rank; reinforcement learning; Lipschitz Bandits; Electrical Engineering; Elektro- och systemteknik;

    Sammanfattning : In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlated arms. Particularly, we investigate the case when the expected rewards are a Lipschitz function of the arm, and the learning to rank problem, as viewed from a MAB perspective. LÄS MER

  2. 2. Minimizing Regret in Combinatorial Bandits and Reinforcement Learning

    Författare :Mohammad Sadegh Talebi Mazraeh Shahi; Alexandre Proutiere; Mikael Johansson; Ronald Ortner; KTH; []
    Nyckelord :Multi-armed Bandits; Reinforcement Learning; Regret Minimization; Statistics; Electrical Engineering; Elektro- och systemteknik;

    Sammanfattning : This thesis investigates sequential decision making tasks that fall in the framework of reinforcement learning (RL). These tasks involve a decision maker repeatedly interacting with an environment modeled by an unknown finite Markov decision process (MDP), who wishes to maximize a notion of reward accumulated during her experience. LÄS MER

  3. 3. Reinforcement Learning and Dynamical Systems

    Författare :Björn Lindenberg; Karl-Olof Lindahl; Marc G. Bellemare; Linnéuniversitetet; []
    Nyckelord :NATURVETENSKAP; NATURAL SCIENCES; artificial intelligence; distributional reinforcement learning; Markov decision processes; Bellman operators; deep learning; multi-armed bandits; Bayesian bandits; conjugate priors; Thompson sampling; linear finite dynamical systems; cycle orbits; fixed-point systems; Mathematics; Matematik; Computer Science; Datavetenskap;

    Sammanfattning : This thesis concerns reinforcement learning and dynamical systems in finite discrete problem domains. Artificial intelligence studies through reinforcement learning involves developing models and algorithms for scenarios when there is an agent that is interacting with an environment. LÄS MER

  4. 4. Online Learning for Energy Efficient Navigation in Stochastic Transport Networks

    Författare :Niklas Åkerblom; Chalmers tekniska högskola; []
    Nyckelord :NATURVETENSKAP; NATURAL SCIENCES; NATURVETENSKAP; NATURAL SCIENCES; NATURVETENSKAP; NATURAL SCIENCES; Thompson Sampling; Online Minimax Path Problem; Multi-Armed Bandits; Online Learning; Online Shortest Path Problem; Machine Learning; Combinatorial Semi-Bandits; Energy Efficient Navigation;

    Sammanfattning : Reducing the dependence on fossil fuels in the transport sector is crucial to have a realistic chance of halting climate change. The automotive industry is, therefore, transitioning towards an electrified future at an unprecedented pace. LÄS MER

  5. 5. Efficient Online Learning under Bandit Feedback

    Författare :Stefan Magureanu; Alexandre Proutiere; Odalric-Ambrym Maillard; KTH; []
    Nyckelord :TEKNIK OCH TEKNOLOGIER; ENGINEERING AND TECHNOLOGY; multi-armed bandits; reinforcement learning; learning to rank; Electrical Engineering; Elektro- och systemteknik;

    Sammanfattning : In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlated arms. Particularly, we investigate the case when the expected rewards are a Lipschitz function of the arm and extend these results to bandits with arbitrary structure that is known to the decision maker. LÄS MER