Sökning: "Mohammad Sadegh Talebi Mazraeh Shahi"
Hittade 2 avhandlingar innehållade orden Mohammad Sadegh Talebi Mazraeh Shahi.
1. Minimizing Regret in Combinatorial Bandits and Reinforcement Learning
Sammanfattning : This thesis investigates sequential decision making tasks that fall in the framework of reinforcement learning (RL). These tasks involve a decision maker repeatedly interacting with an environment modeled by an unknown finite Markov decision process (MDP), who wishes to maximize a notion of reward accumulated during her experience. LÄS MER
2. Online Combinatorial Optimization under Bandit Feedback
Sammanfattning : Multi-Armed Bandits (MAB) constitute the most fundamental model for sequential decision making problems with an exploration vs. exploitation trade-off. In such problems, the decision maker selects an arm in each round and observes a realization of the corresponding unknown reward distribution. LÄS MER