Sökning: "Alexandre Proutiere"
Visar resultat 11 - 15 av 20 avhandlingar innehållade orden Alexandre Proutiere.
11. Minimizing Regret in Combinatorial Bandits and Reinforcement Learning
Sammanfattning : This thesis investigates sequential decision making tasks that fall in the framework of reinforcement learning (RL). These tasks involve a decision maker repeatedly interacting with an environment modeled by an unknown finite Markov decision process (MDP), who wishes to maximize a notion of reward accumulated during her experience. LÄS MER
12. Online Combinatorial Optimization under Bandit Feedback
Sammanfattning : Multi-Armed Bandits (MAB) constitute the most fundamental model for sequential decision making problems with an exploration vs. exploitation trade-off. In such problems, the decision maker selects an arm in each round and observes a realization of the corresponding unknown reward distribution. LÄS MER
13. Consensus Algorithms in Dynamical Network Systems
Sammanfattning : Dynamical network systems are complex interconnected systems describing many real world problems. The current trend is to connect more and more systems together, and at the same time requiring continuous availability. To this end, it is crucial to understand the dynamic behaviors of networked systems. LÄS MER
14. Optimization and Control in Dynamical Network Systems
Sammanfattning : Dynamical network systems are complex interconnected systems useful to describe many real world problems. The advances in information technology has led the current trend towards connecting more and more systems, creating "intelligent" systems, where the intelligence originates in the scale and complexity of the network. LÄS MER
15. Regret Minimization in Structured Reinforcement Learning
Sammanfattning : We consider a class of sequential decision making problems in the presence of uncertainty, which belongs to the field of Reinforcement Learning (RL). Specifically, we study discrete Markov decision Processes (MDPs) which model a decision maker or agent that interacts with a stochastic and dynamic environment and receives feedback from it in the form of a reward. LÄS MER