Currently Offered Topics / Aktuelle angebotene Themen fuer Abschlussarbeiten

We offer these current topics for Bachelor and Master students at TU Darmstadt. In addition, we are usually happy to devise new topics on request to suit the abilities of excellent students. Please DIRECTLY contact the thesis advisor if you are interested in one of these topics.

Learning a Friction Hystersis with MOSAIC

Scope: Master's thesis, Bachelor thesis
Advisor: Jan Peters
Start: ASAP
Topic: Inspired by results in neuroscience, especially in the Cerebellum, Kawato & Wolpert [1] introduced the idea of the MOSAIC (modular selection and identification for control) learning architecture. In this architecture, local forward models, i.e., models that predict future states and events, are learned directly from observations. Based on the prediction accuracy of these models, corresponding inverse models can be learned. In this thesis, we want to focus on the problem of learning to control a robot system with a hysteresis in its friction.

On the relation between actor-critic methods and generative adversarial networks

Scope: Master's thesis
Advisor: Boris Belousov
Start: Winter Semester 2017-2018
Topic: In actor-critic architectures for reinforcement learning, an actor performs actions which are subsequently evaluated by a critic. Using the principle of Lagrange duality, one can formalize the search for an optimal actor-critic pair as a minimax problem. Generative adversarial networks (GANs) have been introduced in a different context but nevertheless bear a close resemblance of the actor-critic approach in that they employ a generator for creating samples and a discriminator for evaluating their quality, optimization of which is again formalized as a minimax problem. High-level similarity between the approaches was pointed out by Finn et al. [1] and Pfau & Vinyals [2]. This thesis makes the connection concrete by embedding GANs in the framework of entropy regularized policy search.

Automated reward generation for goal directed dynamics

Scope: Master's thesis, Bachelor's thesis
Advisor: Boris Belousov
Start: Summer Semester 2018
Topic: Recent advances in optimization for control of multi-joint systems with contact brought by Goal Directed Dynamics allow for the use of an immediate cost as a high-level command to a robot. Greedy optimization with respect to such a short-term cost may yield unsatisfactory performance unless the cost turns out to be a reasonable approximation of a long-term value function. This thesis investigates the feasibility of automatic generation of immediate costs from high-level goals.

Object Recreation from RGBD-Images for Physics Simulations (temporary title)

Scope: Master's thesis
Advisor: Fabio Muratore
Start: Summer Semester 2018 or later (adaptable to the student's needs)
Topic: Running physics simulations requires to specify all occurring entities in the current scene i.e., the robot as well as the objects. The current solution is to manually write the information into an xml-file in a framework-specific format, which will be loaded before the simulation starts. Despite this being a stable and easy-to-control solution, it becomes tedious for scenes with multiple objects and restricts us to invariant environments.
In order to take one step further in the direction of autonomous robots, we want to develop a software that utilizes the information from a RBG-D camera (similar to the Kinect) and recreates the seen objects in the simulation environment. Therefor, it is necessary to classify the objects' shape (e.g., sphere, cuboid, ect.), retrieve their position in space, extract their physical properties (e.g., mass, extends, ect.), and finally pass this information to the simulator in order to conduct virtual experiments. This thesis will focus on automatic spacial recognition and shape classification of a given set of objects.
What you can expect from us:
(I) an opportunity to implement your own creative solutions with minimal restrictions to the simulation framework used.
(II) method-agnostic and supportive advisor(s)
(III) the possibility to work with the robotics group at the Honda Research Institute and/or the IAS at the TU Darmstadt
What we expect from you:
(I) strong background in computer vision and machine learning
(II) very good knowledge of C++ and Python
(III) high motivation and the ability to work independently

Deep Reinforcement Learning for Partially Observable Markov Decision Processes (POMDPs)

Scope: Master's Thesis
Advisor: Gregor Gebhardt
Start: Summer Semester 2018
Topic: Reinforcement learning approaches based on deep neural networks usually try to learn a Q-function or a policy using the fully observable state as the input. Recent advances towards RL in the partailly observable setting use standard approaches and inject LSTM layers into the network structure Hausknecht et al. Deep recurrent q-learning for partially observable MDPs. In this thesis, we want to develop a new approach based on the recurrent Kalman network (RKN) structure. The RKN learns not only an internal state representation but also a confidence value, which could potentially be exploited for actively gathering information about the state.

Deep Adversial Learning of Object Disentangling

Scope: Master's thesis
Advisor: Oleg Arenz, Joni Pajarinen
Start: ASAP
Topic: When confronted with large piles of entangled or otherwise stuck together objects a robot has to separate the objects before further manipulation is possible. For example, in waste segregation the robot may put different types of objects into different containers. In this Master thesis project, one robot will learn to disentangle objects and another adversarial robot will learn to entangle objects. Learning will be done on real robots shown in the picture right. Background knowledge: robot learning

DETECT: A Deep End-To-End Calibration Thesis

Scope: Master's thesis, Bachelor's thesis
Advisor: Oleg Arenz, Joni Pajarinen
Start: ASAP
Topic: Methods for learning the relative pose of a robot-attached camera often rely on training data that include the pose of calibration objects. By depending on a robust and precise detection method they are often cumbersome to apply or do not yield satisfactory results. This thesis tackles the problem of recovering the pose of the camera from easily attainable training data, namely robot configuration and corresponding RGB(D) camera images. Your task is to train a convolutional-deconvolutional auto-encoder that is capable of reproducing the images based on a very low dimensional bottleneck layer. In order to achieve this, the neural network has to memorize the environment of the robot and recover information about the camera pose in the bottleneck. Based on constrains that are leveraged from the known robot configuration, you should furthermore force the bottleneck layer to correspond to transformation parameters.

Super-human Decision Making Under Uncertainty

Scope: Master's thesis
Advisor: Joni Pajarinen
Start: ASAP
Topic: Google Deepmind recently showed how Monte Carlo Tree Search (MCTS) combined with neural networks can be used to play Go on a super-human level. However, one disadvantage of MCTS is that the search tree explodes exponentially with respect to the planning horizon. In this Master thesis the student will integrate the advantages of MCTS, that is, optimistic decision making into a policy representation that is limited in size with respect to the planning horizon. The outcome will be an approach that can plan further into the future. The application domain will include partially observable problems where decisions can have far reaching consequences.

Targeted Exploration Using Value Bounds

Scope: Master's thesis
Advisor: Joni Pajarinen
Start: ASAP
Topic: Efficient exploration is one of the most prominent challenges in deep reinforcement learning. In reinforcement learning, exploration of the state space is critical for finding high value actions and connecting them to the causing actions. Exploration in model-free reinforcement learning has relied on classical techniques, empirical uncertainty estimates of the value function, or random policies. In model-based reinforcement learning value bounds have been used successfully to direct exploration. In this Master thesis project the student will investigate how lower and upper value bounds can be used to target exploration in model-free reinforcement learning into the most promising parts of the state space.

Robotics Under Partial Observability

Scope: Master's thesis, Bachelor's thesis
Advisor: Joni Pajarinen
Start: ASAP
Topic: Partial observability is a defining property of robotics. Noisy sensors and actuators make state estimation difficult and even with accurate sensors occlusions prevent full observability. To gain full autonomy, a robot should use available observations for both state estimation and to plan how to gain the information required for performing the assigned tasks. Recently approaches which take partial observability into account have gained traction for example in autonomous driving, household robotics, and interactive perception. This Bachelor/Master thesis focuses on surveying the literature with respect to partial observability in robotics; categorizing different approaches and discussing open questions.

Learning Locally Linear Dynamical Systems For Reinforcement Learning

Scope: Master's Thesis, Bachelor's thesis
Advisor: Hany Abdulsamad
Start: ASAP
Topic: Model-based Reinforcement Learning is an approach to learn complex tasks given local approximations of the nonlinear dynamics of the environment and cost functions. It has proven to be a sample efficient approach for learning on real robots. Classical approaches for learning such local models have certain restrictions on the overall structure; for example the number of local componants and switching dynamics. State of the art research has recently moved to more general settings with nonparameteric approaches that require less structure. The aim of this thesis is to review the literature on this subject and to compare existing algorithms on real robots like the BioRob or the Barrett WAM.

Chance-Guarantees for Model-Based Reinfrocement Learning

Scope: Master's Thesis, Bachelor's thesis
Advisor: Hany Abdulsamad
Start: ASAP
Topic: In the domain of model-based Reinforcement Learning the quality of the policy update and rate of convergence depend heavily on the quality of approximated nonlinear dynamics. The nonlinearities may arise due to either a complex model strucutre; for exmaple a 7-link robot under the influence of gravity or due to real system constraints such as state and action limits. Hard nonlinearities such as state and actions constraints are rarely modelled in the general model-based Reinforcement Learning problem, leading to catastrophic approximation errors where the system hits the limits. The aim in this thesis is to incoporate a new type of constraints to absolutely bound the chance of a system leaving a certain area of the state-action space, thus allowing for better approximations and faster overall convergence.

Benchmarking Reinforcement Learning Algorithms on Tetherball Games

Scope: Bachelor's Thesis
Advisor: Hany Abdulsamad
Start: ASAP
Topic: Given the rapid development of Reinforcement Learning in the recent years a high number of new approaches has been introduced. While most approaches seem to reach a good performance in simulation on a given task, it is often hard to compare different approaches in an adversarial setup on a real robot. For exactly this purpose we have built the Tetherball setup in our lab, where two robots can play/learn against/from each other with different controllers represeting state of the art RL algorithms. In the scope of this thesis, two algorithms are to be chosen and used to learn hitting policies on the real robot. Let the best AI win!

Learning to Support a Learning Agent in a Cooperative Setup

Scope: Master's Thesis, Bachelor's thesis
Advisor: Hany Abdulsamad
Start: ASAP
Topic: A great challenge in applying Reinforcement Learning approaches is the need for human intervention to reset the scenario of a learned task, making the process very tedious and time consuming. A clear example is learning table tennis, where we are either limited to using a ball gun with predictable pattern of initial positions or a human is needed to play against the robot. However given a second robotic player, we propose a new setup, in which the two agents cooperate to develope two different strategies, where one agent learns to support the second in becoming a great table tennis player. It is interesting to see if in such a scenario the agents would be able to discover what might resemble a defensive and an aggressive strategy in table tennis. The thesis will concentrate on developing the concept of cooperation and testing the results in simulation and on our own real table tennis setup.

Minimizing Bellman Residual in Deep Deterministic Policy Gradient

Scope: Master's Thesis
Advisor: Samuele Tosatto
Start: Summer Semester 2018
Topic: Deep Deterministic Policy Gradient (DDPG), is a Reinforcement Learning algorithm which uses deep neural network to represent both the actor and the critic and to update them consequently. The goal is to find an optimal deterministic policy which solves the given task. DDPG had shown to have a great performance on some task, both with a low-dimensional input and with a multidimensional one (such as raw pixel image). The success of DDPG is interesting also in the field of robotics, since a deterministic policy could be often preferable then a stochastic one on a real system. In this thesis the student will exploit a modification of the update rule of the critic, by directly minimizing the Bellman residual, and by introducing a penalization term, which should "synchronize" better the update of the critic with respect to the actor, in such the way that the actor will remain more greedy with respect to the critic itself. The thesis is intended to be theoretical, at least in the beginning, and the student will conduct several experiments in order to define how the modified update will impact on the optimization process. In a second phase, when the theoretical results are consolidated, the student, together with his advisor, will define a robot task in order to compare the standard DDPG and the modified version.

  

zum Seitenanfang