Currently Offered Topics / Aktuelle angebotene Themen fuer Abschlussarbeiten

We offer these current topics directly for Bachelor and Master students at TU Darmstadt who can feel free to DIRECTLY contact the thesis advisor if you are interested in one of these topics. Excellent external students from another university may be accepted but please first email Jan Peters. Note that we cannot provide funding for any of these theses projects.

In addition, we are usually happy to devise new topics on request to suit the abilities of excellent students. Please DIRECTLY contact the thesis advisor if you are interested in one of these topics. When you contact the advisor, it would be nice if you could mention (1) WHY you are interested in the topic (dreams, parts of the problem, etc), and (2) WHAT makes you special for the projects (e.g., class work, project experience, special programming or math skills, prior work, etc.). Supplementary materials (CV, grades, etc) are highly appreciated. Of course, such materials are not mandatory but they help the advisor to see whether the topic is too easy, just about right or too hard for you.

AADD: Adaptive Autonomous Deep Driving

Scope: Master's thesis
Advisor: Joni Pajarinen, Dorothea Koert
Start: ASAP
Topic: This thesis project focuses on deep reinforcement learning for autonomous driving in challenging conditions. Due to pedestrians and bad weather there can be high uncertainty about the true state of the world: humans are hard to detect and may behave erratically. This is especially true for small children. One example scenario is parking in a crowded parking lot. The thesis will focus on deep reinforcement learning methods which can take uncertainty into account. The learning will be mainly performed in a simulated environment such as the one shown on the right (video: https://youtu.be/Hp8Dz-Zek2E). If desired, the student will also get the chance to participate in the large autonomous driving project at Darmstadt which will have a real autonomous car.

Automatic Segmentation and Labeling for Robot Table Tennis Time Series

Scope: Master's thesis, Bachelor's thesis
Advisor: Sebastian Gomez at our Tübingen Lab at the Max Planck Institute for Intelligent Systems
Start: ASAP
Topic: Robot table tennis is an interesting test-bed for machine learning and reinforcement learning approaches. However, the data obtained from the vision system cannot always be trusted, difficulting the task of obtaining useful rewards or training data sets. This thesis investigates has as goal to design a trainable well-funded procedure to clean and segment time series and then classify the segments according to desired criteria. For example, segment the table tennis data in different trials, remove outliers, find bouncing points, identify if the ball successfully landed on the opponent's court, or if the robot (or opponent) managed to answer some particular ball.
Location: The robot table tennis hardware is not located on TU-Darmstadt, it is located in Max-Planck Tübingen.

Benchmarking Reinforcement Learning Algorithms on Tetherball Games

Scope: Bachelor's Thesis
Advisor: Hany Abdulsamad
Start: ASAP
Topic: Given the rapid development of Reinforcement Learning in the recent years a high number of new approaches has been introduced. While most approaches seem to reach a good performance in simulation on a given task, it is often hard to compare different approaches in an adversarial setup on a real robot. For exactly this purpose we have built the Tetherball setup in our lab, where two robots can play/learn against/from each other with different controllers represeting state of the art RL algorithms. In the scope of this thesis, two algorithms are to be chosen and used to learn hitting policies on the real robot. Let the best AI win!

Chance-Guarantees for Model-Based Reinforcement Learning

Scope: Master's Thesis, Bachelor's thesis
Advisor: Hany Abdulsamad
Start: ASAP
Topic: In the domain of model-based Reinforcement Learning the quality of the policy update and rate of convergence depend heavily on the quality of approximated nonlinear dynamics. The nonlinearities may arise due to either a complex model strucutre; for exmaple a 7-link robot under the influence of gravity or due to real system constraints such as state and action limits. Hard nonlinearities such as state and actions constraints are rarely modelled in the general model-based Reinforcement Learning problem, leading to catastrophic approximation errors where the system hits the limits. The aim in this thesis is to incoporate a new type of constraints to absolutely bound the chance of a system leaving a certain area of the state-action space, thus allowing for better approximations and faster overall convergence.

Comparison of Intrinsic Motivation Models

Scope: Master's thesis
Advisor: Svenja Stark, Daniel Tanneberg
Start: Summer Semester 2018 (adaptable to the student's needs)
Topic: How intrinsic motivation drives humans to explore their surroundings and how this behavior can be transferred to robots has been an active research topic during the past years. In this thesis, we will implement and compare proposed intrinsic motivation algorithms as well as design several benchmarking tasks for evaluation such that we can provide a comparative survey. Furthermore, the survey should include comparisons to related / overlapping approaches (e.g.: reward shaping, exploration-exploitation tradeoff)
What we expect from you:
(I) strong background in machine learning, background in psychology
(II) at least basic knowledge in Python or Matlab
(III) high motivation and the ability to work independently
Picture taken from here

Deep Adversial Learning of Object Disentangling

Scope: Master's thesis
Advisor: Oleg Arenz, Joni Pajarinen
Start: ASAP
Topic: When confronted with large piles of entangled or otherwise stuck together objects a robot
has to separate the objects before further manipulation is possible. For example, in waste segregation
the robot may put different types of objects into different containers. In this Master thesis project, one
robot will learn to disentangle objects and another adversarial robot will learn to entangle objects.
Learning will be done on real robots shown in the picture right.
Background knowledge: robot learning

Deep Model Learning for Inverse Dynamics

Scope: Master's thesis, Bachelor's thesis
Advisor: Michael Lutter
Start: Anytime
Topic: One common robot learning task is to learn the inverse dynamics model using machine learning techniques. However, up to now the very recent advances in Deep Learning and Computing Hardware have not been tried out in learning inverses dynamics models online.

Therefore, this thesis should implement a Deep Learning model that learns the inverse dynamics model and evaluate the performance on a real-robotic system. So if your are excited to try out Deep Learning and want to get your hands dirty with real robots, this thesis is perfect for you. Additionally, you will present your thesis to ABB, a robot manufacturer that engineered inverse dynamics models for decades, and show them how efficient model learning is. So if you are interested in this this message, feel free to message me (michael@robot-learning.de)

TL;TR:

  • Learn Inverse Dynamics Models Online
  • Test your learned model on real robots
  • Impress ABB and publish your thesis at a conference/journal
  • Good knowledge of Machine Learning & Deep Learning required
  • Good programming skills in Python / C++ required

Deep Perceptual Primitives for Dynamic Environments:

Scope: Master's thesis, Bachelor's thesis
Advisor: Michael Lutter
Start: Anytime
Topic: Learning primitives for robotics mostly focused on learning open-loop policies for overly simplified static environments. Especially for two-armed robots such as Darias or YuMi open-loop policies do not provide sufficient flexibility, as the static policies cannot adapt their movements w.r.t. the other arm. Additionally, the currently learned primitives rely on low-dimensional and engineered feature representations. In the recent advent of Deep Learning, which is especially suitable for feature learning in high dimensional spaces, it should be possible to learn Deep Perceptual Primitives, which are closed-loop primitives using raw sensor data including visual data for action selection. The Deep Perceptual Primitives should be able to implicitly learn a good feature representation, the coordination across arms and the adaption to different environments.

Therefore, this thesis should develop a new primitive based on Deep Learning and evaluate the performance on a real-robotic system. Based on this research scope and depending on your time, your current knowledge and your interest, we can define a specific topic which fits you best. With your thesis you also get the chance to impress ABB, a leading manufacturer for robots, and publish your work at a leading conference. So if you are interested in this research scope, feel free to message me (michael@robot-learning.de)

TL;TR:

  • Develop Deep Learning Skills for robots
  • Test your learned skills on real robots
  • Impress ABB and publish your thesis at a conference/journal
  • Good knowledge of Machine Learning & Deep Learning required
  • Good programming skills in Python required

Deep multi-objective optimization

Scope: Master's thesis
Advisor: Simone Parisi
Start: February 2018
Topic: Recent advances in multi-objective optimization propose gradient-based approach to learn an approximation of the Pareto frontier manifold of all possible policies. However, this work is limited to linear policies and linear manifold approximators. Furthermore, they require to use a differentiable indicator of the quality of each policy, but the most used indicators are non-differentiable. Building on recent advances in multi-objective mathematical optimization and deep learning, we want to build differentiable approximators of the most used indicators and to use deep network to approximate the Pareto frontier manifold.

Deep Reinforcement Learning for Partially Observable Markov Decision Processes (POMDPs)

Scope: Master's Thesis
Advisor: Gregor Gebhardt
Start: Summer Semester 2018
Topic: Reinforcement learning approaches based on deep neural networks usually try to learn a Q-function or a policy using the fully observable state as the input. Recent advances towards RL in the partailly observable setting use standard approaches and inject LSTM layers into the network structure Hausknecht et al. Deep recurrent q-learning for partially observable MDPs. In this thesis, we want to develop a new approach based on the recurrent Kalman network (RKN) structure. The RKN learns not only an internal state representation but also a confidence value, which could potentially be exploited for actively gathering information about the state.

DETECT: A Deep End-To-End Calibration Thesis

Scope: Master's thesis
Advisor: Oleg Arenz, Joni Pajarinen
Start: Summer Semester 2018
Topic:

Methods for learning the relative pose of a robot-attached camera often rely on training data that include the pose of calibration objects. By depending on a robust and precise detection method they are often cumbersome to apply or do not yield satisfactory results. This thesis tackles the problem of recovering the pose of the camera from easily attainable training data, namely robot configuration and corresponding RGB(D) camera images. Your task is to train a convolutional-deconvolutional auto-encoder that is capable of reproducing the images based on a very low dimensional bottleneck layer. In order to achieve this, the neural network has to memorize the environment of the robot and recover information about the camera pose in the bottleneck. Based on constrains that are leveraged from the known robot configuration, you should furthermore force the bottleneck layer to correspond to transformation parameters.

Interactive dance: at the interface of high-level reactive robotic control and human-robot interactions.

Scope: Master's thesis
Advisor: Vincent Berenz (a collaborator at Tübingen at the at the Max Planck Institute for Intelligent Systems)
Start: ASAP
Topic:

Robotic scripted dance is common. One the other hand, interactive dance, in which the robot uses runtime sensory information to continuously adapt its moves to those of its (human) partner, remains challenging. It requires integration of together various sensors, action modalities and cognitive processes. The selected candidate objective will be to develop such an interactive dance, based on the software suit for simultaneous perception and motion generation our department built over the years. The target robot on which the dance will be applied is the wheeled robot Softbank Robotics Pepper. This master thesis is with the Max Planck Institute for Intelligent Systems and is located in Tuebingen. More information: https://am.is.tuebingen.mpg.de/jobs/master-thesis-interactive-dance-performed-by-sofbank-robotics-pepper

Improving Movement Grammars through Human Interaction

Scope: Master's Thesis
Advisor: Rudolf Lioutikov
Start: Already taken, but please contact me if this topic interests you
Topic: The rule based structure of formal grammar, e.g, regular grammar, context free grammar, represents an intuitive representation of complex, recursive relation between entities. This this easily comprehensible representation is especially desirable when conveying hierarchical or structured information. We are currently exploring formal grammars as a means to sequence movement primitives while intuitively conveying the behavioral capabilities of the robot, wrt a library of primitives, to non-expert users. In our highly dynamical world, however, it is unrealistic to expect a primitive library being able to cope with every possible scenario. Instead this thesis is focused on exploring methods to improve and extend such grammars through physical interaction with the robot or direct interaction with the grammar.

Joint Learning of Humans and Robots

Scope: Master's Thesis, Bachelor's thesis
Advisor: Marco Ewerton
Start: Already taken, but please contact me if this topic interests you
Topic: Recent research has leveraged Learning from Demonstrations and Probabilistic Movement Representations to allow humans and robots to efficiently perform tasks together, such as moving objects from one location to another without hitting obstacles in the way Co-manipulation with a Library of Virtual Guiding Fixtures. In some situations, however, it might be not trivial to provide good demonstrations to the robot. Moreover, the robot and the human might need to adapt their respective behaviors with time in order to get used to each other and achieve better performance in previously unknown environments. Intelligent prosthetic limbs or exoskeletons could for instance adapt to human users as the users themselves get accustomed to those devices and as both agents face new environments. In this project, the student will explore Learning from Demonstrations, Probabilistic Movement Primitives and Policy Search algorithms in order to enable robots to assist humans in shared control tasks.

Learning a Friction Hystersis with MOSAIC

Scope: Master's thesis, Bachelor thesis
Advisor: Jan Peters
Start: ASAP
Topic: Inspired by results in neuroscience, especially in the Cerebellum, Kawato & Wolpert introduced the idea of the MOSAIC (modular selection and identification for control) learning architecture. In this architecture, local forward models, i.e., models that predict future states and events, are learned directly from observations. Based on the prediction accuracy of these models, corresponding inverse models can be learned. In this thesis, we want to focus on the problem of learning to control a robot system with a hysteresis in its friction.

Learning Forward Models for High Dimensional Tactile States

Scope: Master's Thesis
Advisor: Filipe Veiga
Start: SoSe 2018
Topic: For humans, the ability to predict changes to their visual and haptic perception as they execute actions is crucial for evaluating action success. For example, during the execution of a trajectory, and assuming good forward models, deviations from the predicted states can trigger online corrections such that the trajectory error is minimized. For tasks where haptic feedback is relevant (grasping, manipulation, object servoing, etc.), and as state-of-the-art sensing technologies provide more complex and high dimensional signals, there is a lack of forward models that are sufficiently accurate for a controller to generate prediction based control signals. This thesis focuses on building such models using unsupervised Deep learning approaches (such as Variational auto-encoders) and to integrate the models in a model predictive control framework.

Learning Locally Linear Dynamical Systems For Reinforcement Learning

Scope: Master's Thesis, Bachelor's thesis
Advisor: Hany Abdulsamad
Start: ASAP
Topic: Model-based Reinforcement Learning is an approach to learn complex tasks given local approximations of the nonlinear dynamics of the environment and cost functions. It has proven to be a sample efficient approach for learning on real robots. Classical approaches for learning such local models have certain restrictions on the overall structure; for example the number of local componants and switching dynamics. State of the art research has recently moved to more general settings with nonparameteric approaches that require less structure. The aim of this thesis is to review the literature on this subject and to compare existing algorithms on real robots like the BioRob or the Barrett WAM.

Learning to Support a Learning Agent in a Cooperative Setup

Scope: Master's Thesis, Bachelor's thesis
Advisor: Hany Abdulsamad, Marco Ewerton
Start: ASAP
Topic: A great challenge in applying Reinforcement Learning approaches is the need for human intervention to reset the scenario of a learned task, making the process very tedious and time consuming. A clear example is learning table tennis, where we are either limited to using a ball gun with predictable pattern of initial positions or a human is needed to play against the robot. However given a second robotic player, we propose a new setup, in which the two agents cooperate to develop two different strategies, where one agent learns to support the second in becoming a great table tennis player. It is interesting to see if in such a scenario the agents would be able to discover what might resemble a defensive and an aggressive strategy in table tennis. The thesis will concentrate on developing the concept of cooperation and testing the results in simulation and on our own real table tennis setup.

Learning Variational Autoencoders for Movement Grammars

Scope: Master's Thesis
Advisor: Rudolf Lioutikov
Start: Already taken, but please contact me if this topic interests you
Topic: The rule based structure of formal grammar, e.g, regular grammar, context free grammar, represents an intuitive representation of complex, recursive relation between entities. This this easily comprehensible representation is especially desirable when conveying hierarchical or structured information. We are currently exploring formal grammars as a means to sequence movement primitives while intuitively conveying the behavioral capabilities of the robot, wrt a library of primitives, to non-expert users. These Grammar are induced from previously labeled demonstrations, where each terminal symbol describes a movement primitive. In order to induce an easily understandable grammar, we define a prior over the grammar structure. A major problem of grammar induction is the lack of a proper distance or similarity measure between grammars. The goal of the thesis is to use movement grammars as inputs for Variational Autoencoders, and to research the properties of the latent space. A latent space representing an abstract continuous projection of the grammar space, has the potential for a huge impact in various areas applying grammars, since it allows to use methods based on distances and similarities, e.g. gradient descent and policy search for grammar induction in a way not seen before.

Minimizing Bellman Residual in Deep Deterministic Policy Gradient

Scope: Master's Thesis
Advisor: Samuele Tosatto
Start: Summer Semester 2018
Topic: Deep Deterministic Policy Gradient (DDPG), is a Reinforcement Learning algorithm which uses deep neural network to represent both the actor and the critic and to update them consequently. The goal is to find an optimal deterministic policy which solves the given task. DDPG had shown to have a great performance on some task, both with a low-dimensional input and with a multidimensional one (such as raw pixel image). The success of DDPG is interesting also in the field of robotics, since a deterministic policy could be often preferable then a stochastic one on a real system. In this thesis the student will exploit a modification of the update rule of the critic, by directly minimizing the Bellman residual, and by introducing a penalization term, which should "synchronize" better the update of the critic with respect to the actor, in such the way that the actor will remain more greedy with respect to the critic itself. The thesis is intended to be theoretical, at least in the beginning, and the student will conduct several experiments in order to define how the modified update will impact on the optimization process. In a second phase, when the theoretical results are consolidated, the student, together with his advisor, will define a robot task in order to compare the standard DDPG and the modified version.

Recover from failure with stochastic recurrent networks

Scope: Master's thesis
Advisor: Daniel Tanneberg
Start: negotiable
Topic: One of the major challenges in robotics is the concept of developmental robots, i.e., robots that develop and adapt autonomously through lifelong-learning. Especially the ability to recover from failures like a broken joint is a crucial ability for autonomous robots. Therefore, we want to investigate how this ability can be realized in a recent framework for online adaptation based on a bio-inspired stochastic recurrent network and intrinsic motivation signals [ 1 , 2 ].
What we expect from you:
(I) (strong) background in machine learning
(II) good programming skills in python
(III) high motivation and ability to work independently (!)
(IV) flexibility for the duration time

Robotics Under Partial Observability

Scope: Master's thesis, Bachelor's thesis
Advisor: Joni Pajarinen
Start: ASAP

Topic: Partial observability is a defining property of robotics. Noisy sensors and actuators make state estimation difficult and even with accurate sensors occlusions prevent full observability. To gain full autonomy, a robot should use available observations for both state estimation and to plan how to gain the information required for performing the assigned tasks. Recently approaches which take partial observability into account have gained traction for example in autonomous driving, household robotics, and interactive perception. This Bachelor/Master thesis focuses on surveying the literature with respect to partial observability in robotics; categorizing different approaches and discussing open questions. This thesis topic is a good fit for a student who likes searching for and categorising information and likes to get a deeper understanding of the state-of-the-art.

Super-human Decision Making Under Uncertainty

Scope: Master's thesis
Advisor: Joni Pajarinen
Start: ASAP
Topic: Google Deepmind recently showed how Monte Carlo Tree Search (MCTS) combined with neural networks can be used to play Go on a super-human level. However, one disadvantage of MCTS is that the search tree explodes exponentially with respect to the planning horizon. In this Master thesis the student will integrate the advantages of MCTS, that is, optimistic decision making into a policy representation that is limited in size with respect to the planning horizon. The outcome will be an approach that can plan further into the future. The application domain will include partially observable problems where decisions can have far reaching consequences.

Targeted Exploration Using Value Bounds

Scope: Master's thesis
Advisor: Joni Pajarinen
Start: ASAP
Topic: Efficient exploration is one of the most prominent challenges in deep reinforcement learning. In reinforcement learning, exploration of the state space is critical for finding high value actions and connecting them to the causing actions. Exploration in model-free reinforcement learning has relied on classical techniques, empirical uncertainty estimates of the value function, or random policies. In model-based reinforcement learning value bounds have been used successfully to direct exploration. In this Master thesis project the student will investigate how lower and upper value bounds can be used to target exploration in model-free reinforcement learning into the most promising parts of the state space. This thesis topic requires background knowledge in reinforcement learning gained e.g. through machine learning or robot learning courses.

Understanding through Reinforcement Learning

Scope: Master's Thesis
Advisor: Riad Akrour
Start: Summer Semester 2018
Topic: We quantify an agent's understanding of its environment by the expected number of times an agent needs to interact with its environment before solving a reinforcement learning problem. The expectation takes into account the stochasticity of the environment but most importantly is w.r.t. a distribution over reward functions (we do not consider only a single task). An agent has a full understanding of its environment if it is able to return an optimal policy to any given reward function without any additional interaction. To endow the agent with such capabilities we propose to learn (by RL) an actor-critic architecture that takes as input the current and target states and outputs the next action to apply (actor) and the expected time required to reach the target (critic). The resulting policy and critic can be respectively seen as a (multi-step) inverse dynamics model and a (multi-step) forward dynamics probabilistic model. The impact of such models on the agent's understanding will be demonstrated by the ability to reproduce arbitrary trajectories in environments with complex dynamics.

  

zum Seitenanfang