I am a PhD student at the Amsterdam Machine Learning Lab (AMLab) supervised by Jan-Willem van de Meent. Before September 2021, I was a PhD student at the Khoury College of Computer Science.

I am interested in probabilistic modeling, inference, and ways to automate these tasks using probabilistic programming systems. My recent work has been focused on combining importance- sampling-based methods with variational inference as well as developing corresponding program- ming abstractions for probabilistic programming.

# Publications

Zimmermann, H., Wu, H., Esmaeili, B., & van de Meent, J.-W. (2021, December). Nested Variational Inference. 35th Conference on Neural Information Processing Systems (NeurIPS).

We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an forward or reverse KL divergence at each level of nesting. NVI is applicable to many commonly-used importance sampling strategies and provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler. Our experiments apply NVI to (a) sample from a multimodal distribution using a learned annealing path (b) learn heuristics that approximate the likelihood of future observations in a hidden Markov model and (c) to perform amortized inference in hierarchical deep generative models. We observe that optimizing nested objectives leads to improved sample quality in terms of log average weight and effective sample size.

@inproceedings{zimmermann2021nested,
title = {Nested Variational Inference},
author = {Zimmermann, Heiko and Wu, Hao and Esmaeili, Babak and van de Meent, Jan-Willem},
booktitle = {35th Conference on Neural Information Processing Systems (NeurIPS)},
year = {2021}
}

Zimmermann, H., Stites, S., Wu, H., Sennesh, E., & van de Meent, J.-W. (2021, July). Learning Proposals for Probabilistic Programs with Inference Combinators. 37th Conference on Uncertainty in Artificial Intelligence (UAI).

We develop operators for construction of proposals in probabilistic programs, which we refer to as inference combinators. Inference combinators define a grammar over importance samplers that compose primitive operations such as application of a transition kernels and importance resampling. Proposals in these samplers can be parameterized using neural networks, which in turn can be trained by optimizing variational objectives. The result is a framework for user-programmable variational methods that are correct by construction and can be tailored to specific models. We demonstrate the flexibility of this framework in applications to advanced variational methods based on amortized Gibbs sampling and annealing.

@inproceedings{stites2021combinators,
title = {Learning Proposals for Probabilistic Programs with Inference Combinators},
author = {Zimmermann, Heiko and Stites, Sam and Wu, Hao and Sennesh, Eli and van de Meent, Jan-Willem},
booktitle = {37th Conference on Uncertainty in Artificial Intelligence (UAI)},
year = {2021}
}

Zimmermann, H., Wu, H., Esmaeili, B., Stites, S., & van de Meent, J.-W. (2021). Nested Variational Inference. Symposium on Advances in Approximate Bayesian Inference (AABI).

We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an inclusive or exclusive KL divergence at each level of nesting. NVI is applicable to many commonly-used importance sampling strategies and additionally provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler. Our experiments apply NVI to learn samplers targeting (a) an unnormalized density using annealing and (b) the posterior of a hidden Markov model. We observe improved sample quality in terms of log average weight and effective sample size.

@article{zimmermann2021aabi,
title = {Nested Variational Inference},
author = {Zimmermann, Heiko and Wu, Hao and Esmaeili, Babak and Stites, Sam and van de Meent, Jan-Willem},
journal = {Symposium on Advances in Approximate Bayesian Inference (AABI)},
year = {2021}
}

Wu, H., Zimmermann, H., Sennesh, E., Le, T. A., & van de Meent, J.-W. (2020, July). Amortized Population Gibbs Samplers with Neural Sufficient Statistics. Proceeding of the International Conference on Machine Learning (ICML).

Amortized variational methods have proven difficult to scale to structured problems, such as inferring positions of multiple objects from video images. We develop amortized population Gibbs (APG) samplers, a class of scalable methods that frames structured variational inference as adaptive importance sampling. APG samplers construct high-dimensional proposals by iterating over updates to lower-dimensional blocks of variables. We train each conditional proposal by minimizing the inclusive KL divergence with respect to the conditional posterior. To appropriately account for the size of the input data, we develop a new parameterization in terms of neural sufficient statistics. Experiments show that APG samplers can train highly structured deep generative models in an unsupervised manner, and achieve substantial improvements in inference accuracy relative to standard autoencoding variational methods.

@inproceedings{wu2019amortized,
title = {Amortized Population Gibbs Samplers with Neural Sufficient Statistics},
author = {Wu, Hao and Zimmermann, Heiko and Sennesh, Eli and Le, Tuan Anh and van de Meent, Jan-Willem},
booktitle = {Proceeding of the International Conference on Machine Learning (ICML)},
year = {2020}
}

Wochner, I., Driess, D., Zimmermann, H., Haeufle, D. F. B., Toussaint, M., & Schmitt, S. (2020). Optimality Principles in Human Point-to-Manifold Reaching Accounting for Muscle Dynamics. Frontiers in Computational Neuroscience, 14, 38.

Human arm movements are highly stereotypical under a large variety of experimental conditions. This is striking due to the high redundancy of the human musculoskeletal system, which in principle allows many possible trajectories toward a goal. Many researchers hypothesize that through evolution, learning, and adaption, the human system has developed optimal control strategies to select between these possibilities. Various optimality principles were proposed in the literature that reproduce human-like trajectories in certain conditions. However, these studies often focus on a single cost function and use simple torque-driven models of motion generation, which are not consistent with human muscle-actuated motion. The underlying structure of our human system, with the use of muscle dynamics in interaction with the control principles, might have a significant influence on what optimality principles best model human motion. To investigate this hypothesis, we consider a point-to-manifold reaching task that leaves the target underdetermined. Given hypothesized motion objectives, the control input is generated using Bayesian optimization, which is a machine learning based method that trades-off exploitation and exploration. Using numerical simulations with Hill-type muscles, we show that a combination of optimality principles best predicts human point-to-manifold reaching when accounting for the muscle dynamics.

@article{20-wochner-frontiers,
title = {Optimality Principles in Human Point-to-Manifold Reaching Accounting for Muscle Dynamics},
author = {Wochner, Isabell and Driess, Danny and Zimmermann, Heiko and Haeufle, Daniel F. B. and Toussaint, Marc and Schmitt, Syn},
journal = {Frontiers in Computational Neuroscience},
volume = {14},
pages = {38},
year = {2020}
}

Driess, D., Zimmermann, H., Wolfen, S., Suissa, D., Haeufle, D., Hennes, D., Toussaint, M., & Schmitt, S. (2018, May). Learning to Control Redundant Musculoskeletal Systems with Neural Networks and SQP: Exploiting Muscle Properties. Proceeding of the IEEE Int. Conference on Robotics and Automation (ICRA).

Modeling biomechanical musculoskeletal systems reveals that the mapping from muscle stimulations to movement dynamics is highly nonlinear and complex, which makes it difficult to control those systems with classical techniques. In this work, we not only investigate whether machine learning approaches are capable of learning a controller for such systems. We are especially interested in the question if the structure of the musculoskeletal apparatus exhibits properties that are favorable for the learning task. In particular, we consider learning a control policy from target positions to muscle stimulations. To account for the high actuator redundancy of biomechanical systems, our approach uses a learned forward model represented by a neural network and sequential quadratic programming to obtain the control policy, which also enables us to alternate the co-contraction level and hence allows to change the stiffness of the system and to include optimality criteria like small muscle stimulations. Experiments on both a simulated musculoskeletal model of a human arm and a real biomimetic muscle-driven robot show that our approach is able to learn an accurate controller despite high redundancy and nonlinearity, while retaining sample efficiency.

@inproceedings{18-driess-ICRA,
title = {Learning to Control Redundant Musculoskeletal Systems with Neural Networks and {SQP}: Exploiting Muscle Properties},
author = {Driess, Danny and Zimmermann, Heiko and Wolfen, Simon and Suissa, Dan and Haeufle, Daniel and Hennes, Daniel and Toussaint, Marc and Schmitt, Syn},
booktitle = {Proceeding of the IEEE Int. Conference on Robotics and Automation (ICRA)},
year = {2018}
}

Vien, N. A., Zimmermann, H., & Toussaint, M. (2018, February). Bayesian Functional Optimization. Proceeding of The AAAI Conference on Artificial Intelligence (AAAI).

Bayesian optimization (BayesOpt) is a derivative-free approach for sequentially optimizing stochastic black-box functions. Standard BayesOpt, which has shown many successes in machine learning applications, assumes a finite dimensional domain which often is a parametric space. The parameter space is defined by the features used in the function approximations which are often selected manually. Therefore, the performance of BayesOpt inevitably depends on the quality of chosen features. This paper proposes a new Bayesian optimization framework that is able to optimize directly on the domain of function spaces. The resulting framework, Bayesian Functional Optimization (BFO), not only extends the application domains of BayesOpt to functional optimization problems but also relaxes the performance dependency on the chosen parameter space. We model the domain of functions as a reproducing kernel Hilbert space (RKHS), and use the notion of Gaussian processes on a real separable Hilbert space. As a result, we are able to define traditional improvement-based (PI and EI) and optimistic acquisition functions (UCB) as functionals. We propose to optimize the acquisition functionals using analytic functional gradients that are also proved to be functions in a RKHS. We evaluate BFO in three typical functional optimization tasks: i) a synthetic functional optimization problem, ii) optimizing activation functions for a multi-layer perceptron neural network, and iii) a reinforcement learning task whose policies are modeled in RKHS

@inproceedings{18-ngo-AAAI,
title = {Bayesian Functional Optimization},
author = {Vien, Ngo Anh and Zimmermann, Heiko and Toussaint, Marc},
booktitle = {Proceeding of The AAAI Conference on Artificial Intelligence (AAAI)},
year = {2018}
}