About me
I am a PhD student at the Amsterdam Machine Learning Lab (AMLab). I am interested in probabilistic modeling, approximate inference, and ways to automate these tasks using probabilistic programming systems.
Publications
@inproceedings{esmaeili2023topological, title = {Topological Obstructions and How to Avoid Them}, author = {Esmaeili, Babak and Walters, Robin and Zimmermann, Heiko and van de Meent, Jan-Willem}, booktitle = {Conference on Neural Information Processing Systems (NeurIPS)}, year = {2023} }
Generative flow networks (GFNs) are a class of probabilistic models for sequential sampling of composite objects, proportional to a target distribution that is defined in terms of an energy function or a reward. GFNs are typically trained using a flow matching or trajectory balance objective, which matches forward and backward transition models over trajectories. In this work we introduce a variational objective for training GFNs, which is a convex combination of the reverse- and forward KL divergences, and compare it to the trajectory balance objective when sampling from the forward- and backward model, respectively. We show that, in certain settings, variational inference for GFNs is equivalent to minimizing the trajectory balance objective, in the sense that both methods compute the same score-function gradient. This insight suggests that in these settings, control variates, which are commonly used to reduce the variance of score-function gradient estimates, can also be used with the trajectory balance objective. We evaluate our findings and the performance of the proposed variational objective numerically by comparing it to the trajectory balance objective on two synthetic tasks.
@article{zimmermann2023variational, title = {A Variational Perspective on Generative Flow Networks}, author = {Zimmermann, Heiko and Lindsten, Fredrik and van de Meent, Jan-Willem and Naesseth, Christian A}, journal = {Transactions on Machine Learning Research}, issn = {2835-8856}, year = {2023}, code = {https://github.com/zmheiko/variational-perspective-on-gflownets} }
Accurate epidemiological models require parameter estimates that account for mobility patterns and social network structure. We demonstrate the effectiveness of probabilistic programming for parameter inference in these models. We consider an agent-based simulation that represents mobility networks as degree-corrected stochastic block models, whose parameters we estimate from cell phone co-location data. We then use probabilistic program inference methods to approximate the distribution over disease transmission parameters conditioned on reported cases and deaths. Our experiments demonstrate that the resulting models improve the quality of fit in multiple geographies relative to baselines that do not model network topology.
@article{smedemark2022probabilistic, title = {Probabilistic program inference in network-based epidemiological simulations}, author = {Smedemark-Margulies, Niklas and Walters, Robin and Zimmermann, Heiko and Laird, Lucas and van der Loo, Christian and Kaushik, Neela and Caceres, Rajmonda and van de Meent, Jan-Willem}, journal = {PLOS Computational Biology}, volume = {18}, number = {11}, year = {2022}, publisher = {Public Library of Science San Francisco, CA USA} }
We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an forward or reverse KL divergence at each level of nesting. NVI is applicable to many commonly-used importance sampling strategies and provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler. Our experiments apply NVI to (a) sample from a multimodal distribution using a learned annealing path (b) learn heuristics that approximate the likelihood of future observations in a hidden Markov model and (c) to perform amortized inference in hierarchical deep generative models. We observe that optimizing nested objectives leads to improved sample quality in terms of log average weight and effective sample size.
@inproceedings{zimmermann2021nested, title = {Nested Variational Inference}, author = {Zimmermann, Heiko and Wu, Hao and Esmaeili, Babak and van de Meent, Jan-Willem}, booktitle = {Conference on Neural Information Processing Systems (NeurIPS)}, year = {2021} }
paper supplemenary arXiv abstract bibtex
We develop operators for construction of proposals in probabilistic programs, which we refer to as inference combinators. Inference combinators define a grammar over importance samplers that compose primitive operations such as application of a transition kernels and importance resampling. Proposals in these samplers can be parameterized using neural networks, which in turn can be trained by optimizing variational objectives. The result is a framework for user-programmable variational methods that are correct by construction and can be tailored to specific models. We demonstrate the flexibility of this framework in applications to advanced variational methods based on amortized Gibbs sampling and annealing.
@inproceedings{stites2021combinators, title = {Learning Proposals for Probabilistic Programs with Inference Combinators}, author = {Zimmermann, Heiko and Stites, Sam and Wu, Hao and Sennesh, Eli and van de Meent, Jan-Willem}, booktitle = {37th Conference on Uncertainty in Artificial Intelligence (UAI)}, year = {2021} }
We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an inclusive or exclusive KL divergence at each level of nesting. NVI is applicable to many commonly-used importance sampling strategies and additionally provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler. Our experiments apply NVI to learn samplers targeting (a) an unnormalized density using annealing and (b) the posterior of a hidden Markov model. We observe improved sample quality in terms of log average weight and effective sample size.
@article{zimmermann2021aabi, title = {Nested Variational Inference}, author = {Zimmermann, Heiko and Wu, Hao and Esmaeili, Babak and Stites, Sam and van de Meent, Jan-Willem}, journal = {Symposium on Advances in Approximate Bayesian Inference (AABI)}, year = {2021} }
Amortized variational methods have proven difficult to scale to structured problems, such as inferring positions of multiple objects from video images. We develop amortized population Gibbs (APG) samplers, a class of scalable methods that frames structured variational inference as adaptive importance sampling. APG samplers construct high-dimensional proposals by iterating over updates to lower-dimensional blocks of variables. We train each conditional proposal by minimizing the inclusive KL divergence with respect to the conditional posterior. To appropriately account for the size of the input data, we develop a new parameterization in terms of neural sufficient statistics. Experiments show that APG samplers can train highly structured deep generative models in an unsupervised manner, and achieve substantial improvements in inference accuracy relative to standard autoencoding variational methods.
@inproceedings{wu2019amortized, title = {Amortized Population Gibbs Samplers with Neural Sufficient Statistics}, author = {Wu, Hao and Zimmermann, Heiko and Sennesh, Eli and Le, Tuan Anh and van de Meent, Jan-Willem}, booktitle = {Proceeding of the International Conference on Machine Learning (ICML)}, year = {2020} }
Human arm movements are highly stereotypical under a large variety of experimental conditions. This is striking due to the high redundancy of the human musculoskeletal system, which in principle allows many possible trajectories toward a goal. Many researchers hypothesize that through evolution, learning, and adaption, the human system has developed optimal control strategies to select between these possibilities. Various optimality principles were proposed in the literature that reproduce human-like trajectories in certain conditions. However, these studies often focus on a single cost function and use simple torque-driven models of motion generation, which are not consistent with human muscle-actuated motion. The underlying structure of our human system, with the use of muscle dynamics in interaction with the control principles, might have a significant influence on what optimality principles best model human motion. To investigate this hypothesis, we consider a point-to-manifold reaching task that leaves the target underdetermined. Given hypothesized motion objectives, the control input is generated using Bayesian optimization, which is a machine learning based method that trades-off exploitation and exploration. Using numerical simulations with Hill-type muscles, we show that a combination of optimality principles best predicts human point-to-manifold reaching when accounting for the muscle dynamics.
@article{20-wochner-frontiers, title = {Optimality Principles in Human Point-to-Manifold Reaching Accounting for Muscle Dynamics}, author = {Wochner, Isabell and Driess, Danny and Zimmermann, Heiko and Haeufle, Daniel F. B. and Toussaint, Marc and Schmitt, Syn}, journal = {Frontiers in Computational Neuroscience}, volume = {14}, pages = {38}, year = {2020} }
Modeling biomechanical musculoskeletal systems reveals that the mapping from muscle stimulations to movement dynamics is highly nonlinear and complex, which makes it difficult to control those systems with classical techniques. In this work, we not only investigate whether machine learning approaches are capable of learning a controller for such systems. We are especially interested in the question if the structure of the musculoskeletal apparatus exhibits properties that are favorable for the learning task. In particular, we consider learning a control policy from target positions to muscle stimulations. To account for the high actuator redundancy of biomechanical systems, our approach uses a learned forward model represented by a neural network and sequential quadratic programming to obtain the control policy, which also enables us to alternate the co-contraction level and hence allows to change the stiffness of the system and to include optimality criteria like small muscle stimulations. Experiments on both a simulated musculoskeletal model of a human arm and a real biomimetic muscle-driven robot show that our approach is able to learn an accurate controller despite high redundancy and nonlinearity, while retaining sample efficiency.
@inproceedings{18-driess-ICRA, title = {Learning to Control Redundant Musculoskeletal Systems with Neural Networks and {SQP}: Exploiting Muscle Properties}, author = {Driess, Danny and Zimmermann, Heiko and Wolfen, Simon and Suissa, Dan and Haeufle, Daniel and Hennes, Daniel and Toussaint, Marc and Schmitt, Syn}, booktitle = {Proceeding of the IEEE Int. Conference on Robotics and Automation (ICRA)}, year = {2018} }
paper supplemenary abstract bibtex
Bayesian optimization (BayesOpt) is a derivative-free approach for sequentially optimizing stochastic black-box functions. Standard BayesOpt, which has shown many successes in machine learning applications, assumes a finite dimensional domain which often is a parametric space. The parameter space is defined by the features used in the function approximations which are often selected manually. Therefore, the performance of BayesOpt inevitably depends on the quality of chosen features. This paper proposes a new Bayesian optimization framework that is able to optimize directly on the domain of function spaces. The resulting framework, Bayesian Functional Optimization (BFO), not only extends the application domains of BayesOpt to functional optimization problems but also relaxes the performance dependency on the chosen parameter space. We model the domain of functions as a reproducing kernel Hilbert space (RKHS), and use the notion of Gaussian processes on a real separable Hilbert space. As a result, we are able to define traditional improvement-based (PI and EI) and optimistic acquisition functions (UCB) as functionals. We propose to optimize the acquisition functionals using analytic functional gradients that are also proved to be functions in a RKHS. We evaluate BFO in three typical functional optimization tasks: i) a synthetic functional optimization problem, ii) optimizing activation functions for a multi-layer perceptron neural network, and iii) a reinforcement learning task whose policies are modeled in RKHS
@inproceedings{18-ngo-AAAI, title = {Bayesian Functional Optimization}, author = {Vien, Ngo Anh and Zimmermann, Heiko and Toussaint, Marc}, booktitle = {Proceeding of The AAAI Conference on Artificial Intelligence (AAAI)}, year = {2018} }