Hybrid machine-learning (ML) approaches that combine deep-learning with model based approaches promises the "best of both worlds." While some methods can be combined in a common framework, e.g. mean-field variational message passing and variational autoencoders [1], realizing such a hybrid methods is not trivial. Challenges arise e.g. due to the computational complexity of model-based algorithms which slows down the training of the ML part. Thus, the aim of this thesis is to investigate hybrid inference methods that combine deep-learning with model-based approaches, e.g. in the context of multi-pitch estimation [2] where the signals from tonal instruments can be well-modeled as a periodic signal (e.g. using Fourier series) but non-tonal instruments like a drum kit or other percussive instruments cannot be modeled in the same way.