We recently developed a state-of-the-art multipitch-detection algorithm [1] that works on a frame-by-frame basis. However, it is expected that the algorithms performance can be improved by tracking the appearance/disappearance of pitches (i.e. tones) over multiple frames. Thus, the aim of this thesis to use the frame-by-frame estimates of [1] as pre-processing algorithm for a multi-object tracking algorithm, e.g. based on belief propagation (BP) and the sum-product-algorithm (SPA) [2].