I was wondering how one can modify Expectation-Maximization procedure for fitting mixtures (well, gaussian mixtures, because it’s the only nontrivial and quite general multivariate distribution that can be fitted easily) to support really many overlapping summands in the mixture.

Randomization probably can be a solution to this problem. It seems that expectation in EM algorithm is the proper place to introduce it, but in this case maximization step should receive some inertial momentum-like corrections.

Probably it is a good idea to remind how EM works. There are two steps that are computed iteratively:

  • (Expectation) where we compute probability that each particular event belongs to each distribution (which component ‘explains’ observation best)
  • (Maximization) where given the probabilities we maximize parameters of each distribution.

What if we sample events according to distribution from expectation step? At each stage we attribute each event to one (in simplest case) component of mixture, or maybe several of them (that’s the main difference I want to achieve) . This kind of randomization should prevent ‘shrinking’ of component distributions.

The core idea I am trying to introduce here is very similar to dropout — a trick, which allowed researches to train neural networks with more parameters than amount of observations we have.