# Vector representations of categories

After thinking a while over the last point in my list of topics in machine learning I was going to think over...

I finally came to the thought that I’d better use some **graphical model**, since there is a great majority of possible algorithms without any warranty of their workability.

So, first thing — probably one can use LDA (Latent Dirichlet allocation, wiki) to find good representation for categories. In this case we should use bag-of-words model, and `categories` will stand for words.

Yes, the model of LDA doesn’t seem to be appropriate, though I’m sure it will give reasonable representations.

Another graphical model I came with, is much simpler (and probably was developed a long time ago, but I don’t know).

It’s generative model looks looks this:

- First, a `topic` of event is generated. That’s an n-dimensional vector $t_{event} \in \mathbb{R}^n$ drawn from gaussian distribution.
- For each category, say, `site_id`, each value of site_id has it’s own topic $t_{cat\_value}$ of same dimension $n$. The greater dot product $(t_{cat\_value}, t_{event})$, the higher probability that this value of category will be chosen. I shall stress here, that I’m not thinking over the arbitrary number of categories, I’m currently interested in the case, where the number and types of categories are fixed.
- To be more precise, the probability of each value within category to be drawn for the event with $t_{event}$ vector is proportional to $$p(\text{cat\_value} | \text{event}) \sim p(\text{cat\_value}) \times e^{(t_{cat\_value}, t_{event})} $$ so, to compute final probability one shall use softmax function.

PS. After writing this I understood that usage of decision trees to generate ids of leaves — new ’categories’ + usage of LibFM over these new features was a very interesting and good idea (this idea appeared recently in one of kaggle challenges).