If you heard something of theoretical mechanics, you definitely know that the most important transition in mechanics is one from Lagrangian Mechanics to Hamiltonian. And its name is Legendre transform.

This transition is some kind of magic that to my mind is considered usually as some secret recipe. The recipe is
  • take the momenta: $p_i=\frac{\partial L}{\partial \dot{q_i}}$.
  • prove that Hamilton equations hold: $$\frac{\partial H}{\partial q_j}=-\dot{p}_j,\qquad\frac{\partial H}{\partial p_j}=\dot{q}_j$$ where  $$ H\left(q,\;p,\;t\right)=\sum_i\dot{q}_i p_i-L(q,\;\dot{q},\;t) $$
And that is all the idea. But why should I take this Hamiltonian not some other function? Why should I consider Lagrangian derivatives as new variables instead of $q_i$ ? I didn't meet the answers in mechanics courses, though there is one simple intuitive justification. That's Pasha Gavrilenko who told me about it.

What do we have initially? An action on some interval of time, this is the integral 
$$ S = \min_{q(\cdot)} \int_{t_1}^{t_2}  L(q, \dot{q}, t) dt$$ which should be minimized (locally), that what Hamilton's principle states. (There are conditions on the endpoints which I will omit) 

Ok, the only trouble is $q$ and $\dot{q}$ aren't independent, otherwise the Lagrange equations would be much simpler: $ \frac{\partial L}{\partial q} = \frac{\partial L}{\partial \dot{q}} = 0 $

Let us try to replace $\dot(q)$ with $v$, assuming they are equal: $$ S = \min_{q(\cdot)} \int_{t_1}^{t_2}  L(q, v, t)\bigg|_{v=\dot q} dt$$

Hmhmhm. Seems nothing changed. Now the trick. Let's add summand
$$\delta(\dot{q},v) = \begin{cases} 0 & \dot{q} = v \\ +\infty & \text{otherwise} \end{cases}$$ and now we can minimize over all possible trajectories of $q$ and of $v$.  $$ S = \min_{q(\cdot), v(\cdot)} \left[ \int_{t_1}^{t_2}  L(q, v, t) dt + \delta[q,v] \right] $$

See? We have now $q$ and $v$ independent by the cost of additional summand. Now we can write $\delta$ in the following form (make sure you understand it):
$$\delta[q,v] = \max_{p(\cdot)} \int_{t_1}^{t_2}  p(t) (\dot{q}(t) - v(t)) dt $$

After substitution we have the problem on finding saddle point of function:  $$ S = \min_{q(\cdot), v(\cdot)} \max_{p(\cdot)} \int_{t_1}^{t_2}   p(t) (\dot{q}(t) - v(t)) +  L(q, v, t) dt $$ Pay attention that all variables $p,q,v$ are independent now. The solution we need is trajectory $q(\cdot)$, but it has corresponding trajectories $v(\cdot)$ and $p(\cdot)$ which form a saddle point together with $q(\cdot)$.

As we know, at the saddle point all the partial derivatives are zero (assuming the function is differentiable). Calculating variational derivatives with respect to $p,v,q$ gives respectively $$ \dot{q} = v \\ p = \frac{\partial L}{\partial \dot{q}} \\ \dot{p} = \frac{\partial L}{\partial q} $$
Note that energy function $H(q,v,t)$ also appeared in a natural way as well as least action principle in Hamiltonian mechanics $$p(t) (\dot{q}(t) - v(t)) +  L(q, v, t) = p(t) \dot{q}(t) - \left[  p(t)v(t) - L(q,v,t)  \right]  =  \\ = \{\text{changing the variables} \} =  p(t) \dot{q}(t) - H(p,q,t) $$
You may have noticed that the thing I did is just added Lagrange multiplier ho make the condition $\dot{q} = v$ hold.

This way Legendre transformation looks more accessible to my mind. Starting from that moment I understood that Lagrange multipliers is a very powerful tool.