Essentially, the Viterbi algorithm computes the shortest path through a DAG. If ...

tonyarkles · on May 14, 2023

There was a pretty funny, to me and my buddy, moment in our Digital Communications course when we were learning this. We were both dual EE/CS students while the rest of the class was straight EE. The professor is drawing out the trellis diagrams and going through this elaborate description of things and most people (including me and my buddy) were completely lost. At some point, though, he and I both had the same lightbulb moment. We turned to each other and simultaneously just whispered “it’s a dynamic programming problem”. And then it was all easy from there.

tgv · on May 14, 2023

Doesn't the algorithm work on directed graphs, rather than only DAGs, i.e., also on cyclic ones? That's a difference with Dijkstra, where cycles are excluded from the solution. Plus, it takes the graph and a sequence of transitions.

shoo · on May 14, 2023

> Doesn't the algorithm work on directed graphs, rather than only DAGs, i.e., also on cyclic ones? That's a difference with Dijkstra, where cycles are excluded from the solution

With the Viterbi algorithm, you're decoding a sequence of observations, indexed by observation time. At each timestep there's an array of probabilities indexed by the hidden states. State s at time t has some probability of transitioning to state s' at time t+1, defining an arc from the node (s, t) to the node (s', t+1). That defines the "trellis" structure. Since time doesn't loop back on itself, an arc can never point back to an earlier time, so there aren't any cycles in the trellis.

As lqet explains, its essentially computing a shortest path through the DAG.

The Viterbi algorithm could be reimplemented as Dijkstra's algorithm. But that wouldn't gain anything, and it'd make it harder to compute efficiently -- with Dijkstra's algorithm there's a sequential dependency of popping a minimal length path off the priority queue and expanding it before you're able to pop and start processing the next path (as the path you're expanding might push the next minimal length path onto the queue). The bottom-up Viterbi algorithm computation is closer to a BFS, where you could process the whole layer of arcs at corresponding to timestep t in parallel - apart from the argmin over input arcs at each node (s, t+1) in the next layer.

Another small difference is that Viterbi's algorithm deals with node-weights as well as arc-weights, corresponding to the probability of emitting the observation y_t at timestep t, supposing the system had been in the hidden state s_i at time t. I think that can be dealt with by absorbing the node weights into the arc weights -- each weight is a log probability, and log probabilities can be added.

cs702 · on May 15, 2023

This is a great explanation. Thank you for posting it on HN!