Posts

Thoughts on Principal Components Analysis

This is a post with more questions than answers. I’ve been thinking about Principal Components Analysis (PCA) lately.

Sprinkle some Maximum Likelihood Estimation on that Contingency Table!

Maximum Likelihood Estimation provides consistent estimators, and can be efficiently computed under many null hypotheses of practical interest.

Contingency Tables Part II: The Binomial Distribution

In our last post, we introduced the potential outcomes framework as the foundational framework for causal inference. In the potential outcomes framework, each unit (e.g. each person) is represented by a pair of outcomes, corresponding to the result of the experience provided to them (treatment or control, A or B, etc.

Contingency Tables Part I: The Potential Outcomes Framework

“Why can’t I take the results of an A/B test at face value? Who are you, the statistics mafia? I don’t need a PhD in statistics to know that one number is greater than another.” If this sounds familiar, it is helpful to remember that we do an A/B test to learn about different potential outcomes. Comparing potential outcomes is essential for smart decision making, and this framework is the cornerstone of causal inference.

Unshackle Yourself from Statistical Significance

Don’t be a prisoner to statistical significance. A/B testing should serve the business, not the other way around!

Commit Message Linting with Magit

I have a confession to make. I’ve been writing bad commit messages for years. It takes time to write good commit messages, and often I’m in a hurry. Or so I tell myself. But that’s a false dichotomy. I can have my cake and eat it too! Recently I discovered how to use magit to enforce best practices for commit messages.

Viterbi Algorithm, Part 2: Decoding

This is my second post describing the Viterbi algorithm. As before, our presentation follows Jurafsky and Martin closely, merely filling in some details omitted in the text.

Viterbi Algorithm, Part 1: Likelihood

The Viterbi algorithm is used to find the most likely sequence of states given a sequence of observations emitted by those states and some details of transition and emission probabilities. It has applications in Natural Language Processing like part-of-speech tagging, in error correction codes, and more!

Minimum Edit Distance

Minimum Edit Distance is defined as the minimum number of edits (delete, insert, replace) needed to transform a source string to a target string. The algorithm uses dynamic programming both to calculate the minimum edit distance and to identify a corresponding sequence of edits.

Getting Things Done: Projects List and Next Actions

Lately I’ve been practicing David Allen’s “Getting Things Done” framework, which consists of components for getting tasks out of your head and into a system to improve productivity and reduce stress.