Member-only story

Demystifying Bayes’ Decision Rule

7 min readJun 12, 2018

Bayesian decision theory is a fundamental statistical approach to solve classification problems. It relies on prior probabilities and class conditional densities to reach a decision. Often it is stated that Bayes classifier is the most optimum among all classifiers and it has the least error. In simple terms, it aims to minimize the probability of error in classification problems.

One should also think that if in fact Bayes’ classifier is the best classifier for any classification problem, why do we need other classification methods in the first place! Also, how do we define “best”? To answer all these questions we need to get a little rigorous and see how Bayes’ theorem is built up from the first principles.

Since I don’t want the generalizations to obscure the central points, I will consider a case of only two classes and one feature. Let Ω ∈ ℜᵈ be our sample space which is the set of all possible features in a d dimensional Euclidean space. d is the number of features and in our case d = 1. And let that feature be denoted by x ∈ ℜ. We are only concerned about two classes as of now, so we partition our space into two sets Ω₁ and Ω₂. Since these two sets are partitions of the sample space they have the following properties:

It is important to note that the decision D is not symmetric according to our definition. Before things get out…

Demystifying Bayes’ Decision Rule

Written by Anirudh Gupta

No responses yet