Probability Models and its Real Life Applications with Data

Aarushi Ramesh
Analytics Vidhya
Published in
4 min readJan 27, 2020

--

A reasonable probability is the only certainty. — E.W. Howe

The best way to describe the term “Probability” is that it’s a combination of very interesting concepts with so many things to keep in mind about (such as the independence of one variable, patterns/connections between two variables..etc.) . I feel like Probability has always been the math that’s different; in the sense that there are so many real world applications that can be used to visualize literally any problem!

Probability Models and PMFs

So whenever we are considering the probability of a certain event to occur, we are also considering the total number of possible outcomes. How can we lay out the possible outcomes for a particular situation (ex. flipping a fair coin n times, or finding the number of k successes in t trials) with certain values associated with it, like for example the number ‘1’ to represent the probability of heads to appear, and the number ‘0’ to represent the probability of tails to appear?? (yeah that was the longest question in the world lol).

In order to answer this question, we need to learn about the Probability Mass Function, aka PMF. The PMF basically maps out certain values to the probabilities (outcomes) of an experiment or situation. But what are those ‘certain values’? These are called random variables, or discrete random variables. A Random Variable is a variable that assigns a value to a certain outcome of an experiment. We use a capital letter to represent the random variable of an experiment.

​So for example, lets say the experiment was to test 10 different circuits and see if they work (a success, denoted as s) or they need improvement (an error, denoted as e). Every observation in this case is a sequence of 10 different letters (s or e). So the sample space or the total number of possible outcomes (sequences) is 1024 (2¹⁰). The random variable in this case, lets say K, can be the number of successful circuits in a sequence. So for a certain outcome, sssssseeee, the random variable K = # of successes = 6. Since the sequence is a set of 10 letters, the range of K has to be from 0–10. K is a discrete random variable, since the range of K can be listed. (even if it was infinitely long).

Goal Scoring Problem

Ok lets look at another example with pmf and random variable (RV) stuff. Suppose you are playing soccer and have two free kicks. A free kick can lead to two possibilities: a score (in the goal, denoted as S) or no score (N). What is the PMF of the random variable G, the number of free kicks scored?

We can make a table to highlight the probability of the scoring a certain outcomes and its relationship with the random variable G! Since we are assuming that each outcome is equally likely, the probability of getting a goal in the first try and not getting a goal in the second try is just = 1/2*1/2. This is the same for every other outcome.

Since the random variable G has three possible values {0, 1, 2}, the probabilities of these three possible values are: P[X=0] = 1/4, P[X=1] = 1/4 + 1/4 = 1/2, P[X=2] = 1/4. One way of representing the PMF is by a plot or graph:

Note: The PMF/probability is represented with a P and then a subscript of the random variable, and then a lower case version of the random variable in the brackets, because the notation says that this is “the probability of a random variable G equaling some value g (which is in the set of outcomes)”. So g is an actual number, which is in the range of G, the random variable.

Binary Symmetric Channel

Ok this is a trickier problem, and what I really liked about this one was that it is has such a great connection with electrical engineering.

So, we are sending a 1000 bits into some data channel, and the probability there is a bit error (the bit was the wrong bit received, for example 1 is a 0 or 0 is a 1) = P[error] = 0.02. So 1-P[error] = success. So the question is, what is the probability there are 10 errors?

Ok so first, we need to think about how many possible combinations are there if there are 10 errors out of 1000 bits sent into the channel. In order to figure out the total number of possibilities we do 1000 choose 10 (the combinations formula). We would then use the binomial distribution (since there are only 2 possible values, 1 or 0) to find the PMF:

If we plug in P for 0.02, we get about P[10 errors] ~~ 0.0055, which is a relatively low probability!

Originally published at https://rushiblogs.weebly.com.

--

--

Aarushi Ramesh
Analytics Vidhya

Hello! I’m a student at the University at Texas in Austin. Welcome to my collection of thoughts. I like to write and blog. rushiblogs.weebly.com