The geometric random variable with parameter \(p\in\left(0,1\right)\) can be defined as the number of trials required to obtain a success where the probability of success on each trial is \(p\) . Thus, Show
\begin{eqnarray*} p\left(k;p\right) & = & \left(1-p\right)^{k-1}p\quad k\geq1\\ F\left(x;p\right) & = & 1-\left(1-p\right)^{\left\lfloor x\right\rfloor }\quad x\geq1\\ G\left(q;p\right) & = & \left\lceil \frac{\log\left(1-q\right)}{\log\left(1-p\right)}\right\rceil \\ \mu & = & \frac{1}{p}\\ \mu_{2} & = & \frac{1-p}{p^{2}}\\ \gamma_{1} & = & \frac{2-p}{\sqrt{1-p}}\\ \gamma_{2} & = & \frac{p^{2}-6p+6}{1-p}.\end{eqnarray*} \begin{eqnarray*} M\left(t\right) & = & \frac{p}{e^{-t}-\left(1-p\right)}\end{eqnarray*} Implementation: The goal is to use Python to help us get intuition on complex concepts, empirically test theoretical proofs, or build algorithms from scratch. In this series, you will find articles covering topics such as random variables, sampling distributions, confidence intervals, significance tests, and more. At the end of each article, you can find exercises to test your knowledge. The solutions will be shared in the article of the following week. Articles published so far:
As usual, the code is available on my GitHub. Comparing Geometric Random Variables and Binomial Random VariablesLet’s start by recapping what a Binomial Random Variable (RV) is. Here is the checklist:
An example of such a variable could be X, the number of 5’s after 10 rolls of a fair die. Notice that the trials are independent; rolling the die in the first trial does not impact the rolling of the die in the second trial. We can clearly classify the trials as success (we got a 5) or failure (we got any number but a 5). We roll 10 times our die, so 10 represents our number of trials. Finally, the probability of success is constant, as we are always using the same die again and again, so the probability of getting a 5 is always 1/6. Then, what is the difference between a Binomial RV and a Geometric RV? See if you can spot the differences when I define the Geometric RV Y, the number of rolls until you get a 5 on a fair die. Think about it before continue (refer to the checklist). The main difference is the fixed number of trials; we do not have them anymore. It could be 1 or a googol (10^100) number of rolls as far as we can tell. from IPython.display import Image Probability for a Geometric Random VariableCristiano Ronaldo has a penalty conversion rate of around 84%, being one of the best in the world. Now imagine that Ronaldo is taking penalties until he misses one. Let X be the number of shots it takes to miss his first penalty shot. What is the probability that Ronaldo’s first missed shot occurs on his 4th attempt? You can see the equation above as the product of the probabilities of scoring the first 3 shots 0.84³ times the probability of missing the 4th 0.16. Recall that the attempts are independent; that is why we can multiply them like we do here. So the probability of Ronaldo scoring 3 penalties in a row and then missing the 4th is around 10%. 0.84**3*0.160.09483263999999998 We can generalize the Probability Mass Function of the Geometric distribution as follows: Cumulative Geometric ProbabilityWhen calculating the cumulative geometric probability, we can view it from two different perspectives. Let’s imagine that the probability of success p=0.1 and that we want to calculate P(X<5). We can think of it in the following way: Or we can think in the opposite direction, where no success exists in the first 4 trials and so: def geomcdf_1(p, x): We can see that we get the same result by implementing any of the approaches. However, the second one is significantly less computational expensive as we do not need to loop over all the cases. Expected Value of a Geometric Random VariableThe probability of any discrete RV is the sum of the probability-weighted outcomes. In a Geometric RV, we already know how to calculate the probabilities. For example, P(X=1) is the probability of one success, therefore P(X=1)=p. In P(X=2), we have two trials, so necessarily one unsuccessful and one successful trial, therefore P(X=2)=(1-p)p and so on. If we subtract these two equations, And there you go, we have a classic geometric series with a common ratio of (1-p). This is, in fact, one of the reasons why we call it a Geometric RV. The sum of a geometric series is 1/(1-r), where r is the common ratio. Now that we derived the expected value for our Geometric RV let’s experimentally see if we can arrive at the same result. p=0.2 And there you go, we can feel comfortable with our derivation. Comparing Poisson Random Variables and Binomial Random VariablesCan we approximate a Poisson Random Variable (RV) from a Binomial RV? Let’s define our RV X as the number of UberEats motorcycles we see passing by our street. If we try to build on this idea, we somehow get to an expected value of X that would look like: And our probability would look like this: The problem is that we can easily get more than 1 motorcycle passing by our street every minute, and we cannot account for that with our current setup. But there is a way to do it, as we can be more granular. Now, we are working with seconds, and the probability of seeing 1 motorcycle every second is quite small. But we can still face the problem of having 2 motorcycles pass by our street in half a second. So we could get even smaller to account for that. I think that you see where this is going. We need to understand what happens when n goes to infinity. Before diving in on this, let’s revise three properties that we will need further in our proof. The first two are related with limits: And the third with combinatorics: Using these properties, we can get the intuition on how the Poisson RV is actually a derivation of a Binomial RV. We just arrived at the Probability Mass Function of a Poisson distribution. Proving by computing values of Probability Mass FunctionsWe always want to be safe on our proofs and derivations, so let’s first compute the Probability Mass Functions for different values of n of a Binomial RV and then compare the results to a Poisson RV with the same rate. from scipy.stats import binom, poisson Notice that our intuition was correct; as the value of n gets larger, we are indeed approximating the same probability of a Poisson RV. Proving by samplingWe can do the same exercise by sampling from the RVs that we defined above and then compute the probabilities. X1 = binom.rvs(n=n1, p=λ/n1, size=100000) We can also plot the Probability Mass Functions for each distribution, notice how the approximation when n=3600 is actually a very good one. _,ax = plt.subplots(2, 2, figsize=(25, 10), sharey=True, sharex=True) Figure 2: Approximating a Poisson RV using a Binomial RV with different values of n; notice that as n gets larger the shape of the sampling distribution of the Binomial RV is getting closer to the sampling distribution of the Poisson RV.ExercisesYou will get the solutions in next week’s article.
Answers from last weekdef fact(n): 1.A company produces bottles of water. In its main factory, the number of defective bottles is 5%. A quality check consists of randomly selecting and testing 1000 bottles. What are the mean and standard deviation of the number of defective bottles in these samples? print('μ_x={}'.format(1000*0.05)) 2. A wine company is running a promotion that says 1-in-4 boxes of the wine contain a surprise. Suppose that you will buy 5 boxes of this wine, and let X represent the number of surprises you can win in these boxes. Assume that these boxes represent a random sample and assume that surprises are independent between boxes. What is the probability that you win at most 1 surprise in the 5 boxes? (Can you solve this problem in 2 different ways? Tip: by sampling and using the theoretical derivation that we did above) |