There is nothing more exciting in the world right now then Machine Learning and Data Analytics! In this one video I will teach you a key part of the Math of Machine Learning and Data Analytics which is Probability.
I took everything in a standard 500 page text book on Probability in put it in this one video. I will cover every formula, but also will solve a real world problem with each formula.
After this video on Probability I will continue with the Math of Machine Learning by covering Statistics, Linear Algebra and Calculus. If you want to see those videos click the Notification Bell.
►► Get my Python Programming Bootcamp Series for $9.99 ( Expires June 26th ) : https://bit.ly/SavePython12
►► Highest Rated Python Udemy Course + 26 Hrs + 114 Videos + New Videos Every Week
Transcript of Video
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
Probability Tutorial Probability focuses on finding the chance a random event will occur over the long term. * Probability is a number between 0 and 1, but also is represented by percentages. * A basic probability can be found by dividing your preferred event by all the possible events. SLIDE For example a coin flip has a .5 probability of coming up heads or tails. SLIDE To find all possible outcomes if you roll 2 dice multiply 6 times 6 to get 36. You can see here all the probabilities of rolling 2 dice. This is a very important slide to understand because I'll be using this example many times over the course of this video. SLIDE P(A) stands for the probability of the event A occurring * A Set is a list of all possible outcomes * If we want the union of 2 sets we create a set that contains all values in either set. Union is similar to the word or. If the value is in set 1 OR set 2 it is part of the union of those sets * An intersection of 2 sets will contain only values in both sets. Intersection acts like an AND condition * If you have A with the values 1, 3 and 5 and a set with all values from 1 to 5, the complement of A would be 2 and 4. SLIDE Conditional probabilities deal with how probabilities change given other events occur * For example what is the probability that a die roll is 5 given that we know the die roll is an odd number? * To find the answer divide the probability of rolling a 5 using a 6 sided die which is 1/6 by the probability of rolling an odd value which is 1/2. When we do that we see that given we know the value is odd there is a 1 in 3 probability of rolling a 5. SLIDE Let's use the Conditional Probability Formula again. Let's say I have data on 200 men and women who exercise or not. If I randomly picked a person that exercises, what is the probability that that person is a woman? * To find this I take 17/200 representing the women that exercise and divide that by the total people who exercise which is 39/200 * If I do this I find that there is a 43.6% chance that if I pick an exerciser that that person is a woman. This is known as a contingency table. It contains probabilities depending on multiple outcomes. Intersections between rows and columns provide joint probabilities between events. SLIDE Now I want to find out if a randomly chosen person is a woman or an exerciser. For this we will use the Addition Rule. If 2 probabilities are exclusive we add them, but in this case it is possible for someone to be both a woman and an exerciser. So, to find our answer I must subtract out the intersection of women and exercisers * And, as you can see there is a .61 probability that a random person will be a woman or exerciser. This is an example of a Joint Probability because we are looking for a person with 2 characteristics. If we said we wanted to look at exercisers and then from that group pull out just women that would be an example of a conditional probability. SLIDE Probabilities can be either dependent or independent, meaning whether they effect each other. To find dependence find if the probability of A given B is equal to the probability of A and vice versa. If so they are dependent * So if we know that the probability of a die being odd effects the probability of a dice roll we see dependence. However a die being odd doesn't effect the probability of a die roll for either an even or odd dice roll. SLIDE You find the probabilities of independent events and dependent events in different ways. If events are independent multiply their probabilities * For example to find the probability of rolling a 1 and then a 2, both who have a probability of 1/6 or .167, multiply them to find a probability of 2.8%. SLIDE Mutually exclusive events can't occur at once. With them you add their probabilities * To find the probability of rolling a 1 and an even number with 2 dice add 1/6 + 1/2 to get 2/3 * As you can see that makes sense because 4 values out of 6 is 2/3rds. SLIDE Venn Diagrams can be very useful for representing events and how they interact. They work best for marginal probabilities, being probabilities that don't effect each other. They also work good with Joint Probabilities which measure the likelihood of events occurring together. They don't work well with conditional probabilities and when analyzing sequences of events. SLIDE You can see here how all the symbols you have seen so far work together in a Venn Diagram. The c represents the compliment of the event. SLIDE Here I'll use our previous exercising table in a Venn Diagram format * You can see how the table data translates over. SLIDE Tree Diagrams are normally used when Venn Diagrams cannot. They work best with multistage and conditional probability problems which is the opposite versus Venn Diagrams. They work poorly when events take place at the same time. Here I'll break down the process of flipping a coin twice. You can see we can find the probability of .5. SLIDE Here is a tree diagram that represents the different toppings you can have on a burger. In this situation you can see that there are 16 possibilities with a 1 burger option which would double by adding another to the equation. We'll get into ways this can be more helpful when we deal with permutations, combinations and variations. SLIDE Here I'll find the total probability based off of 2 options * To find it we must sum probabilities where we multiply the probability of A given B by the probability of B * We have 2 options of break pads of differing quality * We will find the probability of a random purchase allowing us to travel 40,000 miles which works out to 93.8%. Anytime you want the marginal probability of multiple events that are identical find the total probability. SLIDE Here is another example. If I know the satisfaction ratings and the percentage of market share owned by different car insurers I can find the probability I'll be satisfied with my car insurance. SLIDE Bayes' Theorem is used to find a probability when we know other probabilities. It comes in 2 forms which I'll demonstrate. SLIDE Here I'll find the probability that if I know I have an exerciser, what is the probability that they are a man? * I'll take the data from the chart * and plug it into Bayes' to get a result of .564. * I can then double check my work by finding my results equal 22/100. SLIDE I'll use the other form of Bayes' here. If 3.98% of people are infected with a disease, 98% with the disease test positive and there is a 1% false positive, what is the probability that I'm infected if I check positive? * I multiply the probability of Infection by the probability of testing positive given I am infected. Then I sum the product of infected and positive tests by the the product of non-infected by false positives. I divide and find that there is a 80.9% chance I'm infected based off of errors in the test and the low probability of people infected. SLIDE Bayes' is so useful in so many situations that I'll use it again. Here we have 3 toy manufacturing machines. Each makes different quantities and a known percentage of flawed toys. What I want to do is figure out if I know a toy is flawed, what is the probability that it was made by machine C. * X will represent the event that machine A, B, or C made it. * Y represents the probability that the toy is flawed based on the machine. * I'll find the sum of X multiplied by Y for each machine. I'll find the product of flawed toys from C by percentage of toys made by C. After I divide I find that there is an 11.69% chance C made the toy if it is flawed. SLIDE Combinatorics is concerned with the number of ways items can be ordered. They include Permutations, Variations, and Combinations. SLIDE Permutations focus on the number of ways items can be arranged. You find them by finding the factorial. For example there are 24 ways to arrange 4 items 4 x 3 x 2 x 1 SLIDE You can find the number of permutations of x items chosen from a list of n items. You use the factorial again. nPx = n! / (n - x)! * If you were picking 3 runners for a relay from a group of 5 you'd find that that provides 60 permutations. * Maybe a more interesting example would be what is the probability of drawing the numbers 1 to 5 out of a possible 15 numbers? To save time cancel out the factorial of 10 from the top and bottom to get 360,360 permutations. If you divide 1 by that number you find it is very rare at .0000028 SLIDE When calculating combinations the order doesn't matter * Here I'll cover combinations without repetition. This means a value can only be used once. nCx = n! / x! * (n - x)!, where x represents the number of items chosen and n represents the total available number of items to pick from. * When we calculate the number of combinations when we want 3 numbers from a group of 10 without repetition we find there are 120. SLIDE We can also find combinations with repetition with this formula nCx = (x + n - 1)! / x! * (n - 1)! where n is the number of items to choose from and x is the number of items to pick. * This time we will calculate the number of lottery combinations with repetition and we see it jumps to 220. SLIDE Now we'll analyze the number of possible winning hands we can expect in a game of poker. There are 52 cards in a deck of poker cards. SLIDE We'll calculate the number of possible hands that provide 4 of a kind. We start off with the total number of suits being 4. SLIDE Then we will pick the number we want 4 of a kind of from a total of 13 cards. There are 13 cards for each suit. SLIDE Now since we picked our value we want 4 of a kind of and we have that card for each of the 4 suits we now must pick our 5th card. There are only 12 cards left so we multiply times 12. SLIDE to finish off we must decide on the suit for our 5th card. After we multiply all of the above we find that there are 624 possible ways to create 4 of a kind in the game of poker. SLIDE When calculating the total number of full houses we must use our combination formula. First pick the value we want 3 of out of a possible 13 cards. SLIDE Pick what we want 2 of out of the remaining 12 cards. SLIDE Now there are 3 ways out of 4 to pick the suit for our first 3 of a kind card. Use the combination formula to find 4 combinations. SLIDE Then use the formula again to find 6 combinations when picking the suit for our pair of cards. If we multiply those together we find that there are 3744 ways of getting a full house. SLIDE We can then divide by the total number of possible hands to find the probability of being dealt a full house which is .14%. SLIDE Permutations are used with an equal number of elements and positions * Combinations are used when you only care which elements made it * And, variations are used when there are more elements than positions. SLIDE Let's say we want to find out how many possible values we can use to unlock a combination lock. * We take the number of elements to the power of the number of positions to get 1000. This is done when there is repetition. SLIDE When you don't want repetition you use a different formula. * How many variations of Pokemon cards can you create if you pick 3 cards from a total of 5? If we use our formula we see that that is 60. SLIDE Random variables represent the results of your calculations. There are 3 types being finite, countably infinite and uncountable infinite. SLIDE A probability mass function assigns probabilities to random variables. For. 6 sided die every possible roll has a probability of 1/6th. The most basic probability model is known as the discrete uniform distribution. An example would be a single die roll or a coin flip. To be a DUD all possible values of x must be consecutive integers like 1 through 6 with die rolls. Each value for x must have an equal probability of occurring. * To calculate the probability mass of a die roll take 1/b-a+1 where b is the max value and a is the lowest value. For a die roll that equals 1/6th. * Variance measures how far a set of numbers is spread out. To calculate that take (b-a+2)*(b-a) / 12 where again b is the max value and a is the lowest value. With a die roll we see that that is 2.9 or 3. SLIDE This chart provides the probability mass for each roll of 2 dice. * A relative frequency histogram shows how often values are expected for each roll of 2 dice. SLIDE This is a cumulative distribution chart which provides the cumulative probability associated with adding additional possible dice rolls. SLIDE Expected Value is the longterm expected average. It is found by summing all multiples of x by the probability of x. * For our dice rolls that works out to 6.985 or 7 SLIDE σ : Sigma, μ = E(x) = Expected Value, Calculating Variance with multiple variables is a bit more complex. * Sum the total of all values of x squared multiplied by the probability * When we do that we get a value of 54.645 * Then subtract from that the square of the expected value calculated previously to get a value of 5.855 SLIDE The standard deviation is a measure of the amount of variation * If that number is low it means it is closed to the mean average. If it is high that means it is spread over a large range. * To find it take the square root of the variance to find 2.419. SLIDE With a normal distribution in which we contain 100% of the probability under the curve we also have an equal bell curve with the same area under both sides * We also see here that the mean and median are equal. We also see that 68% of the total area is 1 standard deviation from the mean. This continues with 95% at 2 SDs and 99.7% at 3 SDS. This is known as the empirical rule. I further break that down here into all the parts for a standard normal distribution. Also notice that just .3 % or .15% on both sides is all that is left after 3 standard deviations. SLIDE The Z Score gives us the value in standard deviations for the percentile we want * For example if we want 95% of the data it tells us how many standard deviations are required. * The formula asks for the length from the mean to x and divides by the standard deviation. SLIDE This will make more sense with an example. Here is a Z Table. If we know our mean is 40.8, the standard deviation is 3.5 and we want the area to the left of the point 48 we perform our calculation to get 2.06. * We then find 2.0 on the left of the Z Table * and .06 on the top. * This tells us that the area under the curve makes up .98030 of the total. SLIDE We can also find the area to the left of the mean with a Negative Z Score Table. If we want to find the area the the left of 36.3 with a mean of 40.8 perform our calculation to get -1.29. * Now look for -1.2 and 0.09 in the Negative Z Table * to find that the area is .09853 SLIDE Let's now go in reverse. Let's say I want to calculate the score I must get to score in the top 1% of my class. I know the mean is .79 and the standard deviation is 7.5. * I then look for .99 in the Z Table to find the Z Score of 2.33. * If I plug that into our formula and solve for x I find that I must score 96.48 to place in the top 1% on my test. SLIDE We have been using Point Estimates so far which represent a singular point of data. The mean would be an example of a Point Estimate. * While point estimates are easy to calculate they can be inaccurate. An alternative is the use of an interval or range of values. * So if I have 3 sample means taken from 3 different data sets I can create an interval that covers their range of values. * I then have to find how confident I am in that interval. We normally define normal confidence as 90%, 95%, or 99%. * That means if we have a confidence of 90% that we expect 9 out of 10 intervals to contain the mean value. SLIDE When calculating our interval we take our sample mean and find the values of x and y by adding and subtracting the margin of error. Alpha is the doubt we have. So, if we have a confidence of 90% then alpha is 10%. n represents our sample size. We will use a Z Table again to find our interval. SLIDE In this example I've gathered all our data. If we look up .025 on our Z Table we get .0580. * If we then plug in our data we find that x equals 42.97 * and y equals 43.029. SLIDE Now we will cover Binomial Probability which focuses on whether will or won't occur. * There are conditions how ever to use this formula. You must have multiple fixed trials with an outcome of pass or fail. The outcome of each trial can't effect other trials. Finally the probability for each trial must be equal. SLIDE If you are rolling 1 die 100 times and want the binomial distribution of rolling a 6 that is ok because you met all the conditions. * Rolling dice until you get 5 sixes doesn't apply because the number of trials isn't fixed. SLIDE Here is the Binomial Probability Formula. n represents the number of trials. r the number of successes. We must find the number of combinations. p finally represents the probability of each value. SLIDE Let's calculate the probability of getting exactly 4 sixes in 10 rolls. * First we have to find the number of combinations. This is known as the binomial coefficient. * Then we plug in the values and we find that our probability is .048. * As we analyze our formula further we can see the parts representing the probability of success and failure. SLIDE Poisson (pwa san) Distributions focus on the probability a specific number of events will occur over time, distance, area, etc. Visits to website per hour, day. Number of pizzas sold. Red cars seen. Vegetables expected for harvest. * The conditions include that you count occurrences over time, distance, area. The mean occurrence must be the same over a similar time, area, etc. Count occurrences over an interval can't depend on other intervals. And, intervals can't overlap. SLIDE In this example if we can grow 6 heads of lettuce in .3 square meters, what is the probability I can grow 8? * Lambda equals 6 and if I plug in my values I get .103. * As you can see the more x diverges from lambda the lower the probability. * It is important to make sure x has a value no lower than 0. * And, if you want to find the probability of getting at least 8 heads of lettuce sum the probabilities for 0 through 7 and subtract 1. SLIDE Geometric Probability calculates the probability of success depending on the number of trials. For example how many dice rolls are required to get a 6? * The conditions for using it include that trails must be independent, outcomes are pass or fail, probability of success must be constant and the number of trails need not be fixed like with binomials. * As you can see we are multiplying the probability of success by the probability of failure to the power of the number of trials minus 1. SLIDE To find the probability of rolling a 6 in 3 dice rolls multiply the probability of a roll .167 by the failure probability .833 taken to the power of trails minus 1. * You see that that is .116. * If you want to find the probability of success in 3 or less rolls sum all previous probabilities which works out to .422. SLIDE We can find the probability that it will take more than 3 dice rolls simply by subtracting the previous probability from 1. * We can calculate how many dice rolls it will take to get a success. This is called the mean and we can see success is expected in 6 dice rolls. * I also show how to calculate variance and standard deviation. SLIDE I want to briefly talk about sampling. When gathering samples it is important to follow the Central Limit Theorem. * The conditions of it are that samples must be random, the samples must be representative of the population and that you use sampling with replacement, or that you sample less than 10% of the population. SLIDE Now we'll talk about the Negative Binomial Probability. While the geometric model finds the number of trails required for success, number of dice rolls before you get 6. * The negative binomial model finds the number of trials until the nth success. For example how many rolls are required until you get 3 sixes. * The conditions are that you must have independent random trials with either a pass or fail. The probability of success is equal for each trial. Finally you must keep track of the number of trials up to the target success. SLIDE Very often the most confusing decision is based on which formula you should use and when. The Binomial Model is used when you want to count the number of successes with a fixed number of trials. * The Negative Binomial Model counts the number of trials required to find a fixed number of successes. * Finally the Geometric Model is used to find the number of trials required for 1 success. SLIDE Here is the formula for the negative binomial. p is the probability of success, k is the number of successes we I'm to have. n is the number of trials. * Now let's say we want to calculate the probability of rolling 3 sixes in 10 rolls. * First we have to find the number of combinations taking 3 from 10 with repetition which works out to 220. * Then we plug in all other values to find a probability of 28% SLIDE You can see here our probability of rolling 3 sixes maximizes at 15 rolls at .354. Then it starts falling. Can you guess why? Well if our goal is to get 3 sixes at some point we have an increased probability of rolling 4 sixes which is not what we want. SLIDE Hypergeometric Distribution is used with samples without replacement to find the probability a specific number of items fit a defined characteristic. * With Binomial and Hypergeometric you find how many people have a characteristic. Replacement defines which to use. SLIDE The required conditions are that you sample without replacement. Population has an equal chance of being sampled. Population is in 2 groups being those with the important characteristic and those without it. The number of people with the characteristic and the total population size must be known. x represents the total number of people you're interested in. SLIDE Here is the formula used to calculate Hypergeometric Distribution Probabilities. I provide a pictogram to help explain the parts. SLIDE Here I'll work through an example. What is the probability of getting 2 black cards if I draw 5 from a deck without replacement? There are 26 possible black or success cards. There are 2 successes in my sample. The total number of cards is 52. Finally I'm drawing 5 cards for the whole sample. * If I find the combinations for the formula and work through it you'll see I have a .325 probability of success. SLIDE I'll calculate the expected value which is the number of successes you expect in the sample. When I plug in the values I get 2.5. * Then I'll find Variance which is the average deviation from the mean over the long term. And, if you plug in those values and work them out you'll find you get a value of 1.15 SLIDE To understand Continuous Probability requires us to cover a few other definitions. Continuous Random variables are variables with an uncountable infinite number of values. * The Probability Density Function provides the density of the concentration of probability at point x. * And, Continuous Uniform Distribution is a continuous random variable used when all values of the probability density function are the same or are unknown. This variable is between 2 values a and b. It will make more sense with an example. A Continuous Uniform Distribution is used to find probabilities of success depending on all possible values between a and b. It is used to find probability for an interval rather than a point. For example what is the probability that x is between 5 and 6? SLIDE Here are a few more things to know before moving onto an example. The Domain of X is the interval that represents possible values of x. * The probability is the area under the curve called the Density Function of x. * Some rules you must adhere to include that all values of f(x) must be greater than or equal to zero. Also the total area under the curve for all possible values of x is equal to 1. SLIDE Here is the formula you use to find Continuous Probability for x in a given interval. * You can also see that you can find b if you know the height and a. * I also provide 2 examples if you want to calculate probabilities when x < 3 and greater than 2. SLIDE Now we finish up by covering Exponential Distribution. Before moving on though I want to define what an Exponential Function is for those that don't know. It is simply a function of the form ab to the power of x where b is a positive number and x occurs as an exponent. * An Exponential Distribution is used for a Continuous Distribution, like we just saw, whose probability density has the shape of an Exponential Function. * This distribution is often used to model the time elapsed between events. * A Density Function of the form f(x) = lambda e to the power of negative lambda times x where x is greater than or equal to 0 and lambda is constant. Lambda in this situation is called the parameter of the exponential distribution. * A large lambda caused the slope of the curve to drop quickly to zero and vice versa. * And like previously the probability of an Exponential Distribution is the area under the curve. SLIDE There are 3 formulas for calculating Exponentials. You use different versions if x is less than, greater than or lies between values a and b. SLIDE In the next few slides I'll calculate differing probabilities based of less than, greater than or lies between values of a and b. Here also is how to calculate Expected Value and Variance |
Leave a Reply