Excel help to count missing numbers

Icewynd

Member
Can anyone give me some help with excel?

What I am interested in is how many games, on average, it takes for all 49numbers to be played. And, what happens when only a few are still outstanding.

The result could be a list of unique numbers in a range of x draws, or a count of unique numbers in x draws. This would then produce a column of numbers that could be averaged.

It seems that there are several ways to do this with one column of numbers (eg. advanced filter), but I can't seem to find a way to deal with data like lotto numbers (7 columns across). :confused:

A web search turned up the following formula, but I can't get it to work
=SMALL(IF(COUNTIF(tbl, ROW($1:$9))=0, ROW($1:$9), ""), ROWS(B8:$B$8)) + CTRL + SHIFT + ENTER

Any help greatly appreciated.
 

blitzed

Member
hiya Icewynd, dunno...however, my old 649_Lottery.ods spreadsheet here has probability chart.
http://crazynuts.hollosite.com/lottery_spreadsheets/

I have isolated 76draws as the point where there should be a 0% probability that a number has not been drawn yet....times drawn/percent probability:
0x 0.00%
1x 0.05%
2x 0.27%
3x 0.93%
4x 2.37%
5x 4.77%
6x 7.88%
7x 10.99%
8x 13.23%
9x 13.95%
10x 13.04%
11x 10.92%
12x 8.25%
13x 5.67%
14x 3.56%
15x 2.05%
16x 1.09%
17x 0.54%
18x 0.25%
19x 0.10%
20x 0.04%
21x 0.02%
22x 0.01%
23x 0.00%

cheers!
blitzed:thumb:
 

Icewynd

Member
Hi Blitzed,

76 draws would be an upper bound, and higher than I thought! At the other end of the distribution, 7 draws would be the theoretical lower bound (7 draws of no repeats for 6/6 + bonus). I'm looking for the average length of time that it takes before all 49 numbers come out.

I'm not sure how to interpret the probability chart. Is this the probability that any single number will be drawn a given number of times in 76 draws?

Thanks for the quick response. :thumb:
 

blitzed

Member
hiya Icewynd, yes the probability chart is for how many times a given ball can be drawn in a 6/49 game.

so if it is a 6/49 with bonus ball from remaining 43 balls, here is probability chart ifya look at it as a 7/49...65draws is the upperbound.

7of49 65Draws
0x 0.00%
1x 0.05%
2x 0.26%
3x 0.90%
4x 2.33%
5x 4.73%
6x 7.88%
7x 11.07%
8x 13.38%
9x 14.12%
10x 13.18%
11x 10.98%
12x 8.24%
13x 5.60%
14x 3.46%
15x 1.96%
16x 1.02%
17x 0.49%
18x 0.22%
19x 0.09%
20x 0.03%
21x 0.01%
22x 0.00%
 

Icewynd

Member
Thanks, Blitzed. That information gets me part of the way to where I hope to go.

What I am hoping to find out was what happens when, say, 39 of the 49 numbers have been drawn from a set starting point, (e.g. last 15 draws). Are those 10 numbers drawn fairly quickly, on average, and in clumps of 2 or 3? If so, they should be played. But if the last 10 numbers are fairly slow to come out with lots of draws where none of them appear, then they could be eliminated from play.
 

Frank

Member
What I can tell you from the UK lotto which is 6/49 is that over the 1654 draws so far, Including bonus balls the average period to draw all 49 balls is 30.6 draws with the minimum being 19 and maximum being 62 draws.

If you ignore the bonus ball it takes a little longer 35.3 being the average number of draws to draw all 49, the minimum being 21 and the maximum 57 draws. This analysis depends on which draw you start counting from, I started from draw 1 to the present, as soon as all 49 had been drawn (draw 26) I reset the counter and started again from draw 27 etc.
The longest absence of a ball was 73 draws but was split between the intervals I was counting in.
 

Icewynd

Member
Frank, thanks for this info. I find it interesting that there are enough repeats that it takes 30.6 draws on average to get all the numbers out. Of course many of these are probably due to one straggler that gets lost for a long stretch.

Did you do this analysis in Excel? If so, could you share the method you used to do the counts?

Cheers, :beer:
 

blitzed

Member
cool, makes sense that the balls would all be drawn at about midpoint of the upperbound of probability.

goodluck:agree:
blitzed:thumb:
 

Frank

Member
Well, my spreadsheets are invariably complicated and bristling with complex formulas, which I haven't got time to go into. Briefly the spreadsheet uses the Frequency function in an array formula to count the number (and identity) of balls in an adjustable range. By using 2 spinners, one to set where to count from (draw number) and another to increment draw by draw the size of the range being counted (in draws) I can manually adjust until all balls are accounted for (start point the oldest draw) and then note both the number of draws in the range, and the next draw number, which is the reset point.

So for the UK lotto counting from draw 1 (1654 draws ago as the first spinner setting) and incrementing the size of the counted range, it is when I flip from 25 to 26 draws enclosed in the range that my 'missing balls' counter drops from 1 to zero.
So first data point is 26 and 1628 (1654-26 ..the reset draw).
I spin my start point to draw 1628 , set my "count draws" spinner to 1 - and the whole cycle starts again. Done manually this is a laborious and time consuming task, so I automated it using a macro. This runs through what i've just described and tabulates the data.
The most recent period may contain erroneous data simply because the last 20 draws may not have yet eliminated all the balls within the available cycle - which might be incomplete until future draws set the missing balls to zero. so the last data point should be discarded.

I can choose whether to include the bonus ball or not by selecting a radio button.
It can of course use any 6/49 set of results.
That's it in a nutshell, its not designed for inexperienced people to use, it may malfunction if strict rules for use are not adhered to, so I'm pondering whether to share it or not. :)
 

Icewynd

Member
Whew! That does sound complex. Now that you have explained your process, it sounds a bit over my head. I think what I need to do is to learn to use array formulas and try again with the formula I referrenced in the first post on this thread.

However, I would like to express my sincere thanks for the time and effort you have taken to reply.

Cheers, :beer:
 

Frank

Member
Well the formula you quote wont work for this purpose unfortunately. It is designed to list missing numbers from a range named "tbl". Because in a whole lottery of more than 60 results there are no missing numbers, it would be useless. If in the example I quote above for the UK lotto, you named the first 26 draws across 7 columns as "Tbl" then it would give a result of 0, telling you that you had stumbled on the right size of range to test. You would have to keep changing the range (which you call tbl) from 1 row to 2 rows to 3 rows etc. until the answer was 0. That's even more tedious than my method. Then you would have to reset the baseline for your range to the next draw into the future and repeat again for another batch of draws. Its basically what I'm doing, but the bells and whistles on my version make it much more user friendly.

It actually isn't complicated as a method, its just that it needs complicated formulas to make it work.

Look at this screenshot :-

zybkl.jpg


The green grid shows the ball numbers (as green header) and below is how many of each number there are in the range being tested. The missing balls column AB counts down as each ball is drawn (by manually changing the top spinner to make the number of draws being tested larger).
It is set for main balls only, 156 balls out so far and the missing ball total (cell AB17) is zero. So this is the first time in the lottery history that all balls have been drawn, and its after 26 draws.
To make any more progress, everything had to be reset to zero as we move to the next draw (1628) as the "start" of the next period of counting the presence of 49 balls.
The next screenshot shows that we are now examining a range starting from draw 1628 and so far, by incrementing the top spinner, we cover 28 draws. In this period, 168 balls have been drawn, but number 39 is still missing.
I can tell you that I will have to increment the range to include 33 draws in order to find ball 39 being drawn. I will then have to reset the start point for my next range of examination to draw 1595.

2aeytsl.jpg


So for actual lottery results there is no magic single formula that will miraculously give you an answer in one fell swoop, because of the nature of what you are asking. It requires methodically counting from the start untill all 49 are drawn - and then resetting and recounting from that point. I just wrote a spreadsheet to make that process easy.
If you tell me that you understand the concept and why it has to be done that way I'll find a way of letting you have a copy so you can paste in your own 6/49 results and have a go. :cool:
 

Icewynd

Member
Frank, thanks for your detailed reply and screenshots. Very helpful information.

As I mentioned earlier, my interest in this is in seeing how the numbers come out over a range of games. For example, if I set my starting game as "x" then I would want to see how the numbers emerge in draw x+1, x+2...x+n until all 49 numbers have been played. It is my hope that observing the mix of "new" numbers and numbers that have already been played for the average of all x+1...x+n games over time will give me some information about which numbers to retain and which to eliminate.

I gather I could use your process to set the starting draw and then increment by 1, but I'm not sure if I would be able to collect the game-by-game data that I want, rather than an average of the number of games that it takes to get all 49 numbers out.

Actually, I occurs to me as I write this, that I could use COUNTIF for each of the numbers 1-49 copied down until there are no zeros remaining (i.e. all the numbers are out). Rinse and repeat.
 

Frank

Member
Your COUNTIF method would work, it would just take a bit of time to go through the cycles, and to keep altering the range you are counting.

Well my spreadsheet does all you describe from any draw to any draw, its easier to alter the range as you go. However if you want to log the identity of missing balls i.e the ball numbers of balls still to be drawn, you have to manually do the stepping and manually record the ball numbers yet to be drawn, from the table as you step forward in time. The Frequency function (or in your case the COUNTIF function) only count balls drawn so far in the cycle and by inference the balls undrawn so far. There is no logging process to note down the undrawn ball numbers as you proceed. That would be a laborious manual process of you spotting when a zero against a particular 'ball drawn flag' switched from 0 to 1.
My spreadsheet is not designed to do that logging, but you could do that manually. I really don't see how the order (ball number wise) that the stragglers are drawn in one cycle can help you in another cycle, if that's what you are suggesting, since its a random game.

What it does do, when run as a macro is log within each cycle the count of balls undrawn against step value within cycle to show the decay curve for that cycle. You need to select the values it reads out on completion to create the chart as required. See example:-

Example of decay curve for main balls only , draws 1 to 26 , UK lotto.
sztvo6.gif


First 26 draws , after 1 draw (obviously) there are 43 balls undrawn if you disregard the bonus ball. After that as balls are drawn and duplicates of previously drawn balls come out, the curve flattens out. generally speaking after 15 draws in a cycle there are ususually about 6 balls still to appear. In the example shown, all balls are drawn by step value 26.


If you wish, you can contact me via the address on my Lotterygen website and I'll point you to a link where you can try my sheet.
 

Icewynd

Member
Thanks, Frank.

I do realize that I will have to do some manual work to get the game-by-game averages, however it shouldn't be much more than copy and paste.

I don't need to know which numbers are still "out", just how many. My idea is to create an average decay curve, which will approximate reality (to a greater or lesser extent) in any future set of games. What I find of interest in your decay curve is how much it flattens out at the end, with several games where no additional new numbers are drawn. If that holds for the average set of games, those numbers could fairly safely be omitted from play.

I will get in touch with you via the other site.
 

Sidebar

Top