The square law of database self comparison

Frank

Member
Have you ever wondered how you can improve your chances of winning a lottery without having a system but relying on chance alone ? Of course you have and you already know the answer. Buy more tickets.

So you double the number of tickets and double your chances of a win. If the Lottery Operator has 2 draws on the day and your tickets are valid for both draws then your chances of winning are further doubled, improving your chances by a factor of four.
Taking it further, if you bought three tickets and the operator held three draws and all your tickets were valid for all the draws then your chances would improve by 3 x 3 = 9.

Its beginning to look like a square law, related to (number of tickets) x (number of valid draws for your tickets) where I just happen to have made the two multiples equal so its a square law.
So what if you bought 2000 tickets and in some parallel universe they actually held 2000 draws on the same day and all your tickets were valid for those draws. Would you agree that your chances of a win might be improved by 2000 x 2000 = 4 Million ? It makes sense to me anyway.

Ok, then suppose you were a lottery analyst looking back over 2000 draws of your lottery history and you decided to compare every past result with every other past result. Would your chances of finding matches be the same as the published normal odds ? A simple way of looking at it, is to pretended that one result is a 'ticket' and decided to compare that with each of the 1999 other draws. That's an improvement over the odds by a factor of 1999 for that one draw matching the others. Agreed ? But wait, you've got another 1999 'tickets' and you need to check each one against the remaining 1999 draws from that ticket. So a rough and ready calculation would suggest that your chances are improved by 2000 x 1999. Just short of 4 million.

Some of the more astute readers will have spotted that in this process each pair is compared twice, e.g draw 10 with draw 1560 and later, draw 1560 with draw 10, so the above shows twice as many matches as you would actually get. If there were prizes for the matches then it’s the matching pairs you need, That's how many prizes you would get.
So you really need to ask how many pairs of 'ticket and a draw'' do you need to compare and the answer is = COMBIN(2000,2) which works out at 1,999,000 pairs of draws being compared, about 2 million pairs. So that is actually how many times better you have a chance of finding whatever match you are looking for in your past results set, compared with the odds of a match for comparing one draw with one other draw.

If you just listed the draw numbers of any draws that had at least one match to any other draw then there would be 2 x COMBIN(2000,2) x the raw odds - the number of draws mentioned your list would be 3,999,800 x the raw odds of that match.
I refer to the raw odds as the original published or calculated odds for a comparison of 1 draw with 1 ticket.

To quote a specific example, say the UK lotto 649 and you were looking for match 5's between your 2000 old results then the raw odds of a match 5, 1 ticket, 1 draw are 1 in 55,492.
I predict that on average, you would find =COMBIN(2000,2) x 1/55,492 match 5's with those 2000 compared draws = 36 matched pairs which involve 72 named draws with at least one match to another draw.

In turn this means the actual odds of finding a match 5 within 2000 draws are =COMBIN(2000,2) x 2000 / 55,492 = 55.5 to 1. This is also equal to the raw odds x 2000.

You can test this theory by Downloading the match 5 finder https://www.mediafire.com/?10j88uk6ca9s753 and see for yourself.
This compares 2000 results, (click the button) each with the other and finds match 5's consistent with the above rules. It has a feature built into the counter to prevent double counting of matched pairs, so it only displays one of the two (or more) draw numbers that match. You can check back from any draw number in the result by typing in that number (to cell B1 and using the filter to find match 5's. This will show you all the found matches to that draw.
Try any random set of 2000 draws 649 format, see what happens. According to Moses you won't find any matches as there aren't enough draws for a 1 in 55,492 event to happen. I think the UK set has 32 such matches, within the normal range.

Other events with higher odds against them as illustrated below are subject to the same rules as I have described. Just multiply their chances of happening (one draw compared with one draw) by the number of draws in his database. I have a spreadsheet that tackles this type of low probability match, shows you the matches in position and works out the TRUE odds of this type of thing happenning when you have free reign to cherry pick over thousands of draws, choosing ticket and results.

Moses said:
In section below I will provide some duplicated triples from reverse wheel which overrule the law of statistics;
15/09/1999, 32,09,27,14,49,30,03
25/10/2014, 43,15,11,17,49,30,03

29/06/2013, 45,34,07,13,31,25,04
03/11/2007, 10,49,03,22,31,25,04

29/08/2009, 09,18,39,34,47,37,05
05/02/2011, 40,34,33,41,47,37,05

28/04/2012, 46,03,01,09,23,37,06
18/06/2014, 26,10,11,18,23,37,06

12/06/2010, 01,19,16,33,34,17,09
11/05/2011, 33,38,30,48,34,17,09

25/01/1997, 35,31,47,01,28,24,09
05/07/2014, 15,34,08,23,28,24,09

13/09/2003, 08,30,32,09,03,04,12
12/03/1997, 05,22,25,16,03,04,12 ---------- 18/03/2009, 39,02,41,05,22,25,27


18/04/2007, 30,41,17,08,11,32,47
11/04/2007, 08,42,35,21,11,32,47
Yes, really!! All above examples are collected from UK lotto and from 1900 results (fixed database).
I will make those available later after you have had time to digest and test this for yourself.

I understand that some of you will be reluctant to comment in this thread if you have been keeping up with developments. That is if you agree with me. You may not. However you are perfectly welcome to contact me privately on this work, most of you will know how to contact me via e-mail.

Thank you, the truth is out there.
 

Frank

Member
Icewynd said:
Random is often less random than we anticipate! Thanks for this detailed explanation Frank!

Good Luck!
:thumb:
Thank you for your response Icewynd. Its not something we think about much so might easily be fooled at first sight.



Well I need to correct part of the what I said in the above above post, since the statement true odds = raw odds x 2000 is incorrect.
The figure calculated from combin(2000,2) x 2000/55492 = (55.5 to 1) at 2000 draws IS correct.

A more straight forward and approximately correct relationship between number of draws in database and raw odds is:-

true odds = raw odds / (N/2) where raw odds = (1/raw probability) for very low probabiity events and N is the number of draws being compared .

So the true odds is a variable figure dependent on the number of draws in the database.

So where raw probability = (1/55492)

N ----true odds
2000 -- 55.49 to 1 (55492 divided by half the draws =1000)
4000-- 27.75 to 1 (55492 divided by 2000)
8000-- 13.87 to 1 (55492 divided by 4000)
16000- 6.94 to 1 (55492 divided by 8000)

you will note that doubling the number of draws in history halves the true odds.



Well that spreadsheet was just a a taster to show proof of concept, but a more comprehensive sheet was required to find, highlight and count Low probability matches like

15/09/1999, 32,09,27,14,49,30,03
25/10/2014, 43,15,11,17,49,30,03

Also the earlier sheet had a fixed database size of 2000 draws. In fact the true odds of any match reduce in proportion to the number of draws in the self compared database, so a spreadsheet was required to allow larger databases of differing sizes to be examined, to demonstrate this effect.

The example above illustrates where position is important as well as values in arriving at a probability. Just how common are such matches in a set of random 749 or past results ? The specific match type is a linked triple, where the balls in question follow in sequence in Drawn order and above is an example of a match between 2 historic draws. When 7 balls are drawn you could find five such linked triples in any result. So if the results were in columns BCDEFGH they would be in positions BCD,CDE,DEF,EFG,FGH. So the illustrated linked triples above are in positions FGH. The easiest way to find and compare such relatively rare events is to sort the results left to right in the particular triple ordered positions. For example if you sort all the database by column B, then by column C, then by column D, similar sequences in those columns will lie above each other in your database. With the help of formulas to compare adjacent draws one can quicky flag up matching triples in columns BCD, if you have a filter to find them. With conditional formatting to help you can see them all before your very eyes.

What are the raw odds of a linked triple in drawn order for a 749 lottery ? Well the raw odds are 1 in (49 x 48 x 47) = 1 in 110554. So on a 1 draw to 1 other draw basis you would not expect a single result to match another result in your lifetime at 2 draws a week. However results from a list like the above do not arise as a result of a normal comparison process, so you already know that in a 2000 draw database you can divide those odds by (half the number of draws) =1000. So that brings the true odds down to 110 to 1. In 2000 draws then expect about 20 such matches in those exact positions. You can also expect around 20 such matches in each of the other positional groups CDE, DEF, EFG,FGH. too. If you do not specify any particular position for such a triple match to be, then the odds reduce by a factor of 5 to make the overall odds in any of the five groups to be about 22 to 1 for a 2000 draw history pool.

I have a spreadsheet which has buttons to do all the sorting for the 5 triple group positions.

https://www.mediafire.com/?mxihbfcv77xiu20

To find them you first sort by the appropriate button, to line the triple group up above each other in sequence. There is a matching BLUE filter corresponding to that column group which you need to set to find 1's. This will make them all visible in a contiguous block, usually there will be just 2 matching draws, shown by this filter but if you have 16000 draws on the database there will be a lot more multiple matches of the same linked triple.

The Blue filter is just a way of revealing the triples adjacent to each other, but there is also a purple filter. This uses other formulas to actually count blocks of mutiple matches. This will show how many matching pairs of draws it found (when the filter is set to 2).
It will show how many blocks of 3 matching linked triples it found (when the purple filter is set to 3). With 16000 draws in the database it will even find how many blocks of 4 matching triples there are. The figure displayed on row 2 is the actual counter.
It is important that only ONE of the filters is set at any one time, so if using the purple filter, the blue one must show ALL and vice versa.

One could go through all the purple fiter results one by one, noting the counter value for each filter value and enter them into the results sheet (green cells) for that particular sorted column group. in this example BCD. The sheet then works out empirical odds from the results you find.

I have already done this for pool sizes of 1000,2000,4000,8000,16000,24000,32000 draws and worked out empirical odds from the counter results. They comply with the rule I stated above true odds = raw odds/( half the number of draws) = 110554/ (N/2). You could fill in your own counts in the green cells for your own random set of draws (max size 16000 draws in this version) and see what you get. Random 749 sets are available from Random.org and here https://www.mediafire.com/?b4jby5fe25wru8b .

Perhaps I should apologise to Moses for my earlier version of this spreadsheet which was available in another forum, where I double counted the number of matches, so making odds calculations half of what they are in this version.
However the true odds are not fixed, they are variable anyway with the size of your history and I have shown the folly of just comparing old results each with the other and then declaring that the lottery must be rigged because there are too many low probability matches. Comparing real lotteries with random ones would have shown similar behaviour in each and made this excercise unnecessary.
I hope I've made this clear enough to follow, and close this book forever.
:)
 

Sidebar

Top