Lex Index idea/question


What software will eliminate lines from a wheel that are within a
certain range of already hit Lex values?

Think of predicting the next draw as predicting the next Lex value.
You could eliminate TONS of lines by simply predicting the right
direction. Once the correct direction is selected, then the lines
below, and even a certain amount of lines above (or vice-versa) could
be eliminated. Consider the following graphical (gross depiction):

Lex-Index line

^ ^
Lowest Highest

1,2 & 3 are the last three draws
draw #3 being the most recent.
^,^ being the lowest and highest Lex values to date from the entire
Also note that the gap between draw #1 and draw #3 are well below the
average gap between draws

What direction vector would you choose for the next drawing? Why?
Would you predict a target range of Lex values?

Most users would probably predict a higher lex value. But how much
vector would you apply? You would at least in this case want to apply
from the start point up to and include draw #1. You might even
consider applying the distance from draw#3 to draw #1 AFTER draw #1,
or even some average or percentage of the average gap after draw #1.

In designing lottery software, I think we all have missed the target
by attempting to predict what will win, when there is so much more to
predict NOT to win. Millions of combinations can be eliminated in
this manner, not to mention how many more can be eliminated by other
already existing methods (all even/odd, etc.)



Excuse me for almost failing grade 13 calculus and algebra, but what is a Lex-value? Could you explain that whole idea in layman's terms? perhaps with an example? :D


Lexicographical index

a lex-index is simply an index value to define draws.

an index value of 1 in a 6/49 lottery is draw 1-2-3-4-5-6
index 2 is 1-2-3-4-5-7
with the last index value being draw 44-45-46-47-48-49

Got it now??

It's a linear representation of a 6 dimension (or how ever many balls are drawn.)

I posted some response at the rng but a more detailed answer follows.

A Lotto Game is a Uniform Distribution.
In your case y axis shall be the Standard Deviation of Draws and
the x axis shall be the sum of Numbers of each draw.
The central moments of the graph are shown on the link below:

Your Graph space shall be as follows:

1 2 3 4 5 6 21 1.870828693
1 2 3 4 5 49 64 18.83259586
1 2 3 47 48 49 150 25.21110866
1 45 46 47 48 49 236 18.83259586
44 45 46 47 48 49 279 1.870828693

Now you can add within this space the points of your Lotto game (6/49)
in this case.
The same method can be used for analysis of n-Tuplets.

If you follow the above you should arrive to the method that is
described as k-nearest neighbour with a sort description below.

k-nearest neighbour

k-nearest neighbour is a clustering technique. As mentioned above, points or records near each other in a multi-dimensional space share many properties. Therefore, the behaviour of these near neighbour records can be used to predict the behaviour of the record under investigation. By calculating distance of near neighbours, a measure for the strength of the relationship can be found. k refers to the number of centroids or cluster points identified.

Generally, the shortest distance between two records (points) is a straight line (excepting special cases such as the shortest distance over the surface of a sphere). In two dimensional space (Euclidean distance), if two records are regarded as two of the points in an isosceles triangle, their distance apart is given by the formula:

D = (x² + y²)

Where x and y are the difference apart on the x and y axis (which have been normalised to produce meaningful results). The smaller D is, the nearer the two records and a more reliable predictor of behaviour they are likely to be. 10-nearest neighbour would calculate the average behaviour of the 10 neighbours and use this average as a predictor for the record under investigation. There are other methods of measuring distance e.g. Mahalanobis distance, but they are beyond the scope of this work.

One problem of nearest neighbour techniques is complexity. k-nearest neighbour does not scale very well into large data sets. Each record in a database with a thousand records must be compared with every other record creating a million comparisons. Secondly, k-nearest neighbour is not a Data Mining method that learns - it merely searches for records with similar attribute values and assumes that their behaviour will be similar depending on the their distance apart. Since the search technique is actually based on the data, it is one of the purest techniques since the search cannot be polluted. k-nearest neighbour should also improve on naive prediction (as should any method considered).

In k-nearest neighbour, each attribute of a record represents a dimension in a multi-dimensional space e.g. marketing records containing attributes for age, gender, marital status, number of children, income, employment status, employment type, car ownership, house ownership, house type, credit card ownership, liabilities, assets, etc., all of which can be justified from a marketing perspective, has an equal number of dimensions (13 in this case). A multi-dimensional space with 20 dimensions with a million records is extremely sparsely populated and k-nearest neighbour does not work well. In addition, and a very real problem for k-nearest neighbour, each record (point) in the space is an almost equal distance from any other point. Since k-nearest neighbour relies on the distance between records to predict their similarity and hence their behaviour, this undermines the whole rationale behind k-nearest neighbour. In contrast a million records in a three dimensional space, for example using just income, age and gender from the example before, is reasonably crowded and can produce meaningful results. This can be overcome by specifying which attributes are important in which Data Mining searches and mining only on those attributes.

Generally, as discussed earlier, Data Mining algorithms should have a complexity metric of n(log n), therefore k-nearest neighbour does not scale up well (since the number of comparisons that are made expands polynomially) and is used on already identified sub-sets of data.

More info:
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, ``Numerical Recipes'', 2nd edn., Cambridge University Press, Cambridge (1992).
xls examples with code


boy, that last post didn't help much :) But based on winhunter's description, it sounds similar to something i've tried before involving changes in each position (1 thru 6). I was trying to determine limits to how much each position changes from draw to draw, and if one position went up last draw, would it be down the next draw? I didn't get overly involved in testing the whole thing out, but it seemed like the most common changes where that then next number in that position would be within +/- 4 of the last. and quite often when a position went up, it would go down next time..

I think after i decided that theory without actually testing it out I thought of another idea and somehow position change went to the back burner.. My program does still generate a postion change file each time it runs though :)


Wow, I've looked at postion change for some time now, just never got so technical about it.Did'nt even have a name for it.:agree2:


and have you come up with similar results Peter? would you say I am somewhat correct with the theory i came up with about position change? I should go back to it and try to find the best proven theory about it, but I have another project I'd like to get started and finished which would be a massive wheeler with lots and lots of grouping features..


I must confess Snides I look more at postional change on a draw to draw basis ( see positional chart on this last draw, in discussions), I wish I would have kept snapshots from the beginning, but I did'nt. However, I concur with your conclusions.:agree2: I always thought you do great work, although I admit I did'nt always understand the charts, but that's my short coming. Keep them up here plse, alot of smart people here, to help us all crack this baby. :agree2:


Positional information

Analysis of numbers per position (N1 to N6) can give interesting information. For example:

- If the average and the standard position are calculated by position, roughly 2/3 of the numbers will be within +/- 1 standard deviation of the average. The actual data is:
- N1: 7.4 -- 5.9 -- 72.8%
- N2: 14.6 -- 7.6 -- 64.4%
- N3: 21.9 -- 8.2 -- 67.5%
- N4: 29.0 -- 8.0 -- 66.0%
- N5: 36.2 -- 7.2 -- 69.1%
- N6: 43.2 -- 5.4 -- 73.1%

This is for 2027 draws and the data is posted in the following order: Position: Average -- Standard deviation -- % of numbers within one standard deviation of the average


Well Peter, since you confessed, I'll do the same :) After looking at your positional chart, I don't understand it either.. I remember your discussing open loops, and trying to figure them out, but for some reason it isn't making any sense again..

Here's how I look at the position changes..

2020 8 2 2 -6 -2 0
2021 4 1 7 10 9 2
2022 -1 1 -6 -3 0 0
2023 -14 -11 1 7 6 2
2024 4 -1 -3 -15 -19 -20
2025 -6 3 8 8 15 15
2026 11 11 -4 4 -3 -4
2027 -2 -8 -10 -7 2 7

red is with +/- 4 of the prior draw's number

and I look at if it went up last time, does it go down next time?
Once again, red indicates that yes, sometimes this idea is true..

2020 __8 __2 __2 _-6 _-2 __0
2021 __4 __1 __7 _10 __9 __2
2022 _-1 __1 _-6 _-3 __0 __0
2023 -14 -11 __1 __7 __6 __2
2024 __4 _-1 _-3 -15 -19 -20
2025 _-6 __3 __8 __8 _15 _15
2026 _11 _11 _-4 __4 _-3 _-4
2027 _-2 _-8 -10 _-7 __2 __7

man it's hard to edit this stuff live on this website.. I'll hafta ask Mandi to do that for me in the future.. :D


This is uncharted territory for me. The math would probably take me a month or two of study to understand. Calculus was kind like a photon torpedo for me. Fire and forget after you take the test. But I can still hear my calculus professor telling me, to find a solution to the problem start from first principles.
Mathematics is the language of the universe. If I had only taken the time to understand it better when I was still a spring chicken.
As for that positional change, how do you know what range that positon encompasses, and how much to go down or up?


mmm, Gilles, aren't you talking about standard deviation from average per position, while Peter & Snides are talking positional changes? I keep those as 2 seperate charts, and the within 4 rule seems more applicable to the standard deviation from average chart than it does the posit change. Altho my database is only 12 games. :lol: You can still kinda get an idea:
34/72 are +/-4 on standard dev chart 47%
22/74 are +/-4 on posit change chart 29%

the up/down pattern is more visible with the posit changes than the stan dev..... Just thought you might want a look at another game's patterns.

ps. Godload, I'm impressed that you took calc. I'm the cut & paste type myself, math was a horror for me.... but I do have an IQ of 145. go figure!


Can anyone post an algorhitm to create the Lex Index....

I've been beating my head trying to put one togheter but I'm getting nowhere :dizzy:!! Can only seem to make two number combinations work.... I wanted a generic algorhitm.....

Thanks in advance for this one.... :wavey:


I think that different representations are being considered here from what Andrew is proposing. I don't understand all of Nick's math but think he's talking sums and Peter and Snides are looking at individual numbers movement.

What the lexigraph, if I can call it that, would show is beyond my imagination in size but basically what it presents is all ~14M possibilities for each draw. So for CN49 I picture a base line ~14M points long, and a y-axis with the 2K+ draws which will 'grow' with each new draw, or some version of that ....

have you considered using a log scale for the base line? It may skew the 'visuals' a bit but may be more manageable ... has been I while since worked with logs so again the image in my head is poor, kinda rambling here ...


jumping in the middle

WINHunter uses a couple of calculations to determine positional changes.

These two are:

Position Gap
Position Vector

Position Gap is calculated (in the Calc_Position_Gap subroutine, in the clsProcessor8.cls code) from the following concepts:

The processor loops through the specified history, and calculates each position. It stores two values for the Position Gap. First it totals up the Gross Position value, in order to geat an average (I called it a mean, but I guess you would call it an average.)

[code]mGrossPositionValue(n) = mGrossPositionValue(n) + mThisDigit(n)
mMeanPositionValue(n) = mGrossPositionValue(n) / mCount[/code]

Then it calculates the BasePositionGap by adding the difference between each draw and it's previous draw to get a sum total, that is then divided by the total number of draws calculated (minus one because it is a total of the differences, not a total of the positions themselves.)

[code]mBasePositionGap(n) = mBasePositionGap(n) + Abs(mThisDigit(n) - mLastDigit(n))
mPositionGap(n) = mBasePositionGap(n) / (mCount - 1)[/code]

It does this for the specified amount of history (as specified by the parent Filter object.)

Next it calculates the Vectors for each draw, based on the previous draw position to the current draw position, for the specified history given. It works backwards to calculate the most recent vector from the recent draw to the one previous to that, and then so on working backwards through the history, while storing each vector from the two draws. There are three vector types, 1,0 & -1.

Also during this calculation, it stores the Min/Max position values for each Position which are used during the Vector scoring routine.

WINHunter then basically SUMS the Vectors together for each Position, and get's a resultant vector:
'Total all the vectors together
'And see what we get
lTempResult = lTempResult + mPositionVectors(i - mStart, n)

'What is the resultant vector
Select Case lTempResult
Case Is > 0
mPositionVectorSum(n) = Increases
Case Is < 0
mPositionVectorSum(n) = Decreases
Case Is = 0
mPositionVectorSum(n) = Same
End Select

Once it has the resultant vector, it scores the numbers from the MEAN position up to the MaximumPositionvalue, or from the MEANposition down to the minimum value. Keep in mind here that the Min/max/mean(or avg) are still based upon the SPECIFIED amount of history (start and stop points).

What I don't know about is what to do with a no change vector!

With this type of vector information, there are ton's of ways to trend it. But the thing is, I have not given it much thought lately. I'm, sure this processor holds alot of potential for future WINHunter possibilities. Your ideas are welcomed here!!



Brad said:
I think that different representations are being considered here from what Andrew is proposing. I don't understand all of Nick's math but think he's talking sums and Peter and Snides are looking at individual numbers movement.

What the lexigraph, if I can call it that, would show is beyond my imagination in size but basically what it presents is all ~14M possibilities for each draw. So for CN49 I picture a base line ~14M points long, and a y-axis with the 2K+ draws which will 'grow' with each new draw, or some version of that ....

have you considered using a log scale for the base line? It may skew the 'visuals' a bit but may be more manageable ... has been I while since worked with logs so again the image in my head is poor, kinda rambling here ...
Yes, the Lexigraph would be a linear representation. But what I am envisioning is instead of a line with bars for each index number, is a picture. Each picture represents an x,y pixel locations. A full blown picture would be a scaled representation, with each pixel representing a cluster of index values. The index values would start at the upper right hand corner of the screen, with that being index value of 1, the next immediate pixel to the right is index #2.

The only problem with this concept, is the image would be 1 pixel high, and 14+ million pixels long. So, how would you wrap that around, and how would you zoom it out? The idea is, that if you wrap it around and zoom it out, some sort of clustering, or lack thereof would become apparent. But what wrap around would produce logical clustering of index values?

Anyway, the clusters would combine their pixel color values and arrive at new values. Im figuring a greyscale picture would be in order first, since grey intensities could be easily SUMMed and displayed.



thornc said:
Can anyone post an algorhitm to create the Lex Index....
Here are three functions to achieve your goal.

Combin is a generic function returning the number of combinations.

GetLex returns the lex-index of a set.

FromLex returns a set that is made from a given lex-index.

I hope this is what you're looking for:[CODE]Private Function Combin(ByVal m As Integer, ByVal n As Integer) As Double
Dim i As Integer
Dim n1 As Integer
Dim p As Integer
Dim r As Double

n1 = n
p = m - n1

If n1 < p Then
p = n1
n1 = m - p
End If

r = n1 + 1

If p = 0 Then
r = 1
End If

If p > 1 Then
For i = 2 To p
r = (r * (n1 + i)) / i
End If

Combin = r
End Function

Function GetLex(ByVal pool As Integer, ByVal ball() As Integer) As Double
Dim i As Integer
Dim p As Integer
Dim x As Integer
Dim v As Integer
Dim m As Integer
Dim k As Double

p = UBound(ball)
k = 1
x = 1
v = 0

For i = 1 To p
m = p - x
v = v + 1

While v < ball(i)
k = k + Combin(pool - v, m)
v = v + 1
End While

x = x + 1

GetLex = k
End Function

Function FromLex(ByVal pool As Integer, ByVal pick As Integer, ByVal lex As Double) As Object
Dim c(pick) As Integer
Dim i As Integer
Dim v As Integer
Dim p As Integer
Dim m As Integer
Dim k As Double
Dim r As Double

k = 0
v = 0
p = pick - 1

For i = 0 To p - 1
m = p - i

While k < lex
v = v + 1
r = Combin(pool - v, m)
k = k + r
End While

k = k - r
c(i + 1) = (v)

c(pick) = v + (lex - k)

FromLex = c
End Function[/CODE]


Goswinus said:
Here are three functions to achieve your goal.
Great thanks Goswin, I only the algorhitm, but this functions will do fine!!!

Thanks again! :agree2: I'll start :read: as soon as possible!