The Semantic DB Project: June 2015

Tuesday 30 June 2015

on emerging patterns

So, consider the English expression "I see a pattern forming". Well, maybe we can encode that idea in BKO. Let's say we have a series of examples. Individually they just look like noise. But, if we add them in a BKO sense, then "I see a pattern forming" corresponds to a distinctive shape emerging from the noise.

The idea is, some of the kets in the examples correspond to signal, and some correspond to noise. As we add them up the signal kets "reinforce" (ie, their coeffs increase), but presumably the noise is random from sample to sample, so the noise kets coeffs remain small.

We can extract the "signal" using something like this (using some operator foo):

foo |signal> => drop-below[t] (foo |example 1> + foo |example 2> + foo |example 3> + ... + foo |example n>)

I hope that makes sense.

Update: I guess I have a closer to real world application of this idea. Consider the list of WWII leaders: Roosevelt, Churchill, Stalin and Hitler.

Then in BKO we might do something like:

sa: everything-we-know-about |*> #=> apply(supported-ops|_self>,|_self>)

sa: the-list-of |WWII leaders> => |Roosevelt> + |Churchill> + |Stalin> + |Hitler>
sa: coeff-sort everything-we-know-about the-list-of |WWII leaders>

And hopefully what emerges is something about WWII.

Update: for want of a better place to put this. The above makes me think of:

sa: everything-we-know-about |*> #=> apply(supported-ops|_self>,|_self>)
sa: map[everything-we-know-about,everything] some |list>
sa: similar[everything] |object>

This should be a quite general way to find general similarity between objects. Haven't tested it, but I'm pretty sure it is correct.

Update: again, for want of a better place, we can also do this. Consider we have knowledge on quite a few animals, including what they like to eat. We also have a lot of knowledge on foxes, but we don't know what they eat. But, we can guess:

guess-what-eat |fox> => select[1,1] coeff-sort eat select[1,5] similar[everything] |fox>

ie, in words, find the 5 most similar animals given what we know. Find what they eat. Sort that list. Return the result with the highest coeff.

Update: we can also use this everything as a way to help with language translation. Maybe something like:

best-guess-German-for |*> #=> select[1,1] similar[English-everything,German-everything] |_self>

Kind of hard to test this idea at the moment. I need some way to map words to everything we know about a word. Heh, cortical.io word SDR's would be a nice start! I wonder how they made them?

Update: a little more on the idea of emerging patterns. Simple enough, the time gap between two events.

Start with a web log file. For each IP find the time gap between retrievals. I imagine this will be quite distinctive. eg, a robot slurping down a page every x seconds, should have a nice big spike around the x second mark (though it depends on how fine grained your time sample is, for how broad this peak will be. The wider your bucket size, the sharper the peak).

Next, if you use the random wait, as in wget:

--random-wait               wait from 0.5*WAIT...1.5*WAIT secs between retrievals

then that should have a distinctive pattern too.

Finally, you should get a clear signal of roughly how often you press refresh on a website when you are bored. This will probably be quite noisy, so the smooth operator should help. Also, quite likely to give you an indication of how long you are asleep. Say you normally sleep for about 8 hours. Then there should be at least some kets (probably roughly 1 per day) with a time delta greater than 8 hours. Whether you web surf at work would also potentially show up.

Last example: apparently every person has a distinctive typing pattern. We could find that simply enough, just by measuring the time delta between different characters on a keyboard. eg, what is the time delta when you type "I'm" between "I" and "'", and "'" and "m". Or typing "The" the time between "T" and "h", and "h" and "e". Or typing "rabbit" and the delta between "r" and "a", "a" and "b", "b" and "b" and so on. Presumably, if you have a big enough sample, and you map this to a superposition, then we could run a similar[typing-delta] |person: X> and guess who typed it.

Sunday 28 June 2015

ebook letter frequencies

I wrote this one roughly a year ago, but figure may as well add it to the blog. Given ebooks (mostly from Project Gutenberg), find their letter frequencies. So not super interesting, but let's add it anyway.

Here is the code, and the resulting sw file.

Now a couple of matrices in the console:

sa: load ebook-letter-counts.sw
sa: matrix[letter-count]
[ a ] = [  9083   26317  142241  23325  76232   35669  260565  35285  23871  ] [ Alice-in-Wonderland  ]
[ b ]   [  1621   4766   25476   4829   15699   6847   50138   6117   4763   ] [ Frankenstein         ]
[ c ]   [  2817   9055   37297   7379   21938   11349  72409   10725  6942   ] [ Gone-with-Wind       ]
[ d ]   [  5228   16720  85897   12139  37966   18763  144619  18828  15168  ] [ I-Robot              ]
[ e ]   [  15084  45720  228415  37293  117608  59029  440119  54536  37230  ] [ Moby-Dick            ]
[ f ]   [  2248   8516   34779   5940   20363   9936   73859   9105   6270   ] [ nineteen-eighty-four ]
[ g ]   [  2751   5762   38283   6037   20489   9113   61948   8023   6822   ] [ Shakespeare          ]
[ h ]   [  7581   19400  119901  16803  61947   28093  234301  28284  19130  ] [ Sherlock-Holmes      ]
[ i ]   [  7803   21411  101987  20074  62942   30304  214275  27361  18380  ] [ Tom-Sawyer           ]
[ j ]   [  222    431    1501    346    915     310    2955    421    465    ]
[ k ]   [  1202   1722   18290   2370   8011    3512   32029   3590   3136   ]
[ l ]   [  5053   12603  79783   12870  42338   18395  156371  17276  12426  ]
[ m ]   [  2245   10295  39595   6534   22871   10513  101507  11391  7255   ]
[ n ]   [  7871   24220  123989  21302  65429   31516  231652  29337  20858  ]
[ o ]   [  9245   25050  130230  24555  69648   34287  299732  34452  24251  ]
[ p ]   [  1796   5939   23979   5148   16553   8058   50638   6987   4766   ]
[ q ]   [  135    323    1270    321    1244    397    2998    416    182    ]
[ r ]   [  6400   20708  105074  17003  52446   25861  224994  25378  16262  ]
[ s ]   [  6980   20808  107430  18044  62734   28382  232317  27105  17852  ]
[ t ]   [  11631  29706  157163  28316  86983   42127  311911  39232  28389  ]
[ u ]   [  3867   10340  50453   9483   26933   12903  121631  13527  9376   ]
[ v ]   [  911    3788   15224   3062   8540    4252   36692   4471   2451   ]
[ w ]   [  2696   7335   43623   6761   21174   11225  78929   10754  7735   ]
[ x ]   [  170    675    1700    508    1037    779    4867    567    326    ]
[ y ]   [  2442   7743   37639   6552   16849   9071   90162   9267   6830   ]
[ z ]   [  79     243    1045    208    598     303    1418    150    155    ]

sa: norm |*> #=> normalize[100] letter-count |_self>
sa: map[norm,normalized-letter-count] rel-kets[letter-count]
sa: matrix[normalized-letter-count]
[ a ] = [  7.75   7.75   8.12   7.85   8.11   7.91   7.38   8.16   7.92   ] [ Alice-in-Wonderland  ]
[ b ]   [  1.38   1.4    1.45   1.62   1.67   1.52   1.42   1.41   1.58   ] [ Frankenstein         ]
[ c ]   [  2.4    2.67   2.13   2.48   2.34   2.52   2.05   2.48   2.3    ] [ Gone-with-Wind       ]
[ d ]   [  4.46   4.92   4.9    4.08   4.04   4.16   4.09   4.35   5.03   ] [ I-Robot              ]
[ e ]   [  12.87  13.46  13.04  12.55  12.52  13.09  12.46  12.61  12.36  ] [ Moby-Dick            ]
[ f ]   [  1.92   2.51   1.98   2.0    2.17   2.2    2.09   2.1    2.08   ] [ nineteen-eighty-four ]
[ g ]   [  2.35   1.7    2.18   2.03   2.18   2.02   1.75   1.85   2.26   ] [ Shakespeare          ]
[ h ]   [  6.47   5.71   6.84   5.65   6.59   6.23   6.63   6.54   6.35   ] [ Sherlock-Holmes      ]
[ i ]   [  6.66   6.3    5.82   6.75   6.7    6.72   6.06   6.32   6.1    ] [ Tom-Sawyer           ]
[ j ]   [  0.19   0.13   0.09   0.12   0.1    0.07   0.08   0.1    0.15   ]
[ k ]   [  1.03   0.51   1.04   0.8    0.85   0.78   0.91   0.83   1.04   ]
[ l ]   [  4.31   3.71   4.55   4.33   4.51   4.08   4.43   3.99   4.12   ]
[ m ]   [  1.92   3.03   2.26   2.2    2.43   2.33   2.87   2.63   2.41   ]
[ n ]   [  6.72   7.13   7.08   7.17   6.96   6.99   6.56   6.78   6.92   ]
[ o ]   [  7.89   7.38   7.43   8.26   7.41   7.6    8.48   7.96   8.05   ]
[ p ]   [  1.53   1.75   1.37   1.73   1.76   1.79   1.43   1.62   1.58   ]
[ q ]   [  0.12   0.1    0.07   0.11   0.13   0.09   0.08   0.1    0.06   ]
[ r ]   [  5.46   6.1    6.0    5.72   5.58   5.73   6.37   5.87   5.4    ]
[ s ]   [  5.96   6.13   6.13   6.07   6.68   6.29   6.58   6.27   5.93   ]
[ t ]   [  9.93   8.75   8.97   9.53   9.26   9.34   8.83   9.07   9.42   ]
[ u ]   [  3.3    3.04   2.88   3.19   2.87   2.86   3.44   3.13   3.11   ]
[ v ]   [  0.78   1.12   0.87   1.03   0.91   0.94   1.04   1.03   0.81   ]
[ w ]   [  2.3    2.16   2.49   2.27   2.25   2.49   2.23   2.49   2.57   ]
[ x ]   [  0.15   0.2    0.1    0.17   0.11   0.17   0.14   0.13   0.11   ]
[ y ]   [  2.08   2.28   2.15   2.2    1.79   2.01   2.55   2.14   2.27   ]
[ z ]   [  0.07   0.07   0.06   0.07   0.06   0.07   0.04   0.03   0.05   ]
sa: save ebook-letter-counts--normalized.sw

And I guess that is it.

Update: while we are here, may as well give the simm matrix:

sa: simm |*> #=> 100 self-similar[letter-count] |_self>
sa: map[simm,simm-matrix] rel-kets[letter-count]
sa: matrix[simm-matrix]
[ Alice-in-Wonderland  ] = [  100.0  94.94  96.52  97.32  96.76  97.11  95.57  97.09  97.49  ] [ Alice-in-Wonderland  ]
[ Frankenstein         ]   [  94.94  100.0  95.97  96.01  95.22  96.48  95.24  96.52  95.54  ] [ Frankenstein         ]
[ Gone-with-Wind       ]   [  96.52  95.97  100.0  96.0   96.98  97.01  95.91  97.12  97.17  ] [ Gone-with-Wind       ]
[ I-Robot              ]   [  97.32  96.01  96.0   100.0  97.3   97.87  96.06  97.35  97.12  ] [ I-Robot              ]
[ Moby-Dick            ]   [  96.76  95.22  96.98  97.3   100.0  98.05  96.07  97.39  96.85  ] [ Moby-Dick            ]
[ nineteen-eighty-four ]   [  97.11  96.48  97.01  97.87  98.05  100.0  95.55  97.88  97.1   ] [ nineteen-eighty-four ]
[ Shakespeare          ]   [  95.57  95.24  95.91  96.06  96.07  95.55  100    97.08  95.89  ] [ Shakespeare          ]
[ Sherlock-Holmes      ]   [  97.09  96.52  97.12  97.35  97.39  97.88  97.08  100    97.54  ] [ Sherlock-Holmes      ]
[ Tom-Sawyer           ]   [  97.49  95.54  97.17  97.12  96.85  97.1   95.89  97.54  100    ] [ Tom-Sawyer           ]

So we see that English text has largely the same letter frequencies over different ebooks. Which makes sense of course, but nice to see it visually.

And it would be nice to have an "unscaled-similar[op]" operator. The problem is that would require an entire new function in the new_context class, which I am reluctant to do, since unscaled-simm is a rare use case. Currently it can be done for special occasions by changing the simm function in new_context.pattern_recognition() to unscaled_simm(A,B).

Friday 26 June 2015

simple prolog vs BKO example

Just a comparison between some prolog on wikipedia, and the BKO equivalent.

First the prolog:

mother_child(trude, sally).
 
father_child(tom, sally).
father_child(tom, erica).
father_child(mike, tom).
 
sibling(X, Y)      :- parent_child(Z, X), parent_child(Z, Y).
 
parent_child(X, Y) :- father_child(X, Y).
parent_child(X, Y) :- mother_child(X, Y).

This results in the following query being evaluated as true:

 ?- sibling(sally, erica).
 Yes

Now in BKO:

|context> => |context: prolog example>

mother |sally> => |trude>
child |trude> = > |sally>

father |sally> => |tom>
child |tom> => |sally>

father |erica> => |tom>
child |tom> +=> |erica>

father |tom> => |mike>
child |mike> => |tom>

parent |*> #=> mother |_self> + father |_self>
sibling |*> #=> child parent |_self>          -- this being the BKO equivalent of: sibling(X, Y) :- parent_child(Z, X), parent_child(Z, Y) 
sibling-of |*> #=> clean drop (child parent |_self> + -|_self>)

now put it to use:

sa: sibling |sally>
|sally> + |erica>

sa: sibling |erica>
|sally> + |erica>

sa: sibling-of |sally>
|erica>

sa: sibling-of |erica>
|sally>

Finally, we can ask the question: "is X a sibling of Sally?"

sa: is-a-sibling-of-sally |*> #=> do-you-know mbr(|_self>, sibling-of|sally>)

sa: is-a-sibling-of-sally |erica>
|yes>

sa: is-a-sibling-of-sally |george>
|no>

And I guess that is enough.

Update: perhaps we should tweak our operator names to a little closer to English and NLP?

mother-of |sally> => |trude>
child-of |trude> = > |sally>

father-of |sally> => |tom>
child-of |tom> => |sally>

father-of |erica> => |tom>
child-of |tom> +=> |erica>

father-of |tom> => |mike>
child-of |mike> => |tom>

parents-of |*> #=> mother-of |_self> + father-of |_self>
siblings-of |*> #=> clean drop (child-of parents-of |_self> + -|_self>)

difference and smooth in the MatSumSig model

This time a couple of operators in the MatSumSig model.

Difference:

f[k] => - f[k-1]/2 + f[k] - f[k+1]/2

[ f0  ]       [  1 -1  0  0  0  0  0  0  0  0  0 ] [ f0  ]
[ f1  ]       [ -1  2 -1  0  0  0  0  0  0  0  0 ] [ f1  ]
[ f2  ]       [  0 -1  2 -1  0  0  0  0  0  0  0 ] [ f2  ]
[ f3  ]       [  0  0 -1  2 -1  0  0  0  0  0  0 ] [ f3  ]
[ f4  ]       [  0  0  0 -1  2 -1  0  0  0  0  0 ] [ f4  ]
[ f5  ] = 1/2 [  0  0  0  0 -1  2 -1  0  0  0  0 ] [ f5  ]
[ f6  ]       [  0  0  0  0  0 -1  2 -1  0  0  0 ] [ f6  ]
[ f7  ]       [  0  0  0  0  0  0 -1  2 -1  0  0 ] [ f7  ]
[ f8  ]       [  0  0  0  0  0  0  0 -1  2 -1  0 ] [ f8  ]
[ f9  ]       [  0  0  0  0  0  0  0  0 -1  2 -1 ] [ f9  ]
[ f10 ]       [  0  0  0  0  0  0  0  0  0 -1  1 ] [ f10 ]

And note that we don't have currency conservation, since the sum of columns = 0, instead of 1. Originally I thought this thing would be useful (eg, for edge detection in images), but so far, not particularly.

Next is smooth, and this one is clearly useful.
Smooth:

f[k] => f[k-1]/4 + f[k]/2 + f[k+1]/4

[ f0  ]       [ 3 1 0 0 0 0 0 0 0 0 0 ] [ f0  ]
[ f1  ]       [ 1 2 1 0 0 0 0 0 0 0 0 ] [ f1  ]
[ f2  ]       [ 0 1 2 1 0 0 0 0 0 0 0 ] [ f2  ]
[ f3  ]       [ 0 0 1 2 1 0 0 0 0 0 0 ] [ f3  ]
[ f4  ]       [ 0 0 0 1 2 1 0 0 0 0 0 ] [ f4  ]
[ f5  ] = 1/4 [ 0 0 0 0 1 2 1 0 0 0 0 ] [ f5  ]
[ f6  ]       [ 0 0 0 0 0 1 2 1 0 0 0 ] [ f6  ]
[ f7  ]       [ 0 0 0 0 0 0 1 2 1 0 0 ] [ f7  ]
[ f8  ]       [ 0 0 0 0 0 0 0 1 2 1 0 ] [ f8  ]
[ f9  ]       [ 0 0 0 0 0 0 0 0 1 2 1 ] [ f9  ]
[ f10 ]       [ 0 0 0 0 0 0 0 0 0 1 3 ] [ f10 ]

Now an example representation in BKO:

smooth |f0> => 0.75|f0> + 0.25|f1>
smooth |f1> => 0.25|f0> + 0.5|f1> + 0.25|f2>
smooth |f2> => 0.25|f1> + 0.5|f2> + 0.25|f3>
smooth |f3> => 0.25|f2> + 0.5|f3> + 0.25|f4>
smooth |f4> => 0.25|f3> + 0.5|f4> + 0.25|f5>
smooth |f5> => 0.25|f4> + 0.5|f5> + 0.25|f6>
smooth |f6> => 0.25|f5> + 0.5|f6> + 0.25|f7>
smooth |f7> => 0.25|f6> + 0.5|f7> + 0.25|f8>
smooth |f8> => 0.25|f7> + 0.5|f8> + 0.25|f9>
smooth |f9> => 0.25|f8> + 0.5|f9> + 0.25|f10>
smooth |f10> => 0.25|f9> + 0.75|f10>

sa: matrix[smooth]
[ f0  ] = [  0.75  0.25  0     0     0     0     0     0     0     0     0     ] [ f0  ]
[ f1  ]   [  0.25  0.5   0.25  0     0     0     0     0     0     0     0     ] [ f1  ]
[ f2  ]   [  0     0.25  0.5   0.25  0     0     0     0     0     0     0     ] [ f2  ]
[ f3  ]   [  0     0     0.25  0.5   0.25  0     0     0     0     0     0     ] [ f3  ]
[ f4  ]   [  0     0     0     0.25  0.5   0.25  0     0     0     0     0     ] [ f4  ]
[ f5  ]   [  0     0     0     0     0.25  0.5   0.25  0     0     0     0     ] [ f5  ]
[ f6  ]   [  0     0     0     0     0     0.25  0.5   0.25  0     0     0     ] [ f6  ]
[ f7  ]   [  0     0     0     0     0     0     0.25  0.5   0.25  0     0     ] [ f7  ]
[ f8  ]   [  0     0     0     0     0     0     0     0.25  0.5   0.25  0     ] [ f8  ]
[ f9  ]   [  0     0     0     0     0     0     0     0     0.25  0.5   0.25  ] [ f9  ]
[ f10 ]   [  0     0     0     0     0     0     0     0     0     0.25  0.75  ] [ f10 ]

Some notes:
1) this clearly has currency conservation, since all columns sum to 1.
2) this thing rapidly approaches a Gaussian smooth, if you iterate it a few times. eg, in image edge enhancement, smooth^20 gave good results. See here. In the case of mapping posting times to 1440 buckets in a day, 300 to 500 smooths gave the best results.

simm in the MatSumSig model

So, we have basics like simple logic, and set union and intersection in the MatSumSig model. Interestingly, we have a version of simm too! ie, simm becomes somewhat biologically plausible. Heh, but to tell the truth, I don't care if the brain doesn't actually use simm, since I have found it to be very useful.

Anyway, recall one definition of simm:

simm(w,f,g) = \Sum_k w[k] min(f[k],g[k]) / max(w*f,w*g)

If we ignore the max(w*f,w*g) denominator, here is a MatSumSig version of simm:

[ r ] = [ sigmoid[x1] ] [ w1 w2 w3 w4 ] [ pos[x1] ] [ 1 -1 -1  0  0  0  0  0  0  0  0  0 ] [ pos[x1]  ] [  1  1  0  0  0  0  0  0 ] [ f1 ]
                                        [ pos[x2] ] [ 0  0  0  1 -1 -1  0  0  0  0  0  0 ] [ pos[x2]  ] [  1 -1  0  0  0  0  0  0 ] [ g1 ]
                                        [ pos[x3] ] [ 0  0  0  0  0  0  1 -1 -1  0  0  0 ] [ pos[x3]  ] [ -1  1  0  0  0  0  0  0 ] [ f2 ]
                                        [ pos[x4] ] [ 0  0  0  0  0  0  0  0  0  1 -1 -1 ] [ pos[x4]  ] [  0  0  1  1  0  0  0  0 ] [ g2 ]
                                                                                           [ pos[x5]  ] [  0  0  1 -1  0  0  0  0 ] [ f3 ]
                                                                                           [ pos[x6]  ] [  0  0 -1  1  0  0  0  0 ] [ g3 ]
                                                                                           [ pos[x7]  ] [  0  0  0  0  1  1  0  0 ] [ f4 ]
                                                                                           [ pos[x8]  ] [  0  0  0  0  1 -1  0  0 ] [ g4 ]
                                                                                           [ pos[x9]  ] [  0  0  0  0 -1  1  0  0 ]
                                                                                           [ pos[x10] ] [  0  0  0  0  0  0  1  1 ]
                                                                                           [ pos[x11] ] [  0  0  0  0  0  0  1 -1 ]
                                                                                           [ pos[x12] ] [  0  0  0  0  0  0 -1  1 ]

where it is assumed w_k >= 0.

If we extract out the intersection component, see last post, we have:

[I1,I2,I3,I4] = 2* [min(f1,g1), min(f2,g2), min(f3,g3), min(f4,g4)]

[ r ] = [ sigmoid[x1] ] [ w1 w2 w3 w4 ] [ I1 ]
                                        [ I2 ]
                                        [ I3 ]
                                        [ I4 ]

Now, the above can be considered a space based simm. We can also do a time based one. I think it goes like this, though I haven't given this much thought in a long, long time!

[ r ] = [ sum[x1,t2] ] [ sigmoid[x1,t1] ] [ 1 -1 -1 ] [ pos[x1] ] [  1  1 ] [ f ]
                                                      [ pos[x2] ] [  1 -1 ] [ g ]
                                                      [ pos[x3] ] [ -1  1 ]

where [ sum[x1,t2] ] is the time based equivalent of [ w1 w2 w3 w4 ]

set union and intersection in the MatSumSig model

A version of set union and intersection in the MatSumSig model.

First, note that union corresponds to max(a,b), and intersection to min(a,b).
Then note these identities:

abs(x) = pos(x) + pos(-x)
abs(a - b) = pos(a - b) + pos(-a + b)
a + b + abs(a - b) = 2*max(a,b)
a + b - abs(a - b) = 2*min(a,b)

So, for example:

[ r1 ]   [ 1  1  1 ] [ pos[x1] ] [  1  1 ] [ a ]
[ r2 ] = [ 1 -1 -1 ] [ pos[x2] ] [  1 -1 ] [ b ]
                     [ pos[x3] ] [ -1  1 ]

expands to:

r1 = a + b + pos(a - b) + pos(-a + b) = 2*max(a,b)
r2 = a + b - pos(a - b) - pos(-a + b) = 2*min(a,b)

These are our two functions we want to reproduce:

set-union(f,g):
  [max(f[k],g[k]) for k in range(len(f))]

set-intersection(f,g):
  [min(f[k],g[k]) for k in range(len(f))]

And here they are in MatSumSig (ignoring a factor of 2):

[ U1 ]   [ 1  1  1  0  0  0  0  0  0  0  0  0 ] [ pos[x1]  ] [  1  1  0  0  0  0  0  0 ] [ f1 ]
[ I1 ] = [ 1 -1 -1  0  0  0  0  0  0  0  0  0 ] [ pos[x2]  ] [  1 -1  0  0  0  0  0  0 ] [ g1 ]
[ U2 ]   [ 0  0  0  1  1  1  0  0  0  0  0  0 ] [ pos[x3]  ] [ -1  1  0  0  0  0  0  0 ] [ f2 ]
[ I2 ]   [ 0  0  0  1 -1 -1  0  0  0  0  0  0 ] [ pos[x4]  ] [  0  0  1  1  0  0  0  0 ] [ g2 ]
[ U3 ]   [ 0  0  0  0  0  0  1  1  1  0  0  0 ] [ pos[x5]  ] [  0  0  1 -1  0  0  0  0 ] [ f3 ]
[ I3 ]   [ 0  0  0  0  0  0  1 -1 -1  0  0  0 ] [ pos[x6]  ] [  0  0 -1  1  0  0  0  0 ] [ g3 ]
[ U4 ]   [ 0  0  0  0  0  0  0  0  0  1  1  1 ] [ pos[x7]  ] [  0  0  0  0  1  1  0  0 ] [ f4 ]
[ I4 ]   [ 0  0  0  0  0  0  0  0  0  1 -1 -1 ] [ pos[x8]  ] [  0  0  0  0  1 -1  0  0 ] [ g4 ]
                                                [ pos[x9]  ] [  0  0  0  0 -1  1  0  0 ]
                                                [ pos[x10] ] [  0  0  0  0  0  0  1  1 ]
                                                [ pos[x11] ] [  0  0  0  0  0  0  1 -1 ]
                                                [ pos[x12] ] [  0  0  0  0  0  0 -1  1 ]

which expands to:

[U1,U2,U3,U4] = 2* [max(f1,g1), max(f2,g2), max(f3,g3), max(f4,g4)]
[I1,I2,I3,I4] = 2* [min(f1,g1), min(f2,g2), min(f3,g3), min(f4,g4)]

So I guess you could say they are space based union and intersection. We can also do a time based one:


[ U[t] ] = [ 1  1  1 ] [ pos[x1] ] [  1  1 ] [ f ]
[ I[t] ]   [ 1 -1 -1 ] [ pos[x2] ] [  1 -1 ] [ g ]
                       [ pos[x3] ] [ -1  1 ]

where U[t] is union of f[t] and g[t] with respect to time, and I[t] is intersection.

Thursday 25 June 2015

simple logic in the MatSumSig model

Again, just simple foundational ideas. Simple logic in the MatSumSig model:

d = a OR b OR c
[ d ] = [ BF[x1] ] [ 1 1 1 ] [ BF[x1] ] [ a ]
                             [ BF[x2] ] [ b ] 
                             [ BF[x3] ] [ c ]

d = a AND b AND c
[ d ] = [ BF[x1] ] [ 1/3 1/3 1/3 ] [ BF[x1] ] [ a ]
                                   [ BF[x2] ] [ b ] 
                                   [ BF[x3] ] [ c ]

d = a XOR b XOR c
[ d ] = [ XF[x1] ] [ 1 1 1 ] [ BF[x1] ] [ a ]
                             [ BF[x2] ] [ b ] 
                             [ BF[x3] ] [ c ]

f = (a AND b AND c) OR (d AND e)
[ f ] = [ BF[x1] ] [ 1 1 ] [ BF[x1] ] [ 1/3 1/3 1/3 0   0   ] [ BF[x1] ] [ a ]
                           [ BF[x2] ] [ 0   0   0   1/2 1/2 ] [ BF[x2] ] [ b ]
                                                              [ BF[x3] ] [ c ]
                                                              [ BF[x4] ] [ d ]
                                                              [ BF[x5] ] [ e ]

where BF[x] and XF[x] are sigmoids:

def binary_filter(x):
  if x <= 0.96:
    return 0
  else:
    return 1

def xor_filter(x):
  if 0.96 <= x and x <= 1.04:
    return 1
  else:
    return 0

Update: now let's do the same in BKO. A 2 element truth table:

pattern |row-1> => 0|a> + 0|b>
pattern |row-2> => |a> + 0|b>
pattern |row-3> => 0|a> + |b>
pattern |row-4> => |a> + |b>

OR-2 |a> => |x1>
OR-2 |b> => |x1>

AND-2 |a> => 0.5|x1>
AND-2 |b> => 0.5|x1>

OR |*> #=> push-float binary-filter OR-2 binary-filter pattern |_self>
AND |*> #=> push-float binary-filter AND-2 binary-filter pattern |_self>
XOR |*> #=> push-float xor-filter OR-2 binary-filter pattern |_self>

sa: table[row,pattern,OR,AND,XOR] ket-sort starts-with |row->
+-------+----------+----+-----+-----+
| row   | pattern  | OR | AND | XOR |
+-------+----------+----+-----+-----+
| row-1 | 0 a, 0 b | 0  | 0   | 0   |
| row-2 | a, 0 b   | 1  | 0   | 1   |
| row-3 | 0 a, b   | 1  | 0   | 1   |
| row-4 | a, b     | 1  | 1   | 0   |
+-------+----------+----+-----+-----+

Cool, huh!

Update: let's do a 3 element truth table:

pattern |row-1> => 0|a> + 0|b> + 0|c>
pattern |row-2> => 0|a> + 0|b> + |c>
pattern |row-3> => 0|a> + |b> + 0|c>
pattern |row-4> => 0|a> + |b> + |c>
pattern |row-5> => |a> + 0|b> + 0|c>
pattern |row-6> => |a> + 0|b> + |c>
pattern |row-7> => |a> + |b> + 0|c>
pattern |row-8> => |a> + |b> + |c>

OR-3 |a> => |x1>
OR-3 |b> => |x1>
OR-3 |c> => |x1>

AND-3 |a> => 0.333|x1>
AND-3 |b> => 0.333|x1>
AND-3 |c> => 0.333|x1>

OR |*> #=> push-float binary-filter OR-3 binary-filter pattern |_self>
AND |*> #=> push-float binary-filter AND-3 binary-filter pattern |_self>
XOR |*> #=> push-float xor-filter OR-3 binary-filter pattern |_self>

sa: table[row,pattern,OR,AND,XOR] ket-sort starts-with |row->
+-------+---------------+----+-----+-----+
| row   | pattern       | OR | AND | XOR |
+-------+---------------+----+-----+-----+
| row-1 | 0 a, 0 b, 0 c | 0  | 0   | 0   |
| row-2 | 0 a, 0 b, c   | 1  | 0   | 1   |
| row-3 | 0 a, b, 0 c   | 1  | 0   | 1   |
| row-4 | 0 a, b, c     | 1  | 0   | 0   |
| row-5 | a, 0 b, 0 c   | 1  | 0   | 1   |
| row-6 | a, 0 b, c     | 1  | 0   | 0   |
| row-7 | a, b, 0 c     | 1  | 0   | 0   |
| row-8 | a, b, c       | 1  | 1   | 0   |
+-------+---------------+----+-----+-----+

That is even cooler! And I hope it helps to show the mapping between the MatSumSig model, and the BKO scheme.

I guess it would also be useful to give the OR-k and AND-k matrices:

sa: matrix[OR-2]
[ x1 ] = [  1  1  ] [ a ]
                    [ b ]

sa: matrix[AND-2]
[ x1 ] = [  0.5  0.5  ] [ a ]
                        [ b ]


sa: matrix[OR-3]
[ x1 ] = [  1  1  1  ] [ a ]
                       [ b ]
                       [ c ]

sa: matrix[AND-3]
[ x1 ] = [  0.33  0.33  0.33  ] [ a ]
                                [ b ]
                                [ c ]

Update: Let's do the compound example:

f = (a AND b AND c) OR (d AND e)
[ f ] = [ BF[x1] ] [ 1 1 ] [ BF[x1] ] [ 1/3 1/3 1/3 0   0   ] [ BF[x1] ] [ x1 ]
                           [ BF[x2] ] [ 0   0   0   1/2 1/2 ] [ BF[x2] ] [ x2 ]
                                                              [ BF[x3] ] [ x3 ]
                                                              [ BF[x4] ] [ x4 ]
                                                              [ BF[x5] ] [ x5 ]

Now in BKO:

-- define our patterns:
pattern |row-1> => 0|x1> + 0|x2> + 0|x3> + 0|x4> + 0|x5>
pattern |row-2> => |x1> + 0|x2> + 0|x3> + 0|x4> + 0|x5>
pattern |row-3> => 0|x1> + |x2> + 0|x3> + 0|x4> + 0|x5>
...

-- define our operators:
AND-3-2 |x1> => 0.333|x1>
AND-3-2 |x2> => 0.333|x1>
AND-3-2 |x3> => 0.333|x1>
AND-3-2 |x4> => 0.5|x2>
AND-3-2 |x5> => 0.5|x2>

OR-2 |x1> => |x1>
OR-2 |x2> => |x1>

f |*> #=> push-float binary-filter OR-2 binary-filter AND-3-2 binary-filter pattern |_self>

And the matrices:

sa: matrix[OR-2]
[ x1 ] = [  1  1  ] [ x1 ]
                    [ x2 ]

sa: matrix[AND-3-2]
[ x1 ] = [  0.33  0.33  0.33  0    0    ] [ x1 ]
[ x2 ]   [  0     0     0     0.5  0.5  ] [ x2 ]
                                          [ x3 ]
                                          [ x4 ]
                                          [ x5 ]

Update: let's be more terse with our sigmoid names.

binary-filter => BF
xor-filter => XF

And then we have:

OR |*> #=> push-float BF OR-3 BF pattern |_self>
AND |*> #=> push-float BF AND-3 BF pattern |_self>
XOR |*> #=> push-float XF OR-3 BF pattern |_self>
f |*> #=> push-float BF OR-2 BF AND-3-2 BF pattern |_self>

Cool. I like the compact version better.

simple inhibitory signals in the MatSumSig model

These are simple enough. It is clear that in the brain some circuits can switch off other circuits when they are active. Here is a simple example in the MatSumSig model:

[ filtered-signal ] = [ pos[x1] ] [ 1 -1 ] [ signal      ]
                                           [ off-current ]

where pos[x] is the simplest of the sigmoids (and also corresponds to the fact that you can't have negative numbers of spikes):

def pos(x):
  if x <= 0:
    return 0
  else:
    return x

and:

signal is a time varying signal.
off-current is a time varying off-current. (In this case an inhibitory signal of roughly the same strength as the signal)
filtered-signal is the result

an example of a strongly inhibitory off-current:

[ filtered-signal ] = [ pos[x1] ] [ 1 -10 ] [ signal      ]
                                            [ off-current ]

an example of a weakly inhibitory off-current:

[ filtered-signal ] = [ pos[x1] ] [ 1 -0.2 ] [ signal      ]
                                             [ off-current ]

And now a BKO example:

M |yes> => |yes> + -1|no>
M |no> => -1|yes> + |no>

sa: matrix[M]
[ no  ] = [  1   -1  ] [ no  ]
[ yes ]   [  -1  1   ] [ yes ]

Now some examples:

sa: drop M |yes>
|yes>

sa: drop M |no>
|no>

sa: drop M (|yes> + |no>)
|>

sa: drop M (0.8|yes> + 0.2|no>)
0.6|yes>

sa: drop M (0.2|yes> + 0.8|no>)
0.6|no>

All simple enough, and corresponds to the case when you have objects/concepts that are mutually exclusive.

introducing the MatSumSig model.

I haven't really given this beasty much thought, but it was the original motivator for a lot of my later ideas, so I should mention it! And at this point it doesn't even matter if my MatSumSig model is wrong! The BKO scheme it motivated, by itself is interesting and useful.

It makes use of my function matrix notation, and I consider it a simplified model of a single neuron (in the physics tradition of if something is too hard, simplify it until you have something you can actually work with, at least as a starting point).

[ y1 ]   [ s1[x1,t1] ] [ sum[x1,t] ] [ a1 a2 a3 a4 a5 ] [ x1 ]
[ y2 ]   [ s2[x1,t2] ]                                  [ x2 ]
[ y3 ]   [ s3[x1,t3] ]                                  [ x3 ]
[ y4 ] = [ s4[x1,t4] ]                                  [ x4 ]
[ y5 ]   [ s5[x1,t5] ]                                  [ x5 ]
[ y6 ]   [ s6[x1,t6] ]                                  
[ y7 ]   [ s7[x1,t7] ]                                  
[ y8 ]   [ s8[x1,t8] ]

where:

{a1,a2,a3,a4,a5} are reals/floats and can be positive or negative.
sum[x,t] sums the input x for a time-slice of length t (with output 0 during this time slice), then spits out the result at the end of that time slice. 
If we don't include the sum[] term, assume t = 0.
Indeed, we only need t > 0 if we want time-dependence. 
s_k[x,t_k] are sigmoids, with passed in parameter t_k.

Note that there are a lot of free parameters here, and I have no idea how the brain tweaks them! We have {a1,a2,a3,a4,a5}, {t,t1,t2,...,t8}, and then we have the sigmoids {s1,s2,..,s8} Indeed, until we have some idea how to fill in all these parameters we can't actually make use of this model/representation.

introducing function matrices

Just some simple notation that merges the idea of functions and matrices into the same structure.

Let's start with simplest possible example:

[ d ]   [ fn1[x1] ] [ a ]
[ e ] = [ fn2[x2] ] [ b ]
[ f ]   [ fn3[x3] ] [ c ]

which expands to:

d = fn1[a]
e = fn2[b]
f = fn3[c]

Now, a slightly more interesting example:

[ d ]   [ bah1[x3]       ] [ a ]
[ e ] = [ bah2[x2,x1]    ] [ b ]
[ f ]   [ bah3[x1,x2,x3] ] [ c ]

which expands to:

d = bah1[c]
e = bah2[b,a]
f = bah3[a,b,c]

And we note a couple of things:
1) x1 corresponds to a, x2 corresponds to b, x3 corresponds to c. And we can use them in any row of the function section of the function-matrix. eg, x1 is used in row 2 and 3. x3 is used in row1 and row 3. The point is, we want to name the elements in the applied vector, so we chose x_i. And we use x as the name for the entire vector.
2) the functions in the function-matrix can have more than one parameter, eg, bah2[] and bah3[].

Now, another property is that function-matrices can have stored data. Yeah, they have their own memory I suppose you could say. Here is a simple example:

[ d ]   [ foo[L1,x] ] [ a ]
[ e ] = [ foo[L2,x] ] [ b ]
[ f ]   [ foo[L3,x] ] [ c ]

which expands to:

d = foo[L1,(a,b,c)]
e = foo[L2,(a,b,c)]
f = foo[L3,(a,b,c)]

And noting that x (without subscripts) corresponds to the name of the entire incoming vector.

Now, for example, define foo[] as:

foo[u,v] = dot-product[u,v]

And for example, define L_i as:

L1 = (m1,m2,m3)
L2 = (m4,m5,m6)
L3 = (m7,m8,m9)

Then:

[ d ]   [ foo[L1,x] ] [ a ]
[ e ] = [ foo[L2,x] ] [ b ]
[ f ]   [ foo[L3,x] ] [ c ]

expands to the standard matrix:

[ d ]   [ m1 m2 m3 ] [ a ]
[ e ] = [ m4 m5 m6 ] [ b ]
[ f ]   [ m7 m8 m9 ] [ c ]

ie:

d = m1*a + m2*b + m3*c
e = m4*a + m5*b + m6*c
f = m7*a + m8*b + m9*c

The next thing to note is we can have many layers of these things. eg:

[ d ]   [ foo1[x1] ] [ 5 6 1 0 2 ] [ fn1[x1]  ] [ m1 m2 m3 ] [ a ]
[ e ] = [ foo1[x2] ] [ 8 8 7 2 1 ] [ fn2[x3]  ] [ m4 m5 m6 ] [ b ]
[ f ]   [ foo2[x1] ]               [ fn3[x2]  ] [ m7 m8 m9 ] [ c ]
[ g ]   [ foo2[x2] ]               [ bah1[x1] ]
                                   [ bah2[x3] ]

It is ugly, but we can expand this beast out:

[ d ]   [ foo1[x1] ] [ 5 6 1 0 2 ] [ fn1[x1]  ] [ m1*a + m2*b + m3*c ]
[ e ] = [ foo1[x2] ] [ 8 8 7 2 1 ] [ fn2[x3]  ] [ m4*a + m5*b + m6*c ] 
[ f ]   [ foo2[x1] ]               [ fn3[x2]  ] [ m7*a + m8*b + m9*c ]
[ g ]   [ foo2[x2] ]               [ bah1[x1] ]
                                   [ bah2[x3] ]

[ d ]   [ foo1[x1] ] [ 5 6 1 0 2 ] [ fn1[m1*a + m2*b + m3*c]  ]
[ e ] = [ foo1[x2] ] [ 8 8 7 2 1 ] [ fn2[m7*a + m8*b + m9*c]  ] 
[ f ]   [ foo2[x1] ]               [ fn3[m4*a + m5*b + m6*c]  ]
[ g ]   [ foo2[x2] ]               [ bah1[m1*a + m2*b + m3*c] ]
                                   [ bah2[m7*a + m8*b + m9*c] ]

[ d ]   [ foo1[x1] ] [ 5*fn1[m1*a + m2*b + m3*c] + 6*fn2[m7*a + m8*b + m9*c] + 1*fn3[m4*a + m5*b + m6*c] + 0*bah1[m1*a + m2*b + m3*c] + 2*bah2[m7*a + m8*b + m9*c] ]
[ e ] = [ foo1[x2] ] [ 8*fn1[m1*a + m2*b + m3*c] + 8*fn2[m7*a + m8*b + m9*c] + 7*fn3[m4*a + m5*b + m6*c] + 2*bah1[m1*a + m2*b + m3*c] + 1*bah2[m7*a + m8*b + m9*c] ] 
[ f ]   [ foo2[x1] ]               
[ g ]   [ foo2[x2] ]               
                                   
d = foo1[5*fn1[m1*a + m2*b + m3*c] + 6*fn2[m7*a + m8*b + m9*c] + 1*fn3[m4*a + m5*b + m6*c] + 0*bah1[m1*a + m2*b + m3*c] + 2*bah2[m7*a + m8*b + m9*c]]
e = foo1[8*fn1[m1*a + m2*b + m3*c] + 8*fn2[m7*a + m8*b + m9*c] + 7*fn3[m4*a + m5*b + m6*c] + 2*bah1[m1*a + m2*b + m3*c] + 1*bah2[m7*a + m8*b + m9*c]]
f = foo2[5*fn1[m1*a + m2*b + m3*c] + 6*fn2[m7*a + m8*b + m9*c] + 1*fn3[m4*a + m5*b + m6*c] + 0*bah1[m1*a + m2*b + m3*c] + 2*bah2[m7*a + m8*b + m9*c]]
g = foo2[8*fn1[m1*a + m2*b + m3*c] + 8*fn2[m7*a + m8*b + m9*c] + 7*fn3[m4*a + m5*b + m6*c] + 2*bah1[m1*a + m2*b + m3*c] + 1*bah2[m7*a + m8*b + m9*c]]

Now, we need to note that in general these things are not invertible (unlike standard matrices). A simple proof is consider the function layer to be a secure hash. Then:

[ y1 ] = [ secure-hash[x1] ] [ m1 m2 ] [ a ]
[ y2 ]   [ secure-hash[x2] ] [ m3 m4 ] [ b ]

y1 = secure-hash[m1*a + m2*b]
y2 = secure-hash[m3*a + m4*b]

ie, given y1 and y2 it is impossible, except by brute force, to find the input vector (a,b).

I guess that is about it. Now to put them to use in the next few posts.

Update: recall above I gave this function matrix:

[ d ]   [ foo[L1,x] ] [ a ]
[ e ] = [ foo[L2,x] ] [ b ]
[ f ]   [ foo[L3,x] ] [ c ]

And if the Li's are vectors of the same length as x (which in this case is just [a,b,c]), and foo[u,v] := dot-product[u,v], then this function matrix is identical to a standard matrix. OK. That is simple enough. But the comment I want to make is, if foo[u,v] := simm[u,v], then this function matrix is essentially a simple if-then machine. The question then becomes, what properties do multiple layers of these guys have? And what is the best approach for finding all the Li's, if we use this system as a replacement for standard ANN's?

Tuesday 23 June 2015

she is out of my league

A quick one today. Let's encode "she is out of my league" in BKO.

-- load up some data about some fictional women:
features |my perfect woman> => |beautiful> + |smart> + |skinny> + |educated> + |loving> + |sexy>
features |Mary> => |loving> + |skinny>
features |Liz> => |smart> + |educated> + |loving>
features |Jane> => |skinny> + |sexy>
features |Mia> => |smart> + |skinny> + |educated> + |loving>
features |Emma> => |athletic> + |skinny> + |sexy> + |beautiful> + |religious>
features |Donna> => |beautiful> + |smart> + |skinny> + |educated> + |sexy>
features |the goddess> => |beautiful> + |smart> + |skinny> + |educated> + |loving> + |sexy>

-- define an operator to tidy our results:
rename-simm |simm> => | >

-- define some operators to work on this:
she-is-out-of-my-league |*> #=> coeff-in-range[80,100] 100 rename-simm ket-simm(features |_self>,features |my perfect woman>)
she-is-in-my-league |*> #=> coeff-in-range[50,80] 100 rename-simm ket-simm(features |_self>,features |my perfect woman>)
not-all-that-interested-in-her |*> #=> coeff-in-range[0,49] 100 rename-simm ket-simm(features |_self>,features |my perfect woman>)

Now, let's see what we have:

sa: table[woman,not-all-that-interested-in-her,she-is-in-my-league,she-is-out-of-my-league] rel-kets[features]
+------------------+--------------------------------+---------------------+-------------------------+
| woman            | not-all-that-interested-in-her | she-is-in-my-league | she-is-out-of-my-league |
+------------------+--------------------------------+---------------------+-------------------------+
| my perfect woman |                                |                     | 100.00                  |
| Mary             | 33.33                          |                     |                         |
| Liz              |                                | 50                  |                         |
| Jane             | 33.33                          |                     |                         |
| Mia              |                                | 66.67               |                         |
| Emma             |                                | 50                  |                         |
| Donna            |                                |                     | 83.33                   |
| the goddess      |                                |                     | 100.00                  |
+------------------+--------------------------------+---------------------+-------------------------+

And as is typical of BKO, this is all rather general. You can define the features with arbitrary superpositions. And it doesn't have to be restricted to the "she is out of my league" domain either. A similar construct can be used for other things.

Update: we can of course tweak the table to show only yes or no.
Simply enough (just prepend "is-" to the coeff-in-range[] operator):

-- define our new operators:
is-not-all-that-interesting |*> #=> is-coeff-in-range[0,49] 100 rename-simm ket-simm(features |_self>,features |my perfect woman>) 
is-in-my-league |*> #=> is-coeff-in-range[50,80] 100 rename-simm ket-simm(features |_self>,features |my perfect woman>)
is-out-of-my-league |*> #=> is-coeff-in-range[80,100] 100 rename-simm ket-simm(features |_self>,features |my perfect woman>) 

-- show the table:
sa: table[woman,is-not-all-that-interesting,is-in-my-league,is-out-of-my-league] rel-kets[features]
+------------------+-----------------------------+-----------------+---------------------+
| woman            | is-not-all-that-interesting | is-in-my-league | is-out-of-my-league |
+------------------+-----------------------------+-----------------+---------------------+
| my perfect woman | no                          | no              | yes                 |
| Mary             | yes                         | no              | no                  |
| Liz              | no                          | yes             | no                  |
| Jane             | yes                         | no              | no                  |
| Mia              | no                          | yes             | no                  |
| Emma             | no                          | yes             | no                  |
| Donna            | no                          | no              | yes                 |
| the goddess      | no                          | no              | yes                 |
+------------------+-----------------------------+-----------------+---------------------+

OK. Kind of cool. And note that using text for operator labels, really is a big step in the direction of NLP (natural language processing).

Update: we can also represent the features superpositions using matrix notation:

sa: load in-my-league.sw
sa: matrix[features]
[ athletic  ] = [  0  1  0  0  0  0  0  0  ] [ Donna            ]
[ beautiful ]   [  1  1  0  0  0  0  1  1  ] [ Emma             ]
[ educated  ]   [  1  0  0  1  0  1  1  1  ] [ Jane             ]
[ loving    ]   [  0  0  0  1  1  1  1  1  ] [ Liz              ]
[ religious ]   [  0  1  0  0  0  0  0  0  ] [ Mary             ]
[ sexy      ]   [  1  1  1  0  0  0  1  1  ] [ Mia              ]
[ skinny    ]   [  1  1  1  0  1  1  1  1  ] [ my perfect woman ]
[ smart     ]   [  1  0  0  1  0  1  1  1  ] [ the goddess      ]

Sunday 21 June 2015

some more similar[inverse-links-to] results

This time using 300,000 pages of wikipedia (out of 15,000,000 total). So roughly 2% of total. Even with EC2, I don't really have the processing power (with the current code) to use much larger sets than this.

sa: load 300k--wikipedia-links.sw
sa: find-inverse[links-to]
sa: H |*> #=> how-many inverse-links-to merge-labels(|WP: > + |_self>)
sa: S |*> #=> table[wikipage,coeff] select[1,60] 100 self-similar[inverse-links-to] merge-labels(|WP: > + |_self>)

sa: S |Love>
+-----------------------------------+--------+
| wikipage                          | coeff  |
+-----------------------------------+--------+
| Love                              | 100.0  |
| Pride                             | 17.391 |
| Pleasure                          | 13.043 |
| Jealousy                          | 13.043 |
| Philotes_(mythology)              | 13.043 |
| Imagination                       | 13.043 |
| Pity                              | 13.043 |
| Envy                              | 13.043 |
| Peace                             | 12.121 |
| Matter                            | 12     |
| Fear                              | 8.696  |
| Measurement                       | 8.696  |
| Number                            | 8.696  |
| Observation                       | 8.696  |
| Misanthropy                       | 8.696  |
| Piety                             | 8.696  |
| Courage                           | 8.696  |
| Hope                              | 8.696  |
| Lust                              | 8.696  |
| Asteria                           | 8.696  |
| Orthrus                           | 8.696  |
| Modesty                           | 8.696  |
| Punishment                        | 8.696  |
| Idea                              | 8.696  |
| Politeness                        | 8.696  |
| Learning                          | 8.696  |
| Luck                              | 8.696  |
| Sexual_attraction                 | 8.696  |
| Necessity                         | 8.696  |
| Physical_intimacy                 | 8.696  |
| Wrath                             | 8.696  |
| Gluttony                          | 8.696  |
| Prediction                        | 8.696  |
| Darkness                          | 8.696  |
| Safety                            | 8.696  |
| Optimism                          | 8.696  |
| Doubt                             | 8.696  |
| Moderation                        | 8.696  |
| Compassion                        | 8.696  |
| Respect                           | 8.696  |
| Nomenclature                      | 8.696  |
| Courtship                         | 8.696  |
| Jonathan_Barnes                   | 8.696  |
| DielsKranz_numbering_system       | 8.696  |
| John_Raven                        | 8.696  |
| De_amore_(Andreas_Capellanus)     | 8.696  |
| Infatuation                       | 8.696  |
| Category:Love               | 8.696  |
| Contempt                          | 8.696  |
| Memory                            | 8.696  |
| Quantity                          | 8.696  |
| cyclops                           | 8.696  |
| Curiosity                         | 8.696  |
| Passion_(emotion)                 | 8.696  |
| Category:Philosophy_of_love | 8.696  |
| nonverbal_communication           | 8.696  |
| Air                               | 8.696  |
| Neikea                            | 8.696  |
| Peter_Kingsley_(scholar)          | 8.696  |
| Inquiry                           | 8.696  |
+-----------------------------------+--------+
  Time taken: 1 hour, 42 minutes, 23 seconds, 210 milliseconds

sa: S |Knowledge>
+----------------------------+--------+
| wikipage                   | coeff  |
+----------------------------+--------+
| Knowledge                  | 100.0  |
| Inquiry                    | 16     |
| Measurement                | 12     |
| Pride                      | 12     |
| Idea                       | 12     |
| Learning                   | 12     |
| Prediction                 | 12     |
| Experience                 | 12     |
| Memory                     | 12     |
| Intelligence_(trait)       | 12     |
| understanding              | 10.345 |
| Imre_Lakatos               | 8.333  |
| Beauty                     | 8      |
| Outline_of_education       | 8      |
| Faith                      | 8      |
| Love                       | 8      |
| Meaning_of_life            | 8      |
| Metaphor                   | 8      |
| Nominalism                 | 8      |
| Number                     | 8      |
| Observation                | 8      |
| Platonic_idealism          | 8      |
| Pain                       | 8      |
| Pathological_science       | 8      |
| Problem_of_other_minds     | 8      |
| Misanthropy                | 8      |
| Piety                      | 8      |
| Virtue                     | 8      |
| Lust                       | 8      |
| Discovery_(observation)    | 8      |
| Ineffability               | 8      |
| Belief                     | 8      |
| Organization               | 8      |
| Modesty                    | 8      |
| Placebo                    | 8      |
| Punishment                 | 8      |
| Quasi-empirical_method     | 8      |
| Pleasure                   | 8      |
| Jealousy                   | 8      |
| Authority                  | 8      |
| Karl_Mannheim              | 8      |
| Paradigm                   | 8      |
| Intensionality             | 8      |
| Problem_of_induction       | 8      |
| Necessity                  | 8      |
| Elegance                   | 8      |
| Prattyasamutpda            | 8      |
| Moderation                 | 8      |
| Phenomenalism              | 8      |
| Nomenclature               | 8      |
| Potentiality_and_actuality | 8      |
| Max_Scheler                | 8      |
| Matter                     | 8      |
| Panpsychism                | 8      |
| Information                | 8      |
| knowledge_management       | 8      |
| Lev_Shestov                | 8      |
| Interpretation_(logic)     | 8      |
| Outline_of_philosophy      | 8      |
| Outline_of_logic           | 8      |
+----------------------------+--------+
  Time taken: 1 hour, 48 minutes, 29 seconds, 868 milliseconds

sa: H |Google>
|number: 704>

sa: S |Google>
+---------------------------------------+--------+
| wikipage                              | coeff  |
+---------------------------------------+--------+
| Google                                | 100.0  |
| Apple_Inc.                            | 14.063 |
| Microsoft                             | 12.732 |
| Facebook                              | 11.222 |
| Yahoo!                                | 9.375  |
| World_Wide_Web                        | 8.807  |
| IBM                                   | 8.093  |
| Sun_Microsystems                      | 7.955  |
| Android_(operating_system)            | 7.812  |
| Internet                              | 7.487  |
| Amazon.com                            | 7.102  |
| Intel                                 | 6.676  |
| Linux                                 | 6.537  |
| Hewlett-Packard                       | 6.25   |
| Stanford_University                   | 6.108  |
| Twitter                               | 6.108  |
| web_browser                           | 6.108  |
| HTML                                  | 5.824  |
| operating_system                      | 5.803  |
| YouTube                               | 5.657  |
| Forbes                                | 5.384  |
| Massachusetts_Institute_of_Technology | 5.324  |
| Java_(programming_language)           | 4.83   |
| AOL                                   | 4.687  |
| smartphone                            | 4.687  |
| open_source                           | 4.687  |
| C_(programming_language)              | 4.608  |
| Silicon_Valley                        | 4.545  |
| Nokia                                 | 4.403  |
| C++                                   | 4.403  |
| Microsoft_Windows                     | 4.354  |
| JavaScript                            | 4.261  |
| Wired_(magazine)                      | 4.261  |
| Motorola                              | 4.119  |
| XML                                   | 4.119  |
| Wall_Street_Journal                   | 4.119  |
| CNET                                  | 4.119  |
| copyright                             | 4.119  |
| software                              | 4.119  |
| Oracle_Corporation                    | 3.977  |
| Sony                                  | 3.977  |
| Unix                                  | 3.977  |
| Mac_OS_X                              | 3.977  |
| Wikipedia                             | 3.977  |
| Internet_Explorer                     | 3.835  |
| OS_X                                  | 3.835  |
| source_code                           | 3.835  |
| eBay                                  | 3.835  |
| computer_science                      | 3.748  |
| University_of_California,_Berkeley    | 3.732  |
| IP_address                            | 3.693  |
| Larry_Page                            | 3.693  |
| iPhone                                | 3.693  |
| algorithm                             | 3.693  |
| free_software                         | 3.693  |
| University_of_Michigan                | 3.551  |
| GNU_General_Public_License            | 3.551  |
| database                              | 3.551  |
| Carnegie_Mellon_University            | 3.409  |
| Cisco_Systems                         | 3.409  |
+---------------------------------------+--------+
  Time taken: 1 day, 18 hours, 53 minutes, 1 second, 791 milliseconds

sa: H |Blog>
|number: 32>

sa: S |Blog>
+-----------------------------------------------------------------+-------+
| wikipage                                                        | coeff |
+-----------------------------------------------------------------+-------+
| Blog                                                            | 100   |
| Active_Server_Pages                                             | 9.375 |
| Desktop_publishing                                              | 9.375 |
| Online_chat                                                     | 9.375 |
| CAPTCHA                                                         | 9.375 |
| RSS                                                             | 9.302 |
| Dynamic_HTML                                                    | 6.25  |
| Malware                                                         | 6.25  |
| Chat_room                                                       | 6.25  |
| Content_management_system                                       | 6.25  |
| ABC_World_News_Tonight                                          | 6.25  |
| Cross-site_scripting                                            | 6.25  |
| Primetime_(TV_series)                                           | 6.25  |
| Phishing                                                        | 6.25  |
| home_page                                                       | 6.25  |
| Open_source_software                                            | 6.25  |
| impact_factor                                                   | 6.25  |
| Terminate_and_Stay_Resident                                     | 6.25  |
| electronic_mailing_list                                         | 6.25  |
| Podcast                                                         | 6.25  |
| Google_Scholar                                                  | 6.25  |
| OPML                                                            | 6.25  |
| feed_aggregator                                                 | 6.25  |
| peer-review                                                     | 6.25  |
| Social_networking_service                                       | 6.25  |
| Digg                                                            | 6.25  |
| carbon_copy                                                     | 6.25  |
| online_community                                                | 6.25  |
| Freemium                                                        | 6.25  |
| Microsoft_Silverlight                                           | 6.25  |
| Wikia                                                           | 6.25  |
| Peer-to-peer_file_sharing                                       | 6.25  |
| Fully_qualified_domain_name                                     | 6.25  |
| Category:Internet_forums                                  | 6.25  |
| Category:American_broadcast_news_analysts                 | 6.25  |
| arXiv.org                                                       | 6.25  |
| preprint                                                        | 6.25  |
| Cicada_3301                                                     | 6.25  |
| fansite                                                         | 6.25  |
| Affiliate_marketing                                             | 6.25  |
| Category:American_television_news_anchors                 | 6.25  |
| Category:ABC_News_personalities                           | 6.25  |
| Category:American_television_reporters_and_correspondents | 6.25  |
| Lisa_McRee                                                      | 6.25  |
| Category:Electronic_publishing                            | 6.25  |
| Kevin_Newman_(journalist)                                       | 6.25  |
| Robin_Roberts_(sportscaster)                                    | 6.25  |
| Internet_Information_Services                                   | 6.061 |
| newsmagazine                                                    | 5.882 |
| George_Stephanopoulos                                           | 5.714 |
| news_presenter                                                  | 5.714 |
| FAQ                                                             | 5.556 |
| Internet_meme                                                   | 5.405 |
| Common_Gateway_Interface                                        | 5.263 |
| Bulletin_board_system                                           | 5.172 |
| Internet_slang                                                  | 5     |
| news_anchor                                                     | 4.651 |
| Document_Object_Model                                           | 4.444 |
| Staff_writer                                                    | 4.444 |
| web_application                                                 | 4.348 |
+-----------------------------------------------------------------+-------+
  Time taken: 2 hours, 12 minutes, 50 seconds, 381 milliseconds

sa: H |arXiv.org>
|number: 3>

sa: S |arXiv.org>
+------------------------------------------------------------------------------------+--------+
| wikipage                                                                           | coeff  |
+------------------------------------------------------------------------------------+--------+
| arXiv.org                                                                          | 100    |
| citation_impact                                                                    | 40     |
| serials_crisis                                                                     | 40     |
| NEC_Research_Institute                                                             | 40     |
| postprint                                                                          | 40     |
| institutional_repository                                                           | 40     |
| OAIster                                                                            | 40     |
| SHERPA_(organisation)                                                              | 40     |
| Category:Electronic_publishing                                               | 40     |
| Paul_Ginsparg                                                                      | 33.333 |
| preprint                                                                           | 27.273 |
| self-archiving                                                                     | 25     |
| Category:Academic_publishing                                                 | 23.077 |
| Methodological_naturalism                                                          | 20     |
| Presocratics                                                                       | 20     |
| Cryptology_ePrint_Archive                                                          | 20     |
| Open_publishing                                                                    | 20     |
| Hubble_diagram                                                                     | 20     |
| GZK_paradox                                                                        | 20     |
| List_of_unsolved_problems_in_physics                                               | 20     |
| Print_on_demand                                                                    | 20     |
| TeV                                                                                | 20     |
| Boundary_condition                                                                 | 20     |
| Black_body_radiation                                                               | 20     |
| Subscriptions                                                                      | 20     |
| R.P._Feynman                                                                       | 20     |
| Citeseer                                                                           | 20     |
| Citation_index                                                                     | 20     |
| File:Solvay_conference_1927.jpg                                              | 20     |
| File:Senenmut-Grab.JPG                                                       | 20     |
| bioacoustics                                                                       | 20     |
| pattern_formation                                                                  | 20     |
| University_Physics                                                                 | 20     |
| File:Archimedes-screw_one-screw-threads_with-ball_3D-view_animated_small.gif | 20     |
| Bryn_Mawr_Classical_Review                                                         | 20     |
| File:Acceleration_components.JPG                                             | 20     |
| Delayed_open-access_journal                                                        | 20     |
| Astronomical_ceiling_of_Senemut_Tomb                                               | 20     |
| quantitative_finance                                                               | 20     |
| File:CMS_Higgs-event.jpg                                                     | 20     |
| James_Madison_Award                                                                | 20     |
| Public_Knowledge_Project                                                           | 20     |
| the_central_science                                                                | 20     |
| Difference_between_chemistry_and_physics                                           | 20     |
| theses                                                                             | 20     |
| Optical_physics                                                                    | 20     |
| analytic_solution                                                                  | 20     |
| weakly_interacting_massive_particle                                                | 20     |
| superclusters                                                                      | 20     |
| Open_Humanities_Press                                                              | 20     |
| iBooks_Author                                                                      | 20     |
| econophysics                                                                       | 20     |
| ultrasonics                                                                        | 20     |
| OAI-PMH                                                                            | 20     |
| Journal_of_Library_Administration                                                  | 20     |
| File:Einstein1921_by_F_Schmutzer_2.jpg                                       | 20     |
| Ancient_Greek_poetry                                                               | 20     |
| Publish_or_perish                                                                  | 20     |
| higher_dimension                                                                   | 20     |
| IBEX                                                                               | 20     |
+------------------------------------------------------------------------------------+--------+
  Time taken: 41 minutes, 16 seconds, 954 milliseconds

sa: H |Theory_of_everything>
|number: 13>

sa: S |Theory_of_everything>
+-------------------------------------------------------------+--------+
| wikipage                                                    | coeff  |
+-------------------------------------------------------------+--------+
| Theory_of_everything                                        | 100.0  |
| Ultimate_fate_of_the_universe                               | 21.429 |
| Planck_scale                                                | 17.391 |
| Big_Rip                                                     | 15.385 |
| Eddington_limit                                             | 15.385 |
| Supersymmetry                                               | 15.385 |
| Arrow_of_time                                               | 15.385 |
| Dimensionless_physical_constant                             | 15.385 |
| Plumian_Professor_of_Astronomy_and_Experimental_Philosophy  | 15.385 |
| Sir_Roger_Penrose                                           | 15.385 |
| Bakerian_Lecture                                            | 15.385 |
| grand_unified_theory                                        | 15.385 |
| Big_Freeze                                                  | 15.385 |
| Topological_order                                           | 15.385 |
| Baryon_asymmetry                                            | 15.385 |
| Neutrino_mass                                               | 15.385 |
| Unified_field_theory                                        | 15.385 |
| Membrane_(M-theory)                                         | 15.385 |
| Static_forces_and_virtual-particle_exchange                 | 15.385 |
| Generation_(particle_physics)                               | 15.385 |
| Stellar_nucleosynthesis                                     | 14.286 |
| Compact_Muon_Solenoid                                       | 13.333 |
| Cosmic_inflation                                            | 13.333 |
| neutrino_oscillation                                        | 12.5   |
| Hermann_Bondi                                               | 11.765 |
| Category:Presidents_of_the_Royal_Astronomical_Society | 11.111 |
| YangMills_theory                                            | 11.111 |
| anthropic_principle                                         | 10.345 |
| Dark_matter                                                 | 9.524  |
| James_Watson                                                | 9.091  |
| CP_violation                                                | 8      |
| Anisotropy                                                  | 7.692  |
| Antiparticle                                                | 7.692  |
| Acts                                                        | 7.692  |
| Centripetal_force                                           | 7.692  |
| Graviton                                                    | 7.692  |
| Gluon                                                       | 7.692  |
| Hydrogen_atom                                               | 7.692  |
| Liquid_crystal                                              | 7.692  |
| Main_sequence                                               | 7.692  |
| Morphogenesis                                               | 7.692  |
| Panspermia                                                  | 7.692  |
| Proton_decay                                                | 7.692  |
| Qubit                                                       | 7.692  |
| Tokamak                                                     | 7.692  |
| Quintessence_(physics)                                      | 7.692  |
| Sonoluminescence                                            | 7.692  |
| Gravitational_lens                                          | 7.692  |
| High-temperature_superconductor                             | 7.692  |
| Fact                                                        | 7.692  |
| Timeline_of_gravitational_physics_and_relativity            | 7.692  |
| Timeline_of_stellar_astronomy                               | 7.692  |
| List_of_astronomers                                         | 7.692  |
| Astrophysicist                                              | 7.692  |
| Triple-alpha_process                                        | 7.692  |
| Religious                                                   | 7.692  |
| Quark_matter                                                | 7.692  |
| Gravity_assist                                              | 7.692  |
| Theory_of_Everything                                        | 7.692  |
| Color_confinement                                           | 7.692  |
+-------------------------------------------------------------+--------+
  Time taken: 1 hour, 8 minutes, 42 seconds, 470 milliseconds

OK. Some cool results in there. Actually, I think they are amazing! I think I have done enough examples of this now.

Though maybe I should note, that the bigger the number H returns, the better the result. Which presumably means if we used even more of wikipedia, we would get even better results! And brings to mind the question, how many wikipages do we need to know more than the average human?

BTW, I don't think I have linked to this yet, the full wikipedia link structure in sw notation. bzip2 down to about 2 GB I seem to recall.

Sunday 14 June 2015

non-linear resonance

The idea is this. The brain stores a vast collection of patterns, and in the BKO scheme we store these patterns as superpositions. But then given an input pattern the brain can resonate if it detects a specific pattern. In the extreme case of a "non-linear resonance" the pattern has to be very precise to trigger the resonance. Though we can have weak resonance too, in which case even a loosely similar pattern can trigger a resonance.

Anyway, words are boring, lets give an example in the console:

-- load up this data:
sa: dump
----------------------------------------
|context> => |context: non-linear resonance>

non-linear-resonance |*> #=> 1000 drop-below[0.99] simm(""|_self>, ""|g>) |g>
weak-resonance |*> #=> 200 drop-below[0.6] simm(""|_self>, ""|g>) |g>

 |g> => |a> + |b> + |c> + |d>

 |f1> => |a>
 |f2> => |a> + |b>
 |f3> => |a> + |b> + |c>
 |f4> => |a> + |b> + |c> + 0.9|d>
 |f5> => 0.95|a> + |b> + |c> + |d>
 |f6> => |a> + |b> + |c> + |d>
 |f7> => |a> + |b> + |c> + |d> + |e>
 |list> => |f1> + |f2> + |f3> + |f4> + |f5> + |f6> + |f7>
----------------------------------------

where g is our incoming pattern, and f_k are our stored patterns.
And we have defined our weak-resonance and non-linear-resonance operators (where we need at least 60% similarity for the weak-resonance, and 99% similarity for the non-linear-resonance, and the amplitude of the non-linear-resonance is much higher).

And then, let's look at the resulting table:

sa: table[pattern,weak-resonance,non-linear-resonance] "" |list>
+---------+----------------+----------------------+
| pattern | weak-resonance | non-linear-resonance |
+---------+----------------+----------------------+
| f1      |                |                      |
| f2      |                |                      |
| f3      | 150 g          |                      |
| f4      | 196.15 g       |                      |
| f5      | 198.10 g       | 990.51 g             |
| f6      | 200 g          | 1000 g               |
| f7      | 160 g          |                      |
+---------+----------------+----------------------+

I'm not sure how that table looks to others, but to me it is very, very beautiful. It is showing hints of stuff I have been thinking about for a long, long time now.

Anyway, some comments:
1) the patterns can of course be anything. eg, a very specific sequence of sounds could non-linear resonate with the "frog" neuron. The specific sequence of letters for "beach" might weak-resonate with the "sun", "sand", "waves" and "beach-goers" neurons.
2) the above of course has a lot of similarity with the similar[op] operator.
3) I suspect something very similar to that table happens in the hippocampus. But that is for later!

Update: the above is essentially a 1-D version of the landscape function:
L(f,x) = simm(f,g(x))
Though here we have f and g swapped, so it is:
L(g,x) = simm(g,f(x))

Update: on reflection, I don't think the "beach" thing is a good example of a weak resonance. I'll have to see if I can think of a better example.

Update: we can also have a "square resonance":

square-resonance |*> #=> 200 clean drop-below[0.6] simm(""|_self>, ""|g>) |g>

Now, look at the table:

sa: table[pattern,weak-resonance,square-resonance,non-linear-resonance] "" |list>
+---------+----------------+------------------+----------------------+
| pattern | weak-resonance | square-resonance | non-linear-resonance |
+---------+----------------+------------------+----------------------+
| f1      |                |                  |                      |
| f2      |                |                  |                      |
| f3      | 150 g          | 200 g            |                      |
| f4      | 196.15 g       | 200 g            |                      |
| f5      | 198.10 g       | 200 g            | 990.51 g             |
| f6      | 200 g          | 200 g            | 1000 g               |
| f7      | 160 g          | 200 g            |                      |
+---------+----------------+------------------+----------------------+

Anyway, that should be clear enough. And of course, we can make (almost) arbitrary permutations of the shape of the resonance.

Also, we could call "weak resonance" a "fuzzy resonance". Ie, the pattern only has to be fuzzy close, yet it still resonates.

Wednesday 10 June 2015

wikipedia fragment to frequency list

Now, we had a lot of success in mapping wikipedia to its link structure, and finding semantic similarities. That uses "similar[op]". This time, we map wikipedia (well, a small piece of it) to frequency lists, and see how "find-topic[op]" works.

Here is my code to convert wikipedia xml to single word frequencies.

$ ./play_with_wikipedia_freq_list.py data/fragments/0.xml

10,604 minutes later, we have this.

Now, some examples:

sa: T |*> #=> table[wikipage,coeff] select[1,300] 100 intn-find-topic[words-1] |_self>
sa: T |river torrens>
+-----------------------------+----------+
| wikipage                    | coeff    |
+-----------------------------+----------+
| Murray_River                | 2210.857 |
| The_Bronx                   | 1450.875 |
| South_Australia             | 1243.607 |
| Adelaide                    | 1130.552 |
| Prince_Edward_Island        | 746.164  |
| Gypsum                      | 710.633  |
| Port_Adelaide_Football_Club | 678.331  |
| June_14                     | 552.714  |
| Trade                       | 552.714  |
| October_25                  | 497.443  |
| Dinosaur                    | 226.11   |
+-----------------------------+----------+
  Time taken: 27 minutes, 19 seconds, 709 milliseconds

sa: T |adelaide university>
+-------------------------------------------+--------+
| wikipage                                  | coeff  |
+-------------------------------------------+--------+
| Macquarie_University                      | 90.953 |
| Immanuel_Kant                             | 74.416 |
| Robert_Menzies                            | 71.625 |
| David_Hume                                | 71.625 |
| Theology                                  | 68.214 |
| Adelaide                                  | 65.114 |
| Austin,_Texas                             | 65.114 |
| Yoga                                      | 65.114 |
| Gregor_Mendel                             | 63.951 |
| Mike_Moore_(New_Zealand_politician)       | 63.951 |
| New_South_Wales                           | 61.393 |
| Perth                                     | 61.393 |
| Aristophanes                              | 61.393 |
| Bob_Hawke                                 | 61.393 |
| Culture_of_Canada                         | 61.393 |
| John_Milton                               | 61.393 |
| West_Bengal                               | 61.393 |
| Brewing                                   | 61.393 |
| Fyodor_Dostoyevsky                        | 61.393 |
| Hunter_College                            | 61.393 |
| John_Stuart_Mill                          | 61.393 |
...
  Time taken: 27 minutes, 8 seconds, 965 milliseconds

sa: T |apple juice>
+---------------------------------------+---------+
| wikipage                              | coeff   |
+---------------------------------------+---------+
| Vinegar                               | 402.189 |
| McIntosh_(apple)                      | 367.835 |
| Fruit                                 | 361.97  |
| Cuisine_of_the_United_States          | 329.064 |
| Drink                                 | 321.751 |
| Vietnamese_cuisine                    | 321.751 |
| List_of_cocktails                     | 321.751 |
| Hungarian_language                    | 294.268 |
| Arsenic                               | 289.576 |
| Chardonnay                            | 289.576 |
| Pear                                  | 271.478 |
| Swedish_cuisine                       | 271.478 |
| Cuisine_of_the_Southern_United_States | 271.478 |
| Food_preservation                     | 241.314 |
| Turkish_cuisine                       | 241.314 |
| Mead                                  | 241.314 |
| French_cuisine                        | 217.182 |
| Mojito                                | 206.84  |
...
  Time taken: 27 minutes, 25 seconds, 378 milliseconds

T |russia china japan australia new zealand egypt>
+-----------------------------------------------------------+--------+
| wikipage                                                  | coeff  |
+-----------------------------------------------------------+--------+
| Tram                                                      | 77.349 |
| List_of_national_capitals_and_largest_cities_by_country   | 75.967 |
| General_Motors                                            | 74.448 |
| 2000s_(decade)                                            | 70.903 |
| History_of_painting                                       | 70.903 |
| 2010s                                                     | 67.68  |
| British_Empire                                            | 67.68  |
| Foreign_relations_of_China                                | 67.68  |
| Self-determination                                        | 67.68  |
| Foreign_relations_of_Taiwan                               | 67.68  |
| Toyota                                                    | 67.68  |
| Dwight_D._Eisenhower                                      | 65.991 |
| Psychology                                                | 65.991 |
| 2008                                                      | 63.813 |
| List_of_former_sovereign_states                           | 63.813 |
| Foreign_relations_of_Indonesia                            | 63.813 |
| Foreign_relations_of_Japan                                | 63.813 |
| Foreign_relations_of_North_Korea                          | 63.813 |
| Peninsula                                                 | 63.813 |
| Pandemic                                                  | 63.813 |
| United_Nations_Security_Council                           | 63.813 |
| 1996                                                      | 63.813 |
| List_of_mountains                                         | 63.813 |
...
  Time taken: 1 hour, 39 minutes, 16 seconds, 194 milliseconds

Anyway, largely rubbish results! Doesn't mean find-topic[op] is completely useless, eg, seems to work well with finding name type (male, female, last), but just doesn't work that well on wikipedia.

even more inverse simm results

OK. I have been throwing example after example at my similar[inverse-links-to] code, and almost all of them gave great results. Some even with only 6 incoming links. The worst I think was "kangaroo" which only had 4 incoming links.

Anyway, too big to paste in here, so here is the text file.
And here are my operators:

H |*> #=> how-many inverse-links-to |_self>
S |*> #=> table[wikipage,coeff] select[1,200] 100 self-similar[inverse-links-to] |_self>

I guess the point is made. Not sure how much more I want to do with it, at least for now.

Update: here is a thought. You know the game, one person says a word/concept then the other person has to answer with the first thing that crosses their mind. We could do that (though the current code would be rather slow!). Something like:

what-do-you-think-of-when-you-hear |*> #=> select[1,1] clean similar[inverse-links-to] |_self>

Noting that we don't need the multiply by 100, and we use similar[op] not self-similar[op].
So, some examples in the console:

sa: what-do-you-think-of-when-you-hear |*> #=> select[1,1] clean similar[inverse-links-to] |_self>
sa: what-do-you-think-of-when-you-hear |WP: Pen>
|WP: Marker_pen>

sa: what-do-you-think-of-when-you-hear |WP: Fox>
|WP: Hyena>

Cool! Though we can add a random element in there too:

-- version 1:
sa: what-do-you-think-of-when-you-hear |*> #=> clean select[1,1] shuffle select[1,5] similar[inverse-links-to] |_self>

-- version 2:
sa: what-do-you-think-of-when-you-hear |*> #=> clean pick-elt select[1,5] similar[inverse-links-to] |_self>

sa: what-do-you-think-of-when-you-hear |WP: Fox>
|WP: Weasel>

sa: what-do-you-think-of-when-you-hear |WP: Diamond>
|WP: Selenium>

-- version 3:
sa: what-do-you-think-of-when-you-hear |*> #=> clean pick-elt select[1,5] similar[inverse-links-to] merge-labels(|WP: > + |_self>)

sa: what-do-you-think-of-when-you-hear |Physics>
|WP: Chemistry>

-- version 4:
sa: what-do-you-think-of-when-you-hear |*> #=> extract-value clean pick-elt select[1,5] similar[inverse-links-to] merge-labels(|WP: > + |_self>)

sa: what-do-you-think-of-when-you-hear |Student>
|Uniform>

sa: what-do-you-think-of-when-you-hear |Quentin_Tarantino>
|Brian_De_Palma>

sa: what-do-you-think-of-when-you-hear |House>
|Amateur_astronomy>

sa: what-do-you-think-of-when-you-hear |Elephant>
|Ant>

sa: what-do-you-think-of-when-you-hear |Elephant>
|Donkey>

sa: what-do-you-think-of-when-you-hear |Rock_and_roll>
|Jimmie_Rodgers_(country_singer)>

sa: what-do-you-think-of-when-you-hear |Rock_and_roll>
|American_Bandstand>

sa: what-do-you-think-of-when-you-hear |Paris>
|Moscow>

sa: what-do-you-think-of-when-you-hear |Paris>
|Rome>

sa: what-do-you-think-of-when-you-hear |Cryptography>
|Data_Encryption_Standard>

sa: what-do-you-think-of-when-you-hear |Cryptography>
|one-time_pad>

sa: what-do-you-think-of-when-you-hear |Drake_equation>
|Kepler_(spacecraft)>

sa: what-do-you-think-of-when-you-hear |Drake_equation>
|Frank_Drake>

sa: what-do-you-think-of-when-you-hear |Slashdot>
|Bruce_Schneier>

sa: what-do-you-think-of-when-you-hear |Slashdot>
|Adrian_Carmack>

sa: what-do-you-think-of-when-you-hear |Slashdot>
|Neal_Stephenson>

sa: what-do-you-think-of-when-you-hear |Scotland>
|Edinburgh>

sa: what-do-you-think-of-when-you-hear |Scotland>
|Spain>

sa: what-do-you-think-of-when-you-hear |Food>
|Wood>

sa: what-do-you-think-of-when-you-hear |Food>
|Cement>

sa: what-do-you-think-of-when-you-hear |Food>
|glycemic_index>

sa: what-do-you-think-of-when-you-hear |The_Terminator>
|T-1000>

sa: what-do-you-think-of-when-you-hear |The_Terminator>
|Aliens_(film)>

sa: what-do-you-think-of-when-you-hear |Algebra>
|Axiom>

sa: what-do-you-think-of-when-you-hear |Algebra>
|Constructivism_(mathematics)>

sa: what-do-you-think-of-when-you-hear |Functional_analysis>
|Measure_theory>

sa: what-do-you-think-of-when-you-hear |Functional_analysis>
|spanning_tree>

OK. Some weird ones in there, but some fun ones too. And if we use more of wikipedia then we should get even better results.

Update: we can now also do this:

sa: table[wikipage] exp[what-do-you-think-of-when-you-hear,10] |The_Terminator>
+-----------------------------+
| wikipage                    |
+-----------------------------+
| The_Terminator              |
| Sylvester_Stallone          |
| Eddie_Murphy                |
| Kevin_Costner               |
| Michelle_Pfeiffer           |
| Sharon_Stone                |
| Frasier                     |
| sitcom                      |
| Cheers                      |
| CBS_Television_Distribution |
+-----------------------------+
  Time taken: 1 hour, 18 minutes, 41 seconds, 294 milliseconds

Update: I applied similar[inverse-links-to] to a 300,000 sample of wikipedia (as opposed to 30,000 used here) and the results were even better!

Friday 5 June 2015

more inverse-simm results

OK. In the last post we discovered similar[inverse-links-to] seems to give some good results. Let's expand our test set, and try it on a few more examples.

-- define our test set:
|list> => |WP: Erwin_Schrdinger> + |WP: Richard_Feynman> + |WP: Cat> + |WP: Dog> + |WP: Apple> + |WP: Adelaide> + |WP: University_of_Adelaide> + |WP: Particle_physics> + |WP: Lisp_(programming_language)> + |WP: APL_(programming_language)> + |WP: SQL> + |WP: SPARQL> + |WP: The_Doors> + |WP: Rugby> + |WP: Australian_Football_League>

-- how many incoming links?
sa: how-many-in-links |*> #=> how-many inverse-links-to |_self>
sa: table[wikipage,how-many-in-links] "" |list>
+-----------------------------+-------------------+
| wikipage                    | how-many-in-links |
+-----------------------------+-------------------+
| Erwin_Schrdinger            | 53                |
| Richard_Feynman             | 79                |
| Cat                         | 14                |
| Dog                         | 24                |
| Apple                       | 21                |
| Adelaide                    | 81                |
| University_of_Adelaide      | 10                |
| Particle_physics            | 17                |
| Lisp_(programming_language) | 64                |
| APL_(programming_language)  | 24                |
| SQL                         | 41                |
| SPARQL                      | 6                 |
| The_Doors                   | 41                |
| Rugby                       | 0                 |
| Australian_Football_League  | 30                |
+-----------------------------+-------------------+

-- create the data:
sa: inverse-simm-op |WP: *> #=> select[1,500] 100 self-similar[inverse-links-to] |_self>
sa: |null> => map[inverse-simm-op,inverse-simm] "" |list>

-- define an operator to explore the resulting data:
sa: T |*> #=> table[wikipage,coeff] select[1,20] inverse-simm |_self>

-- now our examples:
sa: T |WP: Erwin_Schrdinger>
+---------------------------+--------+
| wikipage                  | coeff  |
+---------------------------+--------+
| Erwin_Schrdinger          | 100.0  |
| Max_Born                  | 32.075 |
| Niels_Bohr                | 31.646 |
| Schrdinger_equation       | 30.189 |
| Paul_Dirac                | 29.31  |
| Wolfgang_Pauli            | 28.302 |
| Werner_Heisenberg         | 28.049 |
| Max_Planck                | 26.984 |
| uncertainty_principle     | 26.415 |
| photoelectric_effect      | 24.528 |
| Roger_Penrose             | 22.642 |
| Bohr_model                | 20.755 |
| Arnold_Sommerfeld         | 20.755 |
| Louis_de_Broglie          | 20.755 |
| wave_function             | 20.755 |
| Copenhagen_interpretation | 18.868 |
| quantum_state             | 18.868 |
| Ernest_Rutherford         | 17.742 |
| Maxwell's_equations       | 17.241 |
| Pauli_exclusion_principle | 16.981 |
+---------------------------+--------+

sa: T |WP: Richard_Feynman>
+------------------------------------+--------+
| wikipage                           | coeff  |
+------------------------------------+--------+
| Richard_Feynman                    | 100.0  |
| Werner_Heisenberg                  | 24.39  |
| special_relativity                 | 20.792 |
| Niels_Bohr                         | 20.253 |
| Paul_Dirac                         | 20.253 |
| particle_physics                   | 20.225 |
| classical_mechanics                | 20.0   |
| fermion                            | 18.987 |
| spin_(physics)                     | 18.987 |
| Standard_Model                     | 17.722 |
| Schrdinger_equation                | 17.722 |
| quantum_field_theory               | 17.722 |
| electromagnetism                   | 17.241 |
| Erwin_Schrdinger                   | 16.456 |
| Pauli_exclusion_principle          | 16.456 |
| quark                              | 16.456 |
| Stephen_Hawking                    | 16.456 |
| quantum_electrodynamics            | 16.456 |
| Julian_Schwinger                   | 16.456 |
| Category:Concepts_in_physics | 16.279 |
+------------------------------------+--------+

sa: T |WP: Cat>
+----------+--------+
| wikipage | coeff  |
+----------+--------+
| Cat      | 100.0  |
| Horse    | 31.25  |
| Donkey   | 28.571 |
| Goat     | 28.571 |
| Elephant | 21.429 |
| Pig      | 21.429 |
| Rabbit   | 21.429 |
| Deer     | 21.429 |
| Mule     | 21.429 |
| Goose    | 21.429 |
| Dog      | 20.833 |
| Sheep    | 20     |
| Lion     | 18.75  |
| Almond   | 14.286 |
| Alder    | 14.286 |
| Ant      | 14.286 |
| Bear     | 14.286 |
| Bee      | 14.286 |
| Fox      | 14.286 |
| Lizard   | 14.286 |
+----------+--------+

sa: T |WP: Dog>
+------------------+--------+
| wikipage         | coeff  |
+------------------+--------+
| Dog              | 100.0  |
| Horse            | 29.167 |
| coyote           | 22.222 |
| Gray_wolf        | 21.429 |
| Arctic_fox       | 20.833 |
| Cat              | 20.833 |
| Canidae          | 20.833 |
| Elephant         | 20.833 |
| bobcat           | 20.833 |
| red_fox          | 20.833 |
| Donkey           | 20.833 |
| red_wolf         | 20.833 |
| Rabbit           | 16.667 |
| African_wild_dog | 16.667 |
| gray_wolf        | 16.667 |
| Domestic_sheep   | 16.667 |
| dingo            | 16.667 |
| Goat             | 16.667 |
| Cattle           | 16.667 |
| otter            | 16.667 |
+------------------+--------+

sa: T |WP: Apple>
+----------------+--------+
| wikipage       | coeff  |
+----------------+--------+
| Apple          | 100.0  |
| Strawberry     | 33.333 |
| Cranberry      | 23.81  |
| Grape          | 23.81  |
| Tomato         | 23.81  |
| Cherry         | 23.81  |
| Kiwifruit      | 19.048 |
| Blackberry     | 19.048 |
| plum           | 19.048 |
| Lime_(fruit)   | 19.048 |
| Pineapple      | 19.048 |
| Lemon          | 19.048 |
| Blueberry      | 19.048 |
| peach          | 17.5   |
| pear           | 17.391 |
| Orange_(fruit) | 16.667 |
| Pear           | 14.286 |
| Banana         | 14.286 |
| Peach          | 14.286 |
| Squash_(plant) | 14.286 |
+----------------+--------+

sa: T |WP: Adelaide>
+-------------------------------------+--------+
| wikipage                            | coeff  |
+-------------------------------------+--------+
| Adelaide                            | 100.0  |
| Brisbane                            | 37.079 |
| Perth                               | 32.099 |
| South_Australia                     | 26.042 |
| Melbourne                           | 18.687 |
| Canberra                            | 15.044 |
| The_Age                             | 14.634 |
| Sydney                              | 14.583 |
| Australian_Broadcasting_Corporation | 13.445 |
| Australian_rules_football           | 12.346 |
| Auckland                            | 12.195 |
| Australian_Labor_Party              | 11.111 |
| Darwin,_Northern_Territory          | 11.111 |
| Triple_J                            | 11.111 |
| Seven_Network                       | 11.111 |
| States_and_territories_of_Australia | 11.111 |
| Karachi                             | 10.989 |
| Australian_Football_League          | 9.877  |
| The_Australian                      | 9.877  |
| Western_Australia                   | 8.911  |
+-------------------------------------+--------+

sa: T |WP: University_of_Adelaide>
+----------------------------------------+-------+
| wikipage                               | coeff |
+----------------------------------------+-------+
| University_of_Adelaide                 | 100.0 |
| Port_Adelaide_Football_Club            | 20    |
| Adelaide_Oval                          | 20    |
| Adelaide_city_centre                   | 20    |
| University_of_South_Australia          | 20    |
| Port_Adelaide                          | 20    |
| Australian_Grand_Prix                  | 20    |
| State_Bank_of_South_Australia          | 20    |
| Mount_Lofty                            | 20    |
| South_Eastern_Freeway                  | 20    |
| Southern_Expressway_(Australia)        | 20    |
| Government_of_South_Australia          | 20    |
| Flinders_University_of_South_Australia | 20    |
| South_Australian_Museum                | 20    |
| Adelaide_Crows                         | 20    |
| Glenelg,_South_Australia               | 20    |
| Australian_Central_Standard_Time       | 20    |
| Australian_Central_Daylight_Time       | 20    |
| Fleurieu_Peninsula                     | 20    |
| River_Torrens                          | 20    |
+----------------------------------------+-------+

sa: T |WP: Particle_physics>
+----------------------------------------+--------+
| wikipage                               | coeff  |
+----------------------------------------+--------+
| Particle_physics                       | 100    |
| Optics                                 | 20     |
| Cosmology                              | 20     |
| Acoustics                              | 17.647 |
| Condensed_matter_physics               | 17.647 |
| Fluid_dynamics                         | 17.647 |
| Thermodynamics                         | 17.647 |
| kinematics                             | 17.647 |
| atomic,_molecular,_and_optical_physics | 17.647 |
| cosmic_inflation                       | 17.647 |
| Fluid_statics                          | 17.647 |
| Lambda-CDM_model                       | 17.647 |
| Biophysics                             | 17.647 |
| Category:Physics                 | 17.647 |
| Lev_Landau                             | 15.789 |
| Nuclear_physics                        | 14.286 |
| virtual_particle                       | 14.286 |
| quantum_chemistry                      | 13.636 |
| American_Physical_Society              | 13.333 |
| Elementary_particle                    | 13.043 |
+----------------------------------------+--------+

sa: T |WP: Lisp_(programming_language)>
+------------------------------------+--------+
| wikipage                           | coeff  |
+------------------------------------+--------+
| Lisp_(programming_language)        | 100    |
| Smalltalk                          | 28.125 |
| Pascal_(programming_language)      | 24.675 |
| Fortran                            | 23.881 |
| Scheme_(programming_language)      | 23.438 |
| Ruby_(programming_language)        | 23.188 |
| object-oriented_programming        | 21.875 |
| PHP                                | 20.779 |
| Prolog                             | 20.312 |
| Haskell_(programming_language)     | 20.312 |
| Ada_(programming_language)         | 18.75  |
| APL_(programming_language)         | 18.75  |
| BASIC                              | 18.75  |
| COBOL                              | 18.75  |
| functional_programming             | 18.75  |
| John_McCarthy_(computer_scientist) | 18.75  |
| C_Sharp_(programming_language)     | 18.75  |
| programming_language               | 17.647 |
| JavaScript                         | 17.526 |
| compiler                           | 17     |
+------------------------------------+--------+

sa: T |WP: APL_(programming_language)>
+------------------------------------+--------+
| wikipage                           | coeff  |
+------------------------------------+--------+
| APL_(programming_language)         | 100.0  |
| Kenneth_E._Iverson                 | 33.333 |
| John_McCarthy_(computer_scientist) | 25.0   |
| John_Backus                        | 25.0   |
| Prolog                             | 23.333 |
| Alan_Kay                           | 20.833 |
| AWK                                | 20.833 |
| Grace_Hopper                       | 20.833 |
| ML_(programming_language)          | 20.833 |
| Niklaus_Wirth                      | 20.833 |
| logic_programming                  | 20.833 |
| J_(programming_language)           | 20.833 |
| bytecode                           | 19.231 |
| Lisp_(programming_language)        | 18.75  |
| programmer                         | 17.857 |
| Objective-C                        | 17.241 |
| BASIC                              | 17.188 |
| Mathematica                        | 17.143 |
| SQL                                | 17.073 |
| ALGOL                              | 16.667 |
+------------------------------------+--------+

sa: T |WP: SQL>
+----------------------------------------+--------+
| wikipage                               | coeff  |
+----------------------------------------+--------+
| SQL                                    | 100.0  |
| Haskell_(programming_language)         | 22.727 |
| PHP                                    | 19.481 |
| APL_(programming_language)             | 17.073 |
| Category:Cross-platform_software | 17.073 |
| Visual_Basic                           | 17.073 |
| relational_database                    | 17.073 |
| COBOL                                  | 16.327 |
| PostgreSQL                             | 14.634 |
| R_(programming_language)               | 14.634 |
| Run_time_(program_lifecycle_phase)     | 14.634 |
| Ruby_(programming_language)            | 14.493 |
| JavaScript                             | 13.402 |
| C_Sharp_(programming_language)         | 13.208 |
| database                               | 12.644 |
| Lisp_(programming_language)            | 12.5   |
| Common_Lisp                            | 12.195 |
| Graphical_user_interface               | 12.195 |
| MySQL                                  | 12.195 |
| Mathematica                            | 12.195 |
+----------------------------------------+--------+

sa: T |WP: SPARQL>
+--------------------------------------------------------------------------------------------+--------+
| wikipage                                                                                   | coeff  |
+--------------------------------------------------------------------------------------------+--------+
| SPARQL                                                                                     | 100.0  |
| Web_Ontology_Language                                                                      | 33.333 |
| Agris:_International_Information_System_for_the_Agricultural_Sciences_and_Technology | 33.333 |
| W3C_XML_Schema                                                                             | 33.333 |
| GRDDL                                                                                      | 33.333 |
| Conceptual_interoperability                                                                | 33.333 |
| Category:Web_services                                                                | 33.333 |
| Resource_Description_Framework                                                             | 20     |
| Analytic_geometry                                                                          | 16.667 |
| DHTML                                                                                      | 16.667 |
| Interpolation                                                                              | 16.667 |
| GNU_nano                                                                                   | 16.667 |
| Pico_(text_editor)                                                                         | 16.667 |
| Relational_database                                                                        | 16.667 |
| Sir_Charles_Lyell                                                                          | 16.667 |
| Synchronized_Multimedia_Integration_Language                                               | 16.667 |
| Semantic_network                                                                           | 16.667 |
| Backronym                                                                                  | 16.667 |
| Interoperability                                                                           | 16.667 |
| RAS_syndrome                                                                               | 16.667 |
+--------------------------------------------------------------------------------------------+--------+

sa: T |WP: The_Doors>
+---------------------------------------+--------+
| wikipage                              | coeff  |
+---------------------------------------+--------+
| The_Doors                             | 100.0  |
| Jim_Morrison                          | 31.707 |
| Ray_Manzarek                          | 21.951 |
| Jefferson_Airplane                    | 17.073 |
| The_Band                              | 14.634 |
| Bee_Gees                              | 14.634 |
| Janis_Joplin                          | 12.195 |
| Summer_of_Love                        | 12.195 |
| Timothy_Leary                         | 12.195 |
| folk_rock                             | 12.195 |
| Governor_of_American_Samoa            | 12.195 |
| Cream_(band)                          | 12.195 |
| Sgt._Pepper's_Lonely_Hearts_Club_Band | 12.195 |
| The_Byrds                             | 12.195 |
| Joan_Baez                             | 12.195 |
| Iron_Butterfly                        | 12.195 |
| Brian_Jones                           | 12.195 |
| Creedence_Clearwater_Revival          | 12.195 |
| Johnny_Winter                         | 12.195 |
| The_Yardbirds                         | 12.195 |
+---------------------------------------+--------+

sa: T |WP: Rugby>
+----------+-------+
| wikipage | coeff |
+----------+-------+
+----------+-------+

sa: T |WP: Australian_Football_League>
+---------------------------------+--------+
| wikipage                        | coeff  |
+---------------------------------+--------+
| Australian_Football_League      | 100.0  |
| West_Coast_Eagles               | 40     |
| Richmond_Football_Club          | 40     |
| Sydney_Swans                    | 36.667 |
| St_Kilda_Football_Club          | 36.667 |
| Collingwood_Football_Club       | 36.667 |
| Hawthorn_Football_Club          | 36.667 |
| Australian_rules_football       | 33.871 |
| Essendon_Football_Club          | 33.333 |
| North_Melbourne_Football_Club   | 33.333 |
| Western_Bulldogs                | 33.333 |
| Carlton_Football_Club           | 33.333 |
| Brownlow_Medal                  | 33.333 |
| Seven_Network                   | 32.353 |
| Melbourne_Cricket_Ground        | 30     |
| Australian_Bureau_of_Statistics | 30     |
| Special_Broadcasting_Service    | 30     |
| Melbourne_Football_Club         | 30     |
| Fitzroy_Football_Club           | 30     |
| 2001_AFL_season                 | 30     |
+---------------------------------+--------+

Wow! That works unbelievably well. I don't know exactly why, but hey. And why similar[inverse-links-to] works better than similar[links-to], I don't know either! A question that comes to mind is, if we use a larger subset of wikipedia, will these results get better or worse? I suspect better, but not sure.

BTW, this is from:

100*30000/15559125
= 0.19 %

of the total English wikipedia.

And the resulting sw file is here.