Thursday 4 February 2016

introducing the if-then machine

I think this thing is going to be seriously interesting! Let me define it, and then I will try to explain it:
seq |node 1: 1> => sp1-1
seq |node 1: 2> => sp1-2
seq |node 1: 3> => sp1-3
...
seq |node 1: n> => sp1-n

then |node 1: *> => sp2
next (*) #=> then drop-below[t] similar-input[seq] |_self>
where {sp1-k} and sp2 are all superpositions. And we call this collection of rules the if-then machine.

This has the useful and cool property that if any one of the sp1-k superpositions matches close enough to the input (we can call this a pooling of the sp1-k superpositions), then it spits out sp2 as a kind of prediction. t is a parameter that determines how close that match has to be for there to be any output. If the input is not close enough, then "next input-sp" just returns the empty ket |>. In the world of mathematics where we want things to be black and white and true and false, we might want t = 0.99 or something. If we want to be generous and tolerant, then maybe t = 0.6 (ie, 60% match is good enough). Or we could make it dynamic. If this particular if-then machine "fires too much" then increase t. If it "fires too little" then decrease it. The exact details of that is something we can work out later.

BTW, I don't think I have defined similar-input[op] just yet. It is very close to its brother similar[op] ket (see eg here). Indeed, they are only a couple of lines of python apart.
similar[some-op] |x>
returns a list of kets that have superpositions defined with respect to some-op, sorted by their similarity to the superposition defined by "some-op |x>".

In contrast:
similar-input[some-op] some-sp
returns a list of kets that have superpositions defined with respect to some-op, sorted by their similarity to "some-sp".

In context with our if-then machine defined above, this means "seq" is the operator that determines which "layer" of if-then machines to match our input sp with. If seq is not defined for a series of nodes, then similar-input[seq] won't compare the input against those nodes. Say seq2 is instead defined for those nodes, then similar-input[seq2] will compare the input against those. To see what nodes are in a particular "layer", in the console type "rel-kets[seq]" or "rel-kets[seq2]".

Next. In general there is no restriction on the superpositions sp1-k. But sometimes it is useful for them to be somewhat "orthogonal". Though I don't mean that in a strict maths sense of vector dot product equals zero, or similar. I mean in the sense that if one of sp1-k matches input-sp, then ideally we don't want any of the others to do so. Because if they did, we would get a double counting effect.

Let me try to show this double counting effect. Say "node 1: 3" matches input-sp with 80% and "node 1: 17" matches with 92% (and the rest barely at all), then we would have:
drop-below[0.75] similar-input[seq] input-sp
= 0.92|node 1: 17> + 0.8|node 1: 3>

then drop-below[0.75] similar-input[seq] input-sp
= then (0.92|node 1: 17> + 0.8|node 1: 3>)
= 0.92 then |node 1: 17> + 0.8 then |node 1: 3>
= 0.92 sp2 + 0.8 sp2
= 1.72 sp2
And note the coeff of sp2 is greater than 1. So what can we do about it? One possibility, and probably the one I will go for, is average-categorize. Though probably a tweak of that, but it is close to what I want. Basically, if the superpositions are similar enough with respect to simm(), add them, if not, then keep them distinct. But I should note that sometimes we do want this double counting, or triple counting, or whatever, depending on how many matches there are above the t threshold.

I guess now I should give an example. How about a simple 3 if-then machine example:
----------------------------------------
|context> => |context: if-then machine>

seq |node 1: 1> => |X>
seq |node 1: 2> => |Y>
seq |node 1: 3> => |X> + |Y>
then |node 1: *> => |X or Y>

seq |node 2: 1> => |X> + |Y>
then |node 2: *> => |X and Y>

seq |node 3: 1> => |X> + |Y> + |Z>
then |node 3: *> => |X and Y and Z>
----------------------------------------
Now, test it's properties. First, without the "then" operator, and the drop-below[t] operator:
sa: similar-input[seq] |Y>
|node 1: 2> + 0.5|node 1: 3> + 0.5|node 2: 1> + 0.333|node 3: 1>

sa: similar-input[seq] (|X> + |Y>)
|node 1: 3> + |node 2: 1> + 0.667|node 3: 1> + 0.5|node 1: 1> + 0.5|node 1: 2>

sa: similar-input[seq] (|X> + |Y> + |Z>)
|node 3: 1> + 0.667|node 1: 3> + 0.667|node 2: 1> + 0.333|node 1: 1> + 0.333|node 1: 2>
Set t = 0.8 and repeat:
sa: drop-below[0.8] similar-input[seq] |Y>
|node 1: 2>

sa: drop-below[0.8] similar-input[seq] (|X> + |Y>)
|node 1: 3> + |node 2: 1>

sa: drop-below[0.8] similar-input[seq] (|X> + |Y> + |Z>)
|node 3: 1>
Finally, apply the "then" operator:
sa: then drop-below[0.8] similar-input[seq] |X>
|X or Y>

sa: then drop-below[0.8] similar-input[seq] |Y>
|X or Y>

sa: then drop-below[0.8] similar-input[seq] |Z>
|>

sa: then drop-below[0.8] similar-input[seq] (|X> + |Y>)
|X or Y> + |X and Y>

sa: then drop-below[0.8] similar-input[seq] (|X> + |Y> + |Z>)
|X and Y and Z>
Now, once more, this time using t = 0.6
sa: then drop-below[0.6] similar-input[seq] |X>
|X or Y>

sa: then drop-below[0.6] similar-input[seq] |Y>
|X or Y>

sa: then drop-below[0.6] similar-input[seq] |Z>
|>

sa: then drop-below[0.6] similar-input[seq] (|X> + |Y>)
|X or Y> + |X and Y> + 0.667|X and Y and Z>

sa: then drop-below[0.6] similar-input[seq] (|X> + |Y> + |Z>)
|X and Y and Z> + 0.667|X or Y> + 0.667|X and Y>
OK. That is a pretty simple example! But in practice this thing is very general and very powerful. Mostly because superpositions can be almost anything.

Now, on to the really interesting bit. I propose that the if-then machine roughly corresponds to a single neuron. It certainly has a lot more power than a support vector machine (for example, the input does not need to be linearly separable), which also makes that claim. The if-then machine is itself very powerful, which means an entire brain of them would be unimaginably powerful.

Some notes:
1) Jeff Hawkins and the HTM (hierarchical temporal memory) guys claim SDR's (sparse distributed representations) are the brains data-type. Superpositions are more general than SDR's since they can have coeffs other than {0,1}, so I in turn claim superpositions are the brains data-type.
2) we can tweak our if-then machine in various ways. We could put in sigmoids, or an inhibition layer so that nearby neurons inhibit their neighbours and so on. Plus probably other tweaks.
3) the current parser does not yet handle learning superpositions rules. eg, in this case: "next (*) #=> ...". Hence I had to type the full "then drop-below[0.8] similar-input[seq]". If we could handle them our examples would instead look like this:
sa: next (*) #=> then drop-below[0.8] similar-input[seq] |_self>
sa: next |X>
sa: next |Y>
sa: next |Z>
sa: next (|X> + |Y>)
sa: next (|X> + |Y> + |Z>)
4) the above if-then machine used a static learn rule: "then |node 1: *> => ...". We could instead use some dynamic rule: "then |node 1: *> #=> some-action". NB: the symbol for a stored-rule "#=>".
5) our if-then machine is very close to my claimed "general supervised pattern recognition algo".
6) we should be able to use the if-then machine to learn sequences. I need to look into that, but I'm pretty sure of it. And to find the k'th element in the learned sequence we would do something like: "next^k SP". Something like this I suppose:
seq |node 1: 1> => sp1
then |node 1: *> => sp2

seq |node 2: 1> => sp2
then |node 2: *> => sp3

seq |node 3: 1> => sp3
then |node 3: *> => sp4

seq |node 4: 1> => sp4
then |node 4: *> => sp5

seq |node 5: 1> => sp5
then |node 5: *> => sp6
7) we could then pool a sequence into a single output:
seq2 |node P: 0> => SP
seq2 |node P: 1> => next SP
seq2 |node P: 2> => next^2 SP
seq2 |node P: 3> => next^3 SP
...
seq2 |node P: n> => next^n SP
then |node P: *> => |the-SP-sequence>
8) we need some mechanism to find the desired superpositions.
9) if not obvious, I should mention the if-then machine is not restricted to Boolean values. It can be considered a generalization, though if you set t close enough to 1, then it emulates a Boolean world.

Update: a bigger slightly more interesting example:
-- define our context:
sa: context bigger if-then machine

-- define our object -> superposition operator:
sa: ngrams |*> #=> letter-ngrams[1,2,3] |_self>

-- define our 2 if-then machines:
sa: seq |node 1> => ngrams |the cat sat on the mat>
sa: then |node 1> => |the cat sat on the mat>
sa: seq |node 2> => ngrams |the man on the moon>
sa: then |node 2> => |the man on the moon>

-- see what we have:
sa: dump
----------------------------------------
|context> => |context: bigger if-then machine>

ngrams |*> #=> letter-ngrams[1,2,3] |_self>

seq |node 1> => 5|t> + 2|h> + 2|e> + 5| > + |c> + 3|a> + |s> + |o> + |n> + |m> + 2|th> + 2|he> + 2|e > + | c> + |ca> + 3|at> + 2|t > + | s> + |sa> + | o> + |on> + |n > + | t> + | m> + |ma> + 2|the> + 2|he > + |e c> + | ca> + |cat> + 2|at > + |t s> + | sa> + |sat> + |t o> + | on> + |on > + |n t> + | th> + |e m> + | ma> + |mat>
then |node 1> => |the cat sat on the mat>

seq |node 2> => 2|t> + 2|h> + 2|e> + 4| > + 2|m> + |a> + 3|n> + 3|o> + 2|th> + 2|he> + 2|e > + 2| m> + |ma> + |an> + 2|n > + | o> + 2|on> + | t> + |mo> + |oo> + 2|the> + 2|he > + 2|e m> + | ma> + |man> + |an > + |n o> + | on> + |on > + |n t> + | th> + | mo> + |moo> + |oon>
then |node 2> => |the man on the moon>
----------------------------------------

-- now put it to use (we don't have drop-below[t], so effectively t = 0)
-- though first define a short-cut for our if-then machine:
sa: guess-phrase |*> #=> then similar-input[seq] ngrams |_self>

sa: guess-phrase |the >
0.381|the cat sat on the mat> + 0.37|the man on the moon>

sa: guess-phrase |the cat>
0.548|the cat sat on the mat> + 0.37|the man on the moon>

sa: guess-phrase |the cat sat>
0.717|the cat sat on the mat> + 0.356|the man on the moon>

sa: guess-phrase |the cat sat on the>
0.862|the cat sat on the mat> + 0.544|the man on the moon>

-- NB: I deliberately chose "the cat sat on the moon" to lift up the probability of "the man on the moon":
sa: guess-phrase |the cat sat on the moon>
0.866|the cat sat on the mat> + 0.675|the man on the moon>

sa: guess-phrase |the man on the>
0.805|the man on the moon> + 0.602|the cat sat on the mat>

-- now with the exact phrases:
sa: guess-phrase |the cat sat on the mat>
1.0|the cat sat on the mat> + 0.59|the man on the moon>

sa: guess-phrase |the man on the moon>
1.0|the man on the moon> + 0.59|the cat sat on the mat>
So, you have to read the coeffs above carefully, but if you do, you will see it works really quite well. Much better than a Boolean system of logic.

No comments:

Post a Comment