Sunday, 17 April 2016

Meta-Heuristics and Universal Power Law-Based Signalling Algorithm

Introduction - Defining the criteria of heuristic and meta-heuristic behaviour:

Heuristic algorithms typically intend to find a good solution to an optimization problem by ‘trial-and-error’ in a reasonable amount of computing time. Here ‘heuristic’ means to ‘find’ or ‘search’ by trials and tallying the hits and misses. There is no guarantee to find the best or optimal solution, though it might be a better or improved solution than an educated guess.

Any reasonably good solution, often suboptimal or near optimal, would be good enough for such problems. Broadly speaking, local search methods are heuristic methods because their parameter search is focused on the local variations, and the optimal or best solution can be well outside this local region. However, a high-quality feasible solution in the local region of interest is usually accepted as a good solution in many optimization problems in practice if time is the major constraint.

Metaheuristic algorithms are higher-level heuristic algorithms. Here, ‘meta-’ means ‘higher-level’ or ‘beyond’, so metaheuristic means literally to find the solution, based on a "endowment", often based on a power law, that creates the appearance of learned behavior. Moreover, like heuristic algorithms, metaheuristic algorithms also include randomization, trial-and-error, process elements aswell. The endowment and randomization behaviour together creates behaviour in the metaheuristic which is greater than the sum of the individual parts, making for non-programmed emergent behaviours. 

Broadly speaking, metaheuristics are considered as higher-level techniques or strategies which intend to combine lower-level techniques and tactics for exploration and exploitation of the huge space for parameter search.

This kind of search can be represented in this way, for example you imagine that you have a particle which travels along a search space landscape function f(x) which contain a shallow hole and a deep hole. 

The deep holes can represent the search space global optimum (i.e. best solution possible for f(x)) and the shallow hole is an approximations of the solution, i.e. a varying local optimum solution of f(x).

The objective of the search procedure across this space should therefore be to get to the particle to the bottom of the global optimum and furthermore stay there. 

If you were to simply exploit an instruction to move a particular direction downhill, depending on where the exploited procedure assumes the best direction lies to obtain the best solution, then it is likely that the particle will fall into a local optima rather than a global optima and will furthermore stay there as the procedure would be designed to find the global optimum by travelling downhill and therefore would not be able to go uphill.

However if you introduce randomization - which could be noise, heat i.e. if you introduced "simulated annealing" so that there is a random motion superimposed on the exploited procedure searching for a hole to go down then with the "right" amount of randomness or simulated annealing, then you can effectively make sure the particle can escape the shallow local optima but cannot escape the deep global optima. 

The "right" amount of randomness introduced to the exploited procedure is determined by some kind of predetermined or learned tuning of randomness introduced to the system so that this global optimization can happen if the search is allowed to run over a long enough period of time. Running the metaheuristic procedure long enough then will allow a global optimum to eventually be found. This is in essence what a metaheuristic algorithm is all about. 

Meta-Heuristics and Universal Power Law-Based Signalling Algorithm:

The most basic meta-heuristic signalling algorithm we can construct is one based on some sort of power law for signalling.

Light signalling can of course be represented by a power law. Light signal transmission, like light in general, obeys the inverse-square law in its propagation through space. That is to say, the intensity of the light, I, is inversely proportional to the square of the distance from the light source.

Therefore the light signal gets weaker and weaker as the distance increases.

The meta-heuristic algorithm therefore must contain a function which monotonically decreases its signalling power between each transceiver node with distance, r, under a discrete light absorption coefficient for the physical medium g.

Under a given physical medium, the Gaussian form of the light intensity is then determined by:

The universal power law signalling function of is then written to be monotonically decreasing:

Where r is the distance between the different nodes.

m is the power law parameter for the signal, for light signals it is m=2.(for signals that use audio, chemical, etc, power laws it will be different)
Where β0 is the emitted signal at r=0

The distance between any two nodes, i and j at Xi and Xj respectively, is the Cartesian distance as follows:

Where Xi,j is the (k)th component of the spatial coordinate Xi of the (i)th node.

Where d is the number of dimensions. 

The time evolution of each individual node, i, over its signalling cycle is the governed by the fact each typical node, i, is coupled to the most intense, i.e. brightest, node it sees, j, by the following equation:

The second term is the signalling term, for most cases in implementation  β0 = 1. This is the physical endowment, the power law, exploitation process in the metaheuristic.

The third term is the term that defines the randomization of the possible propagation path’s taken which uses a randomization parameter, a, which for most cases in implementation is distributed in the domain of [0,1]. This is the heuristic, "trial and error", exploration process in the metaheuristic.

Interestingly, we can write this equation symmetrically as:

Which when presented in terms of node distance reads:

We can then define the signalling Activation Function, A:

Which reduces our equation to a more fundamental signal field representation:

Based on these formulae, we have the corresponding rules:

·         Each node in the field will emit and absorb discrete light signals equally.

·         The signals are proportional to the intensity of the light emitted, which both decrease in proportion to increasing distance between the nodes.

·         For any 2 flashing nodes in the field, the signal absorbed with the highest intensity will induce the strongest coupling

·         If there is no signal detected with higher intensity than any one particular node, it will designate itself as being isolated and signal randomly

·         The light insanity of any signalling node is determined by the landscape of the field, itself determined by the nature of the activation function itself.

Observing Metaheuristic Exploration VS Exploitation in a Search Space:

From the rules derived from the physical parameters used to construct the pre-programmed signalling mechanism we can define that the metaheuristic nature of the signalling algorithm is based the behaviours of exploration and exploitation

Exploration in the metaheuristic algorithm is achieved, in the case of the above signalling algorithm, by the use of randomization which enables the algorithm to have the ability to jump out of any local state of coupling between 2 or more  signalling nodes so as to explore the search for more couplings in the network over a potentially larger area.

Randomization is also used for local search around the current state if the signalling are limited to a local region. When the signalling frequency is long, randomization then allows for exploration of the the search space on a larger scale. Fine-tuning the right amount of randomness and balancing local search and global search are crucially important in controlling the performance of the synchronizing metaheuristic algorithm.

Exploitation is the use of local knowledge of the search and solutions found so far so that new search moves can concentrate on the local regions or neighbourhood where the optimality may be close; however, this local optimum may not be the global optimality. Exploitation tends to use strong local information such as gradients, the shape of the mode such as convexity, and the history of the search process.

Observations, using simulations, of the convergence behaviour of common optimisation algorithms suggests that exploitation tends to increase the speed of convergence, while exploration tends to decrease the convergence rate of the algorithm. On the other hand, too much exploration increases the probability of finding the global optimum, while strong exploitation tends to make the algorithm being trapped in a local optimum.

The relationship between exploration and exploitation both can be seen with a simple implementation of the metaheuristic derived above in the context of sensory exploration and contrastive learning exploitation based on the activation signalling function

The sample code is implemented as:

    for k=1:N  % Generate sensed signals, given that the initial values are instincts to exploit
        z(k,1)=sigmaz2(k,1)*(0.5-randn); % Generate internal signalling system based on activation function A(r)

        y(k,1)=x(k,1)+z(k,1); % Generate input – i.e. an external signal that is sensed - exploration
        Y=[y(k,1); Y(1:(n-1),1)]; % Learn by contrast – i.e. shift regression vector and load in new value - exploitation (or endowment)
        xhat(k,1)=mean(Y); %Get the mean of the contrastive learning procedure

We plot the exploration and exploitation convergence plots for 100, 1000 and 10,000 iterations:

100 iterations:

1,000 iterations:

10,000 iterations:

As seen and discussed, there is a fine balance between the right amount of exploration and the right degree of exploitation. Despite its importance, there has never been any known practical guideline, apart from generic tuning, for this balance as regards to learning behaviour.

It does not seem to be practical, in any sense, to find a guideline based on the balances between exploration and exploitation by simply reading convergence plots alone, which are simply data, which is at least partially recalcitrant as the overall behaviour of the system that follows the algorithm has not itself been pre-programmed but nor is the system based entirely on probability.

Therefore, it makes more sense to think about the behaviour and data we empirically observe using an abstract model of a physical system that reduces the behaviour, otherwise taken for granted, to a simple signalling action principle based on the environmental parameters of the system. We will be exploring the consequences of this further and search for different ways of understanding metaheuristics in the realm of information theory, in particular in the behaviour of systems which do not occur from prior programming.

Next Part:  Integrating the Concept of Meta-heuristics to Neural Networks