Exercise "Artificial Neural Networks"

(6 points)
In this exercise we construct a simplest artificial neural network which will be trained to interpolate tabulated functions.
It is an ordinary three-layer neural network with one neuron in the input layer, several neurons in the hidden layer, and one neuron in the output layer,
```
                           [hidden neuron]
                          /               \
                         /                 \
                        /                   \
x  --->[identity neuron]---[hidden neuron]---[summation neuron]--->  y=F(x)
                        \                   /  
                         \                 /  
                          \               /  
                           [hidden neuron]
```
The input neuron is an identity neuron: it simply sends the input, a real number x, to all hidden neurons.
The output neuron is a summation neuron: it sums the outputs of the hidden neurons and sends the result to the output.
The hidden neurons are ordinary neurons: the neuron number i transforms its input signal, x, into the its output signal, y, as
y=f((x-a_i)/b_i)*w_i,
where f is the activation function (the same for all hidden neurons) and where a_i, b_i, w_i are the parameters of the neuron number i.
The activation function can be
- a Gaussian wavelet, f(x)=x×exp(-x²),
- a Gaussian, f(x)=exp(-x²),
- a wavelet, f(x)=cos(5x)×exp(-x²),
or any another suitable function.
The whole network then functions as one big non-linear multi-parameter function y=F_p(x), where p={a_i,b_i,w_i}_i=1..n is the set of parameters of the network.
Given the tabulated function, {x_k,y_k}_k=1..N, the training of the network consists of tuning its parameters to minimize the deviation
δ(p)=∑_k=1..N(F_p(x_k)-y_k)²,
which amounts to minimization of the deviation δ(p) in the space of the parameters of the network. This minimization can be done with your own quasi-newton minimization routine or a minimization routine from GSL.
A C-structure to hold this network could be in the form
typedef struct {int n; double (*f)(double); gsl_vector* data;} ann;
where n is the number of neurons in the hidden layer, f is the pointer to the activation function, and gsl_vector* data keeps the parameters {a_i, b_i, w_i}_i=1..n.
You should build the following functions,
```
ann* ann_alloc(int number_of_hidden_neurons, double(*activation_function)(double));
void ann_free(ann* network);
double ann_feed_forward(ann* network, double x);
void ann_train(ann* network, gsl_vector* xlist, gsl_vector* ylist);
```
(3 points) Modify the previous exercise such that the network, after training, could also approximate the derivative and the anti-derivative of the tabulated function.
(1 point) Implement an artificial neural network that can be trained to approximate a solution to the differential equation
Φ[y(x)]≡Φ(y'',y',y,x)=0,
(where Φ is generally a non-linear function of its arguments) on an interval [a,b] with the boundary condition at a given point 'c',
y(c)=y_c, y'(c)=y'_c,
where c∈[a,b] and y_c and y'_c are given numbers.
The cost function to minimize might be
δ(p)=∫_a^b|Φ[F_p,x]|dx +|F_p(c)-y_c|(b-a) +|F_p'(c)-y'_c|(b-a) .
or
δ(p)=∫_a^b|Φ[F_p,x]|²dx +|F_p(c)-y_c|²(b-a) +|F_p'(c)-y'_c|²(b-a) .
(alternative 1 point exercise) Build a network to recognize the 3x5 font digits
```
  x  xxx  xxx  x x  xxx  xxx  xxx  xxx  xxx  xxx
 xx    x    x  x x  x    x      x  x x  x x  x x
  x  xxx  xxx  xxx  xxx  xxx    x  xxx  xxx  x x
  x  x      x    x    x  x x    x  x x    x  x x
  x  xxx  xxx    x  xxx  xxx    x  xxx  xxx  xxx
```
displayed by a dot-matrix-display of a larger size, say, 5x7.
There are then 5x7=35 input neurons, several hidden neurons, and 10 output neurons. Each input neuron gets the brightness of one of the pixels of the display. The output neurons output the probability that the display displays the corresponding digit.
The training data might consist of a set of 3x5 digits positioned at different locations on a 5x7 matrix display, like
```
.....  .....  .x...
.....  ..xxx  xx...
xxx..  ....x  .x...
..x..  ..xxx  .x...
xxx..  ....x  .x...
..x..  ..xxx  .....
xxx..  .....  .....
```
The application data might then be a set of the 3x5 digits positioned randomly on a 5x7 display where one of the pixels of the display is randomly distorted.