Application toward Roadmap¶
Theory and Fundamentals¶
Principles¶
Machine Learning (ML) is an algorithm. An Artificial Neural Network (ANN) is a computer architecture which can be (basics of ML and AI.ipynb:Table 1), using data as input, instead of depending on explicit programming..
An ANN is said to Learn, $\mathcal{L}$ from experience $E$ with respect to some task $T$, according to a performance Measure $P$
$IF$ $P(T)$ improves $\in E$ $\Rightarrow$ $\mathcal{L}(E)$
$E$, Experience $\Rightarrow$ the aquisition of new Task Data, $E(T(Data)_{new})$
Currently,
$AI$ := ANN trained by ML to perform a task $T$
Where $T$ is some algorithmic process.1. Semantics
$\quad\quad$Moden AI system cannot currently handle ambiguity, subjectiveness and common-sense (cultural/social sense) oriented tasks, i.e non-algorithmic tasks
2. Power Consumption
TOTAL ENERGY CONSUMPTION = ENERGY TO TRAINING + ENERGY FOR INFERENCE
$$E_{total} = ( P_{hardware}\cdot t_{train}) + ( P_{inference}\cdot t_{inference}\cdot N_{inferences}) $$ $\quad$where,$\quad\quad P_{hardware}$ = total power consumption of the hardware
$\quad\quad t_{traim}$ = training time
$\quad\quad P_{inference}$ = power consumption per inference
$\quad\quad t_{inference}$ = the time it takes to perform a single inference
$\quad\quad N_{inference}$ = total number of many tinferences performed
3. Training Time
$\quad\quad$ For large AI models, training can last from days to months $\Rightarrow$ Very ( energy expensive process.
4. Training Data and Storage
$\quad\quad$Large AI models required huge (Tera to Peta bytes) amounts of highly specific and structured data, spanning as much of the experience $(E)$ space as possibile.
$\quad\quad$The storage of this data is no easy task.
ANN Structure and Organization¶
An ANN is a simplified simulation (model) of the structure and function of the brains’ neural networks. ANN are implemented in software, as an organized arrangements of interconnected nodes (Cells) arranged in Layers. Fjodor van Veen (2017,2016) from the Asimov Institute has compiled from primary literature sources diagramatic representations of the structures and function of many common Network and Cells architectures. His work shows that there are only a few basic cell types used in constructing various neural network architectures.
Basic Cell Perceptron/Feed-Forward Network¶
Figure 2 – Skematic representation of a Perceptron, equivalent to a single node feed forward ANN.
The fundamental unit in an ANN is the Basic Cell = Feed Forward Network = Node = Perceptron (Figure 2). The basic cell aims to mimic the function of the biological neuron in the brain. Modern ANN architectures are considered fully-connected even though with introduction of layers $\rightarrow$ organized into modules and reduces connectivity $\rightarrow$ higher-order functions.
The connection between a presynaptic cell in the presynaptic layer, and a post-synaptic cell in a post-synaptic layer, are labled by indicies notation
$i^{th}$ Connection,
of $j^{th}$ Cell (post-synaptic),
in $L^{th}$ Layer (post-synaptic),
from $k^{th}$ Cell in $(L-1)$ Layer (pre-synaptic)
$S^{L}_{i,j}$, synapse strength of a connection $i$ to cell $j$, in layer $L$ = product of I/O value $(x^{L}_{i,j})$, into $i^{th}$ connection to $j^{th}$ cell, in $L^{th}$ layer AND connection weight $(w^{L}_{i,j})$. This weight modulelates the importance of the connection. When the ANN is initially configured, the default weight for each connections is often just a random Real number (negative, positive number of any size, or zero). Unless its imperative to specify a specific connection, the principles apply generally, and thus index notation is not going to be explicitly shown. Except of $i$ as to designate its a post-synaptic cell/connection $$S_i = x_i \cdot w_i \quad\quad\quad (1)$$
Cells integrates all I/O values into a weighted sum = net input $(z^{L}_{j})$, of all connections to cell $j$, in layer $L$ with an associated connection bias, $b^{L}_j$ for the cell, which is a constant similar in magnitude to the I/O values
$$z_{j} = \sum_{i=1}^{n} S_{i} + b_j = \sum_{i=1}^{n} (x_{i}\cdot w_{i}) + b_{j} \quad\quad\quad (2)$$
The net input is translated to an output value by an Activation Function $(F)$. $F$ translates/maps the net input subsequent I/O value $y^{L+1}_{i,j}$ $\rightarrow$ I/O value to a connection with a cell in the next layer of the network $\rightarrow$ input propagation
$$ F(z_i) = y^{L+1}_{i,j} \quad\quad\quad\quad (3)$$