why did i ever chose to do IB
This commit is contained in:
@@ -175,6 +175,12 @@ $$
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
In the above example formulation of a fixed-index mutator function coupled with a self-attention layer for filtering noise from higher confidence inputs, specific transformations are applied at specific indices of the input matrix. Here, $\mathbf{X}\in\mathbb{R}^{n\times d}$ represents the input tensor for a given layer. $d$ is the embedding dimensionality, where the function $\mathcal{R}$ applies specific, discontinuous, logic based on identified patterns of the input located at specific indices. The mutated embedding is then processed through the standard attention mechanism to produce the output representation $\mathbf{Z}$. This output is then concatenated with the parallel layer(s) of the original neural network, allowing for an expansion of the network's capacity for operations without need for compromising on the token-based original throughput.
|
||||
|
||||
{\raggedright \normalsize \textit{Location-Based Rule Selection}}
|
||||
|
||||
A critical aspect of the methodology is the mechanism responsible for determining which rule is applicable at each position within the input matrix $\mathbf{X}$. Rather than relying on stochastic selection, this approach implements a deterministic, location-based rule selection strategy that leverages the contextual information encoded within the model's representations. A major advantage of this fixed-index approach is the minimization of dynamic surfaces in the model's cost function, thereby reducing the amount of noise in output, as well as reducing the required amount of training as over fitting is a non-issue without randomness.
|
||||
|
||||
|
||||
%%%%Works cited
|
||||
\newpage
|
||||
|
||||
Reference in New Issue
Block a user