cooked
This commit is contained in:
33
EEMLA.tex
33
EEMLA.tex
@@ -26,6 +26,7 @@
|
||||
%
|
||||
\usepackage{fancyhdr}
|
||||
\usepackage{float}
|
||||
\usepackage{amsmath}
|
||||
\pagestyle{fancy}
|
||||
\lhead{}
|
||||
\chead{}
|
||||
@@ -143,6 +144,38 @@ This research aims to investigate the potential integration of rule-based tensor
|
||||
\textbf{RQ:} How can deterministic rule-based tensor mutations be embedded within LLM architectures to enable more accurate and efficient mathematical operations?
|
||||
\end{quote}
|
||||
|
||||
The significance of this line of inquiry lies in its potential to address a fundamental limitation of current generative AI systems like ChatGPT, Anthropic's Claude, etc. While specialized numeric compute systems exist (e.g. RAG with Wolphram Alpha), they operate independently of the SIMD, low-latency systems of LLMS, leading to sizable latency in communication. This is especially prevalent in workflows involving both mathematical and linguistic reasoning. The integration of computational resources required for such workflows within LLMs could substantially reduce the computational resources required for complex tasks that involve both natural and language processing and mathematical reasoning.
|
||||
|
||||
This infestation focuses specifically on the following mathematical operations:
|
||||
|
||||
\begin{itemize}
|
||||
\item Basic arithmetic (addition, subtraction, multiplication, division)
|
||||
\item Matrix Operations (multiplication, inversion, determinant)
|
||||
\item Binary Opertaions (XOR, AND, NAND, left shift, right shift, OR, complement)
|
||||
\item Array Operations (array sum, as well as the mean, median, mode, standard deviation, variance, and other single variable metrics of a data set)
|
||||
\end{itemize}
|
||||
|
||||
Furthermore, as previously mentioned, the scope of the experiment is limited to implementing these operations within existing open source LLM architectures of moderate scale (1-7 Billion Parameters) as opposed to developing entirely new architectures. This is both because it is desirable to eliminate all sources of subject variability to help ascertain statistical significance, and because of the readily available weights. Namely, the target model for this paper is the Llama-3-3B model, due to its lightweight and fully open source nature.
|
||||
|
||||
{\raggedright \normalsize \textbf{Related Works}}
|
||||
|
||||
Prior research has explored various approaches to improving mathematical reasoning capabilities in LLMs, including specialized training on mathematical corpora \parencite[4]{ahn2024largelanguagemodelsmathematical}. Additionally, other work has been done to fine-tune responses to mathematically demanding prompts using reinforcement learning \parencite[1]{cobbe2021trainingverifierssolvemath}. Others still have tried to add secondary inferences or \textit{Verifiers} to determine accuracy of model outputs when containing computations \parencite[2]{cobbe2021trainingverifierssolvemath}. The immediately evident disadvantage to these approaches is the need for extended training cycles and copious amounts of new corpora. Furthermore, training corpora are required to be similar to testing samples since the strategies outlined above fail to grant models a mechanical performance edge.
|
||||
|
||||
{\raggedright \normalsize \textbf{Methodology}}
|
||||
$$
|
||||
\begin{aligned}
|
||||
&\text{Define a rule-based function } \mathcal{R} : \mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{n \times d} \text{ such that:} \\
|
||||
&\mathcal{R}(\mathbf{X})_i =
|
||||
\begin{cases}
|
||||
\mathbf{X}_i + \mathbf{X}_{i+1}, & \text{if rule is "sum with right neighbor"} \\
|
||||
\det(\mathbf{X}_{i:i+2, j:j+2}), & \text{if rule is "3$\times$3 determinant over submatrix"} \\
|
||||
\mathbf{X}_i, & \text{otherwise}
|
||||
\end{cases} \\
|
||||
&\text{Then pass } \mathcal{R}(\mathbf{X}) \text{ into the modified attention layer: } \mathbf{Z} = \text{Attention}(\mathcal{R}(\mathbf{X}))
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
|
||||
%%%%Works cited
|
||||
\newpage
|
||||
\begin{center}
|
||||
|
||||
Reference in New Issue
Block a user