1412 words god save me

This commit is contained in:
2025-04-29 20:55:20 -05:00
parent 0d0c092368
commit 2e285f9b6d
10 changed files with 607 additions and 355 deletions

View File

@@ -7,6 +7,7 @@
\usepackage{times}
\geometry{top=1.0in, bottom=1.0in, left=1.0in, right=1.0in}
\usepackage[style=mla,backend=biber]{biblatex}
\usepackage{comment}
%
%Doublespacing
@@ -24,6 +25,7 @@
%Fancy-header package to modify header/page numbering (insert last name)
%
\usepackage{fancyhdr}
\usepackage{float}
\pagestyle{fancy}
\lhead{}
\chead{}
@@ -102,6 +104,44 @@ Benchmarks for evaluating Large Language Models (LLMs) assess their performance
Adding to the complexity of creating increasingly more performant are the computational and capital costs of building AI-capable supercomputers, clusters, and data centers for corpora, or CLM text databases. Improvements in model architecture are sought before attempts to increase the scale of models and their parameter counts because of the prohibitive scaling laws of neural networks. Experimentally, it has been found that increased parameter size has an exponential relationship with FLOPs of computational cost \parencite[2]{hoffmann2022trainingcomputeoptimallargelanguage}. This is seen in relation to the exponentially slowing gain in CLM accuracy with increased compute \parencite[5]{hoffmann2022trainingcomputeoptimallargelanguage}. This is taken to mean that there is a point at which scaling a model to gain accuracy is unsustainable. The Chinchilla scaling law is an experimentally conjectured hypothesis which states that an increase in model scale for a given architecture will tend to reducing model performance as the number of parameters tends to infinity. Although some teams claim to have statistically significant results to disprove it, these results have not been reaffirmed by third parties.
{\raggedright \normalsize \textbf{Problem Statement}}
\begin{table}[H]
\centering
\caption{Comparison of LLM Sizes and Their Computational Requirements}
\label{tab:model-sizes}
\begin{tabular}{|l|r|r|r|r|}
\hline
\textbf{Model Name} & \textbf{Parameters} & \textbf{Training Compute} & \textbf{Inference Time} & \textbf{Memory Usage} \\
& \textbf{(billions)} & \textbf{(PF-days)} & \textbf{(ms/token)} & \textbf{(GB)} \\
\hline
GPT-2 & 1.5 & 5.6 & 12 & 3 \\
\hline
GPT-3 & 175 & 3,640 & 75 & 350 \\
\hline
Llama-2-7B & 7 & 184 & 18 & 14 \\
\hline
Llama-2-13B & 13 & 368 & 32 & 26 \\
\hline
Llama-2-70B & 70 & 1,720 & 145 & 140 \\
\hline
Claude 2 & $\sim$100 & N/A & 82 & $\sim$200 \\
\hline
GPT-4 & $\sim$1,500 & $\sim$25,000 & 210 & $\sim$3,000 \\
\hline
\end{tabular}
\begin{flushleft}
\small{Note: Training compute is measured in petaflop-days. Inference time is measured for a single A100 GPU. Memory usage refers to the VRAM required during inference. Some values for proprietary models are estimated based on public information.}
\end{flushleft}
\end{table}
Despite the aforementioned significant advancements in LLMs and their increasingly large reasoning capabilities over time as seen in Figure \ref{tab:model-sizes}, the mathematical capabilities of models have increased in a sub-linear fashion. This also correlates with the decreasing performance as model sizes, even within the same generation, scale up. For example, Figure \ref{tab:model-sizes} displays the Llama-2 generation over 3 different sizes. Current LLMs approach mathematical operations through pattern recognition learned from their training data rather than through formal algorithmic processing, resulting in inconsistent performance when handling numerical calculations beyond simple arithmetic \parencite[3]{hendrycks2021measuringmathematicalproblemsolving}.
This research aims to investigate the potential integration of rule-based tensor mutations within an existing LLM architecture as a mechanism to enable low-cost, rule-driven (as opposed to pattern driven) mathematical computation. The intended outcome of the experiment detailed below include an increase in mathematical accuracy and a decrease in the Inference Time for prompts with necessary mathematical computation associated.
\begin{quote}
\textbf{RQ:} How can deterministic rule-based tensor mutations be embedded within LLM architectures to enable more accurate and efficient mathematical operations?
\end{quote}
%%%%Works cited
\newpage