2512 words, along with related works section
This commit is contained in:
@@ -21,7 +21,7 @@
|
||||
\IEEEauthorblockA{\textit{dept. name of organization (of Aff.)} \\
|
||||
\textit{name of organization (of Aff.)}\\
|
||||
City, Country \\
|
||||
email address or ORCID}
|
||||
krishna@ayyalasomayajula.net}
|
||||
}
|
||||
|
||||
\maketitle
|
||||
@@ -48,6 +48,38 @@ The proposed system modifies the Llama 3 3B model, an open-weight transformer, t
|
||||
|
||||
This work contributes to the broader discourse on integrating symbolic computation into neural architectures. Prior efforts in neural-symbolic computing have explored symbolic regression, logic programming over neural graphs, and reinforcement learning for tool use \cite{wang2024neuralsymbolicoverview}. Unlike these approaches, our method does not require training the model to learn mathematical operations; instead, it injects these operations at runtime within the forward pass of inference. This design minimizes the computational overhead associated with training while maximizing inference-time efficiency.
|
||||
|
||||
|
||||
\section{Related Works}
|
||||
Mathematical reasoning in artificial intelligence is broadly categorized into two complementary paradigms: \textit{symbolic computation} and \textit{statistical pattern learning}. Symbolic computation refers to the manipulation of mathematical objects using discrete logic, such as arithmetic operations, algebraic simplifications, or equation solving. These processes are deterministic, meaning that given the same inputs, they yield the same outputs independent of statistical variation. In contrast, statistical pattern learning, as embodied by neural networks, involves learning probabilistic relationships between tokens or symbols through exposure to large datasets. While statistical learning captures distributional patterns across language, it does not inherently encode the rules of mathematics that govern the manipulation of numbers and expressions.
|
||||
|
||||
Historically, symbolic artificial intelligence systems such as theorem provers, expert systems, and computer algebra systems (e.g., Mathematica, SymPy) have excelled at mathematical reasoning due to their reliance on explicit rule sets and logic engines. These systems require handcrafted rules but offer precise, explainable solutions. Neural networks, including modern large language models, learn representations of symbols as continuous vectors in high-dimensional spaces, enabling them to generate coherent text and recognize syntactic patterns. However, without explicit rules or external reasoning engines, their mathematical capabilities remain fragile and reliant on memorized patterns rather than systematic reasoning. Bridging the gap between these paradigms has become a critical area of research in neural-symbolic computing.
|
||||
|
||||
Efforts to improve mathematical competence in language models generally fall into three categories. The first is \textit{data-centric approaches}, where models are fine-tuned on curated datasets containing mathematical problems, equation patterns, and arithmetic exercises. While this improves recall of memorized problem structures, it does not enable novel symbolic manipulation. The second is \textit{tool-augmented inference}, where models are coupled with external symbolic engines like Wolfram Alpha or SymPy at runtime. These tools enable accurate computation but introduce latency, architectural complexity, and reliance on external dependencies. The third is \textit{architectural modification}, where symbolic components are embedded directly into the model’s computational graph. This approach aims to enable the model to compute symbolically during inference, preserving end-to-end differentiability and eliminating external dependencies.
|
||||
|
||||
Several conventions have emerged in the study of neural mathematical reasoning. Researchers distinguish between \textit{in-context learning} of symbolic patterns (where a model memorizes examples during pretraining), \textit{emergent reasoning} (where generalization arises without explicit training on mathematical tasks), and \textit{symbolic execution}, where operations follow deterministic pathways independent of model weights. Additionally, evaluations often distinguish between \textit{single-step} arithmetic, such as evaluating ``3 + 5,'' and \textit{multi-step} problems, such as solving algebraic expressions or nested equations. Performance on benchmarks like MATH~\cite{hendrycksmath2021} and GSM8K has revealed that while LLMs handle natural language problem descriptions well, they frequently err in the computation stage, demonstrating their probabilistic nature.
|
||||
|
||||
Thus, the challenge is not simply a matter of increasing dataset size or model parameters but rethinking how computation is performed within neural networks. Approaches like program synthesis, intermediate variable reasoning, and explicit mathematical instruction tuning have made progress but remain constrained by the probabilistic nature of neural inference. Embedding deterministic operations directly into the model’s inference pathways represents a fundamentally different approach. Instead of predicting the answer token by token, the model can deterministically compute intermediate results within its tensor operations. This paper contributes to this emerging direction by proposing a mechanism for rule-based tensor mutations applied at specific locations within a transformer’s multi-layer perceptron (MLP) sub-blocks, enabling precise symbolic computation without external tools or fine-tuning.
|
||||
|
||||
|
||||
The gap between probabilistic language modeling and deterministic symbolic reasoning has been a persistent challenge in the development of large language models (LLMs). Hendrycks et al.~\cite{hendrycksmath2021} introduced the MATH dataset, a large-scale benchmark designed to assess symbolic problem-solving abilities in neural networks. Their results indicated that pretrained LLMs—even those fine-tuned on mathematical content—frequently fail to correctly solve algebraic expressions, arithmetic chains, and multi-step symbolic equations. These failures highlight that while LLMs excel at reproducing syntactic patterns observed during training, they do not inherently perform symbolic manipulation, instead relying on probabilistic co-occurrence statistics.
|
||||
|
||||
Ahn et al.~\cite{ahn2024largelanguagemodelsmathematical} further explored this discrepancy, identifying key bottlenecks in the way LLMs generalize mathematical concepts. Their survey outlines how token-level models struggle with operator precedence, recursive computations, and intermediate variable handling. They observe that, unlike humans who approach mathematics through compositional reasoning and intermediate abstractions, LLMs tend to memorize shallow patterns from training data. The authors emphasize the need for architectural interventions that can separate symbolic execution from probabilistic context modeling—a gap that this paper's rule-based mutation pathways directly address.
|
||||
|
||||
While one tempting solution is to scale models larger, Besiroglu et al.~\cite{besiroglu2024chinchillascalingreplicationattempt} provide evidence that such scaling has diminishing returns. Their attempt to replicate the Chinchilla scaling laws confirms that increases in model size and training data improve overall perplexity but fail to proportionally improve performance on arithmetic tasks. This suggests that arithmetic reasoning is not merely a data-scaling problem but a fundamental architectural shortcoming. Their work motivates alternative solutions beyond brute-force parameter expansion, such as modifying the internal computation pathways of transformer blocks.
|
||||
|
||||
The broader neural-symbolic learning community has investigated ways to integrate explicit symbolic reasoning into neural networks. Besold et al.~\cite{besold2017neuralsymboliclearningreasoningsurvey} categorize these approaches into external symbolic reasoning engines and embedded symbolic layers. External engines, such as Prolog interpreters or SMT solvers, provide high reasoning accuracy but introduce significant inference-time latency and disrupt the end-to-end differentiable flow. Embedded symbolic modules attempt to perform symbolic operations within the neural model itself but face challenges aligning symbolic operations with gradient-based optimization. This paper follows the embedded approach, but bypasses gradient concerns by employing fixed rule-based operations during the forward pass, allowing symbolic computation to coexist with trainable layers.
|
||||
|
||||
Program-aided models offer another perspective. Gao et al.~\cite{gao2023palprogramaidedlanguagemodels} proposed PAL, where language models generate executable Python code to solve mathematical problems. By offloading arithmetic and logical tasks to external interpreters, PAL improves accuracy on formal reasoning benchmarks. However, this introduces runtime inefficiencies and dependency on non-neural components. Unlike PAL, our work proposes symbolic operations that are computed directly on GPU tensor cores as part of the LLM's forward pass, avoiding context switches and preserving inference latency.
|
||||
|
||||
Fine-tuning techniques remain a popular method for improving mathematical accuracy. Xu et al.~\cite{xu2024chatglmmathimprovingmathproblemsolving} introduced ChatGLM-Math, a pipeline where the model critiques its own mathematical outputs and refines them iteratively. While effective, this process requires task-specific fine-tuning, increasing both training and inference costs. Moreover, Petruzzellis et al.~\cite{petruzzellis2024assessingemergentsymbolicreasoning} showed that even when fine-tuned, LLaMA models exhibit inconsistent symbolic reasoning abilities, with success rates highly dependent on input complexity and dataset familiarity. This inconsistency suggests that fine-tuning alone cannot fully bridge the symbolic reasoning gap.
|
||||
|
||||
These works converge on a common insight: language models can pattern-match symbolic expressions but lack internal mechanisms for performing symbolic operations themselves. Existing solutions either rely on fine-tuning to statistically approximate symbolic outcomes or delegate computation to external engines. In contrast, this paper proposes embedding deterministic, rule-based tensor mutations directly into the model’s internal linear layers. By masking specific tensor regions, applying deterministic arithmetic functions—such as addition, subtraction, multiplication, division, exponentiation, bitwise logic, and shifts—and reintegrating the results within the inference pass, the model gains native support for symbolic computation.
|
||||
|
||||
Critically, this approach does not replace the probabilistic language modeling capabilities of the transformer but augments them with deterministic pathways optimized for mathematical reasoning. Symbolic operations are performed without gradient flow, ensuring that the core model remains a probabilistic language generator while gaining deterministic subroutines where needed. This architecture represents a middle ground between pure neural-symbolic systems and hybrid models with external engines, achieving both architectural elegance and computational efficiency.
|
||||
|
||||
|
||||
|
||||
|
||||
\section{Methods}
|
||||
|
||||
\subsection{Baseline MLP Feed-Forward Block}
|
||||
@@ -91,7 +123,7 @@ where \(w1\) and \(w2\) are linear layers and \texttt{relu} is the chosen activa
|
||||
Graphically, the data flow is:
|
||||
|
||||
\[
|
||||
x \rightarrow \text{Linear}(W_1) \rightarrow f(\cdot) \rightarrow \text{Linear}(W_2) \rightarrow \text{Output}.
|
||||
x \rightarrow \text{Linear}(W_1) \rightarrow f(\circ) \rightarrow \text{Linear}(W_2) \rightarrow \text{Output}.
|
||||
\]
|
||||
|
||||
This architecture applies sequential transformations, where each layer processes the output of the previous layer.
|
||||
|
||||
Reference in New Issue
Block a user