Science
Related: About this forumInteger addition algorithm could reduce energy needs of AI by 95%
From TechXplore
16-bit, 8-bit floating point numbers defined in IEEE 754 and on various hardware for tensor computations, and the 16-bit integer. MSB stands for most significant bit and LSB stands for least significant bit. Credit: arXiv (2024). DOI: 10.48550/arxiv.2410.00907
___________________________________________________________________________
A team of engineers at AI inference technology company BitEnergy AI reports a method to reduce the energy needs of AI applications by 95%. The group has published a paper describing their new technique on the arXiv preprint server.
...
The new technique is basicinstead of using complex floating-point multiplication (FPM), the method uses integer addition. Apps use FPM to handle extremely large or small numbers, allowing applications to carry out calculations using them with extreme precision. It is also the most energy-intensive part of AI number crunching.
The researchers call their new method Linear-Complexity Multiplicationit works by approximating FPMs using integer addition. They claim that testing, thus far, has shown that the new approach reduces electricity demand by 95%.
The one drawback it has is that it requires different hardware than that currently in use. But the research team also notes that the new type of hardware has already been designed, built and tested.
a little bit more ...
littlemissmartypants
(25,483 posts)Original study:
Addition is All You Need for Energy-efficient Language Models
Hongyin Luo, Wei Sun
https://arxiv.org/abs/2410.00907
❤️
eppur_se_muova
(37,388 posts)Suppose large factorizations are suddenly 20x easier -- how does that affect Web security ?
Jim__
(14,456 posts)My reading is not that they are changing any underlying algorithms, they are changing the implementation of the algorithms. Apparently AI algorithms perform an extremely high number of floating point multiplications. They are converting those floating point multiplications to integer addition, and that uses significantly less electricity.
The computations are less energy intensive. I don't believe they are any faster or any more powerful than the computations they are replacing.
Again that's based on my reading, you may be seeing something different.
eppur_se_muova
(37,388 posts)... (AFAIK) is something called the irrational base discrete weighted {fast Fourier} transform, which replaces the integer multiply with multiple floating-point calculations. Weirdly, this sounds like the reverse of the technique in the OP. Either one of these represents a really fundamental change in the algorithm -- it's not anything like fine-tuning. This link gives a nice short intro to what's involved, as well as a further development which does not seem to be practical just yet: https://theconversation.com/weve-found-a-quicker-way-to-multiply-really-big-numbers-114923
Detailed technical discussion is found in the Wikipedia articles https://en.wikipedia.org/wiki/Fast_Fourier_transform and https://en.wikipedia.org/wiki/Multiplication_algorithm. Also briefly discussed in Knuth (vol. 2) and in more detail in Crandall and Pomerantz .
Public-key cryptographic security is often based on the assumption that factoring very large integers is so resource-intensive that it will almost never be worth the effort involved to crack a protected file. As the algorithms to find factors of large integers become faster and more powerful, larger and larger keys are required.
Many cryptographic protocols are based on the difficulty of factoring large composite integers or a related problemfor example, the RSA problem. An algorithm that efficiently factors an arbitrary integer would render RSA-based public-key cryptography insecure.
https://en.wikipedia.org/wiki/Integer_factorization
Jim__
(14,456 posts)This post is based on quick reads of the AI paper and the papers you referenced. So, it's sort of food for thought.
In the AI problem, the number of multiplications is O(n2) where n is roughly based on the number of elements in a tensor. They are working with neural networks here, so my current understanding (I've read through the paper, but I need to read it more thoroughly to get a better understanding) is that this is based on the number of nodes (synapses) in the neural net. From the
paper - my best attempt at copying this:
Modern LLM training and inference involves a large number of FP calculations in tensor computa-
tion. Consider calculating the element-size and dot products of two 2-D tensors:
Y1 = A ◦ X, Y2 = A · XT ; A, X (elements of) R(N,k)
Calculating Y1 involves N2 FP multiplications (Mul). If A and X are both fp32 tensors, A ◦ X consumes 37 times higher energy than adding two int32 matrices of the save (SIC - s/b same?) size. Similarly, Calculating Y2 involves (m × n × k) FP Mul and the same number of FP additions (Add). When A and X are fp32 tensors, each Mul-Add operation for two numbers consumes 0.9 + 3.7 = 4.6 (pJ)energy. If we replace the fp32 Mul with int32 Add, the energy cost becomes
0.1 + 0.9 = 1.0 (pJ),only 21.7% of the original cost. Similarly, if the inference is conducted in fp16, replacing fp16 Mul with int16 Add result in a 1 - (0.05 + 0.4) / (1.1 + 0.4) = 70% energy saving.
...
We propose L-Mul, a FP multiplication algorithm with O(n) complexity, where n is the bit size
of its FP operands. Consider two FP numbers x, y, whose exponents and fractions are xe, ye and xm, ym respectively, the vanilla FP Mul result is
Mul(x, y) = (1 + xm) · 2xe · (1 + ym) · 2ye
= (1 + xm + ym + xm · ym) · 2xe+ye
plus an xor operation ( ⊕ ) to decide the sign of the result. Assume xm and ym are mantissas of m bits. The O(m2) mantissa multiplication operation is the complexity bottleneck of this calculation. We remove this operation and introduce a new multiplication algorithm that processes mantissas with a computational complexity of O(m):
L-Mul(x, y) = (1 + xm + ym + 2-l(m)) · 2xee+ym,
l(m) = | m if m =< 3,
.........| 3 if m = 4,
.........| 4 if m > 4.
And in the encryption and large prime searches, the O(n2) is based on the large number of digits in the very large numbers involved. From the link: https://theconversation.com/weve-found-a-quicker-way-to-multiply-really-big-numbers-114923 you referenced from The Conversation:
In 1960, Anatoly Karatsuba, a 23-year-old mathematics student in Russia, discovered a sneaky algebraic trick that reduces the number of multiplications needed.
For example, to multiply four-digit numbers, instead of needing 42 = 16 multiplications, Karatsubas method gets away with only nine. When using his method, twice as many digits means only three times as much work.
This stacks up to an impressive advantage as the numbers get bigger. For numbers with a thousand digits, Karatsubas method needs about 17 times fewer multiplications than long multiplication.
Can the solution to the AI problem have some impact on the solution to the problems of multiplying large numbers? In the long-term there may be some connections between the solutions. But, I don't think there is a currently recognized connection between them.
muriel_volestrangler
(102,473 posts)The paper says their "L-Mul algorithm" achieves higher precision than 8 bit FP multiplication (though 8 bit is not very precise). But encryption is for integers, and I wouldn't think factorization can be "close enough".
hunter
(38,919 posts)As with a lot of AI, these games didn't have to precisely reflect any physical model of reality, they just had to "look" right.
Of course "Natural Intelligence" may take such shortcuts too, which might partly explain why even highly intelligent people sometimes do stupid things.
It would be interesting to compare this energy efficient AI to an otherwise identical floating point or large integer AI.