Developer unlocks Apple's M4 chip

Apple's M4 processors hold significant computing power for AI operations, but the tech giant traditionally keeps its components under tight control. In practice, this means that app developers can only use the Neural Engine to infer or run pre-trained models, but cannot directly develop and train new algorithms on it from scratch.

A researcher who goes by the online alias “0x0SojalSec” has published source code on GitHub that details how to harness the full potential of the silicon and achieve 15.8 TFLOPS of hidden computing power for machine learning purposes. While that number isn’t a record today, the feat is remarkable because it was performed entirely outside of Apple’s official development environment.

Since the company’s security settings do not allow direct communication with the neural engine for such advanced tasks, the project author had to find a way without using official software tools like CoreML or Metal, and he also could not rely on the GPU. Instead, he built his own intermediate language from scratch. This custom software solution acts as a bridge, allowing full backpropagation of errors and learning of transformer models directly on the neural chip.

Due to the factory-limited hardware design, some extremely ingenious tricks had to be implemented to maintain the stability of the operating system. If a process gets stuck or freezes during the intensive learning phase, the custom programming language uses a special execution command to restart the process. In this way, the system refreshes its current state and continues processing data without causing the entire application to crash.

A significant challenge in running such demanding workloads was also the speed of operation. To make the entire learning process run as smoothly as possible, the developer configured the system to write all data directly to the system's working memory. By deliberately avoiding the significantly slower flash memory, the entire operation remained extremely fast.