Compilers of neural processing circuits

Human capabilities associated with the analysis of visual, audio and other sensory data, as well as rapid response to external stimuli, are very limited. Microelectronic devices and computers are very helpful in this area. Methods and approaches with elements of artificial intelligence and neural networks are being developed to analyze complex data, such as, for example, video. The most versatile option for such implementations are powerful servers with special graphics or tensor accelerators. The all information from sensors is transmitted to such servers or accelerators. Processing takes place "in the cloud", and decision-making takes place based on the results of this processing. However, very often this approach is not applicable, because it involves the need to transfer large amounts of data, and the time for transmission and processing is unacceptably high.

The decision of this problem is the emergence of a new direction related to the so-called boundary calculations. This direction implies distributed data processing, where direct calculations take place at the boundary of the environment – near the sensors and detectors. In particular, in the case of video processing, this means that a neurochip can be located directly with the camera, which performs primary data processing for detection or classification and sends data to the server only if necessary. Another option may be to reduce the data flow by transmitting only the coordinates of objects, up to full autonomy of the device due to decision-making right on the spot.

Hardware support for edge computing is currently of great scientific and practical interest. Universal and specialized neuroprocessors and accelerators with different architectures and different reprogramming capabilities are being researched and developed. The main approach is related to the implementation of a System-on-Chip (SoC), which includes a graphics processor or a neural accelerator with a large number of computing cores that can work in parallel for the rapid implementation of tensor calculations inherent in neural networks. This approach can be found in many variants of single-board computers from Nvidia (Jetson), Intel (Movidius), Google (Coral) and also in domestic chips from the company Elvis (SKIF, RoboDeus). Such solutions have a high level of versatility, as they can implement any neural network architecture. Memory management and data supply are performed at the software level using a microcontroller, and massive calculations are sent to a hardware accelerator.

AlphaCHIP team is exploring a different approach to implementing neural chips for boundary computing. The main idea is to bring some of the versatility to the design level, thereby making the end devices more specialized, and therefore more suitable for boundary computing. Software tools are being developed to generate hardware descriptions of specialized neural chips optimized for solving specific tasks. Versatility at the design level implies the possibility of implementing various neural network architectures under given performance constraints, area, type of memory used, etc. Versatility at the end device level (if necessary) is limited by the possibility of flashing weight coefficients, which will allow solving various problems of the same class. This approach promises great advantages in technical and economic indicators for produced devices according to the criteria of price, energy efficiency, speed and area. In addition, the requirements for the level of competence of end users are significantly reduced – specialized software for users allows flashing new weight coefficients, providing the possibility of retraining or additional education of the device basing on new data.

The user will be able to select a model from the list and set various settings and parameters, such as the depth of the neural network, the number and bit depth of parameters, architecture features, etc. It will also be possible to set user’s settings and priorities for power, area and performance – the program will automatically be able to take them into account in the process of synthesizing the architecture of the neurochip. As a result of the synthesis, a hardware description of the neurochip will be obtained at the RTL code level, which can subsequently be synthesized on a different element base (FPGA, ASIC). In general, the structure of the device is the relationship of three main components – memory, control device and systolic matrix multiplier, which takes over the major share of all calculations of the model.

Additional neural chip firmware tools will also be developed to solve various object detection tasks and other boundary computing tasks.

The computational core of the neurochip is a systolic matrix multiplier. Matrix multiplications are the basis of all neural network models, so an effective hardware implementation of such modules has value and potential for commercialization among hardware developers and design centers. The project will implement an efficient and versatile hardware description generator (soft IP) and CAD tool for automating and optimizing the multiplier for certain tasks and user constraints.