Prediction of Electric Load Neural Network Prediction Model for Big Data

Yi Zhiqiang 1, *

1School of electronic and information engineering, Xiamen University Malaysia, Sepang, Malaysia
*Corresponding author: EEE1909259@xmu.edu.my

Abstract: Spike sorting algorithms is a considerable significant way to collect neural signals in neural signal acquisition systems. One method to realize this function in hardware aspect is to use absolute-value detector. In this paper, the common circuit topology is given by using static CMOS and PTL technology. Then, through comparing with the delay and energy of unit reference inverter, the optimal energy-delay model when sacrificing extra 50% delay can be obtained by the way changing the size of CMOS and Vdd simultaneously, which save nearly 83% energy dissipate than minimal delay model.

Keywords: Absolute-value Detector, CMOS, Optimal Energy-delay Model.

1. Introduction

With recent advances in neural signal acquisition systems, spike sorting algorithms have attracted the interest of the research community. The algorithm has two advantages in the neuron signal processing step [1]. The first advantage is that the algorithm can help neuroscientists understand which spikes are coming from which neurons. A second advantage is the algorithm's ability to perform data reduction on-chip prior to signal transmission.

The spike detection algorithm involves two main steps: (1) the pre-emphasis of the spike and (2) the application of a threshold [2]. A simple, common detection method is to apply a threshold to the waveform voltage. This threshold can be applied to the original waveform or to the absolute value of the waveform [3]. Applying the threshold to the absolute value of the signal is more intuitive, since spikes can be either positive or negative. And the requirements for the spike classification hardware are that it must be low power to prevent tissue damage from high power overheating, and low area to be implantable [4]. Under the conditions above, increase the computing performance as much as possible.

The absolute value detector can be implemented using a very large scale integration (VLSI) concept suitable for neural signal applications. The use of VLSI methods in neural signal acquisition systems can help reduce circuit size, area, and increase speed [5]. In this review, an approach to VLSI implementation of neural networks -- CMOS technology -- is discussed.

2. Methodology

2.1 Circuit topology and CMOS type

This paper focuses on the delay-energy optimization model of absolute value comparator, and its circuit logic structure is combined by modularization. The concrete design idea is as follows: First, the absolute value comparator is divided into two parts: the output of absolute value and the comparator. Secondly, the absolute value output can be divided into two parts according to its function, the adder part and the data selector part. The main function of the data selector is to select the correct output according to the positive and negative input. If the input is positive, the output is direct; If the input is negative, its complement output is required. The function of the adder is the complement operation, the input negative complement code inverse code plus 1 [6]. Finally, there is the comparator part. Since the absolute value output only outputs positive numbers and one of the four bits of input is a sign bit, the comparator needs only three bits to cover all cases. The overall of absolute-value detection circuit topology is shown in Figure 1.
The next step is the design of the transistor structure of the logic circuit. First of all, there are five different logic gates in the circuit, which are inverter, NAND, NOR, XOR and 2-bit MUX. NAND and NOR adopt a static complementary CMOS structure. The advantage of this structure is that the pull-up network is completely dual to the pull-down network, and this structure does not conduct under either high signal or low signal. Therefore, the static conduction current is very small, which can greatly save energy consumption.

Both XOR and MUX adopt static CMOS combined with pass transmission Logic. The benefit of PTL circuit is the ability to simplify the circuit. A gate built with Transmission gate can save 4 transistors compared with a static CMOS circuit for the same XOR. Logic effort and parasitic effort for XOR is shown in Figure 2.

2.2 Critical path

Delay is an important parameter for a chip to measure its performance. A smaller delay means lower latency for the chip. Due to the complex logic circuit structure in the chip, the path of each instruction may be different, the order may be different, and the delay may also be different. Due to the barrel effect, the minimum delay of a logic circuit is restricted by the maximum delay path, so this path is called critical path [7]. Therefore, in order to calculate the minimum delay of a logic circuit, its critical path must be found.

After completing the circuit design, in order to obtain the effective minimum delay, the general method is to calculate the minimum delay of the critical path, that is, the path with the maximum delay in a certain circuit, which is generally the longest path. According to the observation of the whole road, alternative is the critical path for two preliminary judgment, because the two paths of logic gate series and type is very similar, so which can't effectively see through qualitative analysis is the key to the real path, only through the next section, respectively, to find the minimum delay of two paths and comparative analysis. Potential critical path 1 as is shown in figure 3.
The first path (as shown in Figure 3) starts from A0, goes through an inverter to the first half adder, along with the carry outlet to the second half adder, and then also from the carry outlet to the third half adder, and then from the NAND output through the MUX to the comparator, Then from the third entrance of the comparator, the sequence output of the inverter --2-input NAND-- inverter --2-input NOR--3-input NAND-- inverter --3-input NOR-- inverter. Potential critical path 2 is shown in Figure 4.

The second path (as shown in Figure 4) is the same as the first path until it reaches the inverter, and the second half follows the output sequence of 2-input NAND-Inverter--2-input NOR--3-input NAND--Inverter--3-input NOR--Inverter.

The second path has one less section than the first, but one of the inverters has two branch fan-outs.

### 2.3 Minimum delay calculation

This section describes the method of calculating the path's minimum delay, the computational logic effort. Logic effort is a method commonly used to calculate the delay of logic circuits[8]. Specific methods are as follows:

1. Select the reference inverter, in the paper, the ratio of the scale of length and width of PMOS and NMOS is 2:1.

2. Count logical effort and parasitic effort at all levels. As shown below, the logical effort of XOR is $g = 4 + 2 / 3 = 2$, and the parasitic effort is $P = 4 / 3 + 5 / 3 = 3$. By analogy, the logical efforts of NAND, NOR, and MUX can be obtained by the same method and are listed in the following Table 1.

#### Table 1: Logic effort parameters

<table>
<thead>
<tr>
<th>Gate Type</th>
<th>G for different number of inputs</th>
<th>1</th>
<th>2</th>
<th>3</th>
</tr>
</thead>
<tbody>
<tr>
<td>Inverter</td>
<td></td>
<td>1</td>
<td></td>
<td></td>
</tr>
<tr>
<td>NAND</td>
<td></td>
<td>4/3</td>
<td>5/3</td>
<td></td>
</tr>
<tr>
<td>NOR</td>
<td></td>
<td>5/3</td>
<td>7/3</td>
<td></td>
</tr>
<tr>
<td>Multiplexer</td>
<td></td>
<td>2</td>
<td></td>
<td>/</td>
</tr>
<tr>
<td>XOR</td>
<td></td>
<td>2</td>
<td></td>
<td>/</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Gate Type</th>
<th>Parasitic Delay</th>
</tr>
</thead>
<tbody>
<tr>
<td>Inverter</td>
<td>pinv</td>
</tr>
<tr>
<td>NAND</td>
<td>2npinv</td>
</tr>
<tr>
<td>NOR</td>
<td>2npinv</td>
</tr>
<tr>
<td>Multiplexer</td>
<td>3pinv</td>
</tr>
<tr>
<td>XOR</td>
<td>3pinv</td>
</tr>
</tbody>
</table>

2. Based on the data in Table 1, you can mark the g and p of all logical gates in path 1 and 2. So, according to the below equation,
So for the first path, Fan-out $F = \frac{C_{out}}{C_{gate1}} = 32/2 = 16$; Branch $B = 2$; Path logic effort $G = 61.454$; Parasitic effort $P = 28$; Stages $N = 16$. $H = 1966.529$. Then delay is given by

$$D = t_p 0 \left( \frac{N \cdot f}{F} + P \right)$$

Hence, path fan-out $f = 1.606$; $D = 53.696 t_p 0$. So the delay of the first path is 53.696. Similarly, as for the second one, Fan-out $F = \frac{C_{out}}{C_{gate1}} = 32/2 = 16$; Branch $B = 4$; Path logic effort $G = 61.454$; Parasitic effort $P = 27$; Stages $N = 15$. $H = 3933.05$, the delay of it is 53.045$t_p 0$. The critical path of the circuit should be the first one since the value of the first one is larger. And the minimum delay of the critical path is 53.696$t_p 0$.

After determining the critical path, the size of each logic gate under the minimum delay can be calculated based on the data obtained above. The specific calculation formula is as follows.

$$C_{in} = \frac{b g f + C_{out}}{f}$$

$C_{in}$ represents the input capacitance of equivalent unit inverter of logic gate, and it is also the ratio between the logic gate and unit inverter, which also means the size of the logic gate. $F$ is the fan-out coefficient with the minimum path delay. Therefore, the dimensions of each logic gate in the Critical Path are shown in the following figure. Size of critical path as is shown in figure 5.

![Figure 5 Size of critical path](image)

3. Results

3.1 Energy optimization by changing size and VDD

Another important indicator of integrated chip is energy consumption. Excessive energy consumption will not only cause a large amount of resource consumption, but also generate huge heat for integrated chip, leading to chip heating and overheating and performance degradation. In order to weaken this negative effect, according to the formula

$$E = V_{DD}^2 \cdot C_L$$

It can be known that the energy consumption can be changed by changing the size of the transistor and the supply voltage $VDD$. But a change in either of these two factors will result in a change in path delay. Therefore, the next work is to explore the influence of the two factors on delay and energy respectively and find the optimal parameter model, which can be divided into three cases: The first way is to keep the size, and change the $VDD$; The second way is to keep the $VDD$, and change the size; The third way is to change both size and $VDD$ [8].

3.2 Energy optimization by changing VDD only

When size is determined, the capacitor is also fixed, so the change of energy is only affected by $VDD$. And since the relation between delay and supply voltage $VDD$ is listed in the following equation [9],

$$D \propto \frac{V_{DD}^3}{(V_{DD} - V_T)^2}$$
Assume a coefficient $k$ that makes this equation equal, and take the derivative of this equation. When $0<VDD<0.2V$, $D$ increases with the increase of $VDD$; when $0.2<VDD<1V$, $D$ decreases with increasing $VDD$. Considering the subthreshold effect, the $VDD$ should be greater than twice the $VT$, that is, 0.4V. Therefore, when $VDD=1V$ is used as the reference voltage, $D'=1.5D$ and $VDD'=0.775V$. At this point, $CL=C_{total}=217.80C$. $E=217.80*0.7752=130.82$, which is 39.93% lower than the reference energy.

### 3.3 Energy optimization by changing changing size only

This section describes the second way, keeping the $VDD$ unchanged while changing the size. In this case, it is to use for calculation the formula that works out the delay of single logic gate[10],

$$d_i = g_i h_i + p_i$$  \hspace{1cm} (6)

and the sum of the delay of each gate is total delay of the path. Therefore, in order to obtain 1.5 times delay, it is helpful with this formula to conclude that the total capacitance of the path is 48 when the size of all the paths is 1. Hence, the delay of the path is 80.16. So the energy is at this situation. Comparing with the energy given by the reference $1.5xD_{min}(80.544t_0)$ which is 48, it has 77.96% lower than the reference energy. Size optimization only for minimum energy as is shown in Table 2.

<table>
<thead>
<tr>
<th>Size</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
</tr>
</thead>
<tbody>
<tr>
<td>Delay</td>
<td>81.7</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### 3.4 Optimization solution with consideration to size and VDD

The section demonstrates the third way, both changing $VDD$ and Size. Referring to the formulas listed on the below and the boundary conditions of each parameter, it can obtain that the minimum delay with a favor of $f_{min}$ function [10-11],

$$\begin{align*}
D &= \sum_{i=1}^{N} g_i h_i + p_i \\
D &= k \cdot \frac{v_{DD}^2}{(V_{DD}-V_T)^2} \\
mD &= k \cdot \frac{v_{DD}^2}{(V_{DD}-V_T)^2} \\
0.4 < V_{DD}' \leq 1 \\
h_i \geq 1 \\
m > 0
\end{align*}$$  \hspace{1cm} (7)

and the result of this method occurs when the size of all gates is 1 except the last stage is 2. Meanwhile, $VDD$ is 0.87V, so the capacitance of the path is 49, then the energy is 0.87 square times 50, which is 37.09, 82.97% lower than the reference energy. That is the best solution of this paper. Size-Vdd optimization for minimum energy as is shown in Table 3.

<table>
<thead>
<tr>
<th>Size</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>1.0</th>
<th>2.0</th>
</tr>
</thead>
<tbody>
<tr>
<td>Delay</td>
<td>68.0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### 3.5 Discussion

According to the analysis and calculation of the three methods, several conclusions can be drawn:

1. Because when the logic circuit is in the minimum delay state, the percentage of energy reduction and the percentage of delay increase after reducing $VDD$ is low, it is speculated that the sensitivity of delay is higher than that of energy within a certain range when the size of $VDD$ is changed. In the case of changing size, the percentage of energy decrease and the percentage of delay increase after reducing size is high. Therefore, within a certain range, the sensitivity of energy to SIZE is higher than that of delay to SIZE.

2. When considering the method of only changing $VDD$ or only changing SIZE, the energy optimization obtained by changing SIZE is far greater than that obtained by changing $VDD$. 

447
3. The method of changing VDD and SIZE at the same time can achieve higher energy optimization than either of the previous two schemes, which should be the idea that adopts the part with the largest descent gradient in the two way. Schematic diagram of energy-delay optimization strategy as is shown in figure 6.

![Figure 6 Schematic diagram of energy-delay optimization strategy](image)

4. Conclusions

This paper focuses on exploring the delay and energy optimization of the critical path of the absolute value comparator, taking the unit inverter with a width to length ratio of 2:1 as the reference value. First, two similar potential critical paths are compared by comparing the minimum path delay. The first path has 16 stages and the minimum delay is 53.696\(t_p\); the second path has 15 stages and the minimum delay is 53.049\(t_p\). Then the critical path is Path 1. Next, the size ratio of the logic gates of this path under the minimum delay is calculated.

Next, this paper divides the critical path energy optimization into three categories. The first category is to only change the power supply voltage Vdd without changing the device size. The optimal voltage is 0.775V, and the optimal energy consumption is 130.82. The second type is to change the transistor size without changing the power supply voltage. The optimal size is shown in section 3.3, and the optimal energy consumption is 48. The third type is to balance the supply voltage and size. The optimal voltage is 0.87V, and the optimal energy consumption is 37.09.

The optimal energy delay model achieves nearly 83% energy savings at the expense of only 50% additional delay. This greatly reduces the area and calorific value of the chip, and provides a more feasible project for the application in the field of neural signal science.

References


