|
1.INTRODUCTIONInternet of things (IOT) technology puts forward more stringent requirements on the area and power consumption of SOC (System On Chip). Memory units, such as SRAM (Static Random Access Memory), usually occupy 1/2~2/3 of the chip area, which is an important direction of chip area and power consumption optimization. The 6T SRAM provided by foundry is designed for general use, and its area and power consumption are not ideal. The basic memory cell of DRAM is 2T structure, which has smaller size and leakage current1. Replacing SRAM with DRAM can significantly reduce the area and power consumption of the chip. However, its retention time is short, so it needs to refresh the stored charge periodically, which may lead to a small number of bit errors. In order to ensure the correctness of the data, the wrong data bits must be corrected, and the correction circuit must be simple, otherwise the ratio of the internal storage unit of DRAM to the total area of DRAM will be too small, which is contrary to the optimization goal2-3. Spatial radiation often leads to multi bit data adjacent errors, so the interleaving circuit is adopted in the design to correct them. Access time is an important indicator of memory, and the error search circuit takes up most of it. The parallel search circuit is much faster than the serial search circuit, so the p-channel parallel chien search is adopted to achieve this target. BCH codes are a good linear error correction codes with strong error correction ability, convenient construction and simple coding for correcting multiple random errors. Based on this background, a BCH coded ECC error correction circuit is designed and implemented, which can effectively correct the random errors in DRAM and ensure the accuracy of data and anti-interference ability4. 2.DRAM DATA ERROR ANALYSISDRAM data content errors are mainly caused by data attenuation, refresh error and particle radiation. 2.12T storage data attenuation characteristics and α Particles and space radiationThe 2T storage structure and read-write circuit are simple, but there is no latch structure. The data content is mainly maintained by the grid parasitic capacitance. Therefore, the retention time is short, resulting in data errors over time. The data attenuation characteristics with time are shown in Figure 1.Without optimization, the data retention time is only about 100μs under 55nm process. In large scale integrated circuits, packaging materials contain trace amounts of radioactive substances, α Particles will be generated with the decay of these substances5. 2T memory cell uses grid parasitic capacitance to realize storage, so it is more vulnerable to α Effects of particles and high-energy particle.The error rate is shown in Figure 2 under 90nm and 55nm fabrication. 2.2Gamete manipulation interleaved implicit refreshThe 2T storage unit adopts a folding structure to form a storage array. And the write operation is line destructive, which will cause other data in the line to be overwritten except for the target storage unit to be written6-8. The design adopts sub-operation interleaved with implicit refresh, that is, the refresh operation is hidden between two external accesses in parallel. The refresh operation occurs in the second half of the previous access and the first half of the next access in parallel, as shown in Figure 3. Due to the situation of refreshing and accessing the same row, the gate circuit will be in an uncertain switching state for a short time, resulting in the possibility of data error. 3.BCH CODING ERROR CORRECTIONBased on the above error characteristics of memory cells, considering the area and frequency index of DRAM, interleaved BCH coding can be used to correct the error data content. 3.1BCH coding principle and code interleavingBCH code is a kind of cyclic linear coding. The polynomial generated by cyclic coding is shown in Formula 1. LCM represents the minimum common multiple, and its minimum code distance is d > 2t + 1, and it can correct t errors. In the encoding process, the polynomial of the message to be sent is m(x), the encoding polynomial is g(x) and the sent information polynomial is r(x) = m(x)g(x). In the decoding process, the received information polynomial is r(x), the encoding polynomial is g(x), and the remainder mode is mod(r(x), g(x)), if the remainder is not zero, there is a bit error, and then the error bit can be found through the error positioning polynomial; If the remainder is zero, there is no bit error. The encoding process can be realized by a shift register with a feedback loop. The connection relationship of the feedback loop is controlled by the coefficients of the generated polynomial. The feedback loop shift register is shown in Figure 4. The bit errors caused by space radiation are generally continuous multi bit errors. If the data is continuously non-interleaved, because a0, a1, a2 and a3 share a set of verification circuit, the verification circuit cannot verify the data with more than 2-bit errors. The data non interleaved arrangement is shown in Figure 5(a). Four groups of interleaved arrangement are adopted for the data. In this way, a0, b0, c0 and d0 each adopt independent verification circuits, that is, four sets of calibration circuits, which can calibrate continuous 8-bit errors. The data interleaving arrangement is shown in Figure 5(b). 3.2BCH coding optimizationBCH code is finally converted into a series of XOR operations to facilitate DRAM layout and function tuning. Some intermediate items are as follows. s1=bit1^bit2^bit4^bit5^bit7^bit9^bit11^bit12^bit14^bit16^bit18 s2=bit1^bit3^bit4^bit6^bit7^bit10^bit11^bit13^bit14^bit17^bit18 s3=bit2^bit3^bit4^bit8^bit9^bit10^bit11^bit15^bit16^bit17^bit18 May order: t1=bit5^bit12, t2=bit6^bit13, t3=bit8^bit15, t12=bit1^bit7^bit14, t13=bit2^bit9^bit16, t23=bit3^bit10^bit17, t123=bit4^bit11^bit18, Then, s1=t1^t12^t13^t123, s2=t2^t12^t23^t123, s3=t3^t13^t23^t123 This design originally required 30 XOR gates, but after optimization, only 20 XOR gates were required, which can save 33% of the area. 3.3Implementation and optimization of parallel chien search circuitThe p-channel parallel chien search is adopted instead of the serial chien search to find out the error bit position and correct it. Only 1/p of the original search time is required, which can greatly shorten the error bit decoding time and improve the working frequency of DRAM. The p-channel parallel chien search circuit is shown in Figure 69, in which a ~ a^t is the multiplier coefficient. Finally, it is converted into an exclusive OR gate, and the area connection is small. 4.SIMULATION AND VERIFICATIONThe corresponding Python model of ECC algorithm is built for simulation, as shown in Figure 7. Among them, the black circle represents the original data before writing to the DRAM, and its value is the equal difference sequence data with step size of 1, the blue square represents the data after adding random errors (the maximum any 2-bit errors) to the DRAM, and the red triangle represents the correct data after calibration. The abscissa is the address to be accessed, and the ordinate is the data value. It can be seen from the figure that after ECC correction, the error data is exactly the same as the original data, that is, the dots and triangles on the figure completely coincide. 5.CONCLUSIONA digital IP is designed and implemented that can accurately verify and correct DRAM using BCH coding. The data interleaving method is used in ECC circuit to improve the error correction ability, extract the circuit common factor to simplify the required gate circuit, and the parallel chien search circuit is used to reduce the time of error location. It occupies a small area and is suitable for large-scale common logic process integration. The design has been verified in the 55nm SOC chip project. When the main frequency is 200 MHz, the DRAM read / write refresh is normal, and the data is correct. REFERENCESAnsari, M., and Singh, J.,
“Capacitorless 2T-DRAM for Higher Retention Time and Sense Margin,”
IEEE T. Electron Dev, 67
(3), 902
–906
(2020). https://doi.org/10.1109/TED.16 Google Scholar
Bang, S., Han, K., Kahng, A. B. and Luo, M.,
“Delay uncertainty and signal criticality driven routing channel optimization for advanced DRAM products,”
ASP-DAC, 697
–704
(20162016). Google Scholar
Shin, W., Choi, J., Jang, J., Suh, J., Moon, Y. and Kwon, Y.,
“DRAM-Latency Optimization Inspired by Relationship between Row-Access Time and Refresh Timing,”
IEEE T. Comput, 65
(10), 3027
–3040
(2016). https://doi.org/10.1109/TC.2015.2512863 Google Scholar
Micheloni, R., Ravasio, R., Marelli, A., Alice, E., Altieri, V. and Bovino, A.,
“A 4Gb 2b/cell NAND Flash Memory with Embedded 5b BCH ECC for 36MB/s System Read Throughput,”
ISSCC, 497
–506
(20062006). Google Scholar
Sasada, T., Ichikawa, S., Kanai, T.,
“In-flight measurement of space radiation effects on commercial DRAM,”
ICM, 480
–483
(20042004). Google Scholar
Pattabiraman, K., Zorn, B. G., Liu, S., Moscibroda, T. and Zorn, B. G.,
“Flikker: Saving DRAM Refresh-power through Critical Data Partitionin,”
ser. ASPLOS,
(20112011). Google Scholar
Tillinghast, C. W., Cohen, M. S., Voshell, T. W.,
“Temperature-dependent DRAM refresh circuit,”
US, (1994). Google Scholar
Ohsawa,T., Kai, K., Murakami, K.,
“Optimizing the DRAM refresh count for merged DRAM/logic LSIs,”
IEEE Cat. No. 98TH8379, 82
–87
(1998). Google Scholar
Chen, Y., Parhi, K. K.,
“Small area parallel Chien search architectures for long BCH codes,”
IEEE T. VLSI. Syst, 12
(5), 545
–549
(2015). https://doi.org/10.1109/TVLSI.2004.826203 Google Scholar
|