Hardware Acceleration of EDA Algorithms- P10: Single-threaded software applications have ceased to see significant gains in performanceon a general-purpose CPU, even with further scaling in very large scaleintegration (VLSI) technology. This is a significant problem for electronic designautomation (EDA) applications, since the design complexity of VLSI integratedcircuits (ICs) is continuously growing. In this research monograph, we evaluatecustom ICs, field-programmable gate arrays (FPGAs), and graphics processors asplatforms for accelerating EDA algorithms, instead of the general-purpose singlethreadedCPU....
Nội dung trích xuất từ tài liệu:
Hardware Acceleration of EDA Algorithms- P1010.5 Experiments 163 Table 10.2 Speedup for circuit simulation OmegaSIM (s) AuSIM (s)Ckt name # Trans. Total # eval. CPU-alone GPU+CPU SpeedUpIndustrial_1 324 1.86×107 49.96 34.06 1.47 ×Industrial_2 1,098 2.62×109 118.69 38.65 3.07 ×Industrial_3 1,098 4.30×108 725.35 281.5 2.58 ×Buf_1 500 1.62×107 27.45 20.26 1.35 ×Buf_2 1,000 5.22×107 111.5 48.19 2.31 ×Buf_3 2,000 2.13×108 486.6 164.96 2.95 ×ClockTree_1 1,922 1.86×108 345.69 132.59 2.61 ×ClockTree_2 7,682 1.92×108 458.98 182.88 2.51 ×Avg 2.36 × Table 10.2 compares the runtime of AuSIM (which is OmegaSIM with ourapproach integrated. AuSIM runs partly on GPU and partly on CPU against theoriginal OmegaSIM (running on the CPU alone). Columns 1 and 2 report the cir-cuit name and the number of transistors in the circuit, respectively. The number ofevaluations required for full circuit simulation is reported in column 3. Columns 4and 5 report the CPU-alone and GPU+GPU runtimes (in seconds), respectively.The speedups are reported in column 6. The circuits Industrial_1, Industrial_2,and Industrial_3 perform the functionality of an LFSR. Circuits Buf_1, Buf_2,and Buf_3 are buffer insertion instances for buses of three different sizes. Cir-cuits ClockTree_1 and ClockTree_2 are symmetrical H-tree clock distribution net-works. These results show that an average speedup of 2.36× can be achieved overa variety of circuits. Also, note that with an increase in the number of transistorsin the circuit, the speedup obtained is higher. This is because the GPU mem-ory latencies can be better hidden when more device evaluations are issued inparallel. The NVIDIA 8800 GPU device supports IEEE 754 single precision floatingpoint operations. However, the BSIM3 model code uses IEEE 754 double precisionfloating point computations. We first converted all the double precision computa-tions in the BSIM3 code into single precision before modifying it for use on theGPU. We determined the error that was incurred in this process. We found that theaccuracy obtained by our GPU-based implementation of device model evaluation(using single precision floating point) is extremely close to that of a CPU-baseddouble precision floating point implementation. In particular, we computed the errorover 106 device model evaluations and found that the maximum absolute error was9.0×10−22 Amperes, and the average error was 2.88×10−26 Amperes. The rela-tive average error was 4.8×10−5 . NVIDIA has announced the availability of GPUdevices which support double precision floating point operations. Such devices willfurther improve the accuracy of our approach. Figures 10.1 and 10.2 show the voltage plots obtained for Industrial_2 andIndustrial_3 circuits, obtained by running AuSIM and comparing it with SPICE.Notice that the plots completely overlap.164 10 Accelerating Circuit Simulation Using Graphics ProcessorsFig. 10.1 Industrial_2 waveformsFig. 10.2 Industrial_3 waveformsReferences 16510.6 Chapter SummaryGiven the key role of SPICE in the design process, there has been significant interestin accelerating SPICE. A large fraction (on average 75%) of the SPICE runtimeis spent in evaluating transistor model equations. The chapter reports our effortsto accelerate transistor model evaluations using a GPU. We have integrated thisaccelerator with a commercial fast SPICE tool and have shown significant speedups(2.36× on average). The asymptotic speedup that can be obtained is about 4×. Withthe recently announced quad GPU systems, this speedup could be enhanced further,especially for larger designs.References 1. BSIM3 Homepage. http://www-device.eecs.berkeley.edu/∼bsim3 2. BSIM4 Homepage. http://www-device.eecs.berkeley.edu/∼bsim4 3. Capsim Hierarchical Spice Simulation. http://www.xcad.com/xcad/spice- simulation.html 4. FineSIM SPICE. http://www.magmada.com/c/SVX0QdBvGgqX˙ /Pages/ FineSimSPICE˙ html 5. NVIDIA Tesla GPU Computing Processor. http://www.nvidia.com/object/IO_ 43499.html 6. OmegaSim Mixed-Signal Fast-SPICE Simulator. http://www.nascentric.com/ product.html 7. Virtuoso UltraSim Full-chip Simulator. http://www.cadence.com/products/ custom_ic/ultrasim/index.aspx 8. Agrawal, P., Goil, S., Liu, S., Trotter, J.: Parallel model evaluation for circuit simulation on the PACE multiprocessor. In: Proceedings of the Seventh International Conference on VLSI Design, pp. 45–48 (1994) 9. Agrawal, P., Goil, S., Liu, S., Trotter, J.A.: PACE: A multiprocessor system for VLSI circuit simulation. In: Proceedings of SIAM Conference on Parallel Processing, pp. 573–581 (1993)10. Amdahl, G.: Validity of the single processor approach to achieving large-scale computing capabilities. Proceedings of AFIPS 30, 483–485 (1967)11. Dartu, F., Pileggi, L.T.: TETA: transistor-level engine for timing analysis. In: DAC ’98: Pro- ceedings of the 35th Annual Conference on Design Automation, pp. 595–598 (1998)12. Gulati, K., Croix, J., Khatri, S.P., Shastry, R.: Fast circuit simulation on graphics processing units. In: Proceedings, IEEE/AC ...
Hardware Acceleration of EDA Algorithms- P10
Số trang: 20
Loại file: pdf
Dung lượng: 351.99 KB
Lượt xem: 20
Lượt tải: 0
Xem trước 2 trang đầu tiên của tài liệu này:
Thông tin tài liệu:
Tìm kiếm theo từ khóa liên quan:
phần cứng CPU Vi điều khiển PIC thiết bị máy chủ giao tiếp ngoại vi Định thời biểu CPU hệ điều hành đa chươngTài liệu có liên quan:
-
Giáo trình Vi điều khiển PIC: Phần 1
119 trang 131 0 0 -
Giáo trình Vi điều khiển PIC: Lý thuyết - Thực hành (Phần 2)
168 trang 104 0 0 -
Tìm hiểu các thông số cơ bản của CPU
11 trang 91 0 0 -
Giáo trình hoàn chỉnh vi điều khiển PIC 14
8 trang 58 0 0 -
Bài tập lớn lý thuyết điều khiển tự động
16 trang 48 0 0 -
Agile Processes in Software Engineering and Extreme Programming- P10
19 trang 43 0 0 -
GIÁO TRÌNH ĐIỀU KHIỂN SỐ_CHƯƠNG 7
0 trang 41 0 0 -
GIÁO TRÌNH ĐIỀU KHIỂN SỐ_CHƯƠNG 5
0 trang 40 0 0 -
GIÁO TRÌNH ĐIỀU KHIỂN SỐ_CHƯƠNG 1_2
0 trang 38 0 0 -
Giáo trình Vi điều khiển PIC: Lý thuyết - Thực hành (Phần 1)
201 trang 37 1 0