Danh mục tài liệu

Hardware Acceleration of EDA Algorithms- P7

Số trang: 20      Loại file: pdf      Dung lượng: 217.36 KB      Lượt xem: 17      Lượt tải: 0    
Xem trước 2 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

Hardware Acceleration of EDA Algorithms- P7: Single-threaded software applications have ceased to see significant gains in performanceon a general-purpose CPU, even with further scaling in very large scaleintegration (VLSI) technology. This is a significant problem for electronic designautomation (EDA) applications, since the design complexity of VLSI integratedcircuits (ICs) is continuously growing. In this research monograph, we evaluatecustom ICs, field-programmable gate arrays (FPGAs), and graphics processors asplatforms for accelerating EDA algorithms, instead of the general-purpose singlethreadedCPU....
Nội dung trích xuất từ tài liệu:
Hardware Acceleration of EDA Algorithms- P7102 Part-III Control Plus Data Parallel Applications NVIDIA GeForce GTX 280 GPU card. Experimental results indicate that this approach can obtain an average speedup of about 818× as compared to a serial CPU implementation. With the recently announced cards with quad GTX 280 GPUs, we estimate that our approach would attain a speedup of over 2,400×.• Accelerating Fault Simulation on a Graphics Processor In today’s complex digital designs, with possibly several million gates, the number of faulty variations of the design can be dramatically higher. Fault sim- ulation is an important but expensive step of the VLSI design flow, and it helps to identify faulty designs. Given a digital design and a set of input vectors V defined over its primary inputs, fault simulation evaluates the number of stuck-at faults Fsim that are tested by applying the vectors V. The ratio of Fsim to the total number of faults in the design Ftotal is a measure of the fault coverage. The task of finding this ratio is often referred to as fault grading in the industry. Given the high computational cost for fault simulation, it is extremely important to explore ways to accelerate this application. The ideal fault simulation approach should be fast, scalable, and cost effective. In Chapter 8, we study the accelera- tion of fault simulation on a GPU. Fault simulation is inherently parallelizable, and the large number of threads that can be executed in parallel on a GPU can be employed to perform a large number of gate evaluations in parallel. We imple- ment a pattern and fault parallel fault simulator, which fault-simulates a circuit in a levelized fashion. We ensure that all threads of the GPU compute identical instructions, but on different data. Fault injection is also performed along with gate evaluation, with each thread using a different fault injection mask. Since GPUs have an extremely large memory bandwidth, we implement each of our fault simulation threads (which execute in parallel with no data dependencies) using memory lookup. Our experiments indicate that our approach, implemented on a single NVIDIA GeForce GTX 280 GPU card, can simulate on average 47× faster when compared to an industrial fault simulator. On a Tesla (8-GPU) sys- tem, our approach is potentially 300× faster.• Fault Table Generation Using a Graphics Processor A fault table is essential for fault diagnosis during VLSI testing and debug. Generating a fault table requires extensive fault simulation, with no fault drop- ping. This is extremely expensive from a computational standpoint. We explore the generation of a fault table using a GPU in Chapter 9. We employ a pattern parallel approach, which utilizes both bit parallelism and thread-level parallelism. Our implementation is a significantly modified version of FSIM, which is pattern parallel fault simulation approach for single-core processors. Like FSIM, our approach utilizes critical path tracing and the dominator concept to reduce run- time by pruning unnecessary simulations. Further modifications to FSIM allow us to maximally harness the GPU’s immense memory bandwidth and high com- putational power. In this approach we do not store the circuit (or any part of the circuit) on the GPU. We implement efficient parallel reduction operations to speed up fault table generation. In comparison to FSIM∗, which is FSIM modi- fied to generate a fault table on a single-core processor, our approach on a single NVIDIA Quadro FX 5800 GPU card can generate a fault table 15× faster onOutline of Part III 103 average. On a Tesla (8-GPU) system, our approach can potentially generate the same fault table 90× faster.• Fast Circuit Simulation Using Graphics Processor SPICE-based circuit simulation is a traditional workhorse in the VLSI design process. Given the pivotal role of SPICE in the IC design flow, there has been sig- nificant interest in accelerating SPICE. Since a large fraction (on average 75%) of the SPICE runtime is spent in evaluating transistor model equations, a significant speedup can be availed if these evaluations are accelerated. We study the speedup obtained by implementing the transistor model evaluation on a GPU and porting it to a commercial fast SPICE tool in Chapter 10. Our experiments demonstrate that significant speedups (2.36× on average) can be obtained for the commercial fast SPICE tool. The asymptotic speedup that can be obtained is about 4×. We demonstrate that with circuits consisting of as few as 1,000 transistors, speedups in the neighborhood of this asymptotic value can be obtained.Chapter 7Accelerating Statistical Static Timing AnalysisUsing Graphics Processors7.1 Chapter OverviewIn this chapter, we explore the implementation of Monte Carlo based statistical statictiming analysis (SSTA) on a graphics processing unit (GPU). SSTA via Monte Carlosimulations is a computationally expensive, but important step required to achievedesign timing closure. It provides an accurate estimate of delay variations and theirimpact on design yield. The large number of threads that can be computed in parallelon a GPU suggests a natural fit for the problem of Monte Carlo based SSTA tothe GPU platform. Our implementation performs multiple delay simulations for asingle gate in parallel. A parallel implementation of the Mersenne Twister pseudo-random number generator on the GPU, followed by Box–Muller transformations(also implemented on the GPU), is used for generating gate delay numbers froma normal distribution. The μ and σ of the pin-to-output delay distributions for allinputs of every gate are obtained using a memory lookup, which benefits from thelarge memory bandwidth of the GPU. Threads ...