Hardware Acceleration of EDA Algorithms- P7

Số trang: 20 Loại file: pdf Dung lượng: 217.36 KB Lượt xem: 17 Lượt tải: 0

Thư Viện Số

Báo xấu

Xem trước 2 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

Hardware Acceleration of EDA Algorithms- P7: Single-threaded software applications have ceased to see significant gains in performanceon a general-purpose CPU, even with further scaling in very large scaleintegration (VLSI) technology. This is a significant problem for electronic designautomation (EDA) applications, since the design complexity of VLSI integratedcircuits (ICs) is continuously growing. In this research monograph, we evaluatecustom ICs, field-programmable gate arrays (FPGAs), and graphics processors asplatforms for accelerating EDA algorithms, instead of the general-purpose singlethreadedCPU....
Nội dung trích xuất từ tài liệu:
Hardware Acceleration of EDA Algorithms- P7102 Part-III Control Plus Data Parallel Applications NVIDIA GeForce GTX 280 GPU card. Experimental results indicate that this approach can obtain an average speedup of about 818× as compared to a serial CPU implementation. With the recently announced cards with quad GTX 280 GPUs, we estimate that our approach would attain a speedup of over 2,400×.• Accelerating Fault Simulation on a Graphics Processor In today’s complex digital designs, with possibly several million gates, the number of faulty variations of the design can be dramatically higher. Fault sim- ulation is an important but expensive step of the VLSI design ﬂow, and it helps to identify faulty designs. Given a digital design and a set of input vectors V deﬁned over its primary inputs, fault simulation evaluates the number of stuck-at faults Fsim that are tested by applying the vectors V. The ratio of Fsim to the total number of faults in the design Ftotal is a measure of the fault coverage. The task of ﬁnding this ratio is often referred to as fault grading in the industry. Given the high computational cost for fault simulation, it is extremely important to explore ways to accelerate this application. The ideal fault simulation approach should be fast, scalable, and cost effective. In Chapter 8, we study the accelera- tion of fault simulation on a GPU. Fault simulation is inherently parallelizable, and the large number of threads that can be executed in parallel on a GPU can be employed to perform a large number of gate evaluations in parallel. We imple- ment a pattern and fault parallel fault simulator, which fault-simulates a circuit in a levelized fashion. We ensure that all threads of the GPU compute identical instructions, but on different data. Fault injection is also performed along with gate evaluation, with each thread using a different fault injection mask. Since GPUs have an extremely large memory bandwidth, we implement each of our fault simulation threads (which execute in parallel with no data dependencies) using memory lookup. Our experiments indicate that our approach, implemented on a single NVIDIA GeForce GTX 280 GPU card, can simulate on average 47× faster when compared to an industrial fault simulator. On a Tesla (8-GPU) sys- tem, our approach is potentially 300× faster.• Fault Table Generation Using a Graphics Processor A fault table is essential for fault diagnosis during VLSI testing and debug. Generating a fault table requires extensive fault simulation, with no fault drop- ping. This is extremely expensive from a computational standpoint. We explore the generation of a fault table using a GPU in Chapter 9. We employ a pattern parallel approach, which utilizes both bit parallelism and thread-level parallelism. Our implementation is a signiﬁcantly modiﬁed version of FSIM, which is pattern parallel fault simulation approach for single-core processors. Like FSIM, our approach utilizes critical path tracing and the dominator concept to reduce run- time by pruning unnecessary simulations. Further modiﬁcations to FSIM allow us to maximally harness the GPU’s immense memory bandwidth and high com- putational power. In this approach we do not store the circuit (or any part of the circuit) on the GPU. We implement efﬁcient parallel reduction operations to speed up fault table generation. In comparison to FSIM∗, which is FSIM modi- ﬁed to generate a fault table on a single-core processor, our approach on a single NVIDIA Quadro FX 5800 GPU card can generate a fault table 15× faster onOutline of Part III 103 average. On a Tesla (8-GPU) system, our approach can potentially generate the same fault table 90× faster.• Fast Circuit Simulation Using Graphics Processor SPICE-based circuit simulation is a traditional workhorse in the VLSI design process. Given the pivotal role of SPICE in the IC design ﬂow, there has been sig- niﬁcant interest in accelerating SPICE. Since a large fraction (on average 75%) of the SPICE runtime is spent in evaluating transistor model equations, a signiﬁcant speedup can be availed if these evaluations are accelerated. We study the speedup obtained by implementing the transistor model evaluation on a GPU and porting it to a commercial fast SPICE tool in Chapter 10. Our experiments demonstrate that signiﬁcant speedups (2.36× on average) can be obtained for the commercial fast SPICE tool. The asymptotic speedup that can be obtained is about 4×. We demonstrate that with circuits consisting of as few as 1,000 transistors, speedups in the neighborhood of this asymptotic value can be obtained.Chapter 7Accelerating Statistical Static Timing AnalysisUsing Graphics Processors7.1 Chapter OverviewIn this chapter, we explore the implementation of Monte Carlo based statistical statictiming analysis (SSTA) on a graphics processing unit (GPU). SSTA via Monte Carlosimulations is a computationally expensive, but important step required to achievedesign timing closure. It provides an accurate estimate of delay variations and theirimpact on design yield. The large number of threads that can be computed in parallelon a GPU suggests a natural ﬁt for the problem of Monte Carlo based SSTA tothe GPU platform. Our implementation performs multiple delay simulations for asingle gate in parallel. A parallel implementation of the Mersenne Twister pseudo-random number generator on the GPU, followed by Box–Muller transformations(also implemented on the GPU), is used for generating gate delay numbers froma normal distribution. The μ and σ of the pin-to-output delay distributions for allinputs of every gate are obtained using a memory lookup, which beneﬁts from thelarge memory bandwidth of the GPU. Threads ...

Tìm kiếm theo từ khóa liên quan:

phần cứng CPU Vi điều khiển PIC thiết bị máy chủ giao tiếp ngoại vi Định thời biểu CPU hệ điều hành đa chương

Tài liệu có liên quan:

Giáo trình Vi điều khiển PIC: Phần 1

119 trang 131 0 0
Giáo trình Vi điều khiển PIC: Lý thuyết - Thực hành (Phần 2)

168 trang 104 0 0
Tìm hiểu các thông số cơ bản của CPU

11 trang 91 0 0
Giáo trình hoàn chỉnh vi điều khiển PIC 14

8 trang 58 0 0
Bài tập lớn lý thuyết điều khiển tự động

16 trang 48 0 0
Agile Processes in Software Engineering and Extreme Programming- P10

19 trang 43 0 0
GIÁO TRÌNH ĐIỀU KHIỂN SỐ_CHƯƠNG 7

0 trang 41 0 0
GIÁO TRÌNH ĐIỀU KHIỂN SỐ_CHƯƠNG 5

0 trang 41 0 0
GIÁO TRÌNH ĐIỀU KHIỂN SỐ_CHƯƠNG 1_2

0 trang 38 0 0
Giáo trình Vi điều khiển PIC: Lý thuyết - Thực hành (Phần 1)

201 trang 37 1 0