Danh mục tài liệu

Database Systems: The Complete Book- P8

Số trang: 50      Loại file: pdf      Dung lượng: 4.32 MB      Lượt xem: 19      Lượt tải: 0    
Xem trước 5 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

Database Systems: The Complete Book- P8: Database Systems and Database Design and Application courses offered at the junior, senior and graduate levels in Computer Science departments. Written by well-known computer scientists, this introduction to database systems offers a comprehensive approach, focusing on database design, database use, and implementation of database applications and database management systems
Nội dung trích xuất từ tài liệu:
Database Systems: The Complete Book- P8 680 CHAPTER 14. MULTIDI-kiEiVSIONAL AND BITh,fAP INDEXES 14.2. H,ISH-LIKE STRLCTURES FOR A4ULTIDI~lEhrSIONA4L DATA 681 Partial-Match Queries Examples of this query ~vould include find all customers aged 50, or find all customers with a salary of S200K. Sow, ive need to look at all the buckets in a row or column of the bucket matrix. The number of disk 110s can be quite high if there are many buckets in a row or column, but only a small fraction of all the buckets will be accessed. R a n g e Queries A range query defines a rectangular region of the grid, and all points found in the buckets that cover that region will be answers to the query, with the exception of some of the points in buckets on the border of the search region. For example, if we want to find all customers aged 35-45 with a salary of 50-100, then we need to look in the four buckets in the lower left of Fig. 14.6. In this case, all buckets are on the border, so we may look a t a good number of points Figure 14.8: Insertion of the point (52,200) followed by splitting of buckets that are not answers to the query. However, if the search region involves a large number of buckets, then most of them must be interior, and all their points are answers. For range queries, the number of disk I / 0 7 smay be large, as we may in Fig. 14.6 lay along the diagonal. Then no matter where we placed the grid be required to examine many buckets. Ho~vever,since range queries tend to lines, the buckets off the diagonal would have to be empty. produce large answer sets, we typically will examine not too many more blocks . However, if the data is well distributed, and the data file itself is not too than the minimum number of blocks on which the answer could be placed by large, then we can choose grid lines so that: any organization ~vhatsoever. 1. There are sufficientlyfew buckets that we can keep the bucket matris in Nearest-Neighbor Queries main memory, thus not incurring disk I/O to consult it, or to add ro~i-s Given a point P, xve start by searching the bucket in which that point belongs. or columns to the matrix when we introduce a new grid line. If we find at least one point there. we have a candidate Q for the nearest neighbor. However. it is possible that there are points in adjacent buckets that 2. We can also keep in memory indexes on the values of the grid lines in are closer to P than Q is: the situation is like that suggested in Fig. 14.3. We each dimension (as per the box Accessing Buckets of a Grid File), or have to consider n-hether the distance between P and a border of its bucket is we can avoid the indexes altogether and use main-memory binary seasch less than the distance from P to Q. If there arc such horders, then the adjacent of the values defining the grid lines in each dimension. buckets on the other side of each such border must be searched also. In fact, ...