Danh mục tài liệu

Handbook of Multimedia for Digital Entertainment and Arts- P26

Số trang: 14      Loại file: pdf      Dung lượng: 339.54 KB      Lượt xem: 15      Lượt tải: 0    
Xem trước 2 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

Handbook of Multimedia for Digital Entertainment and Arts- P26: The advances in computer entertainment, multi-player and online games,technology-enabled art, culture and performance have created a new form of entertainmentand art, which attracts and absorbs their participants. The fantastic successof this new field has influenced the development of the new digital entertainmentindustry and related products and services, which has impacted every aspect of ourlives.
Nội dung trích xuất từ tài liệu:
Handbook of Multimedia for Digital Entertainment and Arts- P26756 M. Fink et al.Within-Query ConsistencyOnce the query frames are individually matched to the audio database, using theefficient hashing procedure, the potential matches are validated. Simply countingthe number of frame matches is inadequate, since a database snippet might havemany frames matched to the query snippet but with completely wrong temporalstructure. To insure temporal consistency, each hit is viewed as support for a match at aspecific query-to-database offset. For example, if the eighth descriptor .q8 / in the5-s, 415-frame-long ‘Seinfeld’ query snippet, q, hits the 1,008th database descriptor.x1;008 /, this supports a candidate match between the 5-s query and frames 1,001through 1,415 in the database. Other matches mapping qn to x1;000Cn .1 Ä n Ä415/ would support this same candidate match. In addition to temporal consistency, we need to account for frames when conver-sations temporarily drown out the ambient audio. We use the model of interferencefrom [7]: that is, as an exclusive switch between ambient audio and interferingsounds. For each query frame i, there is a hidden variable, yi : if yi D 0, the i thframe of the query is modeled as interference only; if yi D 1, the i th frame ismodeled as from clean ambient audio. Taking this extreme view (pure ambient orpure interference) is justified by the extremely low precision with which each au-dio frame is represented (32 bits) and is softened by providing additional bit-flipprobabilities for each of the 32 positions of the frame vector under each of thetwo hypotheses (yi D 0 and yi D 1). Finally, the frame transitions between ambient-only and interference-only states are treated as a hidden first-order Markov process,with transition probabilities derived from training data. We re-used the 66-parameterprobability model given by Ke et al. [7]. In summary, the final model of the match probability between a query vector, q,and an ambient-database vector with an offset of N frames, xN , is: Á 415 Y P qj xN D P .h qn ; xN Cn ijyn / P .yn jyn 1/ ; nD1where < qn ; xm > denotes the bit differences between the two 32-bit frame vectorsqn and xm . This model incorporates both the temporal consistency constraint andthe ambient/interference hidden Markov model.Post-Match Consistency FilteringPeople often talk with others while watching television, resulting in sporadic yetstrong acoustic interference, especially when using laptop-based microphones forsampling the ambient audio. Given that most conversational utterances are 2–3 s induration [2], a simple exchange might render a 5-s query unrecognizable.33 Mass Personalization: Social and Interactive Applications 757 To handle these intermittent low-confidence mismatches, we use post-match fil-tering. We use a continuous-time hidden Markov model of channel switching withan expected dwell time (i.e. time between channel changes) of L seconds. Thesocial-application server indicates the highest-confidence match within the recentpast (along with its “discounted” confidence) as part of the state information as-sociated with each client session. Using this information, the server selects eitherthe content-index match from the recent past or the current index match, based onwhichever has the higher confidence. We use Mh and Ch to refer to the best match for the previous time step (5 s ago)and its respective log-likelihood confidence score. If we simply apply the Markovmodel to this previous best match, without taking another observation, then ourexpectation is that the best match for the current time is that same program sequence,just 5 s further along, and our confidence in this expectation is Ch l=L where l D 5 sis the query time step. This discount of l=L in the log likelihood corresponds tothe Markov model probability, e l=L , of not switching channels during the l-lengthtime step. An alternative hypothesis is generated by the audio match for the current query.We use M0 to refer to the best match for the current audio snippet: that is, thematch that is generated by the audio fingerprinting software. C0 is the log-likelihoodconfidence score given by the audio fingerprinting process. If these two matches (the updated historical expectation and the current snip-pet observation) give different matches, we select the hypothesis with the higherconfidence score: ( fMh ; Ch 1=Lg if Ch l=L > C0 fM0 ; C0 g D fM0 ; C0 g otherwisewhere M0 is the match that is used by the social-application server for selectingrelated content and M0 and C0 are carried forward on to the next time step as Mhand Ch .Evaluation of System PerformanceIn this section, we provide a quantitative evaluation of the ambient-audio identifica-tion system. The first set of experiments provides in-depth results with our matchingsystem. The second set of results provides an overview of the performance of an in-tegrated system running in a live environment.Empirical EvaluationHere, we examine the performance of our audio-matching system in detail. We rana series of experiments using 4 days of video footage. The footage was captured758 M. Fink et al.from 3 days of one broadcast station and 1 day from a different station. We jack-knifed this data to provide disjoint query/database sets: whenever we used a queryto probe the database, we removed the minute that contained that query audio fromconsideration. In this way, we were able to test 4 days of que ...