Friday, July 4, 2008

Explicit Semantic Events Detection and Development of Realistic Applications for Broadcasting Baseball Videos

Reference: Chu, W.-T. and J.-L. Wu, Explicit semantic events detection and development of realistic applications for broadcasting baseball videos. Multimedia Tools and Applications, 2008. 38(1): p. 27-50

=================
IN ALL
=================

Features:
i) Shots occurences --> Achieved from the differents video shots generated from their shot bounday detection algorithm... this algo btw, is color based... (using color adjacency histogram for distinction between in-field and out-field view, and hori. and verti. proj. profile for pitch shot view)

ii) Take into consideration Shot transition, tempral duration and motion... for particular events, there's a combination of such features... and hence these features are used for E.D.

Technique for E.D.:
i) K-NEAREST NEIGHBOR (KNN) - Neighbor = 8!!!

Results:
Good... at least: 0.85 PRECISION and 0.90 RECALL!!!

=================


Objectives:



  • Detect events in baseball videos --> Only interested in this one...
  • Come up with practical user applcations

Framework:

Starts with Shot Classification (there are a few classes of shots), then uses shot information as one of the inputs for event detection, finally creates applications.

My focus in this paper(Event Detection):

How they do it? --> Rule based + Model based (when confusion occurs)

Rule Based (Domain Knowledge of Baseball) -->
1. (Caption) TEXT information extraction


  • a) Characters pixels are determined first --> HIGH INTENSITY as compared to BGROUND.
  • b1) Character template construction (1) --> Identified character region is represented by 13-dimendion ZERNIKE moments
  • b2) Character template construction (2) --> For each digit (e.g. 4), a 30-sec vid. clip is used as training
  • b3) Character template constructoin (3) --> Character template for the digit 4 is constructed by averaging all the ZERNIKE moments of all the frames!!!
  • c) Character Recognition --> Test Vectors (unseen data that is) are compared with ALLvtemplates' vectors... look at VECTOR ANGLE!!! (so Zernike can come up with angles?). SMALLEST INCLUDED ANGLE with a particular digit's template is considered character match!
2. (Caption) SYMBOL information extraction


  • a) Uses the same Intensity Comparison with Bground to determine symbols
  • b) Based on pre0indicated symbol regions, BASE-OCCUPATION SITUATION is displayed according to whether the corresponding base is highlighted or not

  • b2) in the above case (image), this means the FIRST BASE is occupied.... this is what I understand though :)
  • c) Then, in the duration between two PITCH SHOTS, looks at changes in number of outs, number of score and base-occupation situation to further come up with 'evidences' for event detection
  • d) A few other domain rules are followed based on the three criteria in italics (as in above)...

3) From all of the above are concatenated into one feature vector fi,i+1...


  • Then, from another set of rules... determines whether the feature vector is LEGAL or ILLEGAL
  • Only LEGAL feature vectors are considered :)

4) Event Identification --> Determined at the leaves of a DECISION TREE!!!

  • Event identification is treated as a classification task into subsets of predefined event sets
  • Tree traversal is based on predefined rules out OUTS, RUNNER BASE OCCUPATION and SCORE and BASE-OCCUPATION SITUATION

Model Based

1) Some rules are common for some events... hence needs to be determined further by examining contextual (visual + temporal) information

2) Look at the combinational occurences of shot types (e.g. pitch shots, field shots, close-up shots), differences in time between shots (i.e. pitch and pivot), field view duration (in frames) and also motion of pivot shot...

3) All these shot context features are normalized between [0,1]... and 20 training sequences are manually selected!

4) In the end, train and test using K-NEAREST NEIGHBOUR algorithm... (authors say becuz it's simple to use this algo... tu je?) :P --> K is set to 8

========================

My Thoughts About This Technique

  • Quite related to my idea of event and important segment detection :) The basic framework is almost similar, ... but I beleive I can bring in novelty due to:
  1. Different domain (SOCCAH!!! ... ok. Soccer :D)
  2. Different technique to process context (Because the whole event and segment detection framework is different, might be able to make use of another context based classification technique... other than KNN, insya-Allaah)
  3. Different way/approach of processing caption text...
  4. Oh! And that DECISION TREE part is neat... might be able to use it in the analysis of my features... :) I will use rules also I reckon...

Insya-Allaah, let's just see...

BUT NOT ONLY SEE!!! MUST DO THE WORK!!!! Ameen :D

No comments: