Sports (Soccer) Video Analysis-Indexing-Retrieval

Tuesday, January 12, 2010

New Idea

The idea for a SOCCER EVENT DETECTION framework

1) Mention that the framework is divided into two levels...
* Feature selection and extraction
- Here, one novelty is in using textual cues itself
- Secondly, boleh tmpukan kepada further reduction of the search space!!!! MEAN! This can jadik a whole chapter in its own! People have never touched this topic that much, kecuali that Min lady :) So, I think it would be worth while to concentrate on this :D - A CONTRIB!!!!

* Event Detecion
- For this, 2 chapters can come out...
i. Rarely occurring events (kena define rarely, maybe based on the amounts of training data available)...
ii. Frequently occurring events

Considerations:
1. masa event detection... how to compare? - maybe... ok. with video space reduction and with video space reduction! ---- can do for BOTH sets of events...
* if done like this, kurang sikit pening kepala nak fikir HOW to detect events... nak guna algo mana... markov ke... mat tov ke... instead focus on the reduction phase...

Tuesday, July 29, 2008

psuedo-Chef Illegal @ M01 Restu

To Eat... Makan... Mangez du riz (makan nasik dalam base Perancis kalau tak silap), A kill (arab kot?) and of course... MANGAN (eat in Javanese) and MADANG (eat rice in Javanese pulak)...

Okes lah. Today hari, I would like to cakap2 pasal masak2 :D I and my roomate have successfully and illegally brought in some stuff into our hostel room at M01 Desasiswa Restu. The accessories range from a FABER Rice Cooker to... to... one big 14-kg cylindrical 'thinggy' that can provide some sort of 'fire' power :D Fridge also we have... but the small bar-type one (Brand GLOBAL...)...

Anywayz, the main thing is that we could start cooking in our room. Even though agak merbahaya... and also due to the fact that the smoke-detector lurks just above our heads... tapi kena masak jugak. If not, aiya-gazamborina-aida-rahim... kos akan meningkat!!! At least if we cook, insya-Allaah our food expenditure would be so so kurang :D Which would then lead to great amoutns of savings!!!

Annywayz...

The first night tu, Azrul attempted nasik-kurma-briyani Al-Jantan? (ye ke mat? ada nama ke nasik kau tu :P)... SODAP MARBELES!!! The next night, saya pulak tried my hands at cooking --> Oyster-Chicken-With-Too-Much-Salt... a bit salty, but Alhamdulillaah, palatable :D

Annywayz... pagi ni yang best sket. Cuz we experimented with curry powder and oso some spices such as kayu manis, bunga lawang, black n white pepper dan sebagainya :D Recipe ni more En Azrul kita yang punya... dan basically proses2nya ialah:

1. Tumis bawang putih n merah
2. Kasik masuk potato yang telah didadu
3. Kasik masuk ayam yang telah dipotong2 halus (btw, this chicken has already been left overnight in the fridge with curry powder, salt n pepper marinade... oh, and some arabic olive oil :D)
4. Kasik masuk sket cherry-tomatoes (kitorang taruk pasal nak kasik habis je)... quite sweet this type of tomato... highly reccommended :D
5. Add some curry powder... to color and taste :D (ada ke 'to color'?)... oh! And also salt to taste... :D
6. Stir stir and stir... stir until satisfied with what you're stirring (ok, that does not make sense... but it did to us!)
7. Add rice and water...
8. Leave to cook until the RICE COOKER button flips to 'KEEP WARM'

(Mamat... perbetulkan kalau aku tersilap aturan ramuan ni :D)

:D

Alhamdulillaahi-robbili'aalameen :D

The dish came out MARBELES (ni term Azrul ni...) !!! For our taste la tapi...
Even tho a bit hot and stormy, it was not only palatable... but oso almost the same with what the arabs are selling in front of the mesjid!... and they're cashing in RM6 per pack for their stuff! :( Overpriced... overpriced...

Wannywayes... Let us layan de gambar of the:

Steamed Curry Rice with Curry Chicken Deluxe :D (btw, if anyone has any other recipes yang senang2... kasik la tau kat kami ye :D)

Okes... thank you 4 reading. Assalaam aleykom WBT and Have A Good One...

Proses menumis dan memasak ayam selesai
Adding rice and water
Another look before the lid is closed
The end result...
Close up sket :D

My bekal for today :D

Monday, July 28, 2008

Tovinkere - Detecting Semantic events in Soccer Games - Towards A Complate Solution

Reference: Tovinkere, V. and R.J. Qian. Detecting Semantic Events in Soccer Games: Towards a Complete Solution. in IEEE International Conference of Multimedia and Expo 2001 (OCME 2001).

Objective: This paper is putting forward their knowledge and rule based method to detect soccer events!

Methodology: From what I've read, the authors are going down to the tee on detecting soccer events via encoding the domain knowledge of soccer using XML, and then using player and ball tracking (along with physics related information e.g. ball bounce angle) to be used in a rule based system. (Authors claim that other methods [such as Machine Learning perhaps?] can also be used besides rules...)

1. Firstly, after understanding the LAWS OF SOCCER, SOCCER GAME FLOW and IDENTIFYING ALL POSSIBLE SOCCER EVENTS, authors conceptually model the domain knowledge of soccer using a hierarchical Entity Relationship model.

2. This model is then translated to XML (why don't they model in XML straight away ek?)

3. Then takes the inputs below to detect events:
* Domain Knowledge (the XML)
* Player and Ball tracking information

PHASE 1 of DETECTION
- Compute the derived information from player motion and orientation to identify all sections of tracking data containing player-ball interactions
- Player-ball interactions are determined by getting rid of deflections that involve bouncing off ground or goal post

PHASE 2 of DETECTION
- Determines which rules (from Domain Knowledge) will be used
- These rules evaluate game situation and execute relevant rules
- The appropriate segments are then marked as VALID or INVALID events, depending on what the evaluation goes :)

=================

Main soccer events are detected by firstly detection BASIC ACTIONS.

These BASIC ACTIONS are then used in combination (i.e. how they are represented in the XML schema) to detect the more COMPLEX EVENTS :)

e.g. Deflection (BASIC) is evaluated... according to XML (domain knowledge)... and in the end a Save (COMPLEX) event is detected.

Wednesday, July 16, 2008

A Novel Framework for Semantic Annotation and Personalized Retrieval of Sports Video

Reference: Xu, C., et al., A Novel Framework for Semantic Annotation and Personalized Retrieval of Sports Video. Multimedia, IEEE Transactions on, 2008. 10(3): p. 421-436.

Objectives: To detect important sporting events, (shows experiment in soccer and basketball domain), Create an index, and finally cater for personalized querying by users.

In a nutshell

The authors are approaching semantic indexing and retrieval of digital video (namely sports) in a rather different way. Instead of solely analyzing the low-level features of the aural and visual modalities, they are making extensive use of OUTSIDE/EXTERNAL information. The external source here is text from websites (also referred to as web-casting text... or some call them weblogs?) - examples are such as those found on ESPN's http://www.soccernet.com (under the weblog link if I'm not mistaken).

Basically, their method is divided into three parts...

1. Text Analysis

The FIRST part (TEXT ANALYSIS) involves querying the web server at ESPN or BBC (for example), and the looking for text region of interests (ROI). In short, look for text-areas that describe the game at hand.

This is followed by keyword identification. This is done via matching the keywords found on the web-casting text website with the particular sports keyword(s) that the authors define.

Finally, the authors come up with TEXT EVENTS based on the matched event keywords. Involves keyword matching combined with STEMMING, PHONIC and FUZZY search...

2. Video Analysis

FIRSTLY, after Shot Boundary Detection (using M2-Edit Pro) they classify shots into FAR VIEW, MEDIUM VIEW and CLOSE-UP VIEW. These views are common in sports video, which serves the purpose to vary (where necessary) the viewers attention... This process is done by:

Classify each frame within a shot into one of the aforementioned three views by analyzing COLOR, EDGE and MOTION features,

Simple Weighted Majority Voting of FRAMES is done (within a shot boundary) to finally classify the shot.

SECONDLY, Replay Detection is done... enough said :)

THIRDLY, Video Event Modeling is done. Here, the authors structure the two previous steps' results as mid-level features to model the event.

i.e. EVENT = [Si, Si+1, ... , Sj], whereby Si = beginning of event shot, and Sj the ending.. Each shot (Sk, where i<=k<=j), is represented by a feature vector Fk... Fk = (SCk, Rk, Lk)

* SCk = Shot Class of shot K,

* Rk = Replay Detection Flag (1 or 0), indicates whether Sk is included in a replay,

* Lk = Length of Sk

3. Text/Video Alignment

Basically detects the region where the game clock is present. Then they do their own OCR (which only recognizes round numbers from 0-9, which is neat btw).

The recognized clock time is then matched with the text event time... this time (on the video frame) will be the starting point for further EVENT BOUNDARY DETECTION (EBD)! (i.e. the start and finish of the event). EBD is done via MODELING THE VISUAL TRANSITION PATTERNS of the video events:-

i. Linking (or matching) of game clock time and text event time

ii. Events are modeled via HMM (trained using the mid-level features mentioned previously)

iii.(candidate) SHOT containing the event is selected as reference (Sref) iv. Search range starts FROM FIRST FAR VIEW SHOT BEFORE Sref and ends at FIRST FAR VIEW SHOT after Sref. (Authors say that the temporal transition event patterns will occur within this range)

i.e. Search Range (FarView-CloseUpView-FARVIEW(START HERE)-CloseUpView-SREF-MedView-CloseUpView-FARVIEW(END HERE)-MidView.......)

iv. Trained HMM's are then used to calculate probability scores of all possible partitions within the search range (aligned by shot boundaries). Partition that has highest probability score is selected as detected event candidate...
- Start and End boundaries are then the first and last shots within the partitions.

Friday, July 4, 2008

Explicit Semantic Events Detection and Development of Realistic Applications for Broadcasting Baseball Videos

Reference: Chu, W.-T. and J.-L. Wu, Explicit semantic events detection and development of realistic applications for broadcasting baseball videos. Multimedia Tools and Applications, 2008. 38(1): p. 27-50

=================
IN ALL
=================

Features:
i) Shots occurences --> Achieved from the differents video shots generated from their shot bounday detection algorithm... this algo btw, is color based... (using color adjacency histogram for distinction between in-field and out-field view, and hori. and verti. proj. profile for pitch shot view)

ii) Take into consideration Shot transition, tempral duration and motion... for particular events, there's a combination of such features... and hence these features are used for E.D.

Technique for E.D.:
i) K-NEAREST NEIGHBOR (KNN) - Neighbor = 8!!!

Results:
Good... at least: 0.85 PRECISION and 0.90 RECALL!!!

=================

Objectives:

Detect events in baseball videos --> Only interested in this one...
Come up with practical user applcations

Framework:

Starts with Shot Classification (there are a few classes of shots), then uses shot information as one of the inputs for event detection, finally creates applications.

My focus in this paper(Event Detection):

How they do it? --> Rule based + Model based (when confusion occurs)

Rule Based (Domain Knowledge of Baseball) -->
1. (Caption) TEXT information extraction

a) Characters pixels are determined first --> HIGH INTENSITY as compared to BGROUND.
b1) Character template construction (1) --> Identified character region is represented by 13-dimendion ZERNIKE moments
b2) Character template construction (2) --> For each digit (e.g. 4), a 30-sec vid. clip is used as training
b3) Character template constructoin (3) --> Character template for the digit 4 is constructed by averaging all the ZERNIKE moments of all the frames!!!
c) Character Recognition --> Test Vectors (unseen data that is) are compared with ALLvtemplates' vectors... look at VECTOR ANGLE!!! (so Zernike can come up with angles?). SMALLEST INCLUDED ANGLE with a particular digit's template is considered character match!

2. (Caption) SYMBOL information extraction

a) Uses the same Intensity Comparison with Bground to determine symbols
b) Based on pre0indicated symbol regions, BASE-OCCUPATION SITUATION is displayed according to whether the corresponding base is highlighted or not

b2) in the above case (image), this means the FIRST BASE is occupied.... this is what I understand though :)
c) Then, in the duration between two PITCH SHOTS, looks at changes in number of outs, number of score and base-occupation situation to further come up with 'evidences' for event detection
d) A few other domain rules are followed based on the three criteria in italics (as in above)...

3) From all of the above are concatenated into one feature vector fi,i+1...

Then, from another set of rules... determines whether the feature vector is LEGAL or ILLEGAL
Only LEGAL feature vectors are considered :)

4) Event Identification --> Determined at the leaves of a DECISION TREE!!!

Event identification is treated as a classification task into subsets of predefined event sets
Tree traversal is based on predefined rules out OUTS, RUNNER BASE OCCUPATION and SCORE and BASE-OCCUPATION SITUATION

Model Based

1) Some rules are common for some events... hence needs to be determined further by examining contextual (visual + temporal) information

2) Look at the combinational occurences of shot types (e.g. pitch shots, field shots, close-up shots), differences in time between shots (i.e. pitch and pivot), field view duration (in frames) and also motion of pivot shot...

3) All these shot context features are normalized between [0,1]... and 20 training sequences are manually selected!

4) In the end, train and test using K-NEAREST NEIGHBOUR algorithm... (authors say becuz it's simple to use this algo... tu je?) :P --> K is set to 8

========================

My Thoughts About This Technique

Quite related to my idea of event and important segment detection :) The basic framework is almost similar, ... but I beleive I can bring in novelty due to:

Different domain (SOCCAH!!! ... ok. Soccer :D)
Different technique to process context (Because the whole event and segment detection framework is different, might be able to make use of another context based classification technique... other than KNN, insya-Allaah)
Different way/approach of processing caption text...
Oh! And that DECISION TREE part is neat... might be able to use it in the analysis of my features... :) I will use rules also I reckon...

Insya-Allaah, let's just see...

BUT NOT ONLY SEE!!! MUST DO THE WORK!!!! Ameen :D