Recently, there has been a trend of employing various data-mining approaches [61, 72, 91] in exploring knowledge from the video database. Consequently, many video mining approaches have been proposed which can be roughly classified into five categories. They are: Video pattern mining, Video clustering and classification, Video association mining, Video content structure mining and Video motion mining.
3.1 Video structure mining
Since, video data is a kind of unstructured stream an efficient access to video is not an easy task. Therefore the main objective of the video structure mining is the identification of the content structure and patterns to carry out the fast random access of the video database.
As video structure represents the syntactic level composition of the video content, its basic structure  is represented as a hierarchical structure constituted by the video program, scene, shot and key-frame as shown in Fig. 1. Video structure mining is defined as the process of discovering the fundamental logic structure from the preprocessed video program adopting data-mining method such as classification, clustering and association rule.
It is essential to analyze video content semantically and fuse multi-modality information to bridge the gap between human semantic concepts and computer low-level features from both the video sequences and audio streams. Video structure mining gets not only the video content constructing patterns but also the semantic information among the constructive patterns . Video structure mining is executed in the following steps : (1) video shot detection, (2) scene detection, (3) scene clustering, and (4) event mining.
Fu et al.  defined two kinds of structural knowledge, namely, video semantic and syntactic structure knowledge leading to the concepts of video semantic and syntactic structure mining. Syntactic structure mining is based on the video basic structure which adopts the methods of data mining according to the similar video units and video unit features. It acquires some syntactic rules in general, including dialogue, interview, news, talk show and so on. These video syntactic rules are structural knowledge triggering the process that mines constructional patterns in the video structure units, and explores relations between video units and features. Semantic structure mining is a process that discovers semantics and events in video basic structure units. The basic structure units explore the relations between video unit features and features such as color and texture pattern in the explosion scene, light and texture pattern in indoor or outdoor scene, audio pattern in highlight scene and so on. These relations are represented by association rules between video unit feature(s) and feature(s).
The current researches on it focus on mining object semantic information and event detection. The video event represents the occurrences of certain semantic concepts. Chen et al. [12, 14, 15] presented a video event detection framework that is shot-based, following the three-level architecture and proceeding the low-level descriptor extraction, mid-level descriptor extraction, and high-level analysis. Heuristic rules can be used to partially bridge the semantic gap between the low-level features and the high level subjective concepts. The decision tree logic data classification model algorithm is then performed upon the combination of multimodal mid-level descriptors and the low-level feature descriptors for event detection. Zhao et al.  proposed the Hierarchical Markov Model Mediator mechanism to efficiently store, organize, and manage the low-level features, multimedia objects, and semantic events along with the high-level user perceptions such as user preferences in the multimedia database management system.
3.2 Video clustering and classification
Video clustering and classification are used to cluster and classify video units into different categories. Therefore clustering is a significant unsupervised learning technique for the discovery of certain knowledge from a dataset. Clustering video sequences in order to infer and extract activities from a single video stream is an extremely important problem and so it has a significant potential in video indexing, surveillance,activity discovery and event recognition [97, 103]. In the video surveillance systems, it is to find the patterns and groups of moving objects that the clustering analysis is used. Clustering similar shots into one unit eliminates redundancy and as a result, produces a more concise video content summary [116, 117]. Clustering algorithms are categorized into partitioning methods, hierarchical methods, density-based methods, grid based methods and model-based methods.
Video classification aims at grouping videos together with similar contents and to disjoin videos with non-similar contents and thus categorizing or assigning class labels to a pattern set under the supervision. It is the primary step for retrieval and the classification approaches are those techniques that split the video into predefined categories. Semantic video classification approaches  can be classified into two categories. First, rule-based approach that uses domain knowledge to define the perceptional rules and achieve semantic video classification and easy to insert, delete and modify the existing rules when the nature of the video classes changes. It is attractive only for the video domains such as news and films that have well-defined story structures for the semantic units (i.e., film and news making rules). Second, the statistical approach that uses statistical machine learning to bridge the semantic gap. This supports more effective semantic video classification by discovering non-obvious correlations (i.e., hidden rules) among different video patterns.
The key features are used to categorize video into predefined genres. Video classifications are based on the spatial and temporal characteristics and necessary for efficient access, understanding and retrieval of the videos. Pan et al.  proposed a video graph tool for video mining and visualizing the structure of the plot of a video sequence. The video graph of the video clip is the directed graph where every node corresponds to a shot group and edges indicate temporal succession. This algorithm is used to “stitch” together similar scenes even if they are not consecutive and automatically derive video graphs. It derives the number of recurrent shot groups for video mining and classification, distinguishing between different video types, e.g., news stories versus commercials.
Pan et al.  presented a video cube tool to classify a video clip into one of ‘n’ given classes (e.g., “news”, “commercials”, etc) which automatically derive a “vocabulary” from each class of the video clips, using the “Independent Component Analysis” incorporating the spatial and temporal information which works on both video and audio information. It creates a vocabulary that describes images, motions and the audio parts of the video and thus provides a way to automatically extract features. The video and audio features reveal the essential characteristics of a genre class and are closely related to the neural signals used in the human perceptual press. VCube algorithm uses the video bases of genre classes to classify a video clip and the audio bases to classify the clips based on their audio information.
Building an activity recognition and classification system is a challenging task because of the variations in the environment, objects and actions. Variations in the environment can be caused by cluttered or moving background, camera motion, occlusion, weather and illumination changes while Variations in the objects are because of the differences in appearance, size or posture of the objects or because of self motion which is not a part of the activity and variations in the action can make it difficult to recognize semantically equivalent actions as such, for example imagine the many ways to jump over an obstacle or different ways to throw a stick.
Nowozin et al.  proposed a classifier for the sequence representations for the action classification in the videos that retains the temporal order in a video. They first proposed the LPBoost classifier for sequential representations, and then, the discriminative PrefixSpan subsequence mining algorithm to find the optimal discriminative subsequent patterns. Brezeale et al.  came out with a survey on video classification. They found that features are drawn from three modalities divided into four groups of automatic classification of the video such as text-based approaches, audio based approaches, visual-based approaches, and the combination of the text, audio, and visual features. Tien et al.  extracted the high-level audiovisual features to describe the video segments which are further transformed to symbolic streams and an efficient mining technique was applied to derive all frequent patterns that characterize tennis events. After mining, they categorized the frequent patterns into several kinds of events and thus achieved event detection for tennis videos by checking the correspondence between mined patterns and events.
3.3 Video association mining
Video association mining is the process of discovering associations in a given video. The video knowledge is explored in a two stages, the first being the video content processing in which the video clip is segmented into certain analysis units extracting their representative features and the second being the video association mining that extracts the knowledge from the feature descriptors. Mongy et al.  presented a framework for video usage mining to generate user profiles on a video search engine in the context of movie production that analyzes the user behaviors on a set of video data to create suitable tools to help people in browsing and searching a large amount of video data.
In video association mining, the video processing and the existing data-mining algorithms are seamlessly integrated into mine video knowledge. Zhu et al.  proposed a multilevel sequential association mining to explore the associations between the audio and visual cues and classified the associations by assigning each of them with a class label using their appearances in the video to construct video indices. They integrated the traditional association measures (support and confidence) and the video temporal information to evaluate video associations.
Sivaselvan et al.  presented a video association mining consisting of two key phases. First, the transformation phase converts the original input video into an alternate transactional format, namely, a cluster sequence. Second the frequent temporal pattern mining phase that is concerned with the generation of the patterns subject to the temporal distance and support thresholds.
Lin et al.  developed a video semantic concept discovery framework that utilizes multimodal content analysis and association rule mining technique to discover the semantic concepts from video data. The framework used the apriori algorithm and association rule mining to find the frequent item-sets in the feature data set and generated the classification rules to classify the video shots into different concepts (semantics). Chen and Shyu  proposed a hierarchical temporal association mining approach that integrates the association rule mining and the sequential pattern discovery to systematically determine the temporal patterns for target events. Goyani et al.  proposed an A-priori algorithm to detect the semantic concepts from the cricket video. Initially, a top-down event detection and classification was performed using the hierarchical tree. Then the higher level concept was identified by applying A-Priori algorithm. Maheshkumar  proposed a method that automatically extracts silent events from the video and classifies each event sequence into a concept by sequential association mining. A hierarchical framework was used for soccer (football) video event sequence detection and classification. The association for the events of each excitement clip was computed using an a priori mining algorithm using the sequential association distance to classify the association of the excitement clip into semantic concepts.
Kea et al.  developed a method based on the frequent pattern tree (FPTree) for mining association rules in video retrieval. The proposed temporal frequent pattern tree growth algorithm mine temporal frequent patterns from TFPTree for finding the rules of the motion events.
3.4 Video motion mining
Motion is a key feature that essentially characterizes the contents of the video, representing the temporal information of videos and more objective and consistent compared to other features such as color, texture and so on. There have been some approaches to extract camera motion and motion activity in video sequences. While dealing with the problem of object tracking, algorithms are always proposed on the basis of known object region in the frames and so the most challenging problem in the visual information retrieval is the recognition and detection of the objects in the moving videos. The camera motion having a vital role to play some of the key issues in video motion detections are, the camera placed in static location while the objects are moving (surveillance video, sports video); the camera is moving with moving objects (movie); multiple cameras are recording the same objects. The camera motion itself contains a copious knowledge related to the action of the whole match. The important types of camera motion are Pan (left and right), Zoom (in and out), Tilt (up and down), and Unknown (camera motions those are not Pan, Zoom, or Tilt are grouped to Unknown).
Wu et al.  proposed the extraction scheme of the global motion and object trajectory in a video shot for content-based video retrieval. For instance, while browsing the video obtained by surveillance system or watching sports programs, the user always has the need to find out the object moving in some special direction. Zang and Klette  proposed an approach for extraction of a (new) moving object from the background and tracking of a moving object.
Mining patterns from the movements of moving objects is called motion mining. First, the features are extracted (physical, visual and aural, motion features) using objects detection and tracking algorithms and then the significations of the features, trends of moving object activities and patterns of events are mined by computing association relations and spatial-temporal relations among the features.
3.5 Video pattern mining
Video pattern mining detects the special patterns modeled in advance and usually characterized as video events such as dialogue, or presentation events in medical video. The existing work can be divided into two categories such as mining similar motion patterns and mining similar objects .
Sivic et al.  described a method for obtaining the principal objects, characters and scenes in a video by measuring the reoccurrence of the spatial configurations of the viewpoint invariant features has three stages: The first stage extracts the neighborhoods occurring in more than a minimum number of key frames considered for clustering, where as the second stage matches the significant neighborhoods using a greedy progressive clustering algorithm, and in the third stage, the resulting clusters are merged based both on spatial and temporal overlap. Burl et al.  presented an algorithm to extract information from raw, surveillance-style video of an outdoor scene containing a mix of people, bicycles, and motorized vehicles. A feature extraction algorithm based on the background estimation and subtraction followed by spatial clustering and multi-object tracking was used to process sequences of video frames into a track set, which encodes the positions, velocities, and the appearances of the various objects as the function of time are mined to answer the user-generated queries. Lai et al.  proposed a motion model that enables to measure the similarities among different animal movements in high precision. A clustering method can separate the recurring movements from the infrequent random movements.
Fleischman et al.  presented an approach in which the temporal information is captured by representing events using a lexicon of hierarchical patterns of human movement that are mined from a large corpora of un-annotated video data. These patterns are used as features for a discriminative model of event classification that exploits tree kernels in a Support Vector Machine. The second category systems aim at grouping frequently appearing objects in videos. Therefore, it is useful to have commonly occurring objects/characters/scenes for various applications . There is a number of applications: First, they provide entry points for visual search in video databases. Second, they can be used in forming video summaries—the basic elements of a summary often involve the commonly occurring objects and these are then displayed as a storyboard. The third application area is in detecting product placements in a film where the frequently occurring logos or labels are prominent. Mining repeated short clips from video collections and streams are essential for video syntactic segmentation, television broadcast monitoring, commercial skipping, content summary and personalization, as well as video redundancy detection and many other applications.
Xie and Chang  investigated the pattern mining strategies in video streams. They applied different pattern mining models (deterministic and statistic; static and temporal) and devised pattern combination strategies for generating a rich set of pattern hypothesis. Some of the statistical clustering method such as K-means, HMM, HHMM and Deterministic algorithms were considered for video clustering. Yang et al.  proposed a method to repeat the clip mining and the knowledge discovery from the video data. The mining framework unifies to detect both the unknown video repeats and the known video clips of the arbitrary length by the same feature extraction and matching process. The structure analysis method is effective in discovering and modeling the syntactic structure of the news videos and their main objective is to detect the unknown video repeats from the video stream without prior knowledge. Su et al.  presented a method to achieve an effective content-based video retrieval by mining the temporal patterns in the video contents. It was the construction of a special index on video temporal patterns for an efficient retrieval (Fast-Pattern- Index tree) and a unique search strategy for effective retrieval (Pattern-based Search).
Протянула руку и нажала на кнопку. Экран погас. ГЛАВА 39 Росио Ева Гранада стояла перед зеркалом в ванной номера 301, скинув с себя одежду. Наступил момент, которого она с ужасом ждала весь этот день.