Main three categories of Web mining field are:
1. Web usage mining (WUM)
2. Web structure mining (WSM), and
3. Web content mining (WCM).
1.2.3 Web Structure Mining
Web structure mining tries to find out valuable knowledge from the structure of hyperlink to take advantage of knowledge about web page relations. We can divide web structure mining into two kinds according to type of web structure data:
1. Extracting patterns from hyperlinks in the web: A structural component that connects the web page to a different location is here considered as a hyperlink.
2. Mining the document structure: analysis of the document whole structure whether structure is a tree-like structure of page structures or any other to describe HTML or XML tag usage.
…show more content…
So in this case Web Usage Mining determines interesting usage patterns from Web data so as to understand and better serve the needs of Web-based applications. By the definition of Web usage mining we conclude that it is the procedure of removing useful information from server logs. Hence, it discovers sequential patterns of web files.
1.2.5 Web Content Mining
Web content mining process is discovery of useful data, information and knowledge from Web page content or data or documents. Web data contents include text, image, audio, video, metadata and hyperlinks. In short, Web content mining is the process of extracting knowledge from web contents. Web content mining deals directly with information. The goal is to mine content from web documents in order to build knowledge from it. This knowledge can be either latent or somehow simply difficult to be analyzed in a straightforward way. Web content mining aims to mine useful information or knowledge from Web page content.
From all the data, three variables are analyzed:
1.
…show more content…
Webpage content mining and
2. Search result mining
Webpage content mining: Web is search via content.
Search Result in content mining searches from the previous search result.
When user searches any specific key word or any web page, number of links or result is displayed. But all the data which is displayed on the web is not relevant. So, retrieving required data on the Web is becoming a challenge. The user issues the query terms (keywords) to a search engine and the search engine returns a set of pages that may be related to the query topics or terms.
Raymond Kosala and Hendrik Blockeel[28] in his paper described Web content mining from two different points of view: Information Retrieval View and Database View.
As for informational retrieval view he summarized the research works done for unstructured documents (free text) and semi-structured documents (HTML).
As for the database view, DB view uses Object exchange model (OEM) i.e. different representation from IR view. OEM represents semi-structure data by a labeled graph. So, data here viewed as a graph and objects as vertices with labels on the