Benefits Of Data Mining Techniques

1. Abstract
In the internet, the whole information in the webpage is entirely extracted based on the keyword search which extract the similarity content and retrieves. In the web pages, the following information like text, animation, image and flash like contents are not important for the users and all domains are not possible in the information extraction from the web. So this degrades the performance of the web information extraction .To overcome this limitation, web information is purely extracted from the dynamic based website links based on the semantic text based extraction in the website resources. So, the current information can be extracted based on the website main link. In this link based information extraction, user expected exact information will be extracted in the web page source by the way of MDL and Min-Hash technique and the web documents can be retrieved compared to normal search engines.

2. Introduction
The whole information on the webpage is entirely extracted based on the keyword search. This type of keyword search, extract the similarity content and retrieves. These contents are not most apt content for the interacting users. On the web pages, the following information like text, animation , image and flash like contents are not important for the users. So this degrades the performance of the web information extraction and all domains are not possible in the information extraction from the web. To overcome this problem, web information is extracted simply by the way of website links.

In this link based information extraction, user expected exact information will be extracted from the web page source by the way of Minimum Description Length technique. The web information extraction is based on the website links. URL website is clustered and then removes the unrelated extension in the URL. The user gets the accurate information from the extracted URL through online with high performance and minimum web resources. By this way of search, information is retrieved quickly as the user expects information from the particular dynamic based website and its webpage with high performance. The information like images, phone, email and then other text based information extraction is possible with the least amount of time. We can extract all contents from the web pages except the video.


3. Problem Statement
The whole web source information is entirely retrieved and then search the content like online search in the web and content is extracted in the form of keyword based search using the search engines like Google, yahoo, Bing, etc.... On the websites different domains are available on the same websites and they produced different information. So, all the domains are not possible for the information extraction from the web.
In this link based information extraction, user expected exact information will be extracted from the web page source by the way of MDL technique. For representing and interfacing with objects in HTML and XML documents, Document Object Model (DOM) is used. To quickly estimate how similar two sets are and to detect the duplicate web pages, eliminate them from search results, cluster the web documents by the similarity of their sets of words, Min-Hash technique is used. The information is extracted for all domains based on the rules the user expects from the particular website
The web information extraction is based on the website links. URL website is clustered and then removes the unrelated extension in the website URL. The user gets the exact information from the extracted URL through online with high performance and minimum web resources. By this way of search, information is retrieved quickly as the user expected information from the particular dynamic website and its webpage with high performance. The information like images, phone, email and then other text based information extraction is possible with the least amount of time. We can extract all contents from the web pages except the video.

4. Objectives of the Study
The Information is extracted in the form of whole webpage, including advertisement and other unwanted information which is extracted in the previous system. The proposed system consists of URL extraction and Clustering. The proposed system proposes a new automatic information extraction approach for all domains. Two efficient data mining Techniques namely Min-Hash and Minimum Description Length is used to filter the unwanted information and extract the text based information where information will be user expected information.

'
5. Literature Survey
In existing system main disadvantage is that the information is extracted in the form of whole webpage, including advertisement and other unwanted information. All domains are not possible for the information extraction from the web. The proposed system consists of URL extraction and Clustering. The proposed system proposes a new automatic information extraction approach for all domains. Two efficient data mining Techniques namely Min-Hash and Minimum Description Length is used to filter the unwanted information and extract the text based information where information will be user expected information.

6. Software Requirement Specification
6.1 Functional requirements
It specifies which output file should be produced from the given file they describe the relationship between the input and output of the system, for each functional requirement a detailed description of all data input and their source and the range of valid inputs must be specified.
6.2 nonfunctional requirements
Describe user visible aspects of the system that are not directly related to the functional behavior of the system. On functional requirements include quantitative constraints, such as response time (i.e. How fast the system reacts to user commands) or accuracy (i.e. How precise are the system numerical answers).
6.3 Software & Hardware Requirements
6.3.1 Hardware Requirements (min)
Processor : Intel Core III 1.80GHz
Hard disk : 40 GB
Monitor : 14''Color HD Display
Ram : 4 GB
6.3.2 Software Specification
Operating System : Windows XP.
Front-end : VS. NET
Coding Language : C#
Back End : SQLSERVER 2005

6.3.3 Limitations of the Software:
The different website of different web page information extraction is not possible. In this project video extraction is not possible.

7. Conceptual modeling
Unified Modeling Language is a specification language used in the software engineering field. It is a general purpose language that uses a graphical designation to create an abstract model. This abstract model can be used in a system. This is known as a UML model. The Object Management group is responsible for defining UML via the UML Meta model.
7.1Use Case Diagram
The use case diagram represents the various types of use-cases involved in the interaction between the program and the nodes.

Fig 7.1 Use Case Diagram

7.2 Class Diagram
The class diagram represents the member variables and member functions of the interacting classes. Each actor is considered a class.

Fig 7.2 Class Diagram

7.3Activity Diagram
The activity diagram represents the workflow of stepwise activities and events with support for selection, iteration and concurrency in graphical representation. It explains the overall flow of control.

Fig.7.3 Activity Diagram

8. Interaction scenario
8.1 Sequence Diagram
The sequence diagram represents the objects and classes involved in the scenario and the series of messages swapped between the objects required to bring out the functionality of the scenario.

Fig 8.1 Sequence diagram

9. Methodology and Approach
9.1 Proposed Technique

In the proposed system, user expected information is possible in the automatic extraction based on MDL technique. In this system, web informatics extraction is based on the website links. This shows the entire website information extraction is included. In this system, the website URL is clustered and then removes the unrelated extension in the website URL. Exact links will be predicted in the website links and get the exact information from the extracted URL through online with high performance and minimum web resources. The information like images, phone, email and then other text based information extraction are possible with the least amount of time. Technique used is Min-Hash, Minimum Description Length.

Advantages:
' Extract the scientific data from different websites.
' Identifying the product in Super market places and Shopping mall from online.
' Different user web blog for grouping their important information.
9.2Modules
1. Generating Linked URL.
2. Clustering URL.
3. Rule Match.
4. Storing in DB.

9.2.1 Generating Linked URL
The limitation of this technique is that the web documents are from a single template. In this, web documents are clustered such that the documents in the same group belong to the same template is required. Thus, the accuracy of extracting templates depends on the quality of clustering.

No

Yes
Yes

9.2.2 Clustering URL
Uniform Resource Locator normalization is a central activity in web mining. URL normalization technique retrieves the web data in a smoother way. It also decreases a lot of calculations in web mining process. Based on the content, formation, semantic similarity of the given set of URLs, clustering is done.

No

Yes
Yes


9.2.3 Rule Matching
Rule matching is nothing but template matching. It matches with the heterogeneous content in template to replace the extracted template. If one template extracted from the web page and the same page is extracted another time it conforms updating of the template.

No


Yes



No
Yes

Yes

9.2.4 Storing in DB
DB is used to store template that is extracted from the web pages in an efficient and compressed manner. It maintains a template as in grouping similarities."Efficient" means that the data which is stored in the DBMS can be accessed quickly and "compressed" means that the data takes up very little space in the computer's memory. The phrase "related data" means that the data stored pertains to a particular topic.

No

Yes



No


Yes

10. Screenshots
10.1 Testing
10.1.1 Functionality Testing

Fig 10.1: Testing for wrong user id and password
10.1.2 Integration testing
Software integration testing is the incremental integration testing of two or more integrated software components to produce failures caused by interface defects on a single platform.Its task is to verify that the components or software applications at the company level are interactive without error.

10.2 Results
10.2.1 The main form Design

10.2.2 Generating Linked URL

10.2.3 Clustering URL

10.2.4 Rule Matching

10.2.5 Text Template Design

10.2.6 Viewing the GSM Report


10.2.7 Viewing the Rediff Report

10.2.8 Viewing the Yelp Report

11. Conclusion
Thus the information is extracted in the form of whole webpage, including advertisement and other unwanted information which is extracted in the previous system. A new automatic information extraction approach is introduced for all domains which consist of URL extraction and Clustering. Two efficient data mining Techniques namely Min-Hash and Minimum Description Length is used to filter the unwanted information and extract the text based information where information will be user expected information.

Source: Essay UK - http://www.essay.uk.com/free-essays/information-technology/benefits-data-mining-techniques.php



About this resource

This Information Technology essay was submitted to us by a student in order to help you with your studies.


Search our content:


  • Download this page
  • Print this page
  • Search again

  • Word count:

    This page has approximately words.


    Share:


    Cite:

    If you use part of this page in your own work, you need to provide a citation, as follows:

    Essay UK, Benefits Of Data Mining Techniques. Available from: <https://www.essay.uk.com/free-essays/information-technology/benefits-data-mining-techniques.php> [26-05-20].


    More information:

    If you are the original author of this content and no longer wish to have it published on our website then please click on the link below to request removal: