Information Upload and retrieval using SP Theory of Intelligence

1. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016] Infogain Publication (Infogainpublication.com) ISSN : 2454-1311 www.ijaems.com Page | 274 Information Upload and retrieval using SP Theory of Intelligence Supriya P, Koushik S Dept of ISE, MS Ramaiah Institute of Technology, Bangalore, India Abstract— In today’s technology Cloud computing has become an important aspect and storing of data on cloud is of high importance as the need for virtual space to store massive amount of data has grown during the years. However time taken for uploading and downloading is limited by processing time and thus need arises to solve this issue to handle large data and their processing. Another common problem is de duplication. With the cloud services growing at a rapid rate it is also associated by increasing large volumes of data being stored on remote servers of cloud. But most of the remote stored files are duplicated because of uploading the same file by different users at different locations. A recent survey by EMC says about 75% of the digital data present on cloud are duplicate copies. To overcome these two problems in this paper we are using SP theory of intelligence using lossless compression of information, which makes the big data smaller and thus reduces the problems in storage and management of large amounts of data. Keywords— Cloud Computing, Big data processing, Data De-duplication, SP theory of intelligence, Lossless Compression. I. INTRODUCTION The SP theory (Simplicity and Power theory) design is to simplify and interface concepts across artificial intelligence, conventional computing and human perception and cognizance, and understands compression of information via the matching and unification of patterns. SP Theory is accomplished as a hypothetical system corresponding to brain which can receive new information and stores it by relating it to already available Old information. The SP theory of intelligence amalgamates visionary clarity with explanatory and description power. In the SP machine there is a capability for simplifying the computation and also saves time, cost and effort involved in the development of many applications. SP is short for Simplicity and Power, as it may be seen as a process of reducing informational redundancy and thus increasing its simplicity while retaining as much as possible of its non-redundant expressive power. Majority of the computing tasks in today’s situation involves data that have been collected and stored in databases. The data make a stationary target. But, increasingly vital important insights can be gained from analyzing information that’s on the move. This approach is called streams analytics [1]. Rather than placing data in a database first, the computer analyses it as and when it comes from a wide amount of sources, continuously filtering its understanding of the data as and when conditions change. This is the way how human process their information. Although, in its unsupervised learning, the SP system processes information by dividing them into batches, and thus lends itself to an incremental approach. The SP system is designed to incorporate new information to a constantly growing body of compressed old information. Massive amount of large data sets introduces problems of data management [2]. In order to reduce required amount of storage it is necessary to reduce the time for computation and also represent data in desired way. One major roadblock to using cloud services for processing large data is the problem of transmitting the data sets over a network. Maintaining communications network is turning out to be very expensive and marginally profitable. In order to minimize these network charges system designers have figured out a way to minimize the energy used for processing data. The SP system advocates the efficient transmission of data by dividing the information into smaller parts. SP theory in managing massive data[3]- Large scale data sets introduce many problems for data management. a. Volume: Size of the large scale data sets can be reduced by compressing the information into chunks and by identifying the duplicate chunks, storage of identical chunks can be eliminated and hence efficiency in storing can be achieved. b. Variety: Each format of information requires different kind of processing. Text files with different formats .txt, .pdf,.doc, each one need to be analyzed differently. SP System provides a universal framework for processing of diverse formats. c. Velocity: Instead of simply placing the data in the database, it requires data to be analyzed first and understand its content first. This way transmission time taken to analyze the moving data can be minimized. II. LITERATURE SURVEY Wolff et.al [1] explained how the SP theory of intelligence and its applicability in the SP machine may be applied to the processing and management of big data.

2. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016] Infogain Publication (Infogainpublication.com) ISSN : 2454-1311 www.ijaems.com Page | 275 J Gerard Wolff et.al [3] this article is an overview of the SP theory of intelligence is designed to simplify and interface concepts across artificial intelligence, conventional computing and human perception and cognizance, with information compression as a theme. It is understood as a human brain that receives New data and stores it in by compressing it as Old information; and it is envisioned in the form of a computer model, a first version of the SP machine. The matching and unification of data patterns and the concept of multiple alignments are the ideas behind the theory. J Gerard Wolff et.al [5] provides confirmation for the idea that much of artificial intelligence, human perception and cognizance, conventional computing, may be understood as compression of information via the matching and indication of patterns. This is the basis for the SP theory of intelligence, outlined in the paper and fully described elsewhere. Robert Escriva et.al [12] this paper presents HyperDex which is understood as a distributed key-value cache that provides a exclusive search primitive that empowers queries on secondary attributes. The key concept of HyperDex is the idea of hyperspace hashing in which objects having multiple attributes are located onto a multidimensional hyperspace. This scaling leads to productive implementations for searches of fractionally-specified secondary attribute and range queries and also for retrieval by primary key. J Gerard Wolff .et.al [13] describes existing and expected benefits of the SP theory of intelligence, and some potential applications. The theory is designed to simplify and interface ideas across artificial intelligence, conventional computing, and human perception and cognizance, with information compression as a theme. It incorporates simplicity of both explanatory and descriptive power in numerous areas of computation and cognizance. Kruus et al. [20] presents a work similar to ours. They propose a chunking algorithm involving two stages that re-chunks transitional and non-duplicated big CDC chunks into small CDC chunks. The significance of their work is to reduce the number of chunks while attaining as the same duplicate elimination ratio as a baseline CDC algorithm. III. PROPOSED SYSTEM Figure1 represents the system architecture of proposed system which is concerned with establishing basic structural framework for a system. When a file is uploaded by the user say File X, text file is divided into blocks for each block a hash code is generated. Each time when a new file say File Y is uploaded it compares with hash code for duplication and then writes the file for cloud storage. A. Information compression and Block Creation The main aim of this project is to overcome the problems in big data using the SP Theory of Intelligence. In order to achieve this goal, big data is subjected to compression techniques. Compression of information is achieved by pattern matching. Using such a system leads to the improvement in the processing of big data. The SP Theory provides pattern recognition, information storage, retrieval and information compression. Fig.1: Architecture of proposed system Figure 2 represents the block diagram of our proposed sp theory of intelligence system. Our proposed system has two modules user and admin. In first stage using logging details user and admin can log into the system. In second stage user can upload a text file for processing, in block creation process divides the text into blocks of fixed or variable size. In last stage de duplication process calculates hash-function for each block and compares hash result with already stored index for duplication detection and update index and store data. The process of block creation or chunking divides the data stream into smaller, non overlapping blocks. There are different approaches for block creations static chunking, content defined chunking and file-based chunking. In our system we are using content defined chunking method, where chunks will be generated based on their content and the calculation of one fingerprint for each substring of length w i.e., one fingerprint for each word and processing overhead for fingerprint typically depends on string length, if small string length impact good performance, but bad chunking properties and large string length impact good chunking properties.

Information Upload and retrieval using SP Theory of Intelligence

Recommended

More Related Content

What's hot (16)

Similar to Information Upload and retrieval using SP Theory of Intelligence (20)

Recently uploaded (20)

Information Upload and retrieval using SP Theory of Intelligence