SlideShare a Scribd company logo
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016]
Infogain Publication (Infogainpublication.com) ISSN : 2454-1311
www.ijaems.com Page | 274
Information Upload and retrieval using SP Theory
of Intelligence
Supriya P, Koushik S
Dept of ISE, MS Ramaiah Institute of Technology, Bangalore, India
Abstract— In today’s technology Cloud computing has
become an important aspect and storing of data on cloud is of
high importance as the need for virtual space to store massive
amount of data has grown during the years. However time
taken for uploading and downloading is limited by processing
time and thus need arises to solve this issue to handle large
data and their processing. Another common problem is de
duplication. With the cloud services growing at a rapid rate it
is also associated by increasing large volumes of data being
stored on remote servers of cloud. But most of the remote
stored files are duplicated because of uploading the same file
by different users at different locations. A recent survey by
EMC says about 75% of the digital data present on cloud are
duplicate copies. To overcome these two problems in this
paper we are using SP theory of intelligence using lossless
compression of information, which makes the big data smaller
and thus reduces the problems in storage and management of
large amounts of data.
Keywords— Cloud Computing, Big data processing, Data
De-duplication, SP theory of intelligence, Lossless
Compression.
I. INTRODUCTION
The SP theory (Simplicity and Power theory) design is to
simplify and interface concepts across artificial intelligence,
conventional computing and human perception and
cognizance, and understands compression of information via
the matching and unification of patterns. SP Theory is
accomplished as a hypothetical system corresponding to brain
which can receive new information and stores it by relating it
to already available Old information.
The SP theory of intelligence amalgamates visionary clarity
with explanatory and description power. In the SP machine
there is a capability for simplifying the computation and also
saves time, cost and effort involved in the development of
many applications.
SP is short for Simplicity and Power, as it may be seen as a
process of reducing informational redundancy and thus
increasing its simplicity while retaining as much as possible of
its non-redundant expressive power.
Majority of the computing tasks in today’s situation involves
data that have been collected and stored in databases. The data
make a stationary target. But, increasingly vital important
insights can be gained from analyzing information that’s on the
move. This approach is called streams analytics [1]. Rather
than placing data in a database first, the computer analyses it as
and when it comes from a wide amount of sources,
continuously filtering its understanding of the data as and when
conditions change. This is the way how human process their
information. Although, in its unsupervised learning, the SP
system processes information by dividing them into batches,
and thus lends itself to an incremental approach. The SP
system is designed to incorporate new information to a
constantly growing body of compressed old information.
Massive amount of large data sets introduces problems of data
management [2]. In order to reduce required amount of storage
it is necessary to reduce the time for computation and also
represent data in desired way.
One major roadblock to using cloud services for processing
large data is the problem of transmitting the data sets over a
network. Maintaining communications network is turning out
to be very expensive and marginally profitable. In order to
minimize these network charges system designers have figured
out a way to minimize the energy used for processing data. The
SP system advocates the efficient transmission of data by
dividing the information into smaller parts.
SP theory in managing massive data[3]- Large scale data sets
introduce many problems for data management.
a. Volume: Size of the large scale data sets can be reduced by
compressing the information into chunks and by identifying the
duplicate chunks, storage of identical chunks can be eliminated
and hence efficiency in storing can be achieved.
b. Variety: Each format of information requires different kind of
processing. Text files with different formats .txt, .pdf,.doc,
each one need to be analyzed differently. SP System provides a
universal framework for processing of diverse formats.
c. Velocity: Instead of simply placing the data in the database, it
requires data to be analyzed first and understand its content
first. This way transmission time taken to analyze the moving
data can be minimized.
II. LITERATURE SURVEY
Wolff et.al [1] explained how the SP theory of intelligence and
its applicability in the SP machine may be applied to the
processing and management of big data.
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016]
Infogain Publication (Infogainpublication.com) ISSN : 2454-1311
www.ijaems.com Page | 275
J Gerard Wolff et.al [3] this article is an overview of the SP
theory of intelligence is designed to simplify and interface
concepts across artificial intelligence, conventional computing
and human perception and cognizance, with information
compression as a theme. It is understood as a human brain that
receives New data and stores it in by compressing it as Old
information; and it is envisioned in the form of a computer
model, a first version of the SP machine. The matching and
unification of data patterns and the concept of multiple
alignments are the ideas behind the theory.
J Gerard Wolff et.al [5] provides confirmation for the idea that
much of artificial intelligence, human perception and
cognizance, conventional computing, may be understood as
compression of information via the matching and indication of
patterns. This is the basis for the SP theory of intelligence,
outlined in the paper and fully described elsewhere.
Robert Escriva et.al [12] this paper presents HyperDex which
is understood as a distributed key-value cache that provides a
exclusive search primitive that empowers queries on secondary
attributes. The key concept of HyperDex is the idea of
hyperspace hashing in which objects having multiple attributes
are located onto a multidimensional hyperspace. This scaling
leads to productive implementations for searches of
fractionally-specified secondary attribute and range queries
and also for retrieval by primary key.
J Gerard Wolff .et.al [13] describes existing and expected
benefits of the SP theory of intelligence, and some potential
applications. The theory is designed to simplify and interface
ideas across artificial intelligence, conventional computing,
and human perception and cognizance, with information
compression as a theme. It incorporates simplicity of both
explanatory and descriptive power in numerous areas of
computation and cognizance.
Kruus et al. [20] presents a work similar to ours. They propose
a chunking algorithm involving two stages that re-chunks
transitional and non-duplicated big CDC chunks into small
CDC chunks. The significance of their work is to reduce the
number of chunks while attaining as the same duplicate
elimination ratio as a baseline CDC algorithm.
III. PROPOSED SYSTEM
Figure1 represents the system architecture of proposed system
which is concerned with establishing basic structural
framework for a system. When a file is uploaded by the user
say File X, text file is divided into blocks for each block a hash
code is generated. Each time when a new file say File Y is
uploaded it compares with hash code for duplication and then
writes the file for cloud storage.
A. Information compression and Block Creation
The main aim of this project is to overcome the problems in
big data using the SP Theory of Intelligence. In order to
achieve this goal, big data is subjected to compression
techniques. Compression of information is achieved by pattern
matching. Using such a system leads to the improvement in the
processing of big data. The SP Theory provides pattern
recognition, information storage, retrieval and information
compression.
Fig.1: Architecture of proposed system
Figure 2 represents the block diagram of our proposed sp
theory of intelligence system. Our proposed system has two
modules user and admin. In first stage using logging details
user and admin can log into the system. In second stage user
can upload a text file for processing, in block creation process
divides the text into blocks of fixed or variable size. In last
stage de duplication process calculates hash-function for each
block and compares hash result with already stored index for
duplication detection and update index and store data.
The process of block creation or chunking divides the data
stream into smaller, non overlapping blocks. There are
different approaches for block creations static chunking,
content defined chunking and file-based chunking. In our
system we are using content defined chunking method, where
chunks will be generated based on their content and the
calculation of one fingerprint for each substring of length w
i.e., one fingerprint for each word and processing overhead for
fingerprint typically depends on string length, if small string
length impact good performance, but bad chunking properties
and large string length impact good chunking properties.
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016]
Infogain Publication (Infogainpublication.com) ISSN : 2454-1311
www.ijaems.com Page | 275
J Gerard Wolff et.al [3] this article is an overview of the SP
theory of intelligence is designed to simplify and interface
concepts across artificial intelligence, conventional computing
and human perception and cognizance, with information
compression as a theme. It is understood as a human brain that
receives New data and stores it in by compressing it as Old
information; and it is envisioned in the form of a computer
model, a first version of the SP machine. The matching and
unification of data patterns and the concept of multiple
alignments are the ideas behind the theory.
J Gerard Wolff et.al [5] provides confirmation for the idea that
much of artificial intelligence, human perception and
cognizance, conventional computing, may be understood as
compression of information via the matching and indication of
patterns. This is the basis for the SP theory of intelligence,
outlined in the paper and fully described elsewhere.
Robert Escriva et.al [12] this paper presents HyperDex which
is understood as a distributed key-value cache that provides a
exclusive search primitive that empowers queries on secondary
attributes. The key concept of HyperDex is the idea of
hyperspace hashing in which objects having multiple attributes
are located onto a multidimensional hyperspace. This scaling
leads to productive implementations for searches of
fractionally-specified secondary attribute and range queries
and also for retrieval by primary key.
J Gerard Wolff .et.al [13] describes existing and expected
benefits of the SP theory of intelligence, and some potential
applications. The theory is designed to simplify and interface
ideas across artificial intelligence, conventional computing,
and human perception and cognizance, with information
compression as a theme. It incorporates simplicity of both
explanatory and descriptive power in numerous areas of
computation and cognizance.
Kruus et al. [20] presents a work similar to ours. They propose
a chunking algorithm involving two stages that re-chunks
transitional and non-duplicated big CDC chunks into small
CDC chunks. The significance of their work is to reduce the
number of chunks while attaining as the same duplicate
elimination ratio as a baseline CDC algorithm.
III. PROPOSED SYSTEM
Figure1 represents the system architecture of proposed system
which is concerned with establishing basic structural
framework for a system. When a file is uploaded by the user
say File X, text file is divided into blocks for each block a hash
code is generated. Each time when a new file say File Y is
uploaded it compares with hash code for duplication and then
writes the file for cloud storage.
A. Information compression and Block Creation
The main aim of this project is to overcome the problems in
big data using the SP Theory of Intelligence. In order to
achieve this goal, big data is subjected to compression
techniques. Compression of information is achieved by pattern
matching. Using such a system leads to the improvement in the
processing of big data. The SP Theory provides pattern
recognition, information storage, retrieval and information
compression.
Fig.1: Architecture of proposed system
Figure 2 represents the block diagram of our proposed sp
theory of intelligence system. Our proposed system has two
modules user and admin. In first stage using logging details
user and admin can log into the system. In second stage user
can upload a text file for processing, in block creation process
divides the text into blocks of fixed or variable size. In last
stage de duplication process calculates hash-function for each
block and compares hash result with already stored index for
duplication detection and update index and store data.
The process of block creation or chunking divides the data
stream into smaller, non overlapping blocks. There are
different approaches for block creations static chunking,
content defined chunking and file-based chunking. In our
system we are using content defined chunking method, where
chunks will be generated based on their content and the
calculation of one fingerprint for each substring of length w
i.e., one fingerprint for each word and processing overhead for
fingerprint typically depends on string length, if small string
length impact good performance, but bad chunking properties
and large string length impact good chunking properties.
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016]
Infogain Publication (Infogainpublication.com) ISSN : 2454-1311
www.ijaems.com Page | 275
J Gerard Wolff et.al [3] this article is an overview of the SP
theory of intelligence is designed to simplify and interface
concepts across artificial intelligence, conventional computing
and human perception and cognizance, with information
compression as a theme. It is understood as a human brain that
receives New data and stores it in by compressing it as Old
information; and it is envisioned in the form of a computer
model, a first version of the SP machine. The matching and
unification of data patterns and the concept of multiple
alignments are the ideas behind the theory.
J Gerard Wolff et.al [5] provides confirmation for the idea that
much of artificial intelligence, human perception and
cognizance, conventional computing, may be understood as
compression of information via the matching and indication of
patterns. This is the basis for the SP theory of intelligence,
outlined in the paper and fully described elsewhere.
Robert Escriva et.al [12] this paper presents HyperDex which
is understood as a distributed key-value cache that provides a
exclusive search primitive that empowers queries on secondary
attributes. The key concept of HyperDex is the idea of
hyperspace hashing in which objects having multiple attributes
are located onto a multidimensional hyperspace. This scaling
leads to productive implementations for searches of
fractionally-specified secondary attribute and range queries
and also for retrieval by primary key.
J Gerard Wolff .et.al [13] describes existing and expected
benefits of the SP theory of intelligence, and some potential
applications. The theory is designed to simplify and interface
ideas across artificial intelligence, conventional computing,
and human perception and cognizance, with information
compression as a theme. It incorporates simplicity of both
explanatory and descriptive power in numerous areas of
computation and cognizance.
Kruus et al. [20] presents a work similar to ours. They propose
a chunking algorithm involving two stages that re-chunks
transitional and non-duplicated big CDC chunks into small
CDC chunks. The significance of their work is to reduce the
number of chunks while attaining as the same duplicate
elimination ratio as a baseline CDC algorithm.
III. PROPOSED SYSTEM
Figure1 represents the system architecture of proposed system
which is concerned with establishing basic structural
framework for a system. When a file is uploaded by the user
say File X, text file is divided into blocks for each block a hash
code is generated. Each time when a new file say File Y is
uploaded it compares with hash code for duplication and then
writes the file for cloud storage.
A. Information compression and Block Creation
The main aim of this project is to overcome the problems in
big data using the SP Theory of Intelligence. In order to
achieve this goal, big data is subjected to compression
techniques. Compression of information is achieved by pattern
matching. Using such a system leads to the improvement in the
processing of big data. The SP Theory provides pattern
recognition, information storage, retrieval and information
compression.
Fig.1: Architecture of proposed system
Figure 2 represents the block diagram of our proposed sp
theory of intelligence system. Our proposed system has two
modules user and admin. In first stage using logging details
user and admin can log into the system. In second stage user
can upload a text file for processing, in block creation process
divides the text into blocks of fixed or variable size. In last
stage de duplication process calculates hash-function for each
block and compares hash result with already stored index for
duplication detection and update index and store data.
The process of block creation or chunking divides the data
stream into smaller, non overlapping blocks. There are
different approaches for block creations static chunking,
content defined chunking and file-based chunking. In our
system we are using content defined chunking method, where
chunks will be generated based on their content and the
calculation of one fingerprint for each substring of length w
i.e., one fingerprint for each word and processing overhead for
fingerprint typically depends on string length, if small string
length impact good performance, but bad chunking properties
and large string length impact good chunking properties.
International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016]
Infogain Publication (Infogainpublication.com) ISSN : 2454-1311
www.ijaems.com Page | 275
J Gerard Wolff et.al [3] this article is an overview of the SP
theory of intelligence is designed to simplify and interface
concepts across artificial intelligence, conventional computing
and human perception and cognizance, with information
compression as a theme. It is understood as a human brain that
receives New data and stores it in by compressing it as Old
information; and it is envisioned in the form of a computer
model, a first version of the SP machine. The matching and
unification of data patterns and the concept of multiple
alignments are the ideas behind the theory.
J Gerard Wolff et.al [5] provides confirmation for the idea that
much of artificial intelligence, human perception and
cognizance, conventional computing, may be understood as
compression of information via the matching and indication of
patterns. This is the basis for the SP theory of intelligence,
outlined in the paper and fully described elsewhere.
Robert Escriva et.al [12] this paper presents HyperDex which
is understood as a distributed key-value cache that provides a
exclusive search primitive that empowers queries on secondary
attributes. The key concept of HyperDex is the idea of
hyperspace hashing in which objects having multiple attributes
are located onto a multidimensional hyperspace. This scaling
leads to productive implementations for searches of
fractionally-specified secondary attribute and range queries
and also for retrieval by primary key.
J Gerard Wolff .et.al [13] describes existing and expected
benefits of the SP theory of intelligence, and some potential
applications. The theory is designed to simplify and interface
ideas across artificial intelligence, conventional computing,
and human perception and cognizance, with information
compression as a theme. It incorporates simplicity of both
explanatory and descriptive power in numerous areas of
computation and cognizance.
Kruus et al. [20] presents a work similar to ours. They propose
a chunking algorithm involving two stages that re-chunks
transitional and non-duplicated big CDC chunks into small
CDC chunks. The significance of their work is to reduce the
number of chunks while attaining as the same duplicate
elimination ratio as a baseline CDC algorithm.
III. PROPOSED SYSTEM
Figure1 represents the system architecture of proposed system
which is concerned with establishing basic structural
framework for a system. When a file is uploaded by the user
say File X, text file is divided into blocks for each block a hash
code is generated. Each time when a new file say File Y is
uploaded it compares with hash code for duplication and then
writes the file for cloud storage.
A. Information compression and Block Creation
The main aim of this project is to overcome the problems in
big data using the SP Theory of Intelligence. In order to
achieve this goal, big data is subjected to compression
techniques. Compression of information is achieved by pattern
matching. Using such a system leads to the improvement in the
processing of big data. The SP Theory provides pattern
recognition, information storage, retrieval and information
compression.
Fig.1: Architecture of proposed system
Figure 2 represents the block diagram of our proposed sp
theory of intelligence system. Our proposed system has two
modules user and admin. In first stage using logging details
user and admin can log into the system. In second stage user
can upload a text file for processing, in block creation process
divides the text into blocks of fixed or variable size. In last
stage de duplication process calculates hash-function for each
block and compares hash result with already stored index for
duplication detection and update index and store data.
The process of block creation or chunking divides the data
stream into smaller, non overlapping blocks. There are
different approaches for block creations static chunking,
content defined chunking and file-based chunking. In our
system we are using content defined chunking method, where
chunks will be generated based on their content and the
calculation of one fingerprint for each substring of length w
i.e., one fingerprint for each word and processing overhead for
fingerprint typically depends on string length, if small string
length impact good performance, but bad chunking properties
and large string length impact good chunking properties.
Ad

More Related Content

What's hot (16)

A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...
ijcsit
 
Using BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined NetworkingUsing BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined Networking
IJCSIS Research Publications
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional Data
IJSTA
 
Anonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloudAnonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloud
eSAT Journals
 
IRJET- Comparative Study of Efficacy of Big Data Analysis and Deep Learni...
IRJET-  	  Comparative Study of Efficacy of Big Data Analysis and Deep Learni...IRJET-  	  Comparative Study of Efficacy of Big Data Analysis and Deep Learni...
IRJET- Comparative Study of Efficacy of Big Data Analysis and Deep Learni...
IRJET Journal
 
Additive gaussian noise based data perturbation in multi level trust privacy ...
Additive gaussian noise based data perturbation in multi level trust privacy ...Additive gaussian noise based data perturbation in multi level trust privacy ...
Additive gaussian noise based data perturbation in multi level trust privacy ...
IJDKP
 
A hybrid cloud approach for secure authorized
A hybrid cloud approach for secure authorizedA hybrid cloud approach for secure authorized
A hybrid cloud approach for secure authorized
Ninad Samel
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET Journal
 
Study on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemStudy on potential capabilities of a nodb system
Study on potential capabilities of a nodb system
ijitjournal
 
U0 vqmtq3m tc=
U0 vqmtq3m tc=U0 vqmtq3m tc=
U0 vqmtq3m tc=
International Journal of Science and Research (IJSR)
 
LSTM deep learning method for network intrusion detection system
LSTM deep learning method for network intrusion  detection system LSTM deep learning method for network intrusion  detection system
LSTM deep learning method for network intrusion detection system
IJECEIAES
 
B1803031217
B1803031217B1803031217
B1803031217
IOSR Journals
 
IRJET- A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...
IRJET-  	  A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...IRJET-  	  A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...
IRJET- A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...
IRJET Journal
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
Tunde Ajose-Ismail
 
Protected Data Collection In WSN by Filtering Attackers Influence (Published ...
Protected Data Collection In WSN by Filtering Attackers Influence (Published ...Protected Data Collection In WSN by Filtering Attackers Influence (Published ...
Protected Data Collection In WSN by Filtering Attackers Influence (Published ...
sangasandeep
 
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
neirew J
 
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...
ijcsit
 
Using BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined NetworkingUsing BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined Networking
IJCSIS Research Publications
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional Data
IJSTA
 
Anonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloudAnonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloud
eSAT Journals
 
IRJET- Comparative Study of Efficacy of Big Data Analysis and Deep Learni...
IRJET-  	  Comparative Study of Efficacy of Big Data Analysis and Deep Learni...IRJET-  	  Comparative Study of Efficacy of Big Data Analysis and Deep Learni...
IRJET- Comparative Study of Efficacy of Big Data Analysis and Deep Learni...
IRJET Journal
 
Additive gaussian noise based data perturbation in multi level trust privacy ...
Additive gaussian noise based data perturbation in multi level trust privacy ...Additive gaussian noise based data perturbation in multi level trust privacy ...
Additive gaussian noise based data perturbation in multi level trust privacy ...
IJDKP
 
A hybrid cloud approach for secure authorized
A hybrid cloud approach for secure authorizedA hybrid cloud approach for secure authorized
A hybrid cloud approach for secure authorized
Ninad Samel
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET Journal
 
Study on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemStudy on potential capabilities of a nodb system
Study on potential capabilities of a nodb system
ijitjournal
 
LSTM deep learning method for network intrusion detection system
LSTM deep learning method for network intrusion  detection system LSTM deep learning method for network intrusion  detection system
LSTM deep learning method for network intrusion detection system
IJECEIAES
 
IRJET- A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...
IRJET-  	  A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...IRJET-  	  A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...
IRJET- A Novel High Capacity Reversible Data Hiding in Encrypted Domain u...
IRJET Journal
 
Protected Data Collection In WSN by Filtering Attackers Influence (Published ...
Protected Data Collection In WSN by Filtering Attackers Influence (Published ...Protected Data Collection In WSN by Filtering Attackers Influence (Published ...
Protected Data Collection In WSN by Filtering Attackers Influence (Published ...
sangasandeep
 
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
neirew J
 

Similar to Information Upload and retrieval using SP Theory of Intelligence (20)

Anomalous symmetry succession for seek out
Anomalous symmetry succession for seek outAnomalous symmetry succession for seek out
Anomalous symmetry succession for seek out
iaemedu
 
An Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud StorageAn Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud Storage
IJMER
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
inventionjournals
 
Activity Context Modeling in Context-Aware
Activity Context Modeling in Context-AwareActivity Context Modeling in Context-Aware
Activity Context Modeling in Context-Aware
Editor IJCATR
 
Paper id 252014139
Paper id 252014139Paper id 252014139
Paper id 252014139
IJRAT
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
ijtsrd
 
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
samueljackson3773
 
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
ijdpsjournal
 
50120130405014 2-3
50120130405014 2-350120130405014 2-3
50120130405014 2-3
IAEME Publication
 
iaetsd Controlling data deuplication in cloud storage
iaetsd Controlling data deuplication in cloud storageiaetsd Controlling data deuplication in cloud storage
iaetsd Controlling data deuplication in cloud storage
Iaetsd Iaetsd
 
Big Data Technology Accelerate Genomics Precision Medicine
Big Data Technology Accelerate Genomics Precision MedicineBig Data Technology Accelerate Genomics Precision Medicine
Big Data Technology Accelerate Genomics Precision Medicine
cscpconf
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
dbpublications
 
E018142329
E018142329E018142329
E018142329
IOSR Journals
 
A Strategy for Improving the Performance of Small Files in Openstack Swift
 A Strategy for Improving the Performance of Small Files in Openstack Swift  A Strategy for Improving the Performance of Small Files in Openstack Swift
A Strategy for Improving the Performance of Small Files in Openstack Swift
Editor IJCATR
 
Privacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagePrivacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storage
dbpublications
 
Big Data in Bioinformatics & the Era of Cloud Computing
Big Data in Bioinformatics & the Era of Cloud ComputingBig Data in Bioinformatics & the Era of Cloud Computing
Big Data in Bioinformatics & the Era of Cloud Computing
IOSR Journals
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET Journal
 
[IJET V2I3P7] Authors: Muthe Sandhya, Shitole Sarika, Sinha Anukriti, Aghav S...
[IJET V2I3P7] Authors: Muthe Sandhya, Shitole Sarika, Sinha Anukriti, Aghav S...[IJET V2I3P7] Authors: Muthe Sandhya, Shitole Sarika, Sinha Anukriti, Aghav S...
[IJET V2I3P7] Authors: Muthe Sandhya, Shitole Sarika, Sinha Anukriti, Aghav S...
IJET - International Journal of Engineering and Techniques
 
Information Retrieval based on Cluster Analysis Approach
Information Retrieval based on Cluster Analysis ApproachInformation Retrieval based on Cluster Analysis Approach
Information Retrieval based on Cluster Analysis Approach
AIRCC Publishing Corporation
 
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACH
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACHINFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACH
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACH
ijcsit
 
Anomalous symmetry succession for seek out
Anomalous symmetry succession for seek outAnomalous symmetry succession for seek out
Anomalous symmetry succession for seek out
iaemedu
 
An Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud StorageAn Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud Storage
IJMER
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
inventionjournals
 
Activity Context Modeling in Context-Aware
Activity Context Modeling in Context-AwareActivity Context Modeling in Context-Aware
Activity Context Modeling in Context-Aware
Editor IJCATR
 
Paper id 252014139
Paper id 252014139Paper id 252014139
Paper id 252014139
IJRAT
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
ijtsrd
 
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
samueljackson3773
 
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
ijdpsjournal
 
iaetsd Controlling data deuplication in cloud storage
iaetsd Controlling data deuplication in cloud storageiaetsd Controlling data deuplication in cloud storage
iaetsd Controlling data deuplication in cloud storage
Iaetsd Iaetsd
 
Big Data Technology Accelerate Genomics Precision Medicine
Big Data Technology Accelerate Genomics Precision MedicineBig Data Technology Accelerate Genomics Precision Medicine
Big Data Technology Accelerate Genomics Precision Medicine
cscpconf
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
dbpublications
 
A Strategy for Improving the Performance of Small Files in Openstack Swift
 A Strategy for Improving the Performance of Small Files in Openstack Swift  A Strategy for Improving the Performance of Small Files in Openstack Swift
A Strategy for Improving the Performance of Small Files in Openstack Swift
Editor IJCATR
 
Privacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagePrivacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storage
dbpublications
 
Big Data in Bioinformatics & the Era of Cloud Computing
Big Data in Bioinformatics & the Era of Cloud ComputingBig Data in Bioinformatics & the Era of Cloud Computing
Big Data in Bioinformatics & the Era of Cloud Computing
IOSR Journals
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET Journal
 
Information Retrieval based on Cluster Analysis Approach
Information Retrieval based on Cluster Analysis ApproachInformation Retrieval based on Cluster Analysis Approach
Information Retrieval based on Cluster Analysis Approach
AIRCC Publishing Corporation
 
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACH
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACHINFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACH
INFORMATION RETRIEVAL BASED ON CLUSTER ANALYSIS APPROACH
ijcsit
 
Ad

Recently uploaded (20)

ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
ijflsjournal087
 
Applications of Centroid in Structural Engineering
Applications of Centroid in Structural EngineeringApplications of Centroid in Structural Engineering
Applications of Centroid in Structural Engineering
suvrojyotihalder2006
 
sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
ajayrm685
 
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
PawachMetharattanara
 
acid base ppt and their specific application in food
acid base ppt and their specific application in foodacid base ppt and their specific application in food
acid base ppt and their specific application in food
Fatehatun Noor
 
Nanometer Metal-Organic-Framework Literature Comparison
Nanometer Metal-Organic-Framework  Literature ComparisonNanometer Metal-Organic-Framework  Literature Comparison
Nanometer Metal-Organic-Framework Literature Comparison
Chris Harding
 
Design of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdfDesign of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdf
Kamel Farid
 
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
JRR Tolkien’s Lord of the Rings: Was It Influenced by Nordic Mythology, Homer...
Reflections on Morality, Philosophy, and History
 
twin tower attack 2001 new york city
twin  tower  attack  2001 new  york citytwin  tower  attack  2001 new  york city
twin tower attack 2001 new york city
harishreemavs
 
Construction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil EngineeringConstruction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil Engineering
Lavish Kashyap
 
David Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry - Specializes In AWS, Microservices And Python.pdfDavid Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry
 
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdfSmart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
PawachMetharattanara
 
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjjseninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
AjijahamadKhaji
 
hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .
NABLAS株式会社
 
Modelling of Concrete Compressive Strength Admixed with GGBFS Using Gene Expr...
Modelling of Concrete Compressive Strength Admixed with GGBFS Using Gene Expr...Modelling of Concrete Compressive Strength Admixed with GGBFS Using Gene Expr...
Modelling of Concrete Compressive Strength Admixed with GGBFS Using Gene Expr...
Journal of Soft Computing in Civil Engineering
 
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdfLittle Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
gori42199
 
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
22PCOAM16 ML Unit 3 Full notes PDF & QB.pdf
Guru Nanak Technical Institutions
 
Machine Learning basics POWERPOINT PRESENETATION
Machine Learning basics POWERPOINT PRESENETATIONMachine Learning basics POWERPOINT PRESENETATION
Machine Learning basics POWERPOINT PRESENETATION
DarrinBright1
 
SICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introductionSICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introduction
fabienklr
 
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
6th International Conference on Big Data, Machine Learning and IoT (BMLI 2025)
ijflsjournal087
 
Applications of Centroid in Structural Engineering
Applications of Centroid in Structural EngineeringApplications of Centroid in Structural Engineering
Applications of Centroid in Structural Engineering
suvrojyotihalder2006
 
sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
sss1.pptxsss1.pptxsss1.pptxsss1.pptxsss1.pptx
ajayrm685
 
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
PawachMetharattanara
 
acid base ppt and their specific application in food
acid base ppt and their specific application in foodacid base ppt and their specific application in food
acid base ppt and their specific application in food
Fatehatun Noor
 
Nanometer Metal-Organic-Framework Literature Comparison
Nanometer Metal-Organic-Framework  Literature ComparisonNanometer Metal-Organic-Framework  Literature Comparison
Nanometer Metal-Organic-Framework Literature Comparison
Chris Harding
 
Design of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdfDesign of Variable Depth Single-Span Post.pdf
Design of Variable Depth Single-Span Post.pdf
Kamel Farid
 
twin tower attack 2001 new york city
twin  tower  attack  2001 new  york citytwin  tower  attack  2001 new  york city
twin tower attack 2001 new york city
harishreemavs
 
Construction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil EngineeringConstruction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil Engineering
Lavish Kashyap
 
David Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry - Specializes In AWS, Microservices And Python.pdfDavid Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry - Specializes In AWS, Microservices And Python.pdf
David Boutry
 
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdfSmart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
PawachMetharattanara
 
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjjseninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
AjijahamadKhaji
 
hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .
NABLAS株式会社
 
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdfLittle Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
Little Known Ways To 3 Best sites to Buy Linkedin Accounts.pdf
gori42199
 
Machine Learning basics POWERPOINT PRESENETATION
Machine Learning basics POWERPOINT PRESENETATIONMachine Learning basics POWERPOINT PRESENETATION
Machine Learning basics POWERPOINT PRESENETATION
DarrinBright1
 
SICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introductionSICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introduction
fabienklr
 
Ad

Information Upload and retrieval using SP Theory of Intelligence

  • 1. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016] Infogain Publication (Infogainpublication.com) ISSN : 2454-1311 www.ijaems.com Page | 274 Information Upload and retrieval using SP Theory of Intelligence Supriya P, Koushik S Dept of ISE, MS Ramaiah Institute of Technology, Bangalore, India Abstract— In today’s technology Cloud computing has become an important aspect and storing of data on cloud is of high importance as the need for virtual space to store massive amount of data has grown during the years. However time taken for uploading and downloading is limited by processing time and thus need arises to solve this issue to handle large data and their processing. Another common problem is de duplication. With the cloud services growing at a rapid rate it is also associated by increasing large volumes of data being stored on remote servers of cloud. But most of the remote stored files are duplicated because of uploading the same file by different users at different locations. A recent survey by EMC says about 75% of the digital data present on cloud are duplicate copies. To overcome these two problems in this paper we are using SP theory of intelligence using lossless compression of information, which makes the big data smaller and thus reduces the problems in storage and management of large amounts of data. Keywords— Cloud Computing, Big data processing, Data De-duplication, SP theory of intelligence, Lossless Compression. I. INTRODUCTION The SP theory (Simplicity and Power theory) design is to simplify and interface concepts across artificial intelligence, conventional computing and human perception and cognizance, and understands compression of information via the matching and unification of patterns. SP Theory is accomplished as a hypothetical system corresponding to brain which can receive new information and stores it by relating it to already available Old information. The SP theory of intelligence amalgamates visionary clarity with explanatory and description power. In the SP machine there is a capability for simplifying the computation and also saves time, cost and effort involved in the development of many applications. SP is short for Simplicity and Power, as it may be seen as a process of reducing informational redundancy and thus increasing its simplicity while retaining as much as possible of its non-redundant expressive power. Majority of the computing tasks in today’s situation involves data that have been collected and stored in databases. The data make a stationary target. But, increasingly vital important insights can be gained from analyzing information that’s on the move. This approach is called streams analytics [1]. Rather than placing data in a database first, the computer analyses it as and when it comes from a wide amount of sources, continuously filtering its understanding of the data as and when conditions change. This is the way how human process their information. Although, in its unsupervised learning, the SP system processes information by dividing them into batches, and thus lends itself to an incremental approach. The SP system is designed to incorporate new information to a constantly growing body of compressed old information. Massive amount of large data sets introduces problems of data management [2]. In order to reduce required amount of storage it is necessary to reduce the time for computation and also represent data in desired way. One major roadblock to using cloud services for processing large data is the problem of transmitting the data sets over a network. Maintaining communications network is turning out to be very expensive and marginally profitable. In order to minimize these network charges system designers have figured out a way to minimize the energy used for processing data. The SP system advocates the efficient transmission of data by dividing the information into smaller parts. SP theory in managing massive data[3]- Large scale data sets introduce many problems for data management. a. Volume: Size of the large scale data sets can be reduced by compressing the information into chunks and by identifying the duplicate chunks, storage of identical chunks can be eliminated and hence efficiency in storing can be achieved. b. Variety: Each format of information requires different kind of processing. Text files with different formats .txt, .pdf,.doc, each one need to be analyzed differently. SP System provides a universal framework for processing of diverse formats. c. Velocity: Instead of simply placing the data in the database, it requires data to be analyzed first and understand its content first. This way transmission time taken to analyze the moving data can be minimized. II. LITERATURE SURVEY Wolff et.al [1] explained how the SP theory of intelligence and its applicability in the SP machine may be applied to the processing and management of big data.
  • 2. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016] Infogain Publication (Infogainpublication.com) ISSN : 2454-1311 www.ijaems.com Page | 275 J Gerard Wolff et.al [3] this article is an overview of the SP theory of intelligence is designed to simplify and interface concepts across artificial intelligence, conventional computing and human perception and cognizance, with information compression as a theme. It is understood as a human brain that receives New data and stores it in by compressing it as Old information; and it is envisioned in the form of a computer model, a first version of the SP machine. The matching and unification of data patterns and the concept of multiple alignments are the ideas behind the theory. J Gerard Wolff et.al [5] provides confirmation for the idea that much of artificial intelligence, human perception and cognizance, conventional computing, may be understood as compression of information via the matching and indication of patterns. This is the basis for the SP theory of intelligence, outlined in the paper and fully described elsewhere. Robert Escriva et.al [12] this paper presents HyperDex which is understood as a distributed key-value cache that provides a exclusive search primitive that empowers queries on secondary attributes. The key concept of HyperDex is the idea of hyperspace hashing in which objects having multiple attributes are located onto a multidimensional hyperspace. This scaling leads to productive implementations for searches of fractionally-specified secondary attribute and range queries and also for retrieval by primary key. J Gerard Wolff .et.al [13] describes existing and expected benefits of the SP theory of intelligence, and some potential applications. The theory is designed to simplify and interface ideas across artificial intelligence, conventional computing, and human perception and cognizance, with information compression as a theme. It incorporates simplicity of both explanatory and descriptive power in numerous areas of computation and cognizance. Kruus et al. [20] presents a work similar to ours. They propose a chunking algorithm involving two stages that re-chunks transitional and non-duplicated big CDC chunks into small CDC chunks. The significance of their work is to reduce the number of chunks while attaining as the same duplicate elimination ratio as a baseline CDC algorithm. III. PROPOSED SYSTEM Figure1 represents the system architecture of proposed system which is concerned with establishing basic structural framework for a system. When a file is uploaded by the user say File X, text file is divided into blocks for each block a hash code is generated. Each time when a new file say File Y is uploaded it compares with hash code for duplication and then writes the file for cloud storage. A. Information compression and Block Creation The main aim of this project is to overcome the problems in big data using the SP Theory of Intelligence. In order to achieve this goal, big data is subjected to compression techniques. Compression of information is achieved by pattern matching. Using such a system leads to the improvement in the processing of big data. The SP Theory provides pattern recognition, information storage, retrieval and information compression. Fig.1: Architecture of proposed system Figure 2 represents the block diagram of our proposed sp theory of intelligence system. Our proposed system has two modules user and admin. In first stage using logging details user and admin can log into the system. In second stage user can upload a text file for processing, in block creation process divides the text into blocks of fixed or variable size. In last stage de duplication process calculates hash-function for each block and compares hash result with already stored index for duplication detection and update index and store data. The process of block creation or chunking divides the data stream into smaller, non overlapping blocks. There are different approaches for block creations static chunking, content defined chunking and file-based chunking. In our system we are using content defined chunking method, where chunks will be generated based on their content and the calculation of one fingerprint for each substring of length w i.e., one fingerprint for each word and processing overhead for fingerprint typically depends on string length, if small string length impact good performance, but bad chunking properties and large string length impact good chunking properties.
  • 3. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016] Infogain Publication (Infogainpublication.com) ISSN : 2454-1311 www.ijaems.com Page | 275 J Gerard Wolff et.al [3] this article is an overview of the SP theory of intelligence is designed to simplify and interface concepts across artificial intelligence, conventional computing and human perception and cognizance, with information compression as a theme. It is understood as a human brain that receives New data and stores it in by compressing it as Old information; and it is envisioned in the form of a computer model, a first version of the SP machine. The matching and unification of data patterns and the concept of multiple alignments are the ideas behind the theory. J Gerard Wolff et.al [5] provides confirmation for the idea that much of artificial intelligence, human perception and cognizance, conventional computing, may be understood as compression of information via the matching and indication of patterns. This is the basis for the SP theory of intelligence, outlined in the paper and fully described elsewhere. Robert Escriva et.al [12] this paper presents HyperDex which is understood as a distributed key-value cache that provides a exclusive search primitive that empowers queries on secondary attributes. The key concept of HyperDex is the idea of hyperspace hashing in which objects having multiple attributes are located onto a multidimensional hyperspace. This scaling leads to productive implementations for searches of fractionally-specified secondary attribute and range queries and also for retrieval by primary key. J Gerard Wolff .et.al [13] describes existing and expected benefits of the SP theory of intelligence, and some potential applications. The theory is designed to simplify and interface ideas across artificial intelligence, conventional computing, and human perception and cognizance, with information compression as a theme. It incorporates simplicity of both explanatory and descriptive power in numerous areas of computation and cognizance. Kruus et al. [20] presents a work similar to ours. They propose a chunking algorithm involving two stages that re-chunks transitional and non-duplicated big CDC chunks into small CDC chunks. The significance of their work is to reduce the number of chunks while attaining as the same duplicate elimination ratio as a baseline CDC algorithm. III. PROPOSED SYSTEM Figure1 represents the system architecture of proposed system which is concerned with establishing basic structural framework for a system. When a file is uploaded by the user say File X, text file is divided into blocks for each block a hash code is generated. Each time when a new file say File Y is uploaded it compares with hash code for duplication and then writes the file for cloud storage. A. Information compression and Block Creation The main aim of this project is to overcome the problems in big data using the SP Theory of Intelligence. In order to achieve this goal, big data is subjected to compression techniques. Compression of information is achieved by pattern matching. Using such a system leads to the improvement in the processing of big data. The SP Theory provides pattern recognition, information storage, retrieval and information compression. Fig.1: Architecture of proposed system Figure 2 represents the block diagram of our proposed sp theory of intelligence system. Our proposed system has two modules user and admin. In first stage using logging details user and admin can log into the system. In second stage user can upload a text file for processing, in block creation process divides the text into blocks of fixed or variable size. In last stage de duplication process calculates hash-function for each block and compares hash result with already stored index for duplication detection and update index and store data. The process of block creation or chunking divides the data stream into smaller, non overlapping blocks. There are different approaches for block creations static chunking, content defined chunking and file-based chunking. In our system we are using content defined chunking method, where chunks will be generated based on their content and the calculation of one fingerprint for each substring of length w i.e., one fingerprint for each word and processing overhead for fingerprint typically depends on string length, if small string length impact good performance, but bad chunking properties and large string length impact good chunking properties.
  • 4. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016] Infogain Publication (Infogainpublication.com) ISSN : 2454-1311 www.ijaems.com Page | 275 J Gerard Wolff et.al [3] this article is an overview of the SP theory of intelligence is designed to simplify and interface concepts across artificial intelligence, conventional computing and human perception and cognizance, with information compression as a theme. It is understood as a human brain that receives New data and stores it in by compressing it as Old information; and it is envisioned in the form of a computer model, a first version of the SP machine. The matching and unification of data patterns and the concept of multiple alignments are the ideas behind the theory. J Gerard Wolff et.al [5] provides confirmation for the idea that much of artificial intelligence, human perception and cognizance, conventional computing, may be understood as compression of information via the matching and indication of patterns. This is the basis for the SP theory of intelligence, outlined in the paper and fully described elsewhere. Robert Escriva et.al [12] this paper presents HyperDex which is understood as a distributed key-value cache that provides a exclusive search primitive that empowers queries on secondary attributes. The key concept of HyperDex is the idea of hyperspace hashing in which objects having multiple attributes are located onto a multidimensional hyperspace. This scaling leads to productive implementations for searches of fractionally-specified secondary attribute and range queries and also for retrieval by primary key. J Gerard Wolff .et.al [13] describes existing and expected benefits of the SP theory of intelligence, and some potential applications. The theory is designed to simplify and interface ideas across artificial intelligence, conventional computing, and human perception and cognizance, with information compression as a theme. It incorporates simplicity of both explanatory and descriptive power in numerous areas of computation and cognizance. Kruus et al. [20] presents a work similar to ours. They propose a chunking algorithm involving two stages that re-chunks transitional and non-duplicated big CDC chunks into small CDC chunks. The significance of their work is to reduce the number of chunks while attaining as the same duplicate elimination ratio as a baseline CDC algorithm. III. PROPOSED SYSTEM Figure1 represents the system architecture of proposed system which is concerned with establishing basic structural framework for a system. When a file is uploaded by the user say File X, text file is divided into blocks for each block a hash code is generated. Each time when a new file say File Y is uploaded it compares with hash code for duplication and then writes the file for cloud storage. A. Information compression and Block Creation The main aim of this project is to overcome the problems in big data using the SP Theory of Intelligence. In order to achieve this goal, big data is subjected to compression techniques. Compression of information is achieved by pattern matching. Using such a system leads to the improvement in the processing of big data. The SP Theory provides pattern recognition, information storage, retrieval and information compression. Fig.1: Architecture of proposed system Figure 2 represents the block diagram of our proposed sp theory of intelligence system. Our proposed system has two modules user and admin. In first stage using logging details user and admin can log into the system. In second stage user can upload a text file for processing, in block creation process divides the text into blocks of fixed or variable size. In last stage de duplication process calculates hash-function for each block and compares hash result with already stored index for duplication detection and update index and store data. The process of block creation or chunking divides the data stream into smaller, non overlapping blocks. There are different approaches for block creations static chunking, content defined chunking and file-based chunking. In our system we are using content defined chunking method, where chunks will be generated based on their content and the calculation of one fingerprint for each substring of length w i.e., one fingerprint for each word and processing overhead for fingerprint typically depends on string length, if small string length impact good performance, but bad chunking properties and large string length impact good chunking properties.
  • 5. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-2, Issue-5, May- 2016] Infogain Publication (Infogainpublication.com) ISSN : 2454-1311 www.ijaems.com Page | 275 J Gerard Wolff et.al [3] this article is an overview of the SP theory of intelligence is designed to simplify and interface concepts across artificial intelligence, conventional computing and human perception and cognizance, with information compression as a theme. It is understood as a human brain that receives New data and stores it in by compressing it as Old information; and it is envisioned in the form of a computer model, a first version of the SP machine. The matching and unification of data patterns and the concept of multiple alignments are the ideas behind the theory. J Gerard Wolff et.al [5] provides confirmation for the idea that much of artificial intelligence, human perception and cognizance, conventional computing, may be understood as compression of information via the matching and indication of patterns. This is the basis for the SP theory of intelligence, outlined in the paper and fully described elsewhere. Robert Escriva et.al [12] this paper presents HyperDex which is understood as a distributed key-value cache that provides a exclusive search primitive that empowers queries on secondary attributes. The key concept of HyperDex is the idea of hyperspace hashing in which objects having multiple attributes are located onto a multidimensional hyperspace. This scaling leads to productive implementations for searches of fractionally-specified secondary attribute and range queries and also for retrieval by primary key. J Gerard Wolff .et.al [13] describes existing and expected benefits of the SP theory of intelligence, and some potential applications. The theory is designed to simplify and interface ideas across artificial intelligence, conventional computing, and human perception and cognizance, with information compression as a theme. It incorporates simplicity of both explanatory and descriptive power in numerous areas of computation and cognizance. Kruus et al. [20] presents a work similar to ours. They propose a chunking algorithm involving two stages that re-chunks transitional and non-duplicated big CDC chunks into small CDC chunks. The significance of their work is to reduce the number of chunks while attaining as the same duplicate elimination ratio as a baseline CDC algorithm. III. PROPOSED SYSTEM Figure1 represents the system architecture of proposed system which is concerned with establishing basic structural framework for a system. When a file is uploaded by the user say File X, text file is divided into blocks for each block a hash code is generated. Each time when a new file say File Y is uploaded it compares with hash code for duplication and then writes the file for cloud storage. A. Information compression and Block Creation The main aim of this project is to overcome the problems in big data using the SP Theory of Intelligence. In order to achieve this goal, big data is subjected to compression techniques. Compression of information is achieved by pattern matching. Using such a system leads to the improvement in the processing of big data. The SP Theory provides pattern recognition, information storage, retrieval and information compression. Fig.1: Architecture of proposed system Figure 2 represents the block diagram of our proposed sp theory of intelligence system. Our proposed system has two modules user and admin. In first stage using logging details user and admin can log into the system. In second stage user can upload a text file for processing, in block creation process divides the text into blocks of fixed or variable size. In last stage de duplication process calculates hash-function for each block and compares hash result with already stored index for duplication detection and update index and store data. The process of block creation or chunking divides the data stream into smaller, non overlapping blocks. There are different approaches for block creations static chunking, content defined chunking and file-based chunking. In our system we are using content defined chunking method, where chunks will be generated based on their content and the calculation of one fingerprint for each substring of length w i.e., one fingerprint for each word and processing overhead for fingerprint typically depends on string length, if small string length impact good performance, but bad chunking properties and large string length impact good chunking properties.
  翻译: