SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 366
Text Segmentation for Online Subjective Examination Using Machine
Learning
Shahid Khan1, Rakshanda Chavan2 , Diksha Singh3, Tina Sajwan4
1,2,3,4 Modern Education Society’s College of Engineering, Pune-411001
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - This paper focuses on text segmentation for
natural language using k-Nearest Neighbour (K-NN)
classifier , which is a type of instance-based learning, or lazy
learning, where the function is only approximated locally
and all computation is deferred until classification. The Text
segmentation divides written text into meaningful units,
which is used by humans when reading text, and artificial
processes implemented in computers which are subject to
natural language processing. K-NN computes the similarity
measure among attributes to determine similarity between
feature vectors after which K-NN is modified based on the
similarity measure, this version is applied into the text
segmentation task. The goal of this paper is to implement
natural language processing using text segmentation which
provides the benefits.
Key Words: K-NN, text segmentation, feature similarity,
NLP
1.INTRODUCTION
The text segmentation is defined as process of segmenting
automatically a large text into many parts based on its
topic or content. The information retrieval (IR) systems
tend to retrieve long texts which contain more than one
topic, as very high relevant texts to the given query, so the
long texts need to be segmented into text partitions topic
by topic. The task of text segmentation is to partition the
text into sentences and paragraphs and judge whether the
topic boundary is put or not between two adjacent
sentences or paragraph. In this task, the text is given as the
input and segmented into paragraphs, a list of pairs of
adjacent paragraphs is generated, and each pair is judged
whether we put the topic boundary between them, or not.
The task is interpreted into a binary classification where
each pair of paragraphs is classified into separation or
non-separation. The task may be interpreted into the
binary classification where each sentence or paragraph
pair into the transition to the different topic or the
continuation of the identical topic.
Some issues are caused by encoding texts into numerical
vectors and computing their similarities based on only
attribute values. This problem causes very high costs for
processing each numerical vector representing a
document in terms of time and system resources. Much
more training examples are required proportionally to the
dimension for avoiding overfitting. The second problem is
sparse distribution where each numerical vector has zero
values dominantly.
Let us mention what we propose in this research as some
agenda. In this research, we assume that words are given
as features of numerical vectors in encoding texts, and
they have their semantic relations with others. Based on
the assumption, we define the similarity measure for
computing the similarity between feature vectors,
considering both feature values and features. We modify
the KNN into the version where both the feature similarity
and the feature value similarity are used, and apply it to
the classification task mapped from the text segmentation.
As benefits from this research, we expect its more
tolerance to the sparse distributions and the potential
avoidance of the huge dimensionality.
Let us mention what is expected from this research as
benefits by implementing the above ideas. We may cut
down the dimensionality in encoding texts into numerical
vectors, potentially. The information loss in computing the
similarity between texts may be reduced by reflecting the
similarities among the features.
We present some benefits which are expected from this
research. By representing the texts into alternative one to
the numerical vectors, we may escape from the two main
problems in doing so. The proposed approach becomes
less sensitive to the sparse distribution of numerical
vectors, because the similarity among features is captured
as well as among feature values.
2. RELATED WORK
Let us survey the previous cases of encoding texts into
structured forms for using the machine learning
algorithms for text mining tasks. The three main problems,
huge dimensionality, sparse distribution, and poor
transparency, have existed inherently in encoding them
into numerical vectors. In previous works, various
schemes of pre-processing texts have been proposed, in
order to solve the problems.
In paper [1], it is given that text segmentation refers to the
process of segmenting an article into its several parts
based on its content. Because in the information retrieval
systems, a long text tends to be retrieved most frequently
by overestimation of its relevancy to a query, we need to
segment it into its several parts, in order to avoid the
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 367
problem. In this task, the text is given as the input and
segmented into paragraphs, a list of pairs of adjacent
paragraphs is generated, and each pair is judged whether
we put the topic boundary between them, or not.
In paper [2], The task of text segmentation is to partition
the text into sentences and paragraphs and judge whether
the topic boundary is put or not between two adjacent
sentences or paragraph. The task may be interpreted into
the binary classification where each sentence or
paragraph pair into the transition to the different topic or
the continuation of the identical topic. Segmentation of
speech texts into sentences or paragraphs may be
considered but covered in the next research. In the text
categorization, the sample texts may span over various
domains, whereas in the text segmentation, the sample
paragraphs should be within a domain. Therefore,
although the text segmentation belongs to the
classification task, it should be distinguished from the
topic based text categorization. The text segmentation is
mapped into a binary classification.
In paper [3], the application of the back propagation to the
judgment of keywords is validated restrictedly. The
definition of the back propagation to the judgment of
keywords may be considered in various ways. The
Information systems dealing with documents, such as
Knowledge Management (KM), Information Retrieval (IR)
and Digital Library (DL) systems require the storage of
documents and structured data, called the document
surrogate, associated with documents. Documents are
written in natural language and cannot be processed
directly by computers. A typical document surrogate,
which is converted from the natural language document
by computer, contains indices of the document and
includes main words reflecting the contents. Indexing
defines the process of converting a document into a list of
words included in it. This paper proposed the application
of back propagation and consideration of more factors
with the addition to TF (Term Frequency) and IDF
(Inverse Document Frequency).
Paper [4], states that text categorization is the process of
assigning one or some among predefined categories to
each document. The task belongs to pattern classification
where texts or documents are given as patterns. Note that
almost information in any system is given as textual
formats dominantly over numerical one. For managing
efficiently the kind of information given as the textual
format, techniques of text categorization are necessary;
text categorization became a very interesting research
topic in both academic and industrial worlds. In this
version of the proposed text categorization system, the
number of entries of tables is fixed constantly. The
proposed one is called static index based approach.
However, the optimal number of entries is very dependent
on the given document or corpus. The size of each table
should be optimized in terms of two factors: reliability and
efficiency.
In paper [5], authors tried to understand the automated
categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last 10
years, due to the increased availability of documents in
digital form and the ensuing need to organize them. It is
important to bear in mind that the considerations above
are not absolute statements (if there may be any) on the
comparative effectiveness of these TC methods. One of the
reasons is that a particular applicative context may exhibit
very different characteristics from the ones to be found in
Reuters, and different classifiers may respond differently
to these characteristics. An experimental study by
Joachims [1998] involving support vector machines, k-NN,
decision trees, Rocchio, and Naive Bayes, showed all these
classifiers to have similar effectiveness on categories with
300 positive training examples each. The fact that this
experiment involved the methods which have scored best
(support vector machines, k-NN) and worst (Rocchio and
Naive Bayes).Most popular approach to TC, at least in the
operational (i.e., real world applications) community, was
a knowledge engineering (KE).
In paper [6], the authors have studied that text clustering
refers to the process of segmenting a particular group of
documents into sub groups each of which contains content
based similar documents. A collection or group of
documents is given as the input of the task. Several smaller
groups of content-based similar documents are generated
from the task as its output. Although there are many
heuristic approaches to the task, unsupervised learning
algorithms have been used as state of the art approaches
to it. The process of encoding documents into numerical
vectors for using traditional unsupervised learning
algorithms for text clustering causes the two main
problems. The first problem is huge dimensionality where
documents must be encoded into very large dimensional
numerical vectors for preventing information loss. In
general, documents must be encoded at least into several
hundreds dimensional numerical vectors in previous
literatures. This problem causes very expensive cost for
processing each numerical vector representing a
document in terms of time and system resources.
Furthermore, much more training examples are required
proportionally to the dimension for avoiding
overfitting.The second problem is sparse distribution
where each numerical vector has zero values dominantly.
In other words, more than 90 degree 0 of its elements are
zero values in each numerical vector. This phenomenon
degrades the discrimination among numerical vectors.
This causes poor performance of text categorization or
text clustering. In order to improve performance of both
tasks, the two problems should be solved.
3. PROPOSED SYSTEM
KNN Classifier:
This section tells about the KNN classifier which is an
algorithm used for text segmentation. It keeps the record
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 368
of all the previous cases and another unknown case is
been classified. It is a type of supervised learning. The
unknown case is been classified by the maximum votes of
its K nearest neighbours. It is a kind of Machine Learning
algorithm and also one of the simplest algorithm used for
classification. It considers the similarity between the
attributes of the answers written by the user and then
computes the similarities between the features of the
answer and specimen answers. In this research, we
encode sentence pairs or paragraph pairs into string
vectors, and apply the string vector based version of KNN
to the classification task mapped from the text
segmentation
NLP:
This section is concerned with Natural Language
Processing which is a field of AI(Artificial Intelligence). It
is about the co-operation between the computer and the
Natural Language used by humans. NLP is helpful in
solving many problems like machine translation and text
segmentation.
Text Segmentation:
This section is concerned about Text segmentation which
is the process where the text which is been written is
divided into small parts. The term applies both to mental
processes used by humans when reading text, and to
artificial processes implemented in computers, which are
the subject of natural language processing. It is very
helpful in assisting computers so that it is possible for the
computers to do artificial things. It is a precursor Natural
Language Processing. Text Segmentation recognizes the
boundaries in between the words.
Data Store:
This section tells us about the role of data store in the
process. A data store is a repository for storing collections
of data, such as database. A data store is basically a
connection to the repository of data, whether the data is
stored in a single database or in one more different files.
The data store can be used to gain data or you can export
the data from results and then store it in the data store, or
both. The data collected from the users is stored in the
data store. For the processing the data stored in the data
store is processed and stored back into the data store for
the users to retrieve their processed data whenever he
wants. Hence data store plays a major role in the entire
process. For the data to be stored in the data store it need
not compulsorily be arranged in some relational format.
Fig -1: Architecture Diagram
4. CONCLUSIONS
An examination system is developed based on the web.
This paper describes the principle of the system, presents
the main functions of the system, analyzes the auto-
generating test paper algorithm, and discusses the
security of the system. With the help of the algorithm we
can conduct online subjective exams anywhere and
everywhere.
It saves time as it allows number of students to give the
exam at a time and displays the results as the test gets
over, so no need to wait for the result. It is automatically
generated by the server. Staff has a privilege to create,
modify and delete the test papers and its particular
questions. Student can register, login and give the test
with his specific id, and can see the results as well.
ACKNOWLEDGEMENT
We thank our guide Prof. A. D. Dhawale for his guidance
and support.
REFERENCES
1) Taeho Jo, “Using K Nearest Neighbors for Text
Segmentation with Feature Similarity”,
International Conference on Communication,
Control, Computing and Electronics Engineering
(ICCCCEE), 2017.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 369
2) Taeho Jo, “Content based Segmentation of Texts
using Table based KNN”, IKE, 2017.
3) Taeho Jo, Malrey Lee , and Thomas M Gatton,
“Keyword Extraction from Documents Using a
Neural Network Model”, IEEE, 2016.
4) T. Jo, “NTC (Neural Text Categorizer): Neural
Network for Text Categorization”, pp83-96,
International Journal of Information Studies, Vol 2,
No 2, 2010.
5) T. Jo, “Normalized Table Matching Algorithm as
Approach to Text Categorization”, pp839-849, Soft
Computing, Vol 19, No 4, 2015.
6) T. Jo, “Single Pass Algorithm for Text Clustering by
Encoding Documents into Tables”, pp1749-1757,
Journal of Korea Multimedia Society, Vol 11, No 12,
2008.
7) T. Jo and D. Cho, “Index based Approach for Text
Categorization”, International Journal of
Mathematics and Computers in Simulation, Vol 2,
No 1, 2008.
8) H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini,
and C. Watkins, “Text Classification with String
Kernels”, pp419-444, Journal of Machine Learning
Research, Vol 2, No 2, 2002.
9) F. Sebastiani, “Machine Learning in Automated Text
Categorization”, pp1-47, ACM Computing Survey,
Vol 34, No 1, 2002
10) T. Jo, “Representation of Texts into String Vectors
for Text Categorization”, pp110-127, Journal of
Computing Science and Engineering, Vol 4, No 2,
2010.

More Related Content

What's hot (15)

A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...
A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...
A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...
IJERA Editor
 
G04124041046
G04124041046G04124041046
G04124041046
IOSR-JEN
 
An efficient-classification-model-for-unstructured-text-document
An efficient-classification-model-for-unstructured-text-documentAn efficient-classification-model-for-unstructured-text-document
An efficient-classification-model-for-unstructured-text-document
SaleihGero
 
Bl24409420
Bl24409420Bl24409420
Bl24409420
IJERA Editor
 
Relevance feature discovery for text mining
Relevance feature discovery for text miningRelevance feature discovery for text mining
Relevance feature discovery for text mining
redpel dot com
 
An Evaluation of Preprocessing Techniques for Text Classification
An Evaluation of Preprocessing Techniques for Text ClassificationAn Evaluation of Preprocessing Techniques for Text Classification
An Evaluation of Preprocessing Techniques for Text Classification
IJCSIS Research Publications
 
Query Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationQuery Answering Approach Based on Document Summarization
Query Answering Approach Based on Document Summarization
IJMER
 
IRJET- Text Document Clustering using K-Means Algorithm
IRJET-  	  Text Document Clustering using K-Means Algorithm IRJET-  	  Text Document Clustering using K-Means Algorithm
IRJET- Text Document Clustering using K-Means Algorithm
IRJET Journal
 
IRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between SentencesIRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between Sentences
IRJET Journal
 
Hc3612711275
Hc3612711275Hc3612711275
Hc3612711275
IJERA Editor
 
Legal Document
Legal DocumentLegal Document
Legal Document
legal4
 
Text Document categorization using support vector machine
Text Document categorization using support vector machineText Document categorization using support vector machine
Text Document categorization using support vector machine
IRJET Journal
 
Suitability of naïve bayesian methods for paragraph level text classification...
Suitability of naïve bayesian methods for paragraph level text classification...Suitability of naïve bayesian methods for paragraph level text classification...
Suitability of naïve bayesian methods for paragraph level text classification...
ijaia
 
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGA CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
ijcsit
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
INFOGAIN PUBLICATION
 
A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...
A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...
A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...
IJERA Editor
 
G04124041046
G04124041046G04124041046
G04124041046
IOSR-JEN
 
An efficient-classification-model-for-unstructured-text-document
An efficient-classification-model-for-unstructured-text-documentAn efficient-classification-model-for-unstructured-text-document
An efficient-classification-model-for-unstructured-text-document
SaleihGero
 
Relevance feature discovery for text mining
Relevance feature discovery for text miningRelevance feature discovery for text mining
Relevance feature discovery for text mining
redpel dot com
 
An Evaluation of Preprocessing Techniques for Text Classification
An Evaluation of Preprocessing Techniques for Text ClassificationAn Evaluation of Preprocessing Techniques for Text Classification
An Evaluation of Preprocessing Techniques for Text Classification
IJCSIS Research Publications
 
Query Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationQuery Answering Approach Based on Document Summarization
Query Answering Approach Based on Document Summarization
IJMER
 
IRJET- Text Document Clustering using K-Means Algorithm
IRJET-  	  Text Document Clustering using K-Means Algorithm IRJET-  	  Text Document Clustering using K-Means Algorithm
IRJET- Text Document Clustering using K-Means Algorithm
IRJET Journal
 
IRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between SentencesIRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between Sentences
IRJET Journal
 
Legal Document
Legal DocumentLegal Document
Legal Document
legal4
 
Text Document categorization using support vector machine
Text Document categorization using support vector machineText Document categorization using support vector machine
Text Document categorization using support vector machine
IRJET Journal
 
Suitability of naïve bayesian methods for paragraph level text classification...
Suitability of naïve bayesian methods for paragraph level text classification...Suitability of naïve bayesian methods for paragraph level text classification...
Suitability of naïve bayesian methods for paragraph level text classification...
ijaia
 
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGA CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
ijcsit
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
INFOGAIN PUBLICATION
 

Similar to Text Segmentation for Online Subjective Examination using Machine Learning (20)

IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET Journal
 
An in-depth review on News Classification through NLP
An in-depth review on News Classification through NLPAn in-depth review on News Classification through NLP
An in-depth review on News Classification through NLP
IRJET Journal
 
Reviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringReviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clustering
IRJET Journal
 
Semantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical ChainsSemantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical Chains
IRJET Journal
 
C017321319
C017321319C017321319
C017321319
IOSR Journals
 
Text Document Classification System
Text Document Classification SystemText Document Classification System
Text Document Classification System
IRJET Journal
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
IRJET Journal
 
Feature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsFeature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documents
IJECEIAES
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET Journal
 
Exploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining Applications
IRJET Journal
 
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET-  	  Review on Information Retrieval for Desktop Search EngineIRJET-  	  Review on Information Retrieval for Desktop Search Engine
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET Journal
 
Improved Text Mining for Bulk Data Using Deep Learning Approach
Improved Text Mining for Bulk Data Using Deep Learning Approach Improved Text Mining for Bulk Data Using Deep Learning Approach
Improved Text Mining for Bulk Data Using Deep Learning Approach
IJCSIS Research Publications
 
IRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET- Semantic based Automatic Text Summarization based on Soft ComputingIRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET Journal
 
Survey of Machine Learning Techniques in Textual Document Classification
Survey of Machine Learning Techniques in Textual Document ClassificationSurvey of Machine Learning Techniques in Textual Document Classification
Survey of Machine Learning Techniques in Textual Document Classification
IOSR Journals
 
Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase
IJECEIAES
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET Journal
 
Context Driven Technique for Document Classification
Context Driven Technique for Document ClassificationContext Driven Technique for Document Classification
Context Driven Technique for Document Classification
IDES Editor
 
Meta documents and query extension to enhance information retrieval process
Meta documents and query extension to enhance information retrieval processMeta documents and query extension to enhance information retrieval process
Meta documents and query extension to enhance information retrieval process
eSAT Journals
 
Group4 doc
Group4 docGroup4 doc
Group4 doc
firati
 
Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...
IAESIJAI
 
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET Journal
 
An in-depth review on News Classification through NLP
An in-depth review on News Classification through NLPAn in-depth review on News Classification through NLP
An in-depth review on News Classification through NLP
IRJET Journal
 
Reviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringReviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clustering
IRJET Journal
 
Semantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical ChainsSemantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical Chains
IRJET Journal
 
Text Document Classification System
Text Document Classification SystemText Document Classification System
Text Document Classification System
IRJET Journal
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
IRJET Journal
 
Feature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsFeature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documents
IJECEIAES
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET Journal
 
Exploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining Applications
IRJET Journal
 
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET-  	  Review on Information Retrieval for Desktop Search EngineIRJET-  	  Review on Information Retrieval for Desktop Search Engine
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET Journal
 
Improved Text Mining for Bulk Data Using Deep Learning Approach
Improved Text Mining for Bulk Data Using Deep Learning Approach Improved Text Mining for Bulk Data Using Deep Learning Approach
Improved Text Mining for Bulk Data Using Deep Learning Approach
IJCSIS Research Publications
 
IRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET- Semantic based Automatic Text Summarization based on Soft ComputingIRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET Journal
 
Survey of Machine Learning Techniques in Textual Document Classification
Survey of Machine Learning Techniques in Textual Document ClassificationSurvey of Machine Learning Techniques in Textual Document Classification
Survey of Machine Learning Techniques in Textual Document Classification
IOSR Journals
 
Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase
IJECEIAES
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET Journal
 
Context Driven Technique for Document Classification
Context Driven Technique for Document ClassificationContext Driven Technique for Document Classification
Context Driven Technique for Document Classification
IDES Editor
 
Meta documents and query extension to enhance information retrieval process
Meta documents and query extension to enhance information retrieval processMeta documents and query extension to enhance information retrieval process
Meta documents and query extension to enhance information retrieval process
eSAT Journals
 
Group4 doc
Group4 docGroup4 doc
Group4 doc
firati
 
Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...
IAESIJAI
 

More from IRJET Journal (20)

Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
BRAIN TUMOUR DETECTION AND CLASSIFICATIONBRAIN TUMOUR DETECTION AND CLASSIFICATION
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ..."Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
Breast Cancer Detection using Computer VisionBreast Cancer Detection using Computer Vision
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the HeliosphereAnalysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
A Novel System for Recommending Agricultural Crops Using Machine Learning App...A Novel System for Recommending Agricultural Crops Using Machine Learning App...
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the HeliosphereAnalysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
FIR filter-based Sample Rate Convertors and its use in NR PRACH
FIR filter-based Sample Rate Convertors and its use in NR PRACHFIR filter-based Sample Rate Convertors and its use in NR PRACH
FIR filter-based Sample Rate Convertors and its use in NR PRACH
IRJET Journal
 
Kiona – A Smart Society Automation Project
Kiona – A Smart Society Automation ProjectKiona – A Smart Society Automation Project
Kiona – A Smart Society Automation Project
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
Invest in Innovation: Empowering Ideas through Blockchain Based CrowdfundingInvest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUBSPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
AR Application: Homewise VisionMs. Vaishali Rane, Om Awadhoot, Bhargav Gajare...
AR Application: Homewise VisionMs. Vaishali Rane, Om Awadhoot, Bhargav Gajare...AR Application: Homewise VisionMs. Vaishali Rane, Om Awadhoot, Bhargav Gajare...
AR Application: Homewise VisionMs. Vaishali Rane, Om Awadhoot, Bhargav Gajare...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
BRAIN TUMOUR DETECTION AND CLASSIFICATIONBRAIN TUMOUR DETECTION AND CLASSIFICATION
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ..."Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
Breast Cancer Detection using Computer VisionBreast Cancer Detection using Computer Vision
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the HeliosphereAnalysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
A Novel System for Recommending Agricultural Crops Using Machine Learning App...A Novel System for Recommending Agricultural Crops Using Machine Learning App...
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the HeliosphereAnalysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
FIR filter-based Sample Rate Convertors and its use in NR PRACH
FIR filter-based Sample Rate Convertors and its use in NR PRACHFIR filter-based Sample Rate Convertors and its use in NR PRACH
FIR filter-based Sample Rate Convertors and its use in NR PRACH
IRJET Journal
 
Kiona – A Smart Society Automation Project
Kiona – A Smart Society Automation ProjectKiona – A Smart Society Automation Project
Kiona – A Smart Society Automation Project
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
Invest in Innovation: Empowering Ideas through Blockchain Based CrowdfundingInvest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUBSPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
AR Application: Homewise VisionMs. Vaishali Rane, Om Awadhoot, Bhargav Gajare...
AR Application: Homewise VisionMs. Vaishali Rane, Om Awadhoot, Bhargav Gajare...AR Application: Homewise VisionMs. Vaishali Rane, Om Awadhoot, Bhargav Gajare...
AR Application: Homewise VisionMs. Vaishali Rane, Om Awadhoot, Bhargav Gajare...
IRJET Journal
 

Recently uploaded (20)

2.3 Genetically Modified Organisms (1).ppt
2.3 Genetically Modified Organisms (1).ppt2.3 Genetically Modified Organisms (1).ppt
2.3 Genetically Modified Organisms (1).ppt
rakshaiya16
 
SICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introductionSICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introduction
fabienklr
 
hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .
NABLAS株式会社
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
introduction technology technology tec.pptx
introduction technology technology tec.pptxintroduction technology technology tec.pptx
introduction technology technology tec.pptx
Iftikhar70
 
Personal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.pptPersonal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.ppt
ganjangbegu579
 
Design Optimization of Reinforced Concrete Waffle Slab Using Genetic Algorithm
Design Optimization of Reinforced Concrete Waffle Slab Using Genetic AlgorithmDesign Optimization of Reinforced Concrete Waffle Slab Using Genetic Algorithm
Design Optimization of Reinforced Concrete Waffle Slab Using Genetic Algorithm
Journal of Soft Computing in Civil Engineering
 
Applications of Centroid in Structural Engineering
Applications of Centroid in Structural EngineeringApplications of Centroid in Structural Engineering
Applications of Centroid in Structural Engineering
suvrojyotihalder2006
 
Nanometer Metal-Organic-Framework Literature Comparison
Nanometer Metal-Organic-Framework  Literature ComparisonNanometer Metal-Organic-Framework  Literature Comparison
Nanometer Metal-Organic-Framework Literature Comparison
Chris Harding
 
Uses of drones in civil construction.pdf
Uses of drones in civil construction.pdfUses of drones in civil construction.pdf
Uses of drones in civil construction.pdf
surajsen1729
 
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdfSmart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
PawachMetharattanara
 
ML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdf
ML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdfML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdf
ML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdf
rameshwarchintamani
 
Modelling of Concrete Compressive Strength Admixed with GGBFS Using Gene Expr...
Modelling of Concrete Compressive Strength Admixed with GGBFS Using Gene Expr...Modelling of Concrete Compressive Strength Admixed with GGBFS Using Gene Expr...
Modelling of Concrete Compressive Strength Admixed with GGBFS Using Gene Expr...
Journal of Soft Computing in Civil Engineering
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 
Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
Control Methods of Noise Pollutions.pptx
Control Methods of Noise Pollutions.pptxControl Methods of Noise Pollutions.pptx
Control Methods of Noise Pollutions.pptx
vvsasane
 
Evonik Overview Visiomer Specialty Methacrylates.pdf
Evonik Overview Visiomer Specialty Methacrylates.pdfEvonik Overview Visiomer Specialty Methacrylates.pdf
Evonik Overview Visiomer Specialty Methacrylates.pdf
szhang13
 
Prediction of Flexural Strength of Concrete Produced by Using Pozzolanic Mate...
Prediction of Flexural Strength of Concrete Produced by Using Pozzolanic Mate...Prediction of Flexural Strength of Concrete Produced by Using Pozzolanic Mate...
Prediction of Flexural Strength of Concrete Produced by Using Pozzolanic Mate...
Journal of Soft Computing in Civil Engineering
 
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjjseninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
AjijahamadKhaji
 
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 
2.3 Genetically Modified Organisms (1).ppt
2.3 Genetically Modified Organisms (1).ppt2.3 Genetically Modified Organisms (1).ppt
2.3 Genetically Modified Organisms (1).ppt
rakshaiya16
 
SICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introductionSICPA: Fabien Keller - background introduction
SICPA: Fabien Keller - background introduction
fabienklr
 
hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .hypermedia_system_revisit_roy_fielding .
hypermedia_system_revisit_roy_fielding .
NABLAS株式会社
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
introduction technology technology tec.pptx
introduction technology technology tec.pptxintroduction technology technology tec.pptx
introduction technology technology tec.pptx
Iftikhar70
 
Personal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.pptPersonal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.ppt
ganjangbegu579
 
Applications of Centroid in Structural Engineering
Applications of Centroid in Structural EngineeringApplications of Centroid in Structural Engineering
Applications of Centroid in Structural Engineering
suvrojyotihalder2006
 
Nanometer Metal-Organic-Framework Literature Comparison
Nanometer Metal-Organic-Framework  Literature ComparisonNanometer Metal-Organic-Framework  Literature Comparison
Nanometer Metal-Organic-Framework Literature Comparison
Chris Harding
 
Uses of drones in civil construction.pdf
Uses of drones in civil construction.pdfUses of drones in civil construction.pdf
Uses of drones in civil construction.pdf
surajsen1729
 
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdfSmart City is the Future EN - 2024 Thailand Modify V1.0.pdf
Smart City is the Future EN - 2024 Thailand Modify V1.0.pdf
PawachMetharattanara
 
ML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdf
ML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdfML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdf
ML_Unit_V_RDC_ASSOCIATION AND DIMENSIONALITY REDUCTION.pdf
rameshwarchintamani
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 
Frontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend EngineersFrontend Architecture Diagram/Guide For Frontend Engineers
Frontend Architecture Diagram/Guide For Frontend Engineers
Michael Hertzberg
 
Control Methods of Noise Pollutions.pptx
Control Methods of Noise Pollutions.pptxControl Methods of Noise Pollutions.pptx
Control Methods of Noise Pollutions.pptx
vvsasane
 
Evonik Overview Visiomer Specialty Methacrylates.pdf
Evonik Overview Visiomer Specialty Methacrylates.pdfEvonik Overview Visiomer Specialty Methacrylates.pdf
Evonik Overview Visiomer Specialty Methacrylates.pdf
szhang13
 
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjjseninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
seninarppt.pptx1bhjiikjhggghjykoirgjuyhhhjj
AjijahamadKhaji
 
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdfML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
ML_Unit_VI_DEEP LEARNING_Introduction to ANN.pdf
rameshwarchintamani
 

Text Segmentation for Online Subjective Examination using Machine Learning

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 366 Text Segmentation for Online Subjective Examination Using Machine Learning Shahid Khan1, Rakshanda Chavan2 , Diksha Singh3, Tina Sajwan4 1,2,3,4 Modern Education Society’s College of Engineering, Pune-411001 ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - This paper focuses on text segmentation for natural language using k-Nearest Neighbour (K-NN) classifier , which is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The Text segmentation divides written text into meaningful units, which is used by humans when reading text, and artificial processes implemented in computers which are subject to natural language processing. K-NN computes the similarity measure among attributes to determine similarity between feature vectors after which K-NN is modified based on the similarity measure, this version is applied into the text segmentation task. The goal of this paper is to implement natural language processing using text segmentation which provides the benefits. Key Words: K-NN, text segmentation, feature similarity, NLP 1.INTRODUCTION The text segmentation is defined as process of segmenting automatically a large text into many parts based on its topic or content. The information retrieval (IR) systems tend to retrieve long texts which contain more than one topic, as very high relevant texts to the given query, so the long texts need to be segmented into text partitions topic by topic. The task of text segmentation is to partition the text into sentences and paragraphs and judge whether the topic boundary is put or not between two adjacent sentences or paragraph. In this task, the text is given as the input and segmented into paragraphs, a list of pairs of adjacent paragraphs is generated, and each pair is judged whether we put the topic boundary between them, or not. The task is interpreted into a binary classification where each pair of paragraphs is classified into separation or non-separation. The task may be interpreted into the binary classification where each sentence or paragraph pair into the transition to the different topic or the continuation of the identical topic. Some issues are caused by encoding texts into numerical vectors and computing their similarities based on only attribute values. This problem causes very high costs for processing each numerical vector representing a document in terms of time and system resources. Much more training examples are required proportionally to the dimension for avoiding overfitting. The second problem is sparse distribution where each numerical vector has zero values dominantly. Let us mention what we propose in this research as some agenda. In this research, we assume that words are given as features of numerical vectors in encoding texts, and they have their semantic relations with others. Based on the assumption, we define the similarity measure for computing the similarity between feature vectors, considering both feature values and features. We modify the KNN into the version where both the feature similarity and the feature value similarity are used, and apply it to the classification task mapped from the text segmentation. As benefits from this research, we expect its more tolerance to the sparse distributions and the potential avoidance of the huge dimensionality. Let us mention what is expected from this research as benefits by implementing the above ideas. We may cut down the dimensionality in encoding texts into numerical vectors, potentially. The information loss in computing the similarity between texts may be reduced by reflecting the similarities among the features. We present some benefits which are expected from this research. By representing the texts into alternative one to the numerical vectors, we may escape from the two main problems in doing so. The proposed approach becomes less sensitive to the sparse distribution of numerical vectors, because the similarity among features is captured as well as among feature values. 2. RELATED WORK Let us survey the previous cases of encoding texts into structured forms for using the machine learning algorithms for text mining tasks. The three main problems, huge dimensionality, sparse distribution, and poor transparency, have existed inherently in encoding them into numerical vectors. In previous works, various schemes of pre-processing texts have been proposed, in order to solve the problems. In paper [1], it is given that text segmentation refers to the process of segmenting an article into its several parts based on its content. Because in the information retrieval systems, a long text tends to be retrieved most frequently by overestimation of its relevancy to a query, we need to segment it into its several parts, in order to avoid the
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 367 problem. In this task, the text is given as the input and segmented into paragraphs, a list of pairs of adjacent paragraphs is generated, and each pair is judged whether we put the topic boundary between them, or not. In paper [2], The task of text segmentation is to partition the text into sentences and paragraphs and judge whether the topic boundary is put or not between two adjacent sentences or paragraph. The task may be interpreted into the binary classification where each sentence or paragraph pair into the transition to the different topic or the continuation of the identical topic. Segmentation of speech texts into sentences or paragraphs may be considered but covered in the next research. In the text categorization, the sample texts may span over various domains, whereas in the text segmentation, the sample paragraphs should be within a domain. Therefore, although the text segmentation belongs to the classification task, it should be distinguished from the topic based text categorization. The text segmentation is mapped into a binary classification. In paper [3], the application of the back propagation to the judgment of keywords is validated restrictedly. The definition of the back propagation to the judgment of keywords may be considered in various ways. The Information systems dealing with documents, such as Knowledge Management (KM), Information Retrieval (IR) and Digital Library (DL) systems require the storage of documents and structured data, called the document surrogate, associated with documents. Documents are written in natural language and cannot be processed directly by computers. A typical document surrogate, which is converted from the natural language document by computer, contains indices of the document and includes main words reflecting the contents. Indexing defines the process of converting a document into a list of words included in it. This paper proposed the application of back propagation and consideration of more factors with the addition to TF (Term Frequency) and IDF (Inverse Document Frequency). Paper [4], states that text categorization is the process of assigning one or some among predefined categories to each document. The task belongs to pattern classification where texts or documents are given as patterns. Note that almost information in any system is given as textual formats dominantly over numerical one. For managing efficiently the kind of information given as the textual format, techniques of text categorization are necessary; text categorization became a very interesting research topic in both academic and industrial worlds. In this version of the proposed text categorization system, the number of entries of tables is fixed constantly. The proposed one is called static index based approach. However, the optimal number of entries is very dependent on the given document or corpus. The size of each table should be optimized in terms of two factors: reliability and efficiency. In paper [5], authors tried to understand the automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. It is important to bear in mind that the considerations above are not absolute statements (if there may be any) on the comparative effectiveness of these TC methods. One of the reasons is that a particular applicative context may exhibit very different characteristics from the ones to be found in Reuters, and different classifiers may respond differently to these characteristics. An experimental study by Joachims [1998] involving support vector machines, k-NN, decision trees, Rocchio, and Naive Bayes, showed all these classifiers to have similar effectiveness on categories with 300 positive training examples each. The fact that this experiment involved the methods which have scored best (support vector machines, k-NN) and worst (Rocchio and Naive Bayes).Most popular approach to TC, at least in the operational (i.e., real world applications) community, was a knowledge engineering (KE). In paper [6], the authors have studied that text clustering refers to the process of segmenting a particular group of documents into sub groups each of which contains content based similar documents. A collection or group of documents is given as the input of the task. Several smaller groups of content-based similar documents are generated from the task as its output. Although there are many heuristic approaches to the task, unsupervised learning algorithms have been used as state of the art approaches to it. The process of encoding documents into numerical vectors for using traditional unsupervised learning algorithms for text clustering causes the two main problems. The first problem is huge dimensionality where documents must be encoded into very large dimensional numerical vectors for preventing information loss. In general, documents must be encoded at least into several hundreds dimensional numerical vectors in previous literatures. This problem causes very expensive cost for processing each numerical vector representing a document in terms of time and system resources. Furthermore, much more training examples are required proportionally to the dimension for avoiding overfitting.The second problem is sparse distribution where each numerical vector has zero values dominantly. In other words, more than 90 degree 0 of its elements are zero values in each numerical vector. This phenomenon degrades the discrimination among numerical vectors. This causes poor performance of text categorization or text clustering. In order to improve performance of both tasks, the two problems should be solved. 3. PROPOSED SYSTEM KNN Classifier: This section tells about the KNN classifier which is an algorithm used for text segmentation. It keeps the record
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 368 of all the previous cases and another unknown case is been classified. It is a type of supervised learning. The unknown case is been classified by the maximum votes of its K nearest neighbours. It is a kind of Machine Learning algorithm and also one of the simplest algorithm used for classification. It considers the similarity between the attributes of the answers written by the user and then computes the similarities between the features of the answer and specimen answers. In this research, we encode sentence pairs or paragraph pairs into string vectors, and apply the string vector based version of KNN to the classification task mapped from the text segmentation NLP: This section is concerned with Natural Language Processing which is a field of AI(Artificial Intelligence). It is about the co-operation between the computer and the Natural Language used by humans. NLP is helpful in solving many problems like machine translation and text segmentation. Text Segmentation: This section is concerned about Text segmentation which is the process where the text which is been written is divided into small parts. The term applies both to mental processes used by humans when reading text, and to artificial processes implemented in computers, which are the subject of natural language processing. It is very helpful in assisting computers so that it is possible for the computers to do artificial things. It is a precursor Natural Language Processing. Text Segmentation recognizes the boundaries in between the words. Data Store: This section tells us about the role of data store in the process. A data store is a repository for storing collections of data, such as database. A data store is basically a connection to the repository of data, whether the data is stored in a single database or in one more different files. The data store can be used to gain data or you can export the data from results and then store it in the data store, or both. The data collected from the users is stored in the data store. For the processing the data stored in the data store is processed and stored back into the data store for the users to retrieve their processed data whenever he wants. Hence data store plays a major role in the entire process. For the data to be stored in the data store it need not compulsorily be arranged in some relational format. Fig -1: Architecture Diagram 4. CONCLUSIONS An examination system is developed based on the web. This paper describes the principle of the system, presents the main functions of the system, analyzes the auto- generating test paper algorithm, and discusses the security of the system. With the help of the algorithm we can conduct online subjective exams anywhere and everywhere. It saves time as it allows number of students to give the exam at a time and displays the results as the test gets over, so no need to wait for the result. It is automatically generated by the server. Staff has a privilege to create, modify and delete the test papers and its particular questions. Student can register, login and give the test with his specific id, and can see the results as well. ACKNOWLEDGEMENT We thank our guide Prof. A. D. Dhawale for his guidance and support. REFERENCES 1) Taeho Jo, “Using K Nearest Neighbors for Text Segmentation with Feature Similarity”, International Conference on Communication, Control, Computing and Electronics Engineering (ICCCCEE), 2017.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 369 2) Taeho Jo, “Content based Segmentation of Texts using Table based KNN”, IKE, 2017. 3) Taeho Jo, Malrey Lee , and Thomas M Gatton, “Keyword Extraction from Documents Using a Neural Network Model”, IEEE, 2016. 4) T. Jo, “NTC (Neural Text Categorizer): Neural Network for Text Categorization”, pp83-96, International Journal of Information Studies, Vol 2, No 2, 2010. 5) T. Jo, “Normalized Table Matching Algorithm as Approach to Text Categorization”, pp839-849, Soft Computing, Vol 19, No 4, 2015. 6) T. Jo, “Single Pass Algorithm for Text Clustering by Encoding Documents into Tables”, pp1749-1757, Journal of Korea Multimedia Society, Vol 11, No 12, 2008. 7) T. Jo and D. Cho, “Index based Approach for Text Categorization”, International Journal of Mathematics and Computers in Simulation, Vol 2, No 1, 2008. 8) H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins, “Text Classification with String Kernels”, pp419-444, Journal of Machine Learning Research, Vol 2, No 2, 2002. 9) F. Sebastiani, “Machine Learning in Automated Text Categorization”, pp1-47, ACM Computing Survey, Vol 34, No 1, 2002 10) T. Jo, “Representation of Texts into String Vectors for Text Categorization”, pp110-127, Journal of Computing Science and Engineering, Vol 4, No 2, 2010.
  翻译: