Text Semantic Similarity - MachineLearning

Machine Learning Project of Semester VI students(Group 3) at School of Engineering and Applied Science, Ahmedabad University.

About

Machine Learning has found its place in the technological world rapidly since the past few years. One of the applications of Machine Learning includes Plagiarism Checking which is an application of Text Semantic Similarity. Text Semantic Similarity is a measure of the degree of semantic equivalence between two pieces of text.

How do we know whether a document that we are reading is authorized? Are students copying the content/ideas from other sources or are they produced by them?
In this project, we build algorithms (one or more) and analyse the algorithms suitable for plagiarism checking software by applying the already understood concepts of Machine Learning.

Team

1)Aneri Sheth- 1401072

Aneri Sheth

2)Himanshu Budhia- 1401039

Himanshu Budhia

3)Raj Shah- 1401050

Raj Shah

4)Twinkle Vaghela- 1401106

Twinkle Vaghela

NATURAL LANGUAGE PROCESSING

Natural Language processing is a wide domain coveringconcepts of Computer Science, Artificial Intelligence and Machine Learning. It is used to analyze text or how humansspeak. One of the applications of NLP is Semantic Analysis(Understanding the meaning of text).

Alt text

CORPUS-BASED APPROACH

This approach uses semantically annotated corpora to train Machine learning algorithms to decide which word to use in which context. Corpus-based methods are supervised learning approaches when the training data is trained by the algorithms. The corpora and the lexical resource used is WordNet.

Alt text


Sentence 1 - A cemetery is a place where dead people’s bodies or their ashes are buried.
Sentence 2 - A graveyard is an area of land, sometimes near a church, where dead people are buried.

Results

Output

Discussion and Future Work