SlideShare a Scribd company logo
International Journal on Natural Language Computing (IJNLC) Vol.10, No.5, October 2021
DOI: 10.5121/ijnlc.2021.10502 9
ANALYZING ARCHITECTURES FOR NEURAL
MACHINE TRANSLATION USING LOW
COMPUTATIONAL RESOURCES
Aditya Mandke, Onkar Litake, and Dipali Kadam
Department of Computer Engineering, SCTR’s Pune Institute of Computer Technology,
Pune, Maharashtra, India
ABSTRACT
With the recent developments in the field of Natural Language Processing, there has been a rise in the use
of different architectures for Neural Machine Translation. Transformer architectures are used to achieve
state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such
setups consisting of high-end GPUs and other resources. We train our models on low computational
resources and investigate the results. As expected, transformers outperformed other architectures, but
there were some surprising results. Transformers consisting of more encoders and decoders took more
time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively
less time to train than transformers, making it suitable to use in situations having time constraints.
KEYWORDS
Machine Translation, Indic Languages, Natural Language Processing
1. INTRODUCTION
Machine Translation (MT) can be used to automatically convert a sentence in one (source)
language into its equivalent translation in another language (target) using Natural Language
Processing. Earlier, rule-based MT was the norm, with heuristic learning of phrase translations
from word-based alignments and lexical weighting of phrase translations [1].
Later, seq2seq networks [2,3,4] have dominated the field of Machine Translation by replacing
statistical phrase-based methods. This led to a corpus-based MT system. The encoder-decoder
paradigm has risen with Neural Machine Translation (NMT) [5]. For this purpose, both the
encoder and decoder were implemented as Recurrent Neural Networks. This helps to encode a
variable-length source sentence into a fixed-length vector and the same is decoded to generate the
target sentence. For this, Long Short Term Memory (LSTM) units [6,7] were utilized to improve
the translation quality of the target sentence. The RNN based NMT approach quickly became the
norm and was utilized in industrial applications [8, 9, 10].
Later, convolutional encoders and decoders [11] were introduced, which allows encoding the
source sentence simultaneously compared to recurrent networks for which computation is
constrained by temporal dependencies. Conv2Seq architectures [12] utilized this to fully
parallelize the training and optimization process over GPUs and TPUs, leading to faster training
speed.
Most recently, by utilizing self-attention and feed-forward connections, Transformers [13] further
advanced the field of Translation by increasing the efficiency and reducing the speed of
International Journal on Natural Language Computing (IJNLC) Vol.10, No.5, October 2021
10
convergence. One of the downsides of using a Transformers-based architecture is that it requires
larger training data for convergence than an RNN-based model.
seq2seq networks can also be used to train multi-way, multilingual neural machine translation
(MNMT) models [14, 15]. This approach enables a single translation model to translate between
multiple languages, which is made possible by having a single attention mechanism that is shared
across all language pairs.
[16] describes that advances in machine translation have led to a steep increase in performance on
tasks for many high resource languages [17] Since translation is a resource-hungry task, it works
well in the case of high-resource languages [18]. In comparison to their European counterparts,
languages of the Indian subcontinent (Indic Languages) don't have large-scale sentence-aligned
corpora.
In comparison with high-resource languages, there have been very few efforts to benchmark
NMT models on low-resource Indic Languages. In our approach, we try to evaluate a variety of
popular architectures used in the field of Translation on the Indic language "Marathi". Marathi
ranks 15th in the world, with about 95 million speakers. For the dataset, we use a subset of
Samantar corpora which has nearly 3.3 million English-Marathi parallel sentences. We observe
that transformers outperform both RNN-based, and Conv2Seq-based architectures in terms of
accuracy, but take a much larger time to train per epoch.
For most of the research work, large models requiring heavy computational resources are trained
for a long duration of time. For example, 38 hours of training on 4 A100 GPUs were used for
training of IndicTrans on the Samantar dataset. Most of the time it’s not possible to have such a
computationally high setup. We are training models with low computational resources and
investigating their work. We are using the NVIDIA TITAN X GPU for training our machine
translation models. We also do a comparison between the BLEU scores and the time required to
train the model to find which architecture is suitable for different scenarios.
The rest of the paper is structured as follows. Section 2 surveys the related work in the field of
Neural Machine Translation. Section 3 indicates the System description which includes the
dataset we used, the preprocessing tasks, and our training procedure. Section 4 displays the
different architectures we used for training the model. Section 5 displays our results and the
subsequent section 6 states the conclusions drawn from the results. Section 7 states the future
work.
2. RELATED WORK
[5] in 2014 conjectured that the usage of a fixed-length vector is a performance limitation in this
basic encoder-decoder architecture. They proposed a model for automatically (soft-)searching for
portions of a source sentence that are useful for predicting a target word without having to
explicitly create these parts as a hard segment.
[9] in 2016 introduced a deep LSTM network with 8 encoder and decoder layers having attention
and residual connections. In the attention mechanism, the bottom layer of the decoder was
connected to the top layer of the encoder to improve parallelism and reduce training time.
Wordpieces for both input and output were introduced. It helped to reduce the average error in
translation by 60%.
International Journal on Natural Language Computing (IJNLC) Vol.10, No.5, October 2021
11
[19] in 2018 identified several key modeling and training techniques and applied them to
RNN architectures which resulted in the emergence of a new RMNT + model that outperformed
three fundamental architectures on the benchmark WMT '14 English to French and English to
German tasks. A new hybrid architecture was introduced for Neural Machine Translation.
[20] looks into the Marathi-English translation task by taking into consideration five
architectures, namely Google, wmt-en-de, iwslt-de-en, wmt-en-de-big-t2t, vaswani-wmt-en-de-
big. The corpora considered here was the Tatoeba, Wikimedia, and bible dataset, and also data
gathered through web-scraping, leading to about 3 million sentences.
3. SYSTEM DESCRIPTION
3.1. Dataset
For supervised Neural Machine Translation, we generally need a large number of parallel corpora
between the two languages from a variety of domains. The Samantar dataset [21] has nearly 3.3
million English-Marathi parallel sentences mined from a variety of sources such as JW300 [22],
cvit-pib [23], wikimatrix [24], kde4, pmindia v1 [25], gnome, bible-uedin [26], ubuntu, ted2020
[27], mozilla-I10nm Tatoeba, tico19 [28], ELRC_2922, IndicParCorp, Wikipedia, PIB_Archives,
Nouns_Dictionary, NPTEL, Timesofindia, and Khan_academy. It contains a total of 3288874
sentences, of which the first 1000000 were chosen.
Table 1. Samantar dataset aggregation description
Existing dataset Corpora size (in thousands)
JW300 289
cvit-pib 114
wikimatrix 124
KDE4 12
PMIndia V1 29
GNOME 26
bible-uedin 60
Ubuntu 26
TED2020 22
Mozilla-110n 15
Tatoeba 53
tico19 < 1
IndicParCorp 2600
Wikipedia 24
PIB 74
PIB_archives 29
Nouns_dictionary 57
NPTEL 15
Timesofindia 25
Khan_academy < 1
International Journal on Natural Language Computing (IJNLC) Vol.10, No.5, October 2021
12
3.2. Preprocessing
For initial preprocessing, we used the scripts available along with indicTrans [21]. To handle the
problems of OOV (out-of-vocabulary) and rare words, subword segmentation techniques were
applied [29]. If a word encountered is an unknown word, still, the model can translate the word
accurately by treating it as a sequence of subword units. For doing this, multiple techniques such
as Byte Pair Encoding, Unigram Language Model, Subword Sampling, BPE-dropout can be used.
In our experiments, we use Byte Pair Encoding.
Byte Pair Encoding is a word segmentation technique that uses a compression algorithm.
Splitting words into sequences of characters iteratively combines the most frequent character pair
into one. BPE can represent an open vocabulary through a fixed-size vocabulary of variable-
length character sequences. Thus, it is a very useful method for word segmentation in Neural
Network Models.
3.3. Training and Evaluation
We use fairseq [30] for this purpose. Before feeding data into the model, we binarized it for faster
loading. The model was trained on the NVIDIA TITAN X GPU. To mimic training on a larger
number of GPUs, we used the update-freq parameter. As the optimizer, Adam [31] was used,
with the Adam betas sent to 0.9 and 0.98. The loss function used was cross-entropy. For
regularization, label smoothing [32] of 0.1 and a dropout [33] of 0.3 were applied. A weight
decay of 10⁻⁴ was used and the learning rate scheduler used was inverse_sqrt. While evaluating
the accuracy of a translation with BLEU [34], the sacrebleu library was used.
For evaluation, we used the BLEU metric as made available in the SacreBleu library, and to
make it fair it was done on the 25th iteration of the models. Since it was being done on processed
text, the byte pair encoding components had to be removed. A Beam Search of 5 was utilized for
generating the text.
For evaluating translation, the aspects of fluency, adequacy, Human-mediated Translation Edit
rate, and Fidelity need to be considered. Human evaluation, the benchmark method, is time-
consuming and expensive. Thus, the quick, inexpensive, and language-independent BLEU
[34] metric was proposed. It is a score that ranges from 0 to 1, where a score of 1 means that the
candidate machine translation is identical to one of the reference translations [35].
4. ARCHITECTURES USED
We trained our dataset on a total of 5 different architectures as follows.
4.1. fconv_iwslt_de_en
It is entirely based on convolutional networks. Compared to recurrent models, computations over
all elements can be fully parallelized during training to better exploit the GPU hardware and
optimization is easier since the number of non-linearities is fixed and independent of the input
length [11, 12].
4.2. Lstm
It is a vanilla RNN architecture that uses LSTMs as its encoder and decoder. LSTMs were
designed to overcome the long-term dependency problem faced by RNNs and mitigate
International Journal on Natural Language Computing (IJNLC) Vol.10, No.5, October 2021
13
short-term memory using mechanisms called gates. In our experiments, the encoder embedding
dimension of the LSTM is 256, and the decoder embedding dimension is 512 [6, 7].
4.3. lstm_wiseman_iwslt_de_en
It is an LSTM but has smaller embedding dimensions. Also known as a tiny-lstm, it has an
encoder embedding dimension of 256, and a decoder embedding dimension of 256.
4.4. transformer_iwslt_de_en
The Transformer is the first model which converts input sequences to output sequences relying
entirely on self-attention to compute representations of its input and output without using
sequence-aligned RNNs or convolution. Having 6 encoder and decoder layers with an embedding
dimension of 512, it utilizes self-attention with 8 encoder and decoder attention heads [13].
4.5. transformer_wmt_en_de_big_t2t
This is also a transformer model like the one described in (4.4) but has a bigger size, having 16
encoder and decoder attention heads, and the encoders and decoders have an embedding
dimension of 1024. It also utilizes attention dropout and activation dropout.
5. RESULTS
Table 2. Best BLEU scores & Training time in hours
Architecture Best BLEU Time (in hours)
transformer_iwslt_de_en 22.98 11
transformer_wmt_en_de_big_t2t 21.17 16.5
lstm 17.24 6.25
fconv_iwslt_de_en 15.41 5.5
lstm_wiseman_iwslt_de_en 14.03 4
Surprisingly transformer_iwslt_de_en outperformed transformer_wmt_en_de_big_t2t.
Transformer_iwslt_de_en had a lesser number of encoders, decoders and attention heads
compared to transformer_wmt_en_de_big_t2t. Having more encoders and decoders,
transformer_wmt_en_de_big_t2t took significantly more time to train. This shows that
transformer_wmt_en_de_big_t2t is a better model to train in the given circumstances for machine
translation as it has more BLEU scores and takes less time to train. One noticeable thing obtained
from the results is the performance of LSTM. Though its BLEU score is 17.24 and is less than
transformer_wmt_en_de_big_t2t but the training time required is significantly very low. If
having time constraints, the LSTM model can be used for Machine translation tasks as it gives a
high BLEU score in less time.
International Journal on Natural Language Computing (IJNLC) Vol.10, No.5, October 2021
14
Figure 1. The Best BLEU scores on the Valid dataset against time steps
Figure 1 shows the Best BLEU scores per architecture over time. It can be observed that the trend
of optimization is similar, with the rate of change of the score beginning to reduce at around 8000
steps. We used Weights & Biases [36] for experiment tracking and visualizations to develop
insights for this paper.
Following are a few examples of sentences and their translated text which were inferred from the
model transformer_iwslt_de_en:
Table 3. Translation Results
Marathi(source language) English(destination language)
तो एक यापारk आहे. He is a businessman.
आज खपू गरम आहे. It is very hot today.
नदk ओसंड
ू न वाहत आहे. The river is overflowing
शरद पवार हे देशातील सवा त भावी राजकार
यांपैक एक आहेत.
Sharad Pawar is one of the most influential leaders of the
country.
6. CONCLUSION
The study of different architectures for machine translation helps us to understand which model
can perform best with low computational resources. In this work, we looked into a multitude of
neural network architectures, such as LSTMs, Conv2Seq networks, and Transformers. For our
task, Transformers gave the best performance with respect to the metric BLEU as compared to
the remaining architectures. But this comes at a large computational cost. Transformers require
nearly 2-3x time per epoch as compared with other networks.
It was observed that even though transformer_wmt_en_de_big_t2t is larger in size as compared
with transformer_iwslt_de_en, it was not able to give better accuracy. LSTMs were able to show
a good performance in less time as compared with Transformers. Depending upon the scenario
such as availability of computational resources, time required to train a model, different
architectures can be chosen. Even with low computational resources and a comparatively smaller
corpora, good results can be achieved using the Transformer architecture.
International Journal on Natural Language Computing (IJNLC) Vol.10, No.5, October 2021
15
7. FUTURE WORK
For this work, we have trained the models on a subset of the dataset. This was done as a result of
a limitation of computational limitations. In the future, we plan to train the model on the whole
dataset instead of the subset of the dataset. We plan to train these models on a setup having high
computational power to investigate the change in BLEU score and required time as compared
with low computational resources. We plan to train the model parallelly to detect if it causes any
changes to the BLEU score.
In addition to this, we will look into replicating these experiments with different languages (for
example English, Hindi, Mandarin etc), while replicating the low-resource setting and see
whether the observed results remain the same. In this experiment, for preprocessing, we have
used Byte Pair Encoding for treating a word as a sequence of subword units. We plan to use other
algorithms such as Unigram Language Model, Subword Sampling, BPE-dropout and see whether
the outcomes observed in this experiment persist.
ACKNOWLEDGEMENTS
We would like to thank the SCTR’s PICT Data Science Lab for letting us use the resources for an
extended time. We would also like to thank Prof. Mukta Takalikar and Dr. Prahlad Kulkarni for
their continuous help and motivation.
Also, we thank the anonymous reviewers for their insightful comments.
REFERENCES
[1] Koehn, P. (2009). Statistical machine translation. Cambridge University Press.
[2] Kalchbrenner, N., & Blunsom, P. (2013, October). Recurrent continuous translation models. In
Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1700-
1709).
[3] Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks.
In Advances in neural information processing systems (pp. 3104-3112).
[4] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y.
(2014). Learning phrase representations using RNN encoder-decoder for statistical machine
translation. arXiv preprint arXiv:1406.1078.
[5] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align
and translate. arXiv preprint arXiv:1409.0473.
[6] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-
1780.
[7] Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen. Diploma, Technische
Universität München, 91(1).
[8] Zhou, J., Cao, Y., Wang, X., Li, P., & Xu, W. (2016). Deep recurrent models with fast-forward
connections for neural machine translation. Transactions of the Association for Computational
Linguistics, 4, 371-383.
[9] Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., ... & Dean, J. (2016).
Google's neural machine translation system: Bridging the gap between human and machine
translation. arXiv preprint arXiv:1609.08144.
[10] Crego, J., Kim, J., Klein, G., Rebollo, A., Yang, K., Senellart, J., ... & Zoldan, P. (2016). Systran's
pure neural machine translation systems. arXiv preprint arXiv:1610.05540.
[11] Gehring, J., Auli, M., Grangier, D., & Dauphin, Y. N. (2016). A convolutional encoder model for
neural machine translation. arXiv preprint arXiv:1611.02344.
International Journal on Natural Language Computing (IJNLC) Vol.10, No.5, October 2021
16
[12] Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional sequence to
sequence learning. CoRR abs/1705.03122.
[13] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.
(2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-
6008).
[14] Firat, O., Cho, K., & Bengio, Y. (2016). Multi-way, multilingual neural machine translation with a
shared attention mechanism. arXiv preprint arXiv:1601.01073.
[15] Freitag, M., & Firat, O. (2020). Complete multilingual neural machine translation. arXiv preprint
arXiv:2010.10239.
[16] Philip, J., Siripragada, S., Namboodiri, V. P., & Jawahar, C. V. (2021). Revisiting Low Resource
Status of Indian Languages in Machine Translation. In 8th ACM IKDD CODS and 26th COMAD
(pp. 178-187).
[17] Edunov, S., Ott, M., Auli, M., & Grangier, D. (2018). Understanding back-translation at scale. arXiv
preprint arXiv:1808.09381.
[18] Koehn, P., & Knowles, R. (2017). Six challenges for neural machine translation. arXiv preprint
arXiv:1706.03872.
[19] Chen, M. X., Firat, O., Bapna, A., Johnson, M., Macherey, W., Foster, G., ... & Hughes, M. (2018).
The best of both worlds: Combining recent advances in neural machine translation. arXiv preprint
arXiv:1804.09849.
[20] Jadhav, S. A. (2020). Marathi To English Neural Machine Translation with Near Perfect Corpus and
Transformers. arXiv preprint arXiv:2002.11643.
[21] Ramesh, G., Doddapaneni, S., Bheemaraj, A., Jobanputra, M., AK, R., Sharma, A., ... & Khapra, M.
S. (2021). Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic
Languages. arXiv preprint arXiv:2104.05596.
[22] Agic, Ž., & Vulic, I. (2020). JW300: A wide-coverage parallel corpus for low-resource languages.
[23] Philip, J., Siripragada, S., Namboodiri, V. P., & Jawahar, C. V. (2021). Revisiting Low Resource
Status of Indian Languages in Machine Translation. In 8th ACM IKDD CODS and 26th COMAD
(pp. 178-187).
[24] Schwenk, H., Chaudhary, V., Sun, S., Gong, H., & Guzmán, F. (2019). Wikimatrix: Mining 135m
parallel sentences in 1620 language pairs from wikipedia. arXiv preprint arXiv:1907.05791.
[25] Haddow, B., & Kirefu, F. (2020). PMIndia--A Collection of Parallel Corpora of Languages of India.
arXiv preprint arXiv:2001.09907.
[26] Christodouloupoulos, C., & Steedman, M. (2015). A massively parallel corpus: the bible in 100
languages. Language resources and evaluation, 49(2), 375-395.
[27] Lison, P., & Tiedemann, J. (2016). Opensubtitles2016: Extracting large parallel corpora from movie
and tv subtitles.
[28] Anastasopoulos, A., Cattelan, A., Dou, Z. Y., Federico, M., Federman, C., Genzel, D., ... & Tur, S.
(2020). Tico-19: the translation initiative for covid-19. arXiv preprint arXiv:2007.01788.
[29] Sennrich, R., Haddow, B., & Birch, A. (2015). Neural machine translation of rare words with
subword units. arXiv preprint arXiv:1508.07909.
[30] Ng, N., Yee, K., Baevski, A., Ott, M., Auli, M., & Edunov, S. (2019). Facebook FAIR's WMT19
News Translation Task Submission. arXiv preprint arXiv:1907.06616.
[31] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
[32] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception
architecture for computer vision. In Proceedings of the IEEE conference on computer vision and
pattern recognition (pp. 2818-2826).
[33] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a
simple way to prevent neural networks from overfitting. The journal of machine learning research,
15(1), 1929-1958.
[34] Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). Bleu: a method for automatic
evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for
Computational Linguistics (pp. 311-318).
[35] Aditya Mandke (2021). Understanding BLEU. ekdnam.substack.com.
[36] Lukas Biewald (2020). Experiment Tracking with Weights and Biases.
Ad

More Related Content

What's hot (15)

An Efficient Search Engine for Searching Desired File
An Efficient Search Engine for Searching Desired FileAn Efficient Search Engine for Searching Desired File
An Efficient Search Engine for Searching Desired File
IDES Editor
 
Pretext Knowledge Grids on Unstructured Data for Facilitating Online Education
Pretext Knowledge Grids on Unstructured Data for Facilitating  Online EducationPretext Knowledge Grids on Unstructured Data for Facilitating  Online Education
Pretext Knowledge Grids on Unstructured Data for Facilitating Online Education
IOSR Journals
 
A probabilistic approach to string transformation
A probabilistic approach to string transformationA probabilistic approach to string transformation
A probabilistic approach to string transformation
IEEEFINALYEARPROJECTS
 
Using translation memory_to_speed_up_tra
Using translation memory_to_speed_up_traUsing translation memory_to_speed_up_tra
Using translation memory_to_speed_up_tra
CamillaTonanzi
 
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
cseij
 
IRJET- Text Extraction from Text Based Image using Android
IRJET- Text Extraction from Text Based Image using AndroidIRJET- Text Extraction from Text Based Image using Android
IRJET- Text Extraction from Text Based Image using Android
IRJET Journal
 
A Mathematical Approach to Ontology Authoring and Documentation
A Mathematical Approach to Ontology Authoring and DocumentationA Mathematical Approach to Ontology Authoring and Documentation
A Mathematical Approach to Ontology Authoring and Documentation
Christoph Lange
 
Implementation of Urdu Probabilistic Parser
Implementation of Urdu Probabilistic ParserImplementation of Urdu Probabilistic Parser
Implementation of Urdu Probabilistic Parser
Waqas Tariq
 
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEMA LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
cscpconf
 
A language independent approach to develop urduir system
A language independent approach to develop urduir systemA language independent approach to develop urduir system
A language independent approach to develop urduir system
csandit
 
IEEE 2014 JAVA DATA MINING PROJECTS A probabilistic approach to string transf...
IEEE 2014 JAVA DATA MINING PROJECTS A probabilistic approach to string transf...IEEE 2014 JAVA DATA MINING PROJECTS A probabilistic approach to string transf...
IEEE 2014 JAVA DATA MINING PROJECTS A probabilistic approach to string transf...
IEEEFINALYEARSTUDENTPROJECTS
 
Effective Data Retrieval System with Bloom in a Unstructured p2p Network
Effective Data Retrieval System with Bloom in a Unstructured p2p NetworkEffective Data Retrieval System with Bloom in a Unstructured p2p Network
Effective Data Retrieval System with Bloom in a Unstructured p2p Network
Uvaraj Shan
 
IRJET - A Review on Chatbot Design and Implementation Techniques
IRJET -  	  A Review on Chatbot Design and Implementation TechniquesIRJET -  	  A Review on Chatbot Design and Implementation Techniques
IRJET - A Review on Chatbot Design and Implementation Techniques
IRJET Journal
 
Text mining open source tokenization
Text mining open source tokenizationText mining open source tokenization
Text mining open source tokenization
aciijournal
 
Full-Text Retrieval in Unstructured P2P Networks using Bloom Cast Efficiently
Full-Text Retrieval in Unstructured P2P Networks using Bloom Cast EfficientlyFull-Text Retrieval in Unstructured P2P Networks using Bloom Cast Efficiently
Full-Text Retrieval in Unstructured P2P Networks using Bloom Cast Efficiently
ijsrd.com
 
An Efficient Search Engine for Searching Desired File
An Efficient Search Engine for Searching Desired FileAn Efficient Search Engine for Searching Desired File
An Efficient Search Engine for Searching Desired File
IDES Editor
 
Pretext Knowledge Grids on Unstructured Data for Facilitating Online Education
Pretext Knowledge Grids on Unstructured Data for Facilitating  Online EducationPretext Knowledge Grids on Unstructured Data for Facilitating  Online Education
Pretext Knowledge Grids on Unstructured Data for Facilitating Online Education
IOSR Journals
 
A probabilistic approach to string transformation
A probabilistic approach to string transformationA probabilistic approach to string transformation
A probabilistic approach to string transformation
IEEEFINALYEARPROJECTS
 
Using translation memory_to_speed_up_tra
Using translation memory_to_speed_up_traUsing translation memory_to_speed_up_tra
Using translation memory_to_speed_up_tra
CamillaTonanzi
 
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
USING TF-ISF WITH LOCAL CONTEXT TO GENERATE AN OWL DOCUMENT REPRESENTATION FO...
cseij
 
IRJET- Text Extraction from Text Based Image using Android
IRJET- Text Extraction from Text Based Image using AndroidIRJET- Text Extraction from Text Based Image using Android
IRJET- Text Extraction from Text Based Image using Android
IRJET Journal
 
A Mathematical Approach to Ontology Authoring and Documentation
A Mathematical Approach to Ontology Authoring and DocumentationA Mathematical Approach to Ontology Authoring and Documentation
A Mathematical Approach to Ontology Authoring and Documentation
Christoph Lange
 
Implementation of Urdu Probabilistic Parser
Implementation of Urdu Probabilistic ParserImplementation of Urdu Probabilistic Parser
Implementation of Urdu Probabilistic Parser
Waqas Tariq
 
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEMA LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
cscpconf
 
A language independent approach to develop urduir system
A language independent approach to develop urduir systemA language independent approach to develop urduir system
A language independent approach to develop urduir system
csandit
 
IEEE 2014 JAVA DATA MINING PROJECTS A probabilistic approach to string transf...
IEEE 2014 JAVA DATA MINING PROJECTS A probabilistic approach to string transf...IEEE 2014 JAVA DATA MINING PROJECTS A probabilistic approach to string transf...
IEEE 2014 JAVA DATA MINING PROJECTS A probabilistic approach to string transf...
IEEEFINALYEARSTUDENTPROJECTS
 
Effective Data Retrieval System with Bloom in a Unstructured p2p Network
Effective Data Retrieval System with Bloom in a Unstructured p2p NetworkEffective Data Retrieval System with Bloom in a Unstructured p2p Network
Effective Data Retrieval System with Bloom in a Unstructured p2p Network
Uvaraj Shan
 
IRJET - A Review on Chatbot Design and Implementation Techniques
IRJET -  	  A Review on Chatbot Design and Implementation TechniquesIRJET -  	  A Review on Chatbot Design and Implementation Techniques
IRJET - A Review on Chatbot Design and Implementation Techniques
IRJET Journal
 
Text mining open source tokenization
Text mining open source tokenizationText mining open source tokenization
Text mining open source tokenization
aciijournal
 
Full-Text Retrieval in Unstructured P2P Networks using Bloom Cast Efficiently
Full-Text Retrieval in Unstructured P2P Networks using Bloom Cast EfficientlyFull-Text Retrieval in Unstructured P2P Networks using Bloom Cast Efficiently
Full-Text Retrieval in Unstructured P2P Networks using Bloom Cast Efficiently
ijsrd.com
 

Similar to ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONAL RESOURCES (20)

Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
IRJET Journal
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
Natasha Latysheva
 
Machine_translation_for_low_resource_Indian_Languages_thesis_report
Machine_translation_for_low_resource_Indian_Languages_thesis_reportMachine_translation_for_low_resource_Indian_Languages_thesis_report
Machine_translation_for_low_resource_Indian_Languages_thesis_report
Trushita Redij
 
Building streaming pipelines for neural machine translation
Building streaming pipelines for neural machine translationBuilding streaming pipelines for neural machine translation
Building streaming pipelines for neural machine translation
Suneel Marthi
 
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Codemotion
 
Language Translator.pptx
Language Translator.pptxLanguage Translator.pptx
Language Translator.pptx
MRABC9
 
Speech To Speech Translation
Speech To Speech TranslationSpeech To Speech Translation
Speech To Speech Translation
IRJET Journal
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
IRJET Journal
 
IRJET- Visual Information Narrator using Neural Network
IRJET- Visual Information Narrator using Neural NetworkIRJET- Visual Information Narrator using Neural Network
IRJET- Visual Information Narrator using Neural Network
IRJET Journal
 
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
IRJET -  	  Speech to Speech Translation using Encoder Decoder ArchitectureIRJET -  	  Speech to Speech Translation using Encoder Decoder Architecture
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
IRJET Journal
 
IRJET- Applications of Artificial Intelligence in Neural Machine Translation
IRJET- Applications of Artificial Intelligence in Neural Machine TranslationIRJET- Applications of Artificial Intelligence in Neural Machine Translation
IRJET- Applications of Artificial Intelligence in Neural Machine Translation
IRJET Journal
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introduction
nlab_utokyo
 
Breaking the language barrier: how do we quickly add multilanguage support in...
Breaking the language barrier: how do we quickly add multilanguage support in...Breaking the language barrier: how do we quickly add multilanguage support in...
Breaking the language barrier: how do we quickly add multilanguage support in...
Jaya Mathew
 
From recurrent neural network techniques to pre-trained models: emphasis on t...
From recurrent neural network techniques to pre-trained models: emphasis on t...From recurrent neural network techniques to pre-trained models: emphasis on t...
From recurrent neural network techniques to pre-trained models: emphasis on t...
IAESIJAI
 
A Review On Chatbot Design And Implementation Techniques
A Review On Chatbot Design And Implementation TechniquesA Review On Chatbot Design And Implementation Techniques
A Review On Chatbot Design And Implementation Techniques
Joe Osborn
 
Multi-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoMulti-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer Calixto
Sebastian Ruder
 
NEURAL AND STATISTICAL MACHINE TRANSLATION: CONFRONTING THE STATE OF THE ART
NEURAL AND STATISTICAL MACHINE TRANSLATION: CONFRONTING THE STATE OF THE ARTNEURAL AND STATISTICAL MACHINE TRANSLATION: CONFRONTING THE STATE OF THE ART
NEURAL AND STATISTICAL MACHINE TRANSLATION: CONFRONTING THE STATE OF THE ART
kevig
 
NEURAL AND STATISTICAL MACHINE TRANSLATION: CONFRONTING THE STATE OF THE ART
NEURAL AND STATISTICAL MACHINE TRANSLATION: CONFRONTING THE STATE OF THE ARTNEURAL AND STATISTICAL MACHINE TRANSLATION: CONFRONTING THE STATE OF THE ART
NEURAL AND STATISTICAL MACHINE TRANSLATION: CONFRONTING THE STATE OF THE ART
kevig
 
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
eraser Juan José Calderón
 
Attentive_fine-tuning_of_Transformers_for_Translat.pdf
Attentive_fine-tuning_of_Transformers_for_Translat.pdfAttentive_fine-tuning_of_Transformers_for_Translat.pdf
Attentive_fine-tuning_of_Transformers_for_Translat.pdf
SatishBhalshankar
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
IRJET Journal
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
Natasha Latysheva
 
Machine_translation_for_low_resource_Indian_Languages_thesis_report
Machine_translation_for_low_resource_Indian_Languages_thesis_reportMachine_translation_for_low_resource_Indian_Languages_thesis_report
Machine_translation_for_low_resource_Indian_Languages_thesis_report
Trushita Redij
 
Building streaming pipelines for neural machine translation
Building streaming pipelines for neural machine translationBuilding streaming pipelines for neural machine translation
Building streaming pipelines for neural machine translation
Suneel Marthi
 
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Codemotion
 
Language Translator.pptx
Language Translator.pptxLanguage Translator.pptx
Language Translator.pptx
MRABC9
 
Speech To Speech Translation
Speech To Speech TranslationSpeech To Speech Translation
Speech To Speech Translation
IRJET Journal
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
IRJET Journal
 
IRJET- Visual Information Narrator using Neural Network
IRJET- Visual Information Narrator using Neural NetworkIRJET- Visual Information Narrator using Neural Network
IRJET- Visual Information Narrator using Neural Network
IRJET Journal
 
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
IRJET -  	  Speech to Speech Translation using Encoder Decoder ArchitectureIRJET -  	  Speech to Speech Translation using Encoder Decoder Architecture
IRJET - Speech to Speech Translation using Encoder Decoder Architecture
IRJET Journal
 
IRJET- Applications of Artificial Intelligence in Neural Machine Translation
IRJET- Applications of Artificial Intelligence in Neural Machine TranslationIRJET- Applications of Artificial Intelligence in Neural Machine Translation
IRJET- Applications of Artificial Intelligence in Neural Machine Translation
IRJET Journal
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introduction
nlab_utokyo
 
Breaking the language barrier: how do we quickly add multilanguage support in...
Breaking the language barrier: how do we quickly add multilanguage support in...Breaking the language barrier: how do we quickly add multilanguage support in...
Breaking the language barrier: how do we quickly add multilanguage support in...
Jaya Mathew
 
From recurrent neural network techniques to pre-trained models: emphasis on t...
From recurrent neural network techniques to pre-trained models: emphasis on t...From recurrent neural network techniques to pre-trained models: emphasis on t...
From recurrent neural network techniques to pre-trained models: emphasis on t...
IAESIJAI
 
A Review On Chatbot Design And Implementation Techniques
A Review On Chatbot Design And Implementation TechniquesA Review On Chatbot Design And Implementation Techniques
A Review On Chatbot Design And Implementation Techniques
Joe Osborn
 
Multi-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoMulti-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer Calixto
Sebastian Ruder
 
NEURAL AND STATISTICAL MACHINE TRANSLATION: CONFRONTING THE STATE OF THE ART
NEURAL AND STATISTICAL MACHINE TRANSLATION: CONFRONTING THE STATE OF THE ARTNEURAL AND STATISTICAL MACHINE TRANSLATION: CONFRONTING THE STATE OF THE ART
NEURAL AND STATISTICAL MACHINE TRANSLATION: CONFRONTING THE STATE OF THE ART
kevig
 
NEURAL AND STATISTICAL MACHINE TRANSLATION: CONFRONTING THE STATE OF THE ART
NEURAL AND STATISTICAL MACHINE TRANSLATION: CONFRONTING THE STATE OF THE ARTNEURAL AND STATISTICAL MACHINE TRANSLATION: CONFRONTING THE STATE OF THE ART
NEURAL AND STATISTICAL MACHINE TRANSLATION: CONFRONTING THE STATE OF THE ART
kevig
 
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot T...
eraser Juan José Calderón
 
Attentive_fine-tuning_of_Transformers_for_Translat.pdf
Attentive_fine-tuning_of_Transformers_for_Translat.pdfAttentive_fine-tuning_of_Transformers_for_Translat.pdf
Attentive_fine-tuning_of_Transformers_for_Translat.pdf
SatishBhalshankar
 
Ad

Recently uploaded (20)

698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
speedcomcyber25
 
Electrical and Electronics Engineering: An International Journal (ELELIJ)
Electrical and Electronics Engineering: An International Journal (ELELIJ)Electrical and Electronics Engineering: An International Journal (ELELIJ)
Electrical and Electronics Engineering: An International Journal (ELELIJ)
elelijjournal653
 
May 2025 - Top 10 Read Articles in Network Security and Its Applications
May 2025 - Top 10 Read Articles in Network Security and Its ApplicationsMay 2025 - Top 10 Read Articles in Network Security and Its Applications
May 2025 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
EHSS Orientation 2023 - Copy.Orientation
EHSS Orientation 2023 - Copy.OrientationEHSS Orientation 2023 - Copy.Orientation
EHSS Orientation 2023 - Copy.Orientation
GulfamShahzad11
 
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptxUnleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
SanjeetMishra29
 
4 Renewable-Energy-Chemistry-ppt-PP.pptx
4 Renewable-Energy-Chemistry-ppt-PP.pptx4 Renewable-Energy-Chemistry-ppt-PP.pptx
4 Renewable-Energy-Chemistry-ppt-PP.pptx
maairapayongayong
 
Relationship of inheritance in oopm.pptx
Relationship of inheritance in oopm.pptxRelationship of inheritance in oopm.pptx
Relationship of inheritance in oopm.pptx
ayush626953
 
ldr darkness sensor circuit.pptx for engineers
ldr darkness sensor circuit.pptx for engineersldr darkness sensor circuit.pptx for engineers
ldr darkness sensor circuit.pptx for engineers
PravalikaChidurala
 
Full document for AI powered resume Analyzer
Full document for AI powered resume AnalyzerFull document for AI powered resume Analyzer
Full document for AI powered resume Analyzer
4213SWARNABCSE
 
IPC-7711D-7721D_ EN 2023 TOC Rework, Modification and Repair of Electronic As...
IPC-7711D-7721D_ EN 2023 TOC Rework, Modification and Repair of Electronic As...IPC-7711D-7721D_ EN 2023 TOC Rework, Modification and Repair of Electronic As...
IPC-7711D-7721D_ EN 2023 TOC Rework, Modification and Repair of Electronic As...
ssuserd9338b
 
Environment .................................
Environment .................................Environment .................................
Environment .................................
shadyozq9
 
digital computing plotform synopsis.pptx
digital computing plotform synopsis.pptxdigital computing plotform synopsis.pptx
digital computing plotform synopsis.pptx
ssuser2b4c6e1
 
Ceramic Multichannel Membrane Structure with Tunable Properties by Sol-Gel Me...
Ceramic Multichannel Membrane Structure with Tunable Properties by Sol-Gel Me...Ceramic Multichannel Membrane Structure with Tunable Properties by Sol-Gel Me...
Ceramic Multichannel Membrane Structure with Tunable Properties by Sol-Gel Me...
DanyalNaseer3
 
Observability and Instrumentation via OpenTelemetry.pptx
Observability and Instrumentation via OpenTelemetry.pptxObservability and Instrumentation via OpenTelemetry.pptx
Observability and Instrumentation via OpenTelemetry.pptx
grahnkarljohan
 
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFTDeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
Kyohei Ito
 
An Explicit Formulation for Estimation of Structural Number (SN) of Flexible ...
An Explicit Formulation for Estimation of Structural Number (SN) of Flexible ...An Explicit Formulation for Estimation of Structural Number (SN) of Flexible ...
An Explicit Formulation for Estimation of Structural Number (SN) of Flexible ...
Journal of Soft Computing in Civil Engineering
 
ESP32 Air Mouse using Bluetooth and MPU6050
ESP32 Air Mouse using Bluetooth and MPU6050ESP32 Air Mouse using Bluetooth and MPU6050
ESP32 Air Mouse using Bluetooth and MPU6050
CircuitDigest
 
Introduction to Additive Manufacturing(3D printing)
Introduction to Additive Manufacturing(3D printing)Introduction to Additive Manufacturing(3D printing)
Introduction to Additive Manufacturing(3D printing)
vijimech408
 
PYTHON--QUIZ-1_20250422_002514_0000.pptx
PYTHON--QUIZ-1_20250422_002514_0000.pptxPYTHON--QUIZ-1_20250422_002514_0000.pptx
PYTHON--QUIZ-1_20250422_002514_0000.pptx
rmvigram
 
VISHAL KUMAR SINGH Latest Resume with updated details
VISHAL KUMAR SINGH Latest Resume with updated detailsVISHAL KUMAR SINGH Latest Resume with updated details
VISHAL KUMAR SINGH Latest Resume with updated details
Vishal Kumar Singh
 
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
speedcomcyber25
 
Electrical and Electronics Engineering: An International Journal (ELELIJ)
Electrical and Electronics Engineering: An International Journal (ELELIJ)Electrical and Electronics Engineering: An International Journal (ELELIJ)
Electrical and Electronics Engineering: An International Journal (ELELIJ)
elelijjournal653
 
May 2025 - Top 10 Read Articles in Network Security and Its Applications
May 2025 - Top 10 Read Articles in Network Security and Its ApplicationsMay 2025 - Top 10 Read Articles in Network Security and Its Applications
May 2025 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
EHSS Orientation 2023 - Copy.Orientation
EHSS Orientation 2023 - Copy.OrientationEHSS Orientation 2023 - Copy.Orientation
EHSS Orientation 2023 - Copy.Orientation
GulfamShahzad11
 
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptxUnleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
Unleashing the Power of Salesforce Flows &amp_ Slack Integration!.pptx
SanjeetMishra29
 
4 Renewable-Energy-Chemistry-ppt-PP.pptx
4 Renewable-Energy-Chemistry-ppt-PP.pptx4 Renewable-Energy-Chemistry-ppt-PP.pptx
4 Renewable-Energy-Chemistry-ppt-PP.pptx
maairapayongayong
 
Relationship of inheritance in oopm.pptx
Relationship of inheritance in oopm.pptxRelationship of inheritance in oopm.pptx
Relationship of inheritance in oopm.pptx
ayush626953
 
ldr darkness sensor circuit.pptx for engineers
ldr darkness sensor circuit.pptx for engineersldr darkness sensor circuit.pptx for engineers
ldr darkness sensor circuit.pptx for engineers
PravalikaChidurala
 
Full document for AI powered resume Analyzer
Full document for AI powered resume AnalyzerFull document for AI powered resume Analyzer
Full document for AI powered resume Analyzer
4213SWARNABCSE
 
IPC-7711D-7721D_ EN 2023 TOC Rework, Modification and Repair of Electronic As...
IPC-7711D-7721D_ EN 2023 TOC Rework, Modification and Repair of Electronic As...IPC-7711D-7721D_ EN 2023 TOC Rework, Modification and Repair of Electronic As...
IPC-7711D-7721D_ EN 2023 TOC Rework, Modification and Repair of Electronic As...
ssuserd9338b
 
Environment .................................
Environment .................................Environment .................................
Environment .................................
shadyozq9
 
digital computing plotform synopsis.pptx
digital computing plotform synopsis.pptxdigital computing plotform synopsis.pptx
digital computing plotform synopsis.pptx
ssuser2b4c6e1
 
Ceramic Multichannel Membrane Structure with Tunable Properties by Sol-Gel Me...
Ceramic Multichannel Membrane Structure with Tunable Properties by Sol-Gel Me...Ceramic Multichannel Membrane Structure with Tunable Properties by Sol-Gel Me...
Ceramic Multichannel Membrane Structure with Tunable Properties by Sol-Gel Me...
DanyalNaseer3
 
Observability and Instrumentation via OpenTelemetry.pptx
Observability and Instrumentation via OpenTelemetry.pptxObservability and Instrumentation via OpenTelemetry.pptx
Observability and Instrumentation via OpenTelemetry.pptx
grahnkarljohan
 
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFTDeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
DeFAIMint | 🤖Mint to DeFAI. Vibe Trading as NFT
Kyohei Ito
 
ESP32 Air Mouse using Bluetooth and MPU6050
ESP32 Air Mouse using Bluetooth and MPU6050ESP32 Air Mouse using Bluetooth and MPU6050
ESP32 Air Mouse using Bluetooth and MPU6050
CircuitDigest
 
Introduction to Additive Manufacturing(3D printing)
Introduction to Additive Manufacturing(3D printing)Introduction to Additive Manufacturing(3D printing)
Introduction to Additive Manufacturing(3D printing)
vijimech408
 
PYTHON--QUIZ-1_20250422_002514_0000.pptx
PYTHON--QUIZ-1_20250422_002514_0000.pptxPYTHON--QUIZ-1_20250422_002514_0000.pptx
PYTHON--QUIZ-1_20250422_002514_0000.pptx
rmvigram
 
VISHAL KUMAR SINGH Latest Resume with updated details
VISHAL KUMAR SINGH Latest Resume with updated detailsVISHAL KUMAR SINGH Latest Resume with updated details
VISHAL KUMAR SINGH Latest Resume with updated details
Vishal Kumar Singh
 
Ad

ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONAL RESOURCES

  • 1. International Journal on Natural Language Computing (IJNLC) Vol.10, No.5, October 2021 DOI: 10.5121/ijnlc.2021.10502 9 ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONAL RESOURCES Aditya Mandke, Onkar Litake, and Dipali Kadam Department of Computer Engineering, SCTR’s Pune Institute of Computer Technology, Pune, Maharashtra, India ABSTRACT With the recent developments in the field of Natural Language Processing, there has been a rise in the use of different architectures for Neural Machine Translation. Transformer architectures are used to achieve state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such setups consisting of high-end GPUs and other resources. We train our models on low computational resources and investigate the results. As expected, transformers outperformed other architectures, but there were some surprising results. Transformers consisting of more encoders and decoders took more time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively less time to train than transformers, making it suitable to use in situations having time constraints. KEYWORDS Machine Translation, Indic Languages, Natural Language Processing 1. INTRODUCTION Machine Translation (MT) can be used to automatically convert a sentence in one (source) language into its equivalent translation in another language (target) using Natural Language Processing. Earlier, rule-based MT was the norm, with heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translations [1]. Later, seq2seq networks [2,3,4] have dominated the field of Machine Translation by replacing statistical phrase-based methods. This led to a corpus-based MT system. The encoder-decoder paradigm has risen with Neural Machine Translation (NMT) [5]. For this purpose, both the encoder and decoder were implemented as Recurrent Neural Networks. This helps to encode a variable-length source sentence into a fixed-length vector and the same is decoded to generate the target sentence. For this, Long Short Term Memory (LSTM) units [6,7] were utilized to improve the translation quality of the target sentence. The RNN based NMT approach quickly became the norm and was utilized in industrial applications [8, 9, 10]. Later, convolutional encoders and decoders [11] were introduced, which allows encoding the source sentence simultaneously compared to recurrent networks for which computation is constrained by temporal dependencies. Conv2Seq architectures [12] utilized this to fully parallelize the training and optimization process over GPUs and TPUs, leading to faster training speed. Most recently, by utilizing self-attention and feed-forward connections, Transformers [13] further advanced the field of Translation by increasing the efficiency and reducing the speed of
  • 2. International Journal on Natural Language Computing (IJNLC) Vol.10, No.5, October 2021 10 convergence. One of the downsides of using a Transformers-based architecture is that it requires larger training data for convergence than an RNN-based model. seq2seq networks can also be used to train multi-way, multilingual neural machine translation (MNMT) models [14, 15]. This approach enables a single translation model to translate between multiple languages, which is made possible by having a single attention mechanism that is shared across all language pairs. [16] describes that advances in machine translation have led to a steep increase in performance on tasks for many high resource languages [17] Since translation is a resource-hungry task, it works well in the case of high-resource languages [18]. In comparison to their European counterparts, languages of the Indian subcontinent (Indic Languages) don't have large-scale sentence-aligned corpora. In comparison with high-resource languages, there have been very few efforts to benchmark NMT models on low-resource Indic Languages. In our approach, we try to evaluate a variety of popular architectures used in the field of Translation on the Indic language "Marathi". Marathi ranks 15th in the world, with about 95 million speakers. For the dataset, we use a subset of Samantar corpora which has nearly 3.3 million English-Marathi parallel sentences. We observe that transformers outperform both RNN-based, and Conv2Seq-based architectures in terms of accuracy, but take a much larger time to train per epoch. For most of the research work, large models requiring heavy computational resources are trained for a long duration of time. For example, 38 hours of training on 4 A100 GPUs were used for training of IndicTrans on the Samantar dataset. Most of the time it’s not possible to have such a computationally high setup. We are training models with low computational resources and investigating their work. We are using the NVIDIA TITAN X GPU for training our machine translation models. We also do a comparison between the BLEU scores and the time required to train the model to find which architecture is suitable for different scenarios. The rest of the paper is structured as follows. Section 2 surveys the related work in the field of Neural Machine Translation. Section 3 indicates the System description which includes the dataset we used, the preprocessing tasks, and our training procedure. Section 4 displays the different architectures we used for training the model. Section 5 displays our results and the subsequent section 6 states the conclusions drawn from the results. Section 7 states the future work. 2. RELATED WORK [5] in 2014 conjectured that the usage of a fixed-length vector is a performance limitation in this basic encoder-decoder architecture. They proposed a model for automatically (soft-)searching for portions of a source sentence that are useful for predicting a target word without having to explicitly create these parts as a hard segment. [9] in 2016 introduced a deep LSTM network with 8 encoder and decoder layers having attention and residual connections. In the attention mechanism, the bottom layer of the decoder was connected to the top layer of the encoder to improve parallelism and reduce training time. Wordpieces for both input and output were introduced. It helped to reduce the average error in translation by 60%.
  • 3. International Journal on Natural Language Computing (IJNLC) Vol.10, No.5, October 2021 11 [19] in 2018 identified several key modeling and training techniques and applied them to RNN architectures which resulted in the emergence of a new RMNT + model that outperformed three fundamental architectures on the benchmark WMT '14 English to French and English to German tasks. A new hybrid architecture was introduced for Neural Machine Translation. [20] looks into the Marathi-English translation task by taking into consideration five architectures, namely Google, wmt-en-de, iwslt-de-en, wmt-en-de-big-t2t, vaswani-wmt-en-de- big. The corpora considered here was the Tatoeba, Wikimedia, and bible dataset, and also data gathered through web-scraping, leading to about 3 million sentences. 3. SYSTEM DESCRIPTION 3.1. Dataset For supervised Neural Machine Translation, we generally need a large number of parallel corpora between the two languages from a variety of domains. The Samantar dataset [21] has nearly 3.3 million English-Marathi parallel sentences mined from a variety of sources such as JW300 [22], cvit-pib [23], wikimatrix [24], kde4, pmindia v1 [25], gnome, bible-uedin [26], ubuntu, ted2020 [27], mozilla-I10nm Tatoeba, tico19 [28], ELRC_2922, IndicParCorp, Wikipedia, PIB_Archives, Nouns_Dictionary, NPTEL, Timesofindia, and Khan_academy. It contains a total of 3288874 sentences, of which the first 1000000 were chosen. Table 1. Samantar dataset aggregation description Existing dataset Corpora size (in thousands) JW300 289 cvit-pib 114 wikimatrix 124 KDE4 12 PMIndia V1 29 GNOME 26 bible-uedin 60 Ubuntu 26 TED2020 22 Mozilla-110n 15 Tatoeba 53 tico19 < 1 IndicParCorp 2600 Wikipedia 24 PIB 74 PIB_archives 29 Nouns_dictionary 57 NPTEL 15 Timesofindia 25 Khan_academy < 1
  • 4. International Journal on Natural Language Computing (IJNLC) Vol.10, No.5, October 2021 12 3.2. Preprocessing For initial preprocessing, we used the scripts available along with indicTrans [21]. To handle the problems of OOV (out-of-vocabulary) and rare words, subword segmentation techniques were applied [29]. If a word encountered is an unknown word, still, the model can translate the word accurately by treating it as a sequence of subword units. For doing this, multiple techniques such as Byte Pair Encoding, Unigram Language Model, Subword Sampling, BPE-dropout can be used. In our experiments, we use Byte Pair Encoding. Byte Pair Encoding is a word segmentation technique that uses a compression algorithm. Splitting words into sequences of characters iteratively combines the most frequent character pair into one. BPE can represent an open vocabulary through a fixed-size vocabulary of variable- length character sequences. Thus, it is a very useful method for word segmentation in Neural Network Models. 3.3. Training and Evaluation We use fairseq [30] for this purpose. Before feeding data into the model, we binarized it for faster loading. The model was trained on the NVIDIA TITAN X GPU. To mimic training on a larger number of GPUs, we used the update-freq parameter. As the optimizer, Adam [31] was used, with the Adam betas sent to 0.9 and 0.98. The loss function used was cross-entropy. For regularization, label smoothing [32] of 0.1 and a dropout [33] of 0.3 were applied. A weight decay of 10⁻⁴ was used and the learning rate scheduler used was inverse_sqrt. While evaluating the accuracy of a translation with BLEU [34], the sacrebleu library was used. For evaluation, we used the BLEU metric as made available in the SacreBleu library, and to make it fair it was done on the 25th iteration of the models. Since it was being done on processed text, the byte pair encoding components had to be removed. A Beam Search of 5 was utilized for generating the text. For evaluating translation, the aspects of fluency, adequacy, Human-mediated Translation Edit rate, and Fidelity need to be considered. Human evaluation, the benchmark method, is time- consuming and expensive. Thus, the quick, inexpensive, and language-independent BLEU [34] metric was proposed. It is a score that ranges from 0 to 1, where a score of 1 means that the candidate machine translation is identical to one of the reference translations [35]. 4. ARCHITECTURES USED We trained our dataset on a total of 5 different architectures as follows. 4.1. fconv_iwslt_de_en It is entirely based on convolutional networks. Compared to recurrent models, computations over all elements can be fully parallelized during training to better exploit the GPU hardware and optimization is easier since the number of non-linearities is fixed and independent of the input length [11, 12]. 4.2. Lstm It is a vanilla RNN architecture that uses LSTMs as its encoder and decoder. LSTMs were designed to overcome the long-term dependency problem faced by RNNs and mitigate
  • 5. International Journal on Natural Language Computing (IJNLC) Vol.10, No.5, October 2021 13 short-term memory using mechanisms called gates. In our experiments, the encoder embedding dimension of the LSTM is 256, and the decoder embedding dimension is 512 [6, 7]. 4.3. lstm_wiseman_iwslt_de_en It is an LSTM but has smaller embedding dimensions. Also known as a tiny-lstm, it has an encoder embedding dimension of 256, and a decoder embedding dimension of 256. 4.4. transformer_iwslt_de_en The Transformer is the first model which converts input sequences to output sequences relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution. Having 6 encoder and decoder layers with an embedding dimension of 512, it utilizes self-attention with 8 encoder and decoder attention heads [13]. 4.5. transformer_wmt_en_de_big_t2t This is also a transformer model like the one described in (4.4) but has a bigger size, having 16 encoder and decoder attention heads, and the encoders and decoders have an embedding dimension of 1024. It also utilizes attention dropout and activation dropout. 5. RESULTS Table 2. Best BLEU scores & Training time in hours Architecture Best BLEU Time (in hours) transformer_iwslt_de_en 22.98 11 transformer_wmt_en_de_big_t2t 21.17 16.5 lstm 17.24 6.25 fconv_iwslt_de_en 15.41 5.5 lstm_wiseman_iwslt_de_en 14.03 4 Surprisingly transformer_iwslt_de_en outperformed transformer_wmt_en_de_big_t2t. Transformer_iwslt_de_en had a lesser number of encoders, decoders and attention heads compared to transformer_wmt_en_de_big_t2t. Having more encoders and decoders, transformer_wmt_en_de_big_t2t took significantly more time to train. This shows that transformer_wmt_en_de_big_t2t is a better model to train in the given circumstances for machine translation as it has more BLEU scores and takes less time to train. One noticeable thing obtained from the results is the performance of LSTM. Though its BLEU score is 17.24 and is less than transformer_wmt_en_de_big_t2t but the training time required is significantly very low. If having time constraints, the LSTM model can be used for Machine translation tasks as it gives a high BLEU score in less time.
  • 6. International Journal on Natural Language Computing (IJNLC) Vol.10, No.5, October 2021 14 Figure 1. The Best BLEU scores on the Valid dataset against time steps Figure 1 shows the Best BLEU scores per architecture over time. It can be observed that the trend of optimization is similar, with the rate of change of the score beginning to reduce at around 8000 steps. We used Weights & Biases [36] for experiment tracking and visualizations to develop insights for this paper. Following are a few examples of sentences and their translated text which were inferred from the model transformer_iwslt_de_en: Table 3. Translation Results Marathi(source language) English(destination language) तो एक यापारk आहे. He is a businessman. आज खपू गरम आहे. It is very hot today. नदk ओसंड ू न वाहत आहे. The river is overflowing शरद पवार हे देशातील सवा त भावी राजकार यांपैक एक आहेत. Sharad Pawar is one of the most influential leaders of the country. 6. CONCLUSION The study of different architectures for machine translation helps us to understand which model can perform best with low computational resources. In this work, we looked into a multitude of neural network architectures, such as LSTMs, Conv2Seq networks, and Transformers. For our task, Transformers gave the best performance with respect to the metric BLEU as compared to the remaining architectures. But this comes at a large computational cost. Transformers require nearly 2-3x time per epoch as compared with other networks. It was observed that even though transformer_wmt_en_de_big_t2t is larger in size as compared with transformer_iwslt_de_en, it was not able to give better accuracy. LSTMs were able to show a good performance in less time as compared with Transformers. Depending upon the scenario such as availability of computational resources, time required to train a model, different architectures can be chosen. Even with low computational resources and a comparatively smaller corpora, good results can be achieved using the Transformer architecture.
  • 7. International Journal on Natural Language Computing (IJNLC) Vol.10, No.5, October 2021 15 7. FUTURE WORK For this work, we have trained the models on a subset of the dataset. This was done as a result of a limitation of computational limitations. In the future, we plan to train the model on the whole dataset instead of the subset of the dataset. We plan to train these models on a setup having high computational power to investigate the change in BLEU score and required time as compared with low computational resources. We plan to train the model parallelly to detect if it causes any changes to the BLEU score. In addition to this, we will look into replicating these experiments with different languages (for example English, Hindi, Mandarin etc), while replicating the low-resource setting and see whether the observed results remain the same. In this experiment, for preprocessing, we have used Byte Pair Encoding for treating a word as a sequence of subword units. We plan to use other algorithms such as Unigram Language Model, Subword Sampling, BPE-dropout and see whether the outcomes observed in this experiment persist. ACKNOWLEDGEMENTS We would like to thank the SCTR’s PICT Data Science Lab for letting us use the resources for an extended time. We would also like to thank Prof. Mukta Takalikar and Dr. Prahlad Kulkarni for their continuous help and motivation. Also, we thank the anonymous reviewers for their insightful comments. REFERENCES [1] Koehn, P. (2009). Statistical machine translation. Cambridge University Press. [2] Kalchbrenner, N., & Blunsom, P. (2013, October). Recurrent continuous translation models. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1700- 1709). [3] Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp. 3104-3112). [4] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. [5] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. [6] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735- 1780. [7] Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen Netzen. Diploma, Technische Universität München, 91(1). [8] Zhou, J., Cao, Y., Wang, X., Li, P., & Xu, W. (2016). Deep recurrent models with fast-forward connections for neural machine translation. Transactions of the Association for Computational Linguistics, 4, 371-383. [9] Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., ... & Dean, J. (2016). Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144. [10] Crego, J., Kim, J., Klein, G., Rebollo, A., Yang, K., Senellart, J., ... & Zoldan, P. (2016). Systran's pure neural machine translation systems. arXiv preprint arXiv:1610.05540. [11] Gehring, J., Auli, M., Grangier, D., & Dauphin, Y. N. (2016). A convolutional encoder model for neural machine translation. arXiv preprint arXiv:1611.02344.
  • 8. International Journal on Natural Language Computing (IJNLC) Vol.10, No.5, October 2021 16 [12] Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. CoRR abs/1705.03122. [13] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998- 6008). [14] Firat, O., Cho, K., & Bengio, Y. (2016). Multi-way, multilingual neural machine translation with a shared attention mechanism. arXiv preprint arXiv:1601.01073. [15] Freitag, M., & Firat, O. (2020). Complete multilingual neural machine translation. arXiv preprint arXiv:2010.10239. [16] Philip, J., Siripragada, S., Namboodiri, V. P., & Jawahar, C. V. (2021). Revisiting Low Resource Status of Indian Languages in Machine Translation. In 8th ACM IKDD CODS and 26th COMAD (pp. 178-187). [17] Edunov, S., Ott, M., Auli, M., & Grangier, D. (2018). Understanding back-translation at scale. arXiv preprint arXiv:1808.09381. [18] Koehn, P., & Knowles, R. (2017). Six challenges for neural machine translation. arXiv preprint arXiv:1706.03872. [19] Chen, M. X., Firat, O., Bapna, A., Johnson, M., Macherey, W., Foster, G., ... & Hughes, M. (2018). The best of both worlds: Combining recent advances in neural machine translation. arXiv preprint arXiv:1804.09849. [20] Jadhav, S. A. (2020). Marathi To English Neural Machine Translation with Near Perfect Corpus and Transformers. arXiv preprint arXiv:2002.11643. [21] Ramesh, G., Doddapaneni, S., Bheemaraj, A., Jobanputra, M., AK, R., Sharma, A., ... & Khapra, M. S. (2021). Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages. arXiv preprint arXiv:2104.05596. [22] Agic, Ž., & Vulic, I. (2020). JW300: A wide-coverage parallel corpus for low-resource languages. [23] Philip, J., Siripragada, S., Namboodiri, V. P., & Jawahar, C. V. (2021). Revisiting Low Resource Status of Indian Languages in Machine Translation. In 8th ACM IKDD CODS and 26th COMAD (pp. 178-187). [24] Schwenk, H., Chaudhary, V., Sun, S., Gong, H., & Guzmán, F. (2019). Wikimatrix: Mining 135m parallel sentences in 1620 language pairs from wikipedia. arXiv preprint arXiv:1907.05791. [25] Haddow, B., & Kirefu, F. (2020). PMIndia--A Collection of Parallel Corpora of Languages of India. arXiv preprint arXiv:2001.09907. [26] Christodouloupoulos, C., & Steedman, M. (2015). A massively parallel corpus: the bible in 100 languages. Language resources and evaluation, 49(2), 375-395. [27] Lison, P., & Tiedemann, J. (2016). Opensubtitles2016: Extracting large parallel corpora from movie and tv subtitles. [28] Anastasopoulos, A., Cattelan, A., Dou, Z. Y., Federico, M., Federman, C., Genzel, D., ... & Tur, S. (2020). Tico-19: the translation initiative for covid-19. arXiv preprint arXiv:2007.01788. [29] Sennrich, R., Haddow, B., & Birch, A. (2015). Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909. [30] Ng, N., Yee, K., Baevski, A., Ott, M., Auli, M., & Edunov, S. (2019). Facebook FAIR's WMT19 News Translation Task Submission. arXiv preprint arXiv:1907.06616. [31] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. [32] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818-2826). [33] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958. [34] Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318). [35] Aditya Mandke (2021). Understanding BLEU. ekdnam.substack.com. [36] Lukas Biewald (2020). Experiment Tracking with Weights and Biases.
  翻译: