Publications

For latest updates, check My Google Scholar profile

Jump to Research Area

    Indian Language Models and Resources

  1. Aman Kumar, Himani Shrotriya, Prachi Sahu, Raj Dabre, Ratish Puduppully, Anoop Kunchukuttan, Mitesh M. Khapra, Pratyush Kumar. IndicNLG Suite: Multilingual Datasets for Diverse NLG Tasks in Indic Languages. arxiv preprint 2203.05437. 2022. [pdf]
  2. Raj Dabre, Himani Shrotriya, Anoop Kunchukuttan, Ratish Puduppully, Mitesh M. Khapra, Pratyush Kumar. IndicBART: A Pre-trained Model for Natural Language Generation of Indic Languages. Findings of the ACL (EMNLP-Findings 2022) . 2022. [pdf]
  3. Tahir Javed, Sumanth Doddapaneni, Abhigyan Raman, Kaushal Santosh Bhogale, Gowtham Ramesh, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra. Towards Building ASR Systems for the Next Billion Users. Conference of the Association for the Advancement of Artificial Intelligence (AAAI 2022) . 2022. [pdf]
  4. Gowtham Ramesh, Sumanth Doddapaneni, Aravinth Bheemaraj, Mayank Jobanputra, Raghavan AK, Ajitesh Sharma, Sujit Sahoo, Harshita Diddee, Mahalakshmi J, Divyanshu Kakwani, Navneet Kumar, Aswin Pradeep, Srihari Nagaraj, Kumar Deepak, Vivek Raghavan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh Shantadevi Khapra. Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages. Transactions of the Association for Computational Linguistics (TACL 2022) . 2022. [pdf]
  5. Rahul Aralikatte, Miryam de Lhoneux,Anoop Kunchukuttan, Anders Søgaard. Itihasa: A large-scale corpus for Sanskrit to English translation. Workshop on Asian Language Translation (WAT 2021) . 2021. [pdf]
  6. Divyanshu Kakwani, Anoop Kunchukuttan, Satish Golla, Gokul N.C., Avik Bhattacharyya, Mitesh M. Khapra, Pratyush Kumar. IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages. Findings of EMNLP (EMNLP-Findings 2020) . 2020. [pdf]
  7. Anoop Kunchukuttan, Divyanshu Kakwani, Satish Golla, Gokul N.C., Avik Bhattacharyya, Mitesh M. Khapra, Pratyush Kumar. AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages. eprint arXiv:2005.00085. [pdf] [arXiv page]
  8. Multilingual Learning

  9. Raj Dabre, Chenhui Chu, Anoop Kunchukuttan. A Survey of Multilingual Neural Machine Translation. ACM Computing Surveys (ACM-CSUR 2020) . 2020. [pdf]
  10. Sumanth Doddapaneni, Gowtham Ramesh, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M Khapra. A Primer on Pretrained Multilingual language models. arxiv pre-print 2107.00676. 2021. [pdf]
  11. Anoop Kunchukuttan. An Empirical Investigation of Multi-bridge Multilingual NMT models. arxiv pre-print 2110.07304. 2021. [pdf]
  12. Vikrant Goyal, Anoop Kunchukuttan, Rahul Kejriwal, Siddharth Jain, Amit Bhagwat. Contact Relatedness can help improve multilingual NMT: Microsoft STCI-MT @ WMT20. Conference on Machine Translation (WMT 2020) . 2020. [pdf]
  13. Rudra Murthy V, Anoop Kunchukuttan, Pushpak Bhattacharyya. Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource Languages. Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL 2019) . 2019. [pdf] [arXiv page]
  14. Pratik Jawanpuria, Arjun Balgovind, Anoop Kunchukuttan, Bamdev Mishra. Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach. Transactions of Association of Computational Linguistics (TACL). 2019. [pdf] [arXiv page] [video]
  15. Anoop Kunchukuttan, Mitesh Khapra, Gurneet Singh, Pushpak Bhattacharyya. Leveraging Orthographic Similarity for Multilingual Neural Transliteration. Transactions of Association of Computational Linguistics (TACL). 2018. [pdf]
  16. Rudramurthy V, Anoop Kunchukuttan, Pushpak Bhattacharyya. Judicious Selection of Training Data in Assisting Language for Multilingual Neural NER. Conference of Association of Computational Linguistics (ACL 2018). 2018. [pdf]
  17. Raj Dabre, Anoop Kunchukuttan, Atsushi Fujita, Eiichiro Sumita. NICT's Participation in WAT 2018: Approaches Using Multilingualism and Recurrently Stacked Layers. 5th Workshop on Asian Language Translation (WAT 2018). 2018. [pdf].
  18. Tamali Banerjee, Anoop Kunchukuttan, Pushpak Bhattacharyya. Multilingual Indian Language Translation System at WAT 2018: Many-to-one Phrase-based SMT . 5th Workshop on Asian Language Translation (WAT 2018). 2018. [pdf].
  19. Anoop Kunchukuttan, Maulik Shah, Pradyot Prakash, Pushpak Bhattacharyya. Utilizing Lexical Similarity between Related, Low-resource Languages for Pivot-based SMT. International Joint Conference on Natural Language Processing (IJCNLP 2017). 2017. [pdf] [arXiv page]
  20. Rohit More, Anoop Kunchukuttan, Raj Dabre, Pushpak Bhattacharyya. Augmenting Pivot based SMT with word segmentation. International Conference on Natural Language Processing (ICON 2015). 2015. [pdf]
  21. Machine Translation

    (many MT papers can be found under the 'Multilingual Learning' and 'Indian Language Resources' sections)
  22. Anoop Kunchukuttan, Pushpak Bhattacharyya. Utilizing Language Relatedness to improve SMT: A Case Study on Languages of the Indian Subcontinent. eprint arXiv:2003.08925. 2020. [pdf] [arxiv]
  23. Anoop Kunchukuttan, Pratik Mehta, Pushpak Bhattacharyya. The IIT Bombay English-Hindi Parallel Corpus. Language Resource and Evaluation Conference (LREC 2018). 2018. [pdf]
  24. Toshiaki Nakazawa, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Win Pa Pa, Isao Goto, Hideya Mino, Katsuhito Sudoh, Sadao Kurohashi. Overview of the 5th Workshop on Asian Translation. 5th Workshop on Asian Translation (WAT 2018). 2018. [pdf]
  25. Sandhya Singh, Ritesh Panjwani, Anoop Kunchukuttan, Pushpak Bhattacharyya. Comparing Recurrent and Convolutional Architectures for English-Hindi Neural Machine Translation. 4th Workshop on Asian Language Translation (WAT 2017, co-located with IJCNLP). 2017. [pdf].
  26. Anoop Kunchukuttan, Pushpak Bhattacharyya. Learning variable length units for SMT between related languages via Byte Pair Encoding. 1st Workshop on Subword and Character level models in NLP (SCLeM 2017, co-located with EMNLP 2017). 2017. [pdf] [arXiv page]
  27. Anoop Kunchukuttan, Pushpak Bhattacharyya. Faster decoding for subword level Phrase-based SMT between related languages. Third Workshop on NLP for Similar Languages, Varieties and Dialects (co-located with COLING 2016). (VarDial3 2016). 2016. [pdf] [arXiv page]
  28. Sandhya Singh, Anoop Kunchukuttan, Pushpak Bhattacharyya. Integrating Neural Probabilistic Language Models with SMT for English-Indonesian Translation. 3rd Workshop on Asian Language Translation (co-located with COLING 2016) (WAT 2016). 2016. [pdf]
  29. Anoop Kunchukuttan, Pushpak Bhattacharyya. Orthographic Syllable as basic unit for SMT between Related Languages. Conference on Empirical Methods in Natural Language Processing. (EMNLP 2016). 2016. [pdf]
  30. Pratik Mehta, Anoop Kunchukuttan, Pushpak Bhattacharyya. Investigating the potential of postordering SMT output to improve translation quality. International Conference on Natural Language Processing (ICON 2015). 2015. [pdf]
  31. Rajen Chatterjee, Anoop Kunchukuttan, Pushpak Bhattacharyya. Supertag Based Pre-ordering in Machine Translation . International Conference on Natural Language Processing (ICON 2014). 2014. [pdf]
  32. Anoop Kunchukuttan, Ratish Pudupully, Rajen Chatterjee, Abhijit Mishra, Pushpak Bhattacharyya. 2014. The IIT Bombay SMT System for ICON 2014 Tools Contest . NLP Tools Contest at ICON 2014 (ICON 2014). [pdf]    [pdf]
  33. Piyush Dungarwal, Rajen Chatterjee, Abhijit Mishra, Anoop Kunchukuttan, Ritesh Shah, Pushpak Bhattacharyya. The IIT Bombay Hindi,English Translation System at WMT 2014 . Workshop on Machine Translation (WMT 2014). 2014. [pdf]
  34. Anoop Kunchukuttan, Abhijit Mishra, Rajen Chatterjee, Ritesh Shah, Pushpak Bhattacharyya. Shata-Anuvadak: Tackling Multiway Translation of Indian Languages . Language and Resources and Evaluation Conference (LREC 2014). 2014. [pdf]
  35. Anoop Kunchukuttan, Pushpak Bhattacharyya. Partially modelling word reordering as a sequence labelling problem. First Workshop on Reordering for Statistical Machine Translation(RMT 2012, co-located with COLING). 2012. [pdf]
  36. Anoop Kunchukuttan. The Reordering Problem in Statistical Machine Translation. Ph.D Seminar Report. 2012. [pdf]
  37. Representation Learning

  38. Pratik Jawanpuria, Satya Dev N T V, Anoop Kunchukuttan, Bamdev Mishra. Learning Geometric Word Meta-Embeddings. Proceedings of the 5th Workshop on Representation Learning for NLP. 2020. [pdf] [video]
  39. Deep Learning

  40. Mayank Meghwanshi, Pratik Jawanpuria, Anoop Kunchukuttan, Hiroyuki Kasai, Bamdev Mishra. McTorch, a manifold optimization library for deep learning . The ACM India Joint International Conference on Data Science and Management of Data (CODS-COMAD 2019). 2019. [arXiv page]
  41. Mayank Meghwanshi, Pratik Jawanpuria, Anoop Kunchukuttan, Hiroyuki Kasai, Bamdev Mishra. McTorch, a manifold optimization library for deep learning . Workshop on Machine Learning Open Source Software (MLOSS 2018, co-located with NIPS). 2018. [arXiv page]
  42. Transliteration

  43. Anoop Kunchukuttan, Siddharth Jain, Rahul Kejriwal. A Large-scale Evaluation of Neural Machine Transliteration for Indic Languages. Conference of the European Chapter of the ACL (EACL 2021) . 2021. [pdf]
  44. Anoop Kunchukuttan, Pushpak Bhattacharyya, Mitesh Khapra. Substring-based unsupervised transliteration with phonetic and contextual knowledge. SIGNLL Conference on Computational Natural Language Learning (CoNLL 2016). 2016. [pdf]
  45. Anoop Kunchukuttan, Pushpak Bhattacharyya. Data representation methods and use of mined corpora for Indian language transliteration . Shared Task at Named Entities Workshop, ACL 2015 (NEWS 2015). 2015. [pdf]
  46. Anoop Kunchukuttan, Ratish Puduppully , Pushpak Bhattacharyya, Brahmi-Net: A transliteration and script conversion system for languages of the Indian subcontinent , Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies: System Demonstrations (NAACL 2105) . 2015.    [pdf]
  47. Mitesh M. Khapra, Ananthakrishnan Ramanathan, Anoop Kunchukuttan, Karthik Visweswariah, Pushpak Bhattacharyya. When Transliteration Met Crowdsourcing : An Empirical Study of Transliteration via Crowdsourcing using Efficient, Non-redundant and Fair Quality Control . Language and Resources and Evaluation Conference (LREC 2014). 2014. [pdf]
  48. Grammar Correction

  49. Anoop Kunchukuttan, Pushpak Bhattacharyya. Addressing Class Imbalance in Grammatical Error Detection with Evaluation Metric Optimization . International Conference on Natural Language Processing (ICON 2015). 2015. [pdf]
  50. Anoop Kunchukuttan, Sriram Chaudhury, Pushpak Bhattacharyya. Tuning a Grammar Correction System for Increased Precision . Proceedings of the Eighteenth Conference on Computational Natural Language Learning (CoNLL 2014) . 2014. [pdf]
  51. Anoop Kunchukuttan, Ritesh Shah, Pushpak Bhattacharyya. IITB System for CoNLL 2013 Shared Task: A Hybrid Approach to Grammatical Error Correction . Proceedings of the Seventeenth Conference on Computational Natural Language Learning (CoNLL 2013) . 2013. [pdf]
  52. Natural Language Generation

  53. Aditya Joshi, Anoop Kunchukuttan, Pushpak Bhattacharyya, Mark J Carman. SarcasmBot>: An open-source sarcasm-generation module for chatbots. WISDOM at KDD. 2015. [pdf]
  54. Crowdsourcing

  55. Anoop Kunchukuttan, Rajen Chatterjee, Shourya Roy, Abhijit Mishra and Pushpak Bhattacharyya. TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain . Proceedings of the Association of Computational Linguistics (ACL 2013). 2013.[pdf]
  56. Anoop Kunchukuttan, Shourya Roy, Pratik Patel, Somya Gupta, Kushal Ladha, Mitesh Khapra, Pushpak Bhattacharyya. Experiences in Resource Generation for Machine Translation through Crowdsourcing . Language and Resources and Evaluation Conference (LREC 2012). 2012. [pdf]
  57. Anoop Kunchukuttan, Shourya Roy, Pratik Patel, Somya Gupta, Kushal Ladha, Mitesh Khapra, Pushpak Bhattacharyya. Experiences in Resource Generation for Machine Translation through Crowdsourcing. (CrowdConf 2011). 2011. [pdf]
  58. Multiword Expressions

  59. Anoop Kunchukuttan, Munish Munia, Pushpak Bhattacharyya. Multiword Expressions in the CLIA project. Vishwabharat. Jan-June 2012. [pdf]
  60. Anoop Kunchukuttan and Om P.Damani. A System for Compound Noun Multiword Expression Extraction for Hindi. 6th Intl. Conf. on Natural Language Processing (ICON 2008). 2008.[pdf]

    Patents Applied

  1. Crowdsourcing translation services. A Kunchukuttan, S Roy, M Khapra, N Cancedda, P Bhattacharyya. US Patent App. 13/592,736