I’m planning to start a new monthly summary post, detailing some of the most interesting things on Machine Learning I’ve found in the last month.
Welcome to the first edition of ML News Monthly – Oct 2020!!
Here are the key happenings this month in the Machine Learning field that I think are worth knowing about. 🕸
1. First, many-to-many multilingual model that translates 100×100 languages without relying on English data
Facebook AI is introducing, M2M-100 the first multilingual machine translation (MMT) model that translates between any pair of 100 languages without relying on English data. When translating, say, Chinese to French, previous best multilingual models train on Chinese to English and English to French, because English training data is the most widely available. This model directly trains on Chinese to French data to better preserve meaning.
2. Introducing spaCy v3.0 nightly
spaCy is releasing it’s new version – spaCy v3.0, with bunch of new features. It includes new transformer-based pipelines that get spaCy’s accuracy right up to the current state-of-the-art, and a new workflow system to help you take projects from prototype to production.
3. Python is Slowly Losing its Charm 🐍
Although Python dominates the fields of Data Science and Machine Learning, and, to some extent, Scientific and Mathematical computing, it does have its share of disadvantages when compared to newer languages like Julia, Swift and Java.
4. Biggest challenge in making ML work in the real world with Richard Socher
This is part of ML podcast by Weights & Biases.
In this podcast, Richard Socher, ex-Chief Scientist at Salesforce, talks about The AI Economist, NLP protein generation, and biggest challenges in making ML work in real word.
The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policieshttps://arxiv.org/abs/2004.13332
ProGen: Language Modeling for Protein Generation: blog.einstein.ai/progen/
5. Break into NLP by deeplearning.ai
A panel of experts in the NLP field talk about their current projects, and the future of NLP and also provide career advice for ML practitioners or non-MLEs hoping to break into NLP.
6. Advancing NLP with Efficient Projection-Based Model Architectures – Google Blog
Building on the success of PRADO, Google developed an improved NLP model, called pQRNN. This model is composed of three building blocks, a projection operator that converts tokens in text to a sequence of ternary vectors, a dense bottleneck layer and a stack of QRNN encoders.
pQRNN is able to achieve near BERT level performance with very less model parameters.
7. Advanced Natural Language Processing with Apache Spark NLP
This hands-on tutorial by David Talby, the CTO of John Snow labs, uses the open-source Spark NLP library to explore advanced NLP in Python.
8. Text Classification with NLP: Tf-Idf vs Word2Vec vs BERT
This post explains 3 different strategies for text multiclass classification: the old-fashioned Bag-of-Words (with Tf-Idf ), the famous Word Embedding (with Word2Vec), and the cutting edge Language models (with BERT).
9. Indic BERT
Indic bert is a multilingual ALBERT model that exclusively covers 12 major Indian languages. It is pre-trained on our novel corpus of around 9 billion tokens and evaluated on a set of diverse tasks. Indic-bert has around 10x fewer parameters than other popular publicly available multilingual models while it also achieves a performance on-par or better than these models.
10. Spotify Open-Sources Klio, An AI Framework For Next Generation Audio Algorithms
Klio is an ecosystem which allows developers to process audio files or any other binary files at any scale. Built by Spotify, Klio runs large-scale audio intelligence systems at the digital music platform and is used by teams of engineers and audio researchers to help develop and deploy next-generation audio algorithms.
Klio is built upon Apache Beam, and the jobs are opinionated data pipelines in Python. Tuned for audio and binary file processing, the goal of this framework is to create smarter data pipelines for audio.
That’s it !!
Let me know if I missed anything or if there’s anything you think should be included in a future post.