ML News Monthly – Feb 2021

Welcome to the fifth edition of ML News Monthly – Feb 2021!!

Here are the key happenings this month in the Machine Learning field that I think are worth knowing about. 🕸


1) Bollywood Movies Still Connect Beauty with Fair Skin, Reveals AI-Based Study

https://beebom.com/bollywood-movies-still-connect-beauty-with-fair-skin-reveals-ai-study/

2) New deep learning models require fewer neurons

https://www.csail.mit.edu/news/new-deep-learning-models-require-fewer-neurons

3) Startup says A.I. helped it find treatment for rare lung disease in record time

Hong Kong–based biotechnology company, Insilico Medicine, which uses A.I. tools to help it find potential new therapies, announced Tuesday it has brought a drug candidate from an initial scientific hunch to the cusp of human clinical trials in less than 18 months, a time span the company says may be a new record for a process that often takes more than four years.

https://fortune-com.cdn.ampproject.org/c/s/fortune.com/2021/02/24/insilico-a-i-rare-lung-disease-ipf-record-time/amp/

4) USA Data Science Job Market Shrinking as Data Engineering Grows Exponentially, New Study by Interview Query

https://finance.yahoo.com/news/data-science-job-market-shrinking-122300456.html

5) New Contextual Calibration Method Boosts GPT-3 Accuracy Up to 30%

https://medium.com/syncedreview/new-contextual-calibration-method-boosts-gpt-3-accuracy-up-to-30-cf1cb6931af1

6) India’s 40 Under 40 Data Scientists

This award brings together the brightest leaders in the Data Science field in India and celebrates their achievements.

7) Digital Owl emerges from stealth with AI that analyzes and summarizes medical records

https://venturebeat-com.cdn.ampproject.org/c/s/venturebeat.com/2021/02/08/digital-owl-emerges-from-stealth-with-ai-that-analyzes-and-summarizes-medical-records/amp/

8) Korea adopting Israeli Technology Breakthrough for Learning English

MagniLearn, a leading Israeli ed-tech company that is transforming English learning with purely personalised technology announced a partnership today with Korea’s “The Education Company”, a leading network of schools with over 5,000 students throughout Korea and with Kim Venturous as a local strategic partner.

https://www.prnewswire.com/news-releases/korea-adopting-israeli-technology-breakthrough-for-learning-english-301222342.html

9) Google BERT vs SMITH: How They Work & Work Together

Earlier, on ‘Search Engine Journal’, the author Roger Montti covered the Google research paper on a new Natural Language Processing algorithm named SMITH.

The conclusion? That SMITH outperforms BERT for long documents.

https://www-searchenginejournal-com.cdn.ampproject.org/c/s/www.searchenginejournal.com/smith-bert-smith-and-bert-in-search/394923/amp/

10) Transformers Scale to Long Sequences With Linear Complexity Via Nyström-Based Self-Attention Approximation

https://medium.com/@Synced/transformers-scale-to-long-sequences-with-linear-complexity-via-nyström-based-self-attention-c67c851ddc8a

11) Snap partners with ShareChat’s Moj to roll out Camera Kit

Snap has partnered with ShareChat’s Moj app to integrate its Camera Kit into the Indian app as the American social giant looks to accelerate its growth in the world’s second largest internet market

https://techcrunch-com.cdn.ampproject.org/c/s/techcrunch.com/2021/02/10/snap-partners-with-sharechats-moj-to-roll-out-camera-kit/amp/

12) Why ML in Production is (still) Broken and Ways we Can Fix it

https://hackernoon.com/why-ml-in-production-is-still-broken-and-ways-we-can-fix-it-e33k32jc

13) 4 PyTorch Lightning Community NLP Examples To Inspire Your Next Project!

https://medium.com/pytorch-lightning/4-pytorch-lightning-community-nlp-examples-to-inspire-your-next-project-afd09297601d

14) 3 PyTorch Lightning Winning Community Kernels to Inspire your Next Kaggle Victory

https://medium.com/pytorch-lightning/3-pytorch-lightning-winning-community-kernels-to-inspire-your-next-kaggle-victory-ea355456229a

15) Retrieval Augmented Generation with Huggingface Transformers and Ray

https://medium.com/distributed-computing-with-ray/retrieval-augmented-generation-with-huggingface-transformers-and-ray-b09b56161b1e

16) ELLIS NLP kick-off workshop

ELLIS (European Laboratory for Learning and Intelligent Systems, https://ellis.eu) is a European grassroots initiative in AI and ML with a focus on scientific excellence, innovation, and societal impact. The new established ELLIS NLP program, which is led by Iryna Gurevych, AndrĂ© Martins, and Ivan Titov includes NLP fellows and scholars from 15 European institutions (https://ellis.eu/programs/natural-language-processing)

17) Exploring hyperparameter meta-loss landscapes with Jax

This post will walk through an example showing how extraordinarily complex meta-loss landscapes can emerge from a relatively simple setting and as a result gradients of these loss landscapes become a lot less useful. This is done using a relatively new machine learning library: Jax.

http://lukemetz.com/exploring-hyperparameter-meta-loss-landscapes-with-jax/

18) Evolving Neural Networks in JAX

https://roberttlange.github.io/posts/2021/02/cma-es-jax/

19) Parallelizing neural networks on one GPU with JAX

http://willwhitney.com/parallel-training-jax.html

20) NLP for India – a relentless pursuit in innovation and creativity

https://indiaai.gov.in/article/nlp-for-india-a-relentless-pursuit-in-innovation-and-creativity

21) India Budget 2021 : Finance Minister allocates Rs 50,000 crore for National Research Foundation

https://www-moneycontrol-com.cdn.ampproject.org/c/s/www.moneycontrol.com/news/business/budget-2021-sitharaman-allocates-rs-50000-crore-for-national-research-foundation-6407411.html/amp

22) China’s ed tech unicorns prove that remote learning can work

https://www.wired.co.uk/article/kai-fu-lee-china-ed-tech

23) Spotify patents tech to recommend songs based on users’ speech, emotion

The music-streaming company Spotify has been granted a patent for technology that aims to interpret users’ speech and background noise to better curate the music it serves up.

https://www.axios.com/spotify-patent-users-speech-recommend-music-6c5ce99d-ca0f-4457-9b87-9d27fcc35527.html

FAANG / GAFAM / FANGAM / BATX


24) Google – Introducing Model Search: An Open Source Platform for Finding Optimal ML Models

https://ai.googleblog.com/2021/02/introducing-model-search-open-source.html?m=1

25) Facebook AI’s Multitask & Multimodal Unified Transformer: A Step Toward General-Purpose Intelligent Agents

https://medium.com/syncedreview/facebook-ais-multitask-multimodal-unified-transformer-a-step-toward-general-purpose-98db2c858603

26) Speller100: Zero-shot spelling correction at scale for 100-plus languages

Microsoft has recently launched a large-scale multilingual spelling correction models worldwide with high precision and high recall in 100-plus languages! These models, technology they collectively call Speller100, are currently helping to improve search results for these languages in Bing.

https://www.microsoft.com/en-us/research/blog/speller100-zero-shot-spelling-correction-at-scale-for-100-plus-languages/

27) Improving Mobile App Accessibility with Icon Detection

https://ai.googleblog.com/2021/01/improving-mobile-app-accessibility-with.html?m=1

28) Azure Quantum is now in Public Preview

Azure Quantum, the world’s first full-stack, public cloud ecosystem for quantum solutions, is now open for business. Developers, researchers, systems integrators, and customers can use it to learn and build solutions based on the latest innovations—using familiar tools in the public cloud.

https://cloudblogs.microsoft.com/quantum/2021/02/01/azure-quantum-preview/

29) Google’s Voice AI accelerator launches 12 startups

https://venturebeat-com.cdn.ampproject.org/c/s/venturebeat.com/2021/02/22/googles-voice-ai-accelerator-launches-12-startups/amp/

30) Facebook’s Continual Transfer Learning Benchmark

https://github.com/facebookresearch/CTrLBenchmark

Papers


31) Wav2Vec 2

Authors at Facebook & Hugging face show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.

https://huggingface.co/facebook/wav2vec2-base-960h

https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/

32) The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Authors introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. GEM provides an environment in which models can easily be applied to a wide set of corpora and evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models.

https://arxiv.org/abs/2102.01672

33) ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision

In this paper, Authors present a minimal VLP model, Vision-and-Language Transformer (ViLT), monolithic in the sense that processing of visual inputs is drastically simplified to just the same convolution-free manner that we process textual inputs. They show that ViLT is up to 60 times faster than previous VLP models, yet with competitive or better downstream task performance.

https://arxiv.org/abs/2102.03334

34) Aspect-Sentiment Embeddings for Company Profiling and Employee Opinion Mining

With the multitude of companies and organizations abound today, ranking them and choosing one out of the many is a difficult and cumbersome task.

Authors aim to overcome the aforementioned problem by generating aspect-sentiment based embedding for the companies by looking into reliable employee reviews of them. They created a comprehensive dataset of company reviews from the famous website Glassdoor.com and employed a novel ensemble approach to perform aspect-level sentiment analysis.

https://arxiv.org/abs/1902.08342v1

35) Speech Recognition by Simply Fine-tuning BERT

Authors propose a simple method for automatic speech recognition (ASR) by fine-tuning BERT, which is a language model (LM) trained on large-scale unlabeled text data and can generate rich contextual representations. The assumption is that given a history context sequence, a powerful LM can narrow the range of possible choices and the speech signal can be used as a simple clue.

https://arxiv.org/abs/2102.00291

36) Learning the language of viral evolution and escape

https://science.sciencemag.org/content/371/6526/284/tab-pdf

37) Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models

https://arxiv.org/abs/2102.02503

Courses / Resources


38) Hugging Face on PyTorch / XLA TPUs: Faster and cheaper training

https://huggingface.co/blog/pytorch-xla

39) A Complete Machine Learning Project From Scratch: Setting Up

https://www.mihaileric.com/posts/setting-up-a-machine-learning-project/

40) Keeping Up with PyTorch Lightning and Hydra — 2nd Edition

https://towardsdatascience.com/keeping-up-with-pytorch-lightning-and-hydra-2nd-edition-34f88e9d5c90

41) How Positional Embeddings work in Self-Attention (code in Pytorch)

https://theaisummer.com/positional-embeddings/

42) Papers with Code – PyTorch Image Models

https://paperswithcode.com/lib/timm

43) Python Outlier Detection (PyOD)

PyOD is a comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data.

https://github.com/yzhao062/pyod

44) CS329s Lecture 3: Data engineering (Chip Huyen Notes)

https://docs.google.com/document/d/1b9iuZiDEGVLHyMmnf6w2y1aN6yWQhAyqk3GHlpI9q6M/mobilebasic

45) Multilingual and code-switching ASR challenges for low resource Indian languages

https://navana-tech.github.io/IS21SS-indicASRchallenge/

46) Pororo: A Deep Learning based Multilingual Natural Language Processing Library

https://github.com/kakaobrain/pororo

47) tez: train pytorch models fasterrrrr

https://github.com/abhishekkrthakur/tez/

48) Question Generation using 🤗transformers

https://github.com/patil-suraj/question_generation

https://github.com/ramsrigouthamg/Questgen.ai

https://www.udemy.com/course/question-generation-using-natural-language-processing/

49) Jina AI

Jina is a deep learning-powered search framework for building cross-/multi-modal search systems (e.g. text, images, video, audio) on the cloud.

https://github.com/jina-ai/jina

50) 6 Things in SaaS That Are Only Obvious At Scale

51) Startup Freshworks Hits $300 Million in Sales With IPO Looming

https://www.bloomberg.com/news/articles/2021-02-15/startup-freshworks-hits-300-million-in-sales-with-ipo-looming

52) BudgetML: Deploy ML models on a budget

https://github.com/ebhy/budgetml

53) Python Concurrency: The Tricky Bits

https://python.hamel.dev/concurrency/

54) How to use RAPIDS on Amazon SageMaker

https://datamuni.com/@atsunorifujita/how-to-use-rapids-on-amazon-sagemaker

55) Hugging Face Transformers Package – What Is It and How To Use It

https://www.kdnuggets.com/2021/02/hugging-face-transformer-basics.html#.YCvsieJ-8Fs.linkedin


That’s it !!

Let me know if I missed anything or if there’s anything you think should be included in a future post.