The increasing convergence of computer vision and NLP

Image from DETR paper

Transformer architecture has achieved state-of-the-art results in many NLP (Natural Language Processing) tasks. One of the main breakthroughs with the Transformer model could be the powerful GPT-3 released in the middle of the year, which has been awarded Best Paper at NeurIPS2020.

In Computer Vision, CNNs have become the dominant models for vision tasks since 2012. There is an increasing convergence of computer vision and NLP with much more efficient class of architectures.

Using Transformers for vision tasks became a new research direction for the sake of reducing architecture complexity, and exploring scalability and training efficiency.

The following are…


Should you use it for your project?

OpenAI released the pricing plan for accessing GPT3 services a week ago. To further understand what these pricing tiers mean, what is the monthly cost for an application, and what are implications to business decisions, I will share some of my findings.

The Plans:

The pricing plan has been settled into 4 tiers: Explore, Create, Build, and Scale. For the “Create” plan, you will pay $100 per month for 2M tokens, and 8 cents per additional 1K tokens. …


A low-code no ML setup!

<span>Photo by <a href=”https://unsplash.com/@felixmooneeram?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=cred
<span>Photo by <a href=”https://unsplash.com/@felixmooneeram?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=cred
Photo by Felix Mooneeram on Unsplash

Recommendation systems are so successful in many products and services we interact with every day. Like 40% of app installs on Google Play and 60% of watch time on YouTube comes from recommendations. Not even to mention the well-known TikTok’s recommendation system.

Traditionally, to build a ML based recommendation system, several components and stages are needed, like candidates generation, scoring, ranking. In terms of candidates generation, common approaches like content based filtering, collaborative filtering and Deep Neural Network (DNN) are well covered in many materials.

Now, as part of GPT3 exploration, here we demonstrate a low-code, no ML approach (yes…


From API to AI as a Service

In the previous article <GPT3 The Dream Machine in Real World>, I covered some highlights about what is GPT-3, why GPT-3 and OpenAI’s API is a big deal and what are the possible use cases with some inspiring examples. I also touched a bit on why it creates a paradigm shift in the future of developing AI products and society. Several people followed up on that topic, so I will share some of my thoughts and further expand it.

The paradigm shift

The paradigm shift comes from 3 different angles for the possible changes with the debut of OpenAI API or the era…


A paradigm shift for AI products

Like Sam’s tweets, there is a lot of hype in the tech community about the latest GPT-3 released by OpenAI in June 2020, but it is still powerful and impressive when you interact with it. GPT-3 is the largest language model ever trained and achieved good results on several NLP tasks like language generation and language translation, with huge potentials for many other creative and functional tasks.

Here we will go over a couple of highlights and have a more clear view of what the model can and cannot do, and how to utilize it to empower various applications…


Introducing a new feature from Mosaic, the news. It enables you digest and engage with the news you care about in a brand new way.

With Mosaic, Alexa will list the most noteworthy headlines accompanied by brief, yet concise descriptions. With just a few simple words, the full length article you want to later digest will be saved in your Facebook Messenger for another time.

Why we built it?

Context change, Experience stay

Voice interface is rising fast! Amazon’s Alexa is proof, with over seven millions devices in households and more than 10,000 voice enabled skills available. Bringing convenience into the home for consuming…


New workflows

Introducing 6 new room based workflows, you can easily group all the smart devices and services based on their location and trigger them all with one simple command on the platforms you like.

On the Alexa platform, you can say “Alexa, ask Mosaic to turn on kitchen”

Your Hue lights turns on, Nest warms up your home, today’s weather forecast is read to you, then you will hear your tesla battery level and mile range report, finally with a brief summary of the google calendar events.

You can now turn off any workflow on Facebook Messenger, Slack or…


Meet Mosaic

Mosaic is a chatbot for your connected devices. It allows you to connect your smart devices and digital service together in one place. Using the Echo, Facebook Messenger, Slack and SMS, Mosaic makes interacting with your devices as easy as chatting with a friend.

I still remember the moment when I first unboxed the Echo and started playing around with several of its features; I was totally amazed by the experience. …

Cheng He

AI evangelist, engineer, entrepreneur, YC alum, clubhouse @satorii

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store