New Language Model Outperforms GPT-3 with Lesser Parameters

gpt-3 killer language model
GPT-3 has been turning heads with it's extraordinary 175 Billion parameters performance, so much so that Microsoft partnered with OpenAI for an exclusive license to it. But recently, in a paper published on arXiv by Timo Schick and Hinrich Schutze from the Center for Information and Language Processing (Ludwig Maximilian University, Munich), it was shown that the same performance can be achieved by a language model using only a tiny fraction of those many parameters.

According to this paper named It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners, their language model outperforms GPT-3 on the superGlue benchmark tests with only 233 million parameters! They have achieved this feat by converting textual inputs into cloze questions that contain some form of task description, combined with gradient-based optimization.
language model outperforms GPT-3
No, I didn't spell it wrong. It is "cloze question".
Cloze: adjective. pertaining to or being a procedure used to measure comprehension or text difficulty, in which a person is called upon to supply elements that have been systematically deleted from a text.

Cloze questions, or Embedded answers questions consist of a passage of text that can have various answers embedded within it, including multiple choice, short answers and numerical answers. Here's an example of an cloze question with a short answer response from the University of Massachusetts Amherst website:
"Bigfoot, also known as Sasquatch, is the name of a phenomenon which has polarized people around the world, being either the product of vivid imagination or a creature that has somehow avoided close observation or capture by man."
They have used a technique called PET, or pattern-exploiting training, an alternative to GPT-3's priming. And although this additionally requires unlabeled data, it does not pose much of a problem since it is way more easier to obtain unlabeled training data than it's counterpart, labeled training data, for many real-world applications.

Is GPT-3 dying?
No. This model is not here to compete with GPT-3. It has it's own limitations, like PET only works when the answers to be predicted by the language model corresponds to a single token in its vocabulary, and it cannot outperform GPT-3 in all tasks.

However, this language model is quite an achievement in itself since now a GPT-3 like result can be achieved for certain tasks, without sacrificing millions of dollars on expensive state-of-the-art hardware, which is a boon, especially for independent researchers.

If you are interested in more such articles, give us a Like on Facebook.

[Picture Courtsey: It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners]