LATEST UPDATES
latest

Will Programmers become obsolete in future because of this Jupyter extension?

convert english queries into python code like gpt-3
Since the time GPT-3 arrived on the scene, there have been talk about whether programmers will becomes obsolete in future. This might seem like a distant dream, but Kartik Godawat and Deepak Rawat has already transformed this into somewhat of a reality.

Inspired by GPT-3, Kartik Godawat and Deepak Rawat has created a Jupyter extension which converts queries in natural English language into relevant Python code. The extension, named Text2Code, is a supervised model that can work on a predefined training pipeline. The model approaches the problem into the following components:

  • Collecting training data: First, they used some general English commands, and then they generated variations of the same using an elementary generator. This dataset was used to simulate the end user queries.
  • Intent matching: This asks a simple question: What does the user wants? The authors used Universal Sentence Encoder (this is similar to word2vec) to embed the user query and find cosine similarity with their predefined intent queries from the generated dataset.
  • NER(Named Entity Recognition): In this layer, the model identified the variables(entities) in the sentences. Although the authors intially explored HuggingFace models but later they ended up using spaCy to train a custom model. The reason behind this decision was owing to the HuggingFace model architechture which are transformer based, and are a bit heavy as compared to spaCy.
  • Fill Template: Here, the model extracted entities in a fixed template to generate code.
  • Integrating with Jupyter: Finally, in this layer everything was wrapped in a single python package that can be installed via pip. The authors created a frontend and a server extension, which gets loaded when the Jupyter notebook is opened. The frontend sends the query to the server to fetch the generated template code, and then inserts it in the cell and finally executes it.

    Here's a quick demo to show the capabilities of this model, this was prepared using the Chai Time Data Science dataset from Kaggle by Sanyam Bhutani:


    Do the programmers need to feel worried?
    Although, from the video that's what it looks like, but not yet. This model has lot of room for improvement. It can generate codes only in Python. It has support only for Ubuntu and macOS, the authors are still working on adding Windows to the list. They also needs to add support for more code, improve intent detection and NER, explore sentence Paraphrasing to generate higher-quality training data, gather real-world variable names, library names as opposed to randomly generating them as it is done now.

    However, Kartik Godawat and Deepak Rawat think that with enough data, they will be able to train a language model to directly do English-to-code generation like GPT-3 does, instead of having separate stages in the pipeline. To do that, they have planned to create a survey to collect linguistic data. The code is not production-ready, but it is good enough for anyone to modify and use it on their own. You too can give it a try, here is the GitHub repository from where you find the information on how to install Text2Code locally, and here is our Facebook page where you can show us some social media love.
    « PREV
    NEXT »