Open Source Guides for Prompting AI Large Language Models

With AI prompt engineering one of the hottest disciplines in IT right now, we take a look at some of the top open source "best practices" type of guides on GitHub.

Speaking to that "hotness," companies have recently posted help-wanted ads with salary ranges topping out at $335,000 per year.

That new role has appeared largely thanks to the advent of cutting-edge generative AI systems based on the GPT series of large language models (LLMs) for machine learning created by Microsoft partner OpenAI. They include:

  • GPT-4, the newest large language model (LLM) from OpenAI, now powering Microsoft's "new Bing" search experience
  • ChatGPT, an advanced chatbot using natural language processing, or NLP
  • Codex, a GPT-3 descendant optimized for coding by being trained on both NLP and billions of lines of code
  • DALL-E 2, an AI system that can create realistic images and art from an NLP description
  • Lately, however, Google has joined the corporate AI race with an LLM-powered search bot of its own, called Bard. It is powered by Google's research LLM described as a lightweight and optimized version of LaMDA, which was announced in May 2021 when Google called its homegrown solution "our breakthrough conversation technology." Like Microsoft, Google is infusing advanced AI into other offerings, like office products.

    Whatever your LLM of choice, the GitHub-hosted prompting guides can help organizations better utilize advanced AI systems, especially generative AI constructs like ChatGPT, a sentient-sounding chat bot that has caught on like wildfire. Following is a roundup of some of the top guides on the open source-focused GitHub platform.

    Microsoft: Prompt Engineering
    What better place to start than Microsoft, the corporate leader in the AI space thanks to more than $10 billion in investments to OpenAI already enacted or planned?

    Microsoft has a Prompt Engineering repo that hosts articles to help users leverage OpenAI's Codex models for generating and manipulating code.

    The main article is "How to get Codex to produce the code you want!"

    It explains that the best way to learn how to use OpenAI models is to try them out in the OpenAI Playground, which requires an OpenAI account that will provide a certain amount of free credits before you have to start paying.

    Microsoft also explains the concepts of zero-shot, one-shot and few-shot learning. They refer to the number of examples included in a prompt, so zero-shot prompts have no examples, one-shot prompts include one example, and so on.

    [Click on image for larger view.] A Diagram Capturing the General Pattern for Few-Shot Learning (source: Microsoft).

    The article goes on to list best practices including:

    • Tell It: Guide the Model with a High Level Task Description
    • Show It: Guide the Model with Examples
    • Describe It: Guide the Model with High Level Contextual Information
    • Remind It: Guide the Model with Conversational History

    Putting all that together can result in a prompt like:

    1. High level task description: Tell the model to use a helpful tone when outputting natural language
    2. High level context: Describe background information like API hints and database schema to help the model understand the task
    3. Examples: Show the model examples of what you want
    4. User input: Remind the model what the user has said before

    The Microsoft repo also provides links to examples to see prompt engineering in practice.

    OpenAI Cookbook
    Straight from the AI horse's mouth, this repo shares example code for accomplishing common tasks with the OpenAI API, which again requires an OpenAI account and associated API key.

    [Click on image for larger view.] OpenAI Playground (source: OpenAI).

    The repo provides a large selection of examples and articles. Among the latter are "How to format inputs to ChatGPT models. It explains a chat API requires two inputs:

  • model: the name of the model you want to use (e.g., gpt-3.5-turbo, gpt-4, gpt-4-0314)
  • messages: a list of message objects, where each object has two required fields:
    • role: the role of the messenger (either system, user, or assistant)
    • content: the content of the message (e.g., Write me a beautiful poem)
  • Another article that provides prompting advice is How to work with large language models.

    It explains that among all the inputs to a LLM, by far the most influential is the text prompt, going on to explain how they can be prompted to produce output in a few ways:

    • Instruction: Tell the model what you want
    • Completion: Induce the model to complete the beginning of what you want
    • Demonstration: Show the model what you want, with either:
      • A few examples in the prompt
      • Many hundreds or thousands of examples in a fine-tuning training dataset

    The article then provides examples of each method.

    OpenAI's GitHub repo also includes many other resources, including guidance on API usage, embeddings, Microsoft's alternative API, Azure OpenAI and much more.

    Other Players
    Along with AI alpha players Microsoft and OpenAI, several other GitHub repos have been created to provide prompting guidance. They include:

    • Prompt Engineering Guide: Coming from DAIR.AI (Democratizing Artificial Intelligence Research, Education, and Technologies), this repo boasts some 19,600 stars.

      In addition to other resources, it has guides ranging from basic prompting to advanced prompting, adversarial prompting and more. However, the repo says they are now outdated and instead points to the Prompt Engineering Guide site. It features sections on the basics of prompting, prompt elements, general tips for designing prompts and much more.

    • Awesome Prompt Engineering: Coming from PromptsLab, this repo contains hand-curated resources for prompt engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM and so on. It promises that a Prompt Engineering Course is coming soon.

      Extant resources include papers, tools & code, APIs, datasets, models and much more.

    • Prompt In-Context Learning: Coming from EgoAlpha Lab, this is described as an open-source engineering guide for prompt-in-context-learning.

      It offers up news, papers, a playground and sections on prompt engineering ("Prompt techniques for leveraging large language models") and ChatGPT prompt ("Prompt examples that can be applied in our work and daily lives.").

    • Awesome AI image synthesis: As its name suggests, this targets the image-generating AI camp, providing "A list of awesome tools, ideas, prompt engineering tools, colabs, models, and helpers for the prompt designer playing with aiArt and image synthesis." It covers the aforementioned DALL-E 2 model from OpenAI along with MidJourney and StableDiffusion.

      The repo's prompt engineering section includes tools for the prompt engineer, artist studies, browser extensions, tips & tricks and inspiration tools.

      It also has sections on text-to-image models, post-processing tools, communities, theory & learning and more.

    With some companies dangling jobs that pay up to $335,000 per year, more prompt engineering guidance is coming online all the time on GitHub and the internet, so would-be prompters can start with the above and keep checking for new guidance as the discipline -- and entire LLM space -- evolves.

    About the Author

    David Ramel is an editor and writer for Converge360.


    Subscribe on YouTube