Microsoft Details How Enterprises Can Use ChatGPT with Up-to-Date, Non-Public Corporate Data

Microsoft just detailed how enterprises can use ChatGPT to generate responses based on their own data.

The sentient-sounding ChatGPT chatbot is based on advanced machine language models created by Microsoft partner OpenAI, but it has a September 2021 knowledge cutoff, so its responses can't incorporate any information beyond that date.

That limits enterprise use, as they likely would prefer responses based on much more current data, and especially information available on their own internal systems.

To that end, Microsoft yesterday (March 9) explained how that can be done using newly available services.

The solution involves the brand-new ChatGPT tech infused into the company's Azure OpenAI Service, along with another Microsoft cloud offering, Azure Cognitive Search.

Microsoft said its solution "integrates the enterprise-grade characteristics of Azure, the ability of Cognitive Search to index, understand and retrieve the right pieces of your own data across large knowledge bases, and ChatGPT's impressive capability for interacting in natural language to answer questions or take turns in a conversation."

[Click on image for larger view.] The New Approach (source: Microsoft).

Note, however, that the ChatGPT functionality in Azure OpenAI Service is in a preview stage, and that at this time the cloud service is available only to vetted Microsoft managed customers and partners who have been granted access.

For those organizations granted access, Microsoft said there were multiple techniques to use ChatGPT to respond with information utilizing corporate data.

One solution is to just inject the information into the prompt, but that approach is hindered by prompt data amount limitations, and enterprises likely would like responses based on gigabytes of data.

"The alternative is to keep all the data in an external knowledge base that can retrieve pieces quickly and with good relevance, exactly what Cognitive Search was designed for," Microsoft said. The post goes on to explain exactly how that is done.

The company, however, noted that the large language models (LLMs), prompts and orchestrations used could have errors, so the responses should include the right information for enterprise users to validate. Microsoft posted three experiences as a starting point to do that:

  • Citations: Each statement in the response includes a citation with a link to the source content. You can see the citations in context (the superscript numbers) as well as the links at the bottom. When you click on one, we display the original content so the user can inspect it.
  • Supporting content: Each response or chat bubble generated by ChatGPT has an option (notebook icon) for displaying all the original content that was fed into the prompt as facts.
  • Orchestration process: Also present in each response or chat bubble, we include an option (lightbulb icon) to see the entire interaction process, including intermediate results and generated prompts.

Effective prompt design has emerged as a new tech specialty role, and Microsoft listed some starting points for prompt and query generation, with references to literature for those interested in more detail, available in the blog post.

In view of Azure OpenAI Service's limited access, many organizations can't use the method above. So, keeping in mind that September 2021 knowledge cutoff date for ChatGPT, we asked ChatGPT itself "how can organizations use ChatGPT with responses based on the organization's own data?" So the following response can be considered the "old" way of doing things, though there might be newer techniques published after September 2021.

Organizations can use ChatGPT to provide intelligent and personalized responses to their customers by training the model on their own data. Here's how:

Collect relevant data: Gather data from various sources such as customer feedback, support tickets, chat logs, and social media interactions. The data should cover a wide range of topics and scenarios that are relevant to your business.

Clean and prepare the data: The data needs to be cleaned and organized in a way that ChatGPT can understand. This involves removing any irrelevant information, correcting spelling and grammar mistakes, and labeling the data with appropriate categories or tags.

Fine-tune the ChatGPT model: Once the data is ready, the organization can use it to fine-tune the ChatGPT model. This involves training the model on the organization's data, adjusting the hyperparameters, and testing the performance of the model.

Integrate ChatGPT into the organization's systems: Once the model has been fine-tuned, it can be integrated into the organization's systems such as the website, chatbot, or customer support portal. Customers can then interact with ChatGPT and receive personalized responses based on the organization's own data.

Continuously monitor and improve the model: The organization should continuously monitor the performance of the ChatGPT model and make improvements based on customer feedback and new data. This will ensure that the responses provided by ChatGPT are accurate, relevant, and helpful to the customers.

Microsoft's new AI-powered Bing search experience has access to much more current data, so we asked it the same question. It dutifully responded with the new Azure OpenAI Service/Azure Cognitive Services approach, complete with references for the response-backing data:

ChatGPT is a chatbot developed by OpenAI and launched in November 2022(1).

It is built on top of OpenAI's GPT-3 family of large language models and has been fine-tuned using both supervised and reinforcement learning techniques(2).

Organizations can use ChatGPT with responses based on their own data by integrating it with Azure Cognitive Search(3) [this is yesterday's post mentioned above], which can index, understand and retrieve relevant pieces of data across large knowledge bases. This way, ChatGPT can interact in natural language to answer questions or take turns in a conversation using the organization's data as a source.

However, ChatGPT may not always be accurate or up-to-date since it only uses data through 2021(4) and does not incorporate real-time information for some topics(5).

Organizations should also be aware of the potential data security risks of using ChatGPT outside of their network(6).

For the current method available only to vetted Microsoft managed customers and partners, the company has set up a ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search GitHub repo that provides examples and other information to get started.

About the Author

David Ramel is an editor and writer for Converge360.


Subscribe on YouTube