Key Takeaways
- The use of MS Office automation and the ChatGPT API can greatly enhance the editing process by automating the extraction and incorporation of editorial comments, as well as leveraging AI-based suggestions for improving the text.
- We can build a tool in pure C++ that automates the editing workflow, while scanning and extracting editorial comments from a Word file, storing them in a database, and using ChatGPT to generate targeted questions for improving the text.
- Enumerating comments allows us to not only retrieve the comment text but also the associated text segment, providing the necessary context for understanding the purpose of each comment.
- When working with web APIs, we rely on a versatile code structure that enables us to send requests and handle responses using the JSON data format. To facilitate this process, we utilize libCurl, a robust tool widely utilized for data transfer across various networks.
- Several building blocks are used to accomplish the task: ChatGPT API, a generic function, OLE automation, as well as several other components.
While using ChatGPT through a web interface is one thing, creating your own autonomous AI tool that interfaces with ChatGPT via its API, is a different story altogether - especially when you aim to maintain complete control over the interaction with the user. At the same time, as strong proponents of C++, we believe that a GPT tool in C++ will ease the pain of dealing with the daunting task of editing (endless) editorial comments.
General idea
We aim to explore the realm of MS Office automation and leverage the ChatGPT API to enhance the editing process. We envision a sophisticated tool that seamlessly integrates C++ with the ChatGPT API, providing a new way to interact with editorial comments in Word documents.
Traditional document editing involves manually reviewing content and adding comments to specific sections. In our case, as we worked on our C++ book, we encountered over 100 editorial comments each time, most of which related to the publisher’s style guide and annotations. It would have been helpful to have a way to store these comments and associated text in a database, not to mention the potential for AI-based editing. That's precisely what our software accomplishes: by automating this process, we can expedite the editing workflow. While this tool serves as proof of concept (POC) and is not recommended for writing and editing entire books, it still presents an exciting exercise in automation and is certainly worth trying.
How it’s done
The workflow begins with our software scanning the Word file, meticulously examining each editorial comment embedded within the document using Office Automation API.
Once enumerated all comments, our tool extracts them along with the associated text segments and stores them in a sqlite3 database. Based on this, it prepares targeted questions for ChatGPT revolving around how to improve or fix a particular section of text. By leveraging the ChatGPT API, we can tap into the language model's vast knowledge and linguistic prowess to obtain expert suggestions and recommendations.
Upon receiving a response from ChatGPT, our tool dynamically incorporates the suggested edits into the associated text segments, seamlessly enhancing the content based on the model's insights.
This automated editing process significantly reduces manual effort and accelerates overall document refinement. Our tool even tracks the changes but remembers to turn 'track changes' off, when done.
Programming-wise, there are several building blocks in our project, and some of them can be expanded or replaced to serve different purposes. Let’s call our code Proof of Concept.
The building blocks
Here are the players involved in the process - our building blocks:
Chat GPT API
Our tool interfaces and interacts with ChatGPT by utilizing various parameters and approaches. We prepare payloads to be sent to the API and parse the responses. To use our tool, you must obtain an API key and add it to our code instead of "<Your-API-key>
". Here is a code snippet demonstrating the basics of interfacing with ChatGPT.
The advantage of using the API includes being able to interface and interact with Chat GPT, using different parameters and approaches, preparing payloads to be sent to the API, and parsing the response we get back.
When using ChatGPT API, there are several things to take into consideration.
Our Generic Function
For the purpose of this article, we created a generic function. That function is modular as it generates requests with modular attributes and parameters in the following form:
data =
{
{"messages", json::array({{ {"role", "user"}, {"content", entire_converstaion_string} }})},
{"model", model},
{"temperature", temperature},
{"max_tokens", max_tokens},
{"n", n},
{"stop", stop}
};
Let’s go over some issues and requirements along with these attributes:
- "
messages
"- defines the conversation history between the user and the model. Each message in the conversation consists of two properties: "role" (which can be "system", "user", or "assistant") and "content" (the actual text of the message). For the purpose of this article, we used "user
" - "
model
" - allows you to specify which version of the ChatGPT model you want to use. for the purpose of this article we used "gpt-3.5-turbo
" - "
temperature
" - can be set to control the level of similarity between the generated text and the prompt. For example, a high-temperature value can be used to generate text that is more different from the prompt, while a low-temperature value can be used to generate text that is more similar to the prompt. This can be useful in situations where the goal is to generate text that is similar to a given input but with some level of variation or "creativity." - "
max_tokens
" – is the maximum number of tokens to be used for each request. The number of tokens processed depends on the length of the input and output text.- 1-2 sentences ~= 30 tokens
- 1 paragraph ~= 100 tokens
- 1,500 words ~= 2048 tokens
As a user of ChatGPT API, you will be charged for the tokens you consume.
Model |
Price for 1000 tokens (prompt) |
Price for 1000 tokens (completion) |
ChatGPT | $0.0020 | $0.0020 |
- "
n
" controls how many responses the model should provide; it is set to one, a single response, by default. - "
stop
" indicates the string that should trigger the model to stop generating its response. Set to newline by default. This means that when the model encounters a new line in its output, it will stop generating after that.
Our Prompt
We always like to say that the significance of well-structured prompts cannot be overstated. A carefully constructed prompt acts as a guiding blueprint, influencing the quality of the generated output. In this article, we will delve into the components of an effective prompt and offer practical examples and guidelines to help C++ students maximize the potential of ChatGPT API in their projects.
Here is an example:
// Setting up a prompt for GPT request
wstring prompt{ L"I will send you some text, and an associated comment that tells what changes need to be made in the text. Please start your response with 'Changed text: ' followed by the actual updated text. Here is the original text: '" };
prompt += rangeText;
prompt += L"'. And here is the associated comment suggesting the change: '";
prompt += commentText;
prompt += L"'. Please do not respond with anything else, only include the changed text and nothing else. If you do not have the correct answer or don't know what to say, respond with these exact words: 'I do not understand";
//
When you compose a prompt, it is best to create a template containing the constant parts of the requests you will use throughout the program and then change the variable parts based on the immediate need. Here are some key building blocks for a good prompt:
Context:
Context serves as the groundwork for the prompt, offering crucial background information. It enables the Language Model to grasp the task's essence. Whether it entails a concise problem description or a summary of pertinent details, providing context is pivotal.
Example:
"You are a software developer working on a mobile app for a food delivery service. The app aims to provide a seamless experience for users to order food from local restaurants. As part of the development process, you need assistance generating engaging and informative content about the app's features."
Task:
The task defines the precise goal or objective of the prompt. It should be clear, concise, and focus on the specific information or action expected from the ChatGPT model.
Example:
"Compose a short paragraph that highlights the app's key features and showcases how they enhance the food delivery experience for customers."
Constraints:
Constraints set boundaries or limitations for the prompt. They may encompass specific requirements, restrictions on response length or complexity, or any other pertinent constraints. By defining constraints, you can guide the generated output toward the desired outcome.
Example:
"The response should be concise, with a maximum word count of 150 words. Focus on the most prominent features that differentiate the app from competitors and make it user-friendly."
Additional Instructions:
In this section, you have the opportunity to provide supplementary context or specify the desired output format. This can include details regarding the expected input format or requesting the output in a specific format, such as Markdown or JSON.
Example:
"Please format the response as a JSON object, containing key-value pairs for each feature description. Each key should represent a feature, and its corresponding value should provide a brief description highlighting its benefits."
By understanding and implementing these fundamental components, C++ developers can master the art of constructing effective prompts for optimal utilization of the ChatGPT API in their projects. Thoughtfully incorporating context, defining clear tasks, setting constraints, and providing additional instructions will enable developers to achieve precise and high-quality results.
Continuous chat
In most cases, we would like to be able to continue a conversation from where you left off last time. There is a special flag used by Chat GPT API to allow that. If it isn’t set, here is what will happen:
➢ What is the capital of France?
Request payload: '{"messages":[{"content":"what is the capital of france?","role":"user"}],"model":"gpt-3.5-turbo"}'
Callback JSON: '{"id":"chatcmpl-7AlP3bJX2T7ibomyderKHwT7fQkcN","object":"chat.completion","created":1682799853,"model":"gpt-3.5-turbo-0301","usage":{"prompt_tokens":15,"completion_tokens":7,"total_tokens":22},"choices":[{"message":{"role":"assistant","content":"The capital of France is Paris."},"finish_reason":"stop","index":0}]}
Your AI friend responds:
➢ The capital of France is Paris.
Then comes a follow-up question:
➢ How big is it?
Request payload: '{"messages":[{"content":"How big is it?","role":"user"}],"model":"gpt-3.5-turbo"}'
Callback JSON: '{"id":"chatcmpl-7AlPAabscfyDrAV2wTjep0ziiseEB","object":"chat.completion","created":1682799860,"model":"gpt-3.5-turbo-0301","usage":{"prompt_tokens":13,"completion_tokens":20,"total_tokens":33},"choices":[{"message":{"role":"assistant","content":"I apologize, but I need more context to accurately answer your question. What are you referring to?"},"finish_reason":"stop","index":0}]}
➢ I apologize, but I need more context to accurately answer your question. What are you referring to?
To fix that, we need to maintain a continuous chat, but how do we do that? In fact, the only way to do that must include passing back and forth a string containing the entire conversation.
string entire_converstaion_string;
We also define:
using Conversation = vector<SingleExchange>;
which is defined as
using SingleExchange = pair<string, string>;
In our source code, you can see how we maintain our Conversation object up to a fixed length (as, clearly, we can’t store endless conversations). This fixed length is set here:
int conversation_exchange_limit{ 100 };
As already mentioned, our prompt plays a key role in the efficiency of the request, and when it comes to continuous chats, we may want to use a different prompt:
string prompt_start{ "You are a chatbot. I want to have a conversation
with you where you can remember the context between multiple requests. To do
that, I will send all previous request prompts and your corresponding
responses, please use those as context. All the previous request prompts and
the current will have the 'request: ' before it, and all your corresponding
responses will have 'response: ' before it. Your job is to respond to only the
latest request. Do not add anything by yourself, just respond to the latest
request. Starting now\n\n" };
Multi-part response
When you ask your AI friend:
➢ Write me a C++ code that counts from 1 to 10
You may get just that:
➢ Sure, here's the C++ code to count from 1 to 10:
Without any source code.
Here is why: The stop parameter sent to the API lets the model know at what point of its output it should stop generating more. The newline is the default when nothing is specified, and it means that the model stops generating more output after the first newline it outputs.
But if you set the "stop" parameter to an empty string, you will get the full response including the source code:
[Click on the image to view full-size]
About OLE Automation
OLE Automation is a technology introduced by Microsoft in the past that has since evolved. In our implementation, we utilize Microsoft automation directly, bypassing the use of MFC (Microsoft Foundation Classes). To access various elements of MS Word, such as documents, active documents, comments, etc., we define an IDispatch COM interface for each object we need to interact with.
Office Automation
Our tool automates various tasks and features within MS Word. It can read comments, find associated text, turn on/off "Track Changes," work in the background, replace text, add comments, save the result, and close the document. Here is a description of the functions we use:
OLEMethod()
: A helper function that invokes a method on an IDispatch
interface, handling method invocations and returning HRESULT
values indicating errors.
Initialize()
: A function that initializes the OfficeAutomation
class by creating an instance of the Word application and setting its visibility. It initializes the COM
library, retrieves the CLSID
for the Word application, creates an instance of the application, and sets its visibility.
OfficeAutomation()
: The constructor of the OfficeAutomation
class. It initializes member variables and calls the Initialize
function with false to create a non-visible Word application instance.
~OfficeAutomation()
: The destructor of the OfficeAutomation
class. It does nothing in this implementation.
SetVisible()
: A function that sets the visibility of the active document. It takes a boolean parameter to determine whether the document should be visible or not. It uses the OLEMethod
function to set the visibility property of the Word application.
OpenDocument()
: A function that opens a Word document and sets its visibility. It takes a path to the document and a boolean parameter for visibility. It initializes the class if necessary, retrieves the Documents interface, opens the specified document, and sets its visibility.
CloseActiveDocument()
: A function that closes the active document. It saves the document and then closes it. It uses the OLEMethod
function to call the appropriate methods.
ToggleTrackChanges()
: A function that toggles the "Track Revisions" feature of the active document. It gets the current status of the feature and toggles it if necessary. It uses the OLEMethod
function to access and modify the "TrackRevisions" property.
FindCommentsAndReply()
: A function that finds all comments in the active document, sends a request to the ChatGPT API for suggestions, and updates the associated text of each comment based on the API response. It iterates through each comment, retrieves the associated text range, sends a prompt to the ChatGPT API with the text and comment as context, receives the API response, and updates the text range with the suggested changes.
CountDocuments()
: A function that returns the number of open documents in the Word application associated with the OfficeAutomation class. It retrieves the Documents interface and returns the count.
Handling comments
When developing a mechanism that will go over comments, we need to be able to enumerate all comments and distinguish between resolved ones and non-resolved ones.
That is done the following way:
bool IsCommentResolved(IDispatch* pComment)
{
// Check if the comment is resolved
VARIANT isResolved;
VariantInit(&isResolved);
HRESULT hr = OLEMethod(DISPATCH_PROPERTYGET, &isResolved, pComment, (LPOLESTR)L"Done", 0);
if (FAILED(hr))
{
ShowError(hr);
return false;
}
bool resolved = (isResolved.vt == VT_BOOL && isResolved.boolVal == VARIANT_TRUE);
return resolved;
}
As you can see, using OLEMethod()
along with DISPATCH_PROPERTYGET
, allows us to check the property name "Done" which will indicate resolved comments.
Enumerating comments
Next, we can just enumerate all comments in the document, and maybe print the "Resolved" status per each of these comments.
Before we start, we would want to not just enumerate the comments, but also the text associated with them. The reason for that is laid on the initial purpose of commenting. The author of a document composes and edits the document. The editor marks a segment, which can be a paragraph, sentence, or even a word, and adds a comment. When we read a comment, we need the context of that comment, and the context would be that marked segment.
So when we enumerate all comments, we do not just print the comment’s text but also the text associated with it (our segment).
When we start going over all comments, we need to declare and initialize 2 pointers:
pComments
– points to the document’s comments.
pRange
– points to the document’s content (the segment that holds the text associated with the comment).
Each of them is initialized:
{
VARIANT result;
VariantInit(&result);
m_hr = OLEMethod(DISPATCH_PROPERTYGET, &result, m_pActiveDocument, (LPOLESTR)L"Comments", 0);
if (FAILED(m_hr))
{
ShowError(m_hr);
return m_hr;
}
pComments = result.pdispVal;
}
{
VARIANT result;
VariantInit(&result);
m_hr = OLEMethod(DISPATCH_PROPERTYGET, &result, m_pActiveDocument, (LPOLESTR)L"Content", 0);
if (FAILED(m_hr))
{
ShowError(m_hr);
return m_hr;
}
pRange = result.pdispVal;
}
Then we can start our loop to iterate through all comments in the document.
You can see how that’s done in our source code, but generally speaking, we start with the comment, go to the associated text, and check if the comment is resolved. Then we can either print it to a report, add it to a database, or send it to Chat GPT API.
General Code for API Interfacing
To interface with any API over the web, we employ general code that facilitates sending requests and parsing responses using the JSON data format. In this process, we utilize libCurl, a powerful tool widely used for transferring data across the network using the command line or scripts. It has extensive applications across different domains, including automobiles, televisions, routers, printers, audio equipment, mobile devices, set-top boxes, and media players. It serves as the internet transfer engine for numerous software applications, with billions of installations.
If you check our source code, you can see how libCurl is used.
To sum up
By utilizing the power of MS Office automation and integrating it with the ChatGPT API, we empower editors and writers to streamline their workflow, saving valuable time and improving the quality of their work. The synergy between C++ and the ChatGPT API facilitates smooth and efficient interaction, enabling our tool to provide intelligent and context-aware recommendations for each editorial comment.
As a result, our small MS Office automation POC tool, powered by the ChatGPT API and C++, revolutionizes the editing process. By automating the extraction of editorial comments, interacting with ChatGPT to seek expert guidance, and seamlessly integrating the suggested edits, we empower users to enhance the quality and efficiency of their work in Word documents. This powerful combination of technologies opens new possibilities for efficient document editing and represents a significant leap forward in the field of MS Office automation.