OpenAI has become the latest firm in the AI space to offer multimodal inputs in its chatbot. While competitors like Google and Anthropic continue to offer stiff competition to pioneering AI companies with advanced models like Gemini and Claude 2, OpenAI is looking to get ahead in the game with its latest string of updates—giving ChatGPT the ability to listen to voice prompts and accept images from users during interaction. The updates will promote a more holistic interaction between the user and the chatbot, paving the path for an easier prompting process. The update to the framework applies to both the GPT-3.5 and the GPT-4 language models. However, the initial phase of the rollout will be available only to ChatGPT Plus and Enterprise customers, following which OpenAI intends to roll out these features even to free users across the globe. 

The launch of Dall-E 3 and the integration of Dall-E’s framework with ChatGPT also come at an interesting time, since OpenAI is finally going multimodal with their flagship offerings. With the firm looking to woo more corporate and enterprise customers, the ability to upload images and prompt the chatbot with voice commands adds better operability and ease of access. Given that ChatGPT already has its iOS and Android applications out in the market, the firm has currently provided opt-in options for handheld devices. As the rollout enters the broader market, users will be able to enter a new era of AI interaction. The upcoming sections throw more light on the latest multimodal AI attributes of ChatGPT.

Behind ChatGPT’s Voice and Image Updates

A woman wearing headphones using an AI application on her mobile phone

The voice and image capabilities will essentially transform the nature of prompts.

ChatGPT’s voice feature, apart from allowing the user to speak their prompt, is also capable of enabling the chatbot to respond through audio as well. Currently, ChatGPT offers users the ability to select from five impressive and human-like synthetic voices that read out responses to the user. This can go a long way in the run-up to highly efficient AI voice assistants and in creating assistive AI technologies. The voice update runs on two different models that convert speech to text and vice versa. The former is performed by Whisper, while the text-to-speech conversion for ChatGPT’s read-aloud feature is performed by another OpenAI model. Based on a demonstration of the features by OpenAI shortly before the release, the model will subsequently also allow users to create their own voices on the application and experiment with it. These attributes enable OpenAI to use its flagship models to create an application that can sit well with most users and be turned into consumer technologies.

As for the image feature, the option enables users to click a picture and upload it to the chat interface. Subsequently, ChatGPT analyzes the images and carries out a task based on the user’s prompt. Users can even use the draw tool on their touchscreen devices to circle or highlight important portions of an image they’d like ChatGPT to remain focused on. Since the chatbot is still prone to pitfalls like bias and hallucination, OpenAI has limited certain aspects of the image analyzer to prevent the chatbot from making remarks about individuals in an image. The feature might eventually be useful in further enhancing Dall-E, given that it remains a prime competitor to rivals like Midjourney and Stable Diffusion.

Why These Updates Change the Nature of ChatGPT Prompting and More

A person using ChatGPT on their laptop

ChatGPT aims to enhance user engagement with its latest updates.

Ever since chatbots grew increasingly popular, numerous users and tech professionals realized that engineering prompts often resulted in better responses. The arrival of visual and audio prompts presents an opportunity for both enthusiasts and lay users to get more experimental with their beloved AI platform. The updates are expected to be rolled out in a staggered manner over the weeks to various users across the globe. While the image feature was already teased with the launch of GPT-4 earlier this year, the voice update is rather novel and is bound to make the chatbot more accessible and raise its points on the equitable AI scale. However, the number of languages ChatGPT will recognize or speak is still not clear. The voice and image updates also come with a fair degree of risk, since individuals with ulterior motives might attempt new ways to jailbreak the chatbot using these updates. Voice inputs do provide more leeway and wiggle room for complex prompts, while images can be used to manipulate the underlying LLM. 

Regardless, OpenAI has been enhancing its AI safety credentials in recent times and building ever-more robust frameworks to protect against prompts and functions that go against company policy. More interestingly, OpenAI has also partnered with other major names in the consumer tech industry in this launch. Spotify, alongside OpenAI, released a new feature that enables podcasters to translate their content from English to other languages such as Spanish, German, and French. The feature purportedly uses OpenAI’s language models to operate. What’s more interesting is that the podcasters will get to keep their own voice in the audio, as opposed to a synthetic one, making the listener experience better and more intuitive. These updates have already begun changing the landscape of not only OpenAI’s own chat interface but also beyond, partnering with known names and for intuitive functions.

Outlook for the All-New ChatGPT

A vector representation of an artificial face titled “AI”

ChatGPT’s updates will give it an upper hand in the market.

ChatGPT’s latest updates make it a highly useful AI interface. With the new updates, users can try out new interactions of prompts and other inputs to derive desired outcomes. While OpenAI has been cautious enough to release the update only after considerable testing, users will still have to watch out for bugs and potential malfunctions. The updates are positioned to keep users interested at a time when ChatGPT’s quality seems to be dipping, and the developers are looking to create useful consumer-facing features and support enterprise partners to create stable AI applications. The subsequent weeks will witness higher ChatGPT usage and a reignition of the rivalries between various AI firms to remain at the top of the highly competitive AI market.

 

FAQs

1. Are the new voice and image features added in the new ChatGPT update free?

The new features will be released initially only to paid ChatGPT Plus subscribers. However, OpenAI has mentioned that these features will also be released to free users in the coming months. 

2. Can one speak to ChatGPT through mobile devices?

While the feature will be available on all desktop interfaces, the speech-to-text option will only be available to Android and iOS users who specifically opt-in for it on mobile devices. 

3. Can ChatGPT generate images?

Yes, ChatGPT can now generate images following its integration with Dall-E 3, which allows users to avoid prompts.