DALL-E 3, OpenAI’s text-to-image model, is now accessible via an API, following its initial integration with ChatGPT and Bing Chat. This latest API iteration retains built-in moderation features, designed to safeguard against misuse. Offering various format and quality options, it covers resolutions from 1024×1024 to 1792×1024, with pricing starting at $0.04 per generated image. However, it's worth noting that compared to its predecessor, the DALL-E 3 API has some limitations. Unlike DALL-E 2, it can't be used to modify existing images or create variations, and generation requests sent to DALL-E 3 are subject to automatic rewriting "for safety reasons" and to add more detail, potentially impacting precision based on the prompt.
OpenAI has also introduced an Audio API, providing access to six preset voices: Alloy, Echo, Fable, Onyx, Nova, and Shimer, along with two generative AI model variants. This API, available today, comes with pricing starting at $0.015 per 1,000 input characters. OpenAI's CEO, Sam Altman, emphasized the naturalness of the audio generated, noting its potential to enhance app interactions, accessibility, language learning, and voice assistance. However, it's important to note that OpenAI doesn't offer emotional affect control over the generated audio, though certain factors in the text, such as capitalization and grammar, may influence the voices' tone.
Furthermore, OpenAI announced the latest version of its open-source automatic speech recognition model, Whisper large-v3. This update boasts enhanced performance across various languages and is accessible via GitHub under a permissive license.
These API releases reflect OpenAI's commitment to expanding the accessibility and utility of advanced AI technologies. As the technology landscape continues to evolve, OpenAI is at the forefront of empowering developers and creators with innovative tools.