An intriguing new product that emerged from this year’s Microsoft Ignite conference is a tool that allows users to make a lifelike avatar of themselves and animate it to say things that the actual person may not have said.
Known as Azure AI Speech text-to-speech avatar, this new tool allows users to create films of an avatar speaking by writing a script and inputting photographs of the person they want the avatar to resemble. It is now available for public preview. A second text-to-speech model that is either prebuilt or trained on the user’s voice reads the script aloud in conjunction with the Microsoft technology that drives the animation.
As stated by Microsoft in a blog post, “With text-to-speech avatars, users can more efficiently create videos… to build training videos, product introductions, customer testimonials, and so on, simply with text input.” “Conversational agents, virtual assistants, chatbots, and more can be built with the avatar.”
Avatars are multilingual speakers of languages. In chatbot instances, they can also use AI models, such as OpenAI’s GPT-3.5, to reply to clients’ off-script inquiries.
Microsoft acknowledges that there are a plethora of ways in which this kind of technology might be exploited. (Pro-China social media accounts and propaganda in Venezuela have used comparable avatar-generating technology from the AI company Synthesia to fabricate news stories.) personalized avatars are now a “limited access” feature accessible by registration only and “only for certain use cases,” according to Microsoft. Most Azure members can only access prebuilt, not personalized, avatars at launch.
However, the function brings up several awkward moral dilemmas. Digital likeness creation using AI was one of the main issues that sparked the current SAG-AFTRA strike. In the end, studios decided to compensate performers for their AI-generated likenesses. However, what about Microsoft’s clientele?
I wanted to know Microsoft’s stance on businesses utilizing actors’ likenesses without giving them fair credit or even notice as far as the performers are concerned. The corporation remained silent and did not clarify if it would force businesses, such as YouTube and an increasing number of other platforms, to identify avatars as artificial intelligence (AI) creations.
Individual voice
Microsoft seems to be placing further restrictions on personal voice, a related generative AI technology that will also be shown at Ignite. When given a one-minute speech sample as an audio prompt, Microsoft’s proprietary neural voice service’s new feature, personal voice, can mimic a user’s voice in seconds. Microsoft markets it as a tool for making customized voice assistants, translating video into several languages, and producing original audio for podcasts, audio novels, and tales.
Microsoft is requesting “explicit consent” from users in the form of a recorded statement before allowing them to utilize personal voice synthesizing to avoid legal issues. Customers agree to use a unique voice only in apps “where the voice does not read user-generated or open-ended content.” Access to the function is now restricted behind a registration form.
Microsoft states in a blog post that “voice model usage must remain within an application, and output must not be publishable or shareable from the application.” “Customers who meet the requirements for limited access have complete control over the creation, distribution, and use of voice models and their output [in the context of] dubbing for movies, TV, videos, and audio for entertainment purposes only.”
Microsoft declined to comment when TechCrunch asked about potential payment methods for performers who contributed their voices and any watermarking technology that would make it easier to distinguish AI-generated sounds.