Say what you will? Your favorite speech-to-text app may be a privacy risk

ESET Expert
Jan 3, 2024
4 min read

Typing with your voice? It should go without saying that you need to take some precautions and avoid spilling your secrets.

Software that swiftly and effortlessly converts spoken words into written text has been a boon for many of us. Its capabilities come in handy in various situations; for example, they can save us from the burden of typing our messages in chat apps, facilitate note-taking during meetings and interviews, and assist people with disabilities.

On the other hand, the proliferation of AI-powered audio-to-text transcription software continues to raise security and privacy concerns – and with good reason. In this article, we’ll look at some key security considerations associated with these apps and recommend simple steps for mitigating potential risks.

Risks associated with apps for transcribing audio

Privacy

There are number of dedicated applications and bots that offer automated audio-to-text transcription. Indeed, at least some of this kind of functionality is also baked into many devices and their operating systems, as well as into popular chat and video conferencing apps.

The features, which rely on speech recognition and machine learning algorithms, can be provided either by the company behind the app or, especially where efficiency and speed is of the essence, by a third-party service. Particularly the latter also raises a slew of questions regarding data privacy, however.

Will the audio be used to improve the algorithm? Will it be stored on servers, either in-house or third-party, during the processing of the content? How is the transmission of this information secured, especially in cases where the audio processing is outsourced?

Meanwhile, manual transcription, which is performed by humans, clearly isn’t without its privacy risks either. This is particularly the case if the people transcribing the audio learn about people’s confidential information and/or if such information is shared with third-party contractors without users’ consent. For example, Facebook (now Meta) faced controversy in 2019 for paying hundreds of contractors to transcribe audio messages from the voice chats of some users on Messenger.

Data collection and storage

Many apps of all kinds request permissions to access various device or user information, such as location, contacts, chats in messaging apps – regardless of whether they need such permissions for their functionality. The collection of this information poses a risk if it is misused, shared with third parties without the user’s informed consent, or if it is not properly secured on the servers of the company storing it.

Audio transcription apps, for example, tend to collect audio files that often capture the spoken words of not just one person, but possibly also that of their relatives, friends and colleagues. In the end, they may make them vulnerable to cyberattacks or privacy breaches.

Malicious apps

If you’re in the market for speech-to-text software, you also need to watch out for fraudulent applications or chatbots. Cybercriminals, too, follow the latest trends, and given how popular this software has become, they could launch fake apps as a lure to compromise victims with malware.

These malicious apps may be copycats of legitimate applications, making it difficult for users to separate the wheat from the chaff. The bogus apps can be very successful in their malevolent mission you don’t check the app’s legitimacy or who is behind it, let alone examine its privacy policy.

Cybercriminals have been spotted rolling out imposters for popular utility programs such as file converters and readers, video editors and keyboard apps. In fact, we have seen various malicious apps that claimed to offer various functionalities, from PDF and QR code readers to language translators and image editors.

Information theft

Stolen audio and text can be weaponized for cyberattacks, including those involving audio deepfakes that can then be leveraged for social engineering attacks or the distribution of fake news.

The process would generally involve two steps: training the machine learning model and using the model itself.

In the first step, the model uses audio signal processing and natural language processing techniques to learn how words are pronounced and how sentences are structured. Once the model is trained with enough data, it would be able to generate text from an audio file.

An attacker could use the model to manipulate stolen audios and make victims say things they never said, including to blackmail, extort or impersonate them in order to trick their employers or relatives. They could also pose as a public figure to generate fake news stories.

Staying safe

Use trusted platforms

Use verified service providers that adhere to regulations such as GDPR and industry best practices, and source your apps from official mobile app stores. In other words, steer clear of unknown or unverified sources may expose you to malicious impostors.

Read the fine print

Examine the privacy policies of service providers, paying particular attention to sections about whether your speech data is stored and shared with third parties, who has access to it, and whether it is encrypted during transmission and in storage. Enquire about their data retention policies, as well as about whether any of your information is deleted on request. Ideally, you wouldn’t use services that collect such data or where the data is not anonymized.

Avoid sharing sensitive information

Refrain from sharing confidential or sensitive details, notably things like passwords or financial information, through speech-to-text software.

Update

Keep all your software up-to-date with the latest security updates and patches. to avoid falling victim to attacks exploiting vulnerabilities in the software. To further boost your protection, use reputable multi-layered security software.

Stay Ahead of Emerging Threats