Microsoft data leak: Microsoft AI researchers accidentally leaked dozens of terabytes of data, including private keys and passwords, while sharing a bucket of open-source training data on GitHub. What’s worrisome is that this data leak had been happening since July 2020 and it’s only recently that the company fixed this issue.
According to a report by a security research firm Wiz, Microsoft’s AI research team, while publishing a bucket of open-source training data on GitHub, accidentally exposed 38 terabytes of private data, which includes secrets, private keys, passwords, over 30,000 internal Microsoft Teams messages and a disk backup of two employees’ workstations.
Wiz researchers found a repository belonging to Microsoft’s AI research division. This repository had been created to provide open-source code and AI models for image recognition or in other words it had been created to help to provide AI models for use in training code. All those who were working on developing and training the AI model and needed the access to this repository were asked to download the code from an Azure Storage URL. All those who downloaded the code from the URL were given access to something called a ‘Shared Access Signature (SAS) token’. For the unversed, the access level can be customised by the user and that the permissions range between read-only and full control. The expiry time is also completely customisable, allowing the user to create never-expiring access tokens.
According to a report by the Wiz researchers, the Azure Storage URL not only gave developers the access to the open-source models, but it also granted them permissions on the entire storage account. “In addition to the overly permissive access scope, the token was also misconfigured to allow “full control” permissions instead of read-only. Meaning, not only could an attacker view all the files in the storage account, but they could delete and overwrite existing files as well,” Wiz wrote in a blog post.
“Our scan shows that this account contained 38TB of additional data — including Microsoft employees’ personal computer backups. The backups contained sensitive personal data, including passwords to Microsoft services, secret keys, and over 30,000 internal Microsoft Teams messages from 359 Microsoft employees,” the company added.
The silver lining in the matter is that the Azure storage account wasn’t directly exposed to the public. Instead, it was a private storage account. Wiz researchers reported the issue to Microsoft in June 2023 and it was quickly resolved in two days.
Microsoft, in a post on its MSRC blog, confirmed the issue. It also said that no user data was put at risk and that users don’t need to take any action on their part. “No customer data was exposed, and no other internal services were put at risk because of this issue. No customer action is required in response to this issue,” Microsoft wrote in a blog post.
Author Name | Shweta Ganjoo