Unable to load image

Windows Recall AI snapshots are stored in an unencrypted SQL file in appdata. Someone github already has a python script to extract this and others to forcibly install Recall on unsupported devices. :marseyscream:

This tool extracts and displays data from the Recall feature in Windows 11, providing an easy way to access information about your PC's activity snapshots.

On May, 20th 2024 Microsoft announced it's new Copilot+ PCs running on ARM architecture.

With this, they also announced Windows Copilot+ Recall which will be released on 18. June 2024.

Retrace your steps with Recall Search across time to find the content you need. Then, re-engage with it. With Recall, you have an explorable timeline of your PC's past. Just describe how you remember it and Recall will retrieve the moment you saw it. Any photo, link, or message can be a fresh point to continue from. As you use your PC, Recall takes snapshots of your screen. Snapshots are taken every five seconds while content on the screen is different from the previous snapshot. Your snapshots are then locally stored and locally analyzed on your PC. Recall's analysis allows you to search for content, including both images and text, using natural language. Trying to remember the name of the Korean restaurant your friend Alice mentioned? Just ask Recall and it retrieves both text and visual matches for your search, automatically sorted by how closely the results match your search. Recall can even take you back to the exact location of the item you saw.

Requirements

To run or use this feature, you need to have one of the new CoPilot+ PCs running on ARM. Some of them can be found here

How can I play with it if it's not released yet?

Some smart folks released AmperageKit, which shows how you can either emulate such an ARM machine locally or spin one up on Azure. I opted for the latter.

Technical Details

Earlier this month, Microsoft's CEO emailed all their staff saying “If you're faced with the tradeoff between security and another priority, your answer is clear: Do security.”

So, do they? Not quite. Windows Recall stores everything locally in an unencrypted SQLite database, and the screenshots are simply saved in a folder on your PC. Here's where you can find them:

C:\Users$USER\AppData\Local\CoreAIPlatform.00\UKP{GUID}

The images are all stored in the following subfolder

.\ImageStore\

The database, ukg.db, is relatively straightforward in its structure, but it holds a wealth of information.

So what does the tool do?

TotalRecall copies the databases and screenshots and then parses the database for potentially interesting artifacts. You can define dates to limit the extraction as well as search for strings (that were extracted via Recall OCR) of interest. There is no rocket science behind all this. It's very basic SQLite parsing.

$ totalrecall.py -h

usage: totalrecall.py [-h] [--from_date FROM_DATE] [--to_date TO_DATE] [--search SEARCH]

Extract and display Windows Recall data.

options:

-h, --help show this help message and exit

--from_date FROM_DATE The start date in YYYY-MM-DD format.

--to_date TO_DATE The end date in YYYY-MM-DD format.

--search SEARCH

Search term for text recognition data.

Example Output

$ totalrecall.py --search password --from_date 2024-06-04 --to_date 2024-06-04

📁 Recall folder found: C:\Users\alex\AppData\Local\CoreAIPlatform.00\UKP{D87DDB65-90BE-4399-BB1B-5BEB0B1D12CB}

🟢 Windows Recall feature found. Do you want to proceed with the extraction? (yes/no): yes

📂 Creating extraction folder: C:\Users\alex\Downloads\TotalRecall\2024-06-04-13-49_Recall_Extraction

🪟 Captured Windows: 133

📸 Images Taken: 36

🔍 Search results for 'password': 22

📄 Summary of the extraction is available in the file:

C:\Users\alex\Downloads\TotalRecall\2024-06-04-13-49_Recall_Extraction\TotalRecall.txt

📂 Full extraction folder path:

C:\Users\alex\Downloads\TotalRecall\2024-06-04-13-49_Recall_Extraction

How TotalRecall Works

Data Extraction:

TotalRecall copies the ukg.db database and the ImageStore folder to a specified extraction folder. This ensures the original data remains intact while you explore the extracted data.

Database Parsing:

It parses the SQLite database to extract potentially interesting artifacts, such as window titles, timestamps, and image tokens. The tool looks for entries that match the criteria you specify (e.g., date range, search terms).

Screenshot Management:

TotalRecall renames the image files in the ImageStore folder with a .jpg extension if they don't already have one. This makes it easier to view and manage the screenshots.

Search Functionality:

You can search for specific terms within the database, leveraging the Optical Character Recognition (OCR) capabilities of Windows Recall. This means you can find text that appeared on your screen, even if it was within an image.

Output Generation:

The tool generates a summary of the extracted data, including counts of captured windows and images taken. It also creates a detailed report in a text file, listing all the captured data and search results.

Key Features

Date Filtering:

Specify start and end dates to limit the extraction to a particular time frame.

Text Search:

Search for specific text within the captured data, making it easy to find relevant information.

Comprehensive Reports:

Generate detailed reports summarizing the captured windows, images, and search results, all stored in a TotalRecall.txt file for easy reference.

TotalRecall provides a straightforward way to explore the data collected by Windows Recall. It's no rocket science whatsoever.

FAQ

Kevin Beaumont (@GossiTheDog) wrote a very good article about the Recall disaster as well with a spot-on FAQ that I will blatantly steal with his permission.

Q. The data is processed entirely locally on your laptop, right?

A. Yes! They made some smart decisions here, there's a whole subsystem of Azure AI etc code that process on the edge.

Q. Cool, so hackers and malware can't access it, right?

A. No, they can.

Q. But it's encrypted.

A. When you're logged into a PC and run software, things are decrypted for you. Encryption at rest only helps if somebody comes to your house and physically steals your laptop — that isn't what criminal hackers do.

For example, InfoStealer trojans, which automatically steal usernames and passwords, are a major problem for well over a decade — now these can just be easily modified to support Recall.

Q. But the BBC said data cannot be accessed remotely by hackers.

A. They were quoting Microsoft, but this is wrong. Data can be accessed remotely.

https://i.rdrama.net/images/1717692350698905.webp

This is what the journ*list was told for some reason:

https://i.rdrama.net/images/17176923506064596.webp

Q. Microsoft say only that user can access the data.

A. This isn't true, I can demonstrate another user account on the same device accessing the database.

Q. So how does it work?

A. Every few seconds, screenshots are taken. These are automatically OCR'd by Azure AI, running on your device, and written into an SQLite database in the user's folder.

This database file has a record of everything you've ever viewed on your PC in plain text. OCR is a process of looking an image, and extracting the letters.

Q. What does the database look like?

A:

Q. How do you obtain the database files?

A. They're just files in AppData, in the new CoreAIPlatform folder.

Q. But it's highly encrypted and nobody can access them, right?!

A. Here's a few second video of two Microsoft engineers accessing the folder:

Q. …But, normal users don't run as admins!

A. According to Microsoft's own website, in their Recall rollout page, they do:

https://i.rdrama.net/images/17176923505249557.webp

In fact, you don't even need to be an admin to read the database — more on that in a later blog.

Q. But a UAC prompt appeared in that video, that's a security boundary.

A. According to Microsoft's own website (and MSRC), UAC is not a security boundary:

https://i.rdrama.net/images/1717692350403281.webp

Q. So… where is the security here?

A. They have tried to do a bunch of things but none of it actually works properly in the real world due to gaps you can drive a plane through.

Q. Does it automatically not screenshot and OCR things like financial information?

A. No:

https://i.rdrama.net/images/17176923503231003.webp

Q. How large is the database?

A. It compresses well, several days working is around ~90kb. You can exfiltrate several months of documents and key presses in the space of a few seconds with an average broadband connection.

Q. How fast is search?

On device, really fast.

Q. Have you exfiltrated your own Recall database?

A. Yes. I have automated exfiltration, and made a website where you can upload a database and instantly search it.

I am deliberately holding back technical details until Microsoft ship the feature as I want to give them time to do something.

I actually have a whole bunch of things to show and think the wider cyber community will have so much fun with this when generally available.. but I also think that's really sad, as real world harm will ensue.

Q. What kind of things are in the database?

A. Everything a user has ever seen, ordered by application. Every bit of text the user has seen, with some minor exceptions (e.g. Microsoft Edge InPrivate mode is excluded, but Google Chrome isn't).

Every user interaction, e.g. minimizing a window. There is an API for user activity, and third party apps can plug in to enrich data and also view store data.

It also stores all websites you visit, even if third party.

Q. If I delete an email/WhatsApp/Signal/Teams message, is it deleted from Recall?

A. No, it stays in the database indefinitely.

Q. Are auto deleting messages in messaging apps removed from Recall?

A. No, they're scraped by Recall and available.

Q. But if a hacker gains access to run code on your PC, it's already game over!

A. If you run something like an info stealer, at present they will automatically scrape things like credential stores. At scale, hackers scrape rather than touch every victim (because there are so many) and resell them in online marketplaces.

Recall enables threat actors to automate scraping everything you've ever looked at within seconds.

During testing this with an off the shelf infostealer, I used Microsoft Defender for Endpoint — which detected the off the shelve infostealer — but by the time the automated remediation kicked in (which took over ten minutes) my Recall data was already long gone.

Q. Does this enable mass data breaches of website?

A. Yes. The next time you see a major data breach where customer data is clearly visible in the breach, you're going to presume company who processes the data are at fault, right?

But if people have used a Windows device with Recall to access the service/app/whatever, hackers can see everything and assemble data dumps without the company who runs the service even being aware. The data is already consistently structured in the Recall database for attackers.

So prepare for AI powered super breaches. Currently credential marketplaces exist where you can buy stolen passwords — soon, you will be able to buy stolen customer data from insurance companies etc as the entire code to do this has been preinstalled and enabled on Windows by Microsoft.

Q. Did Microsoft mislead the BBC about the security of Copilot?

A. Yes.

Q. Have Microsoft mislead customers about the security of Copilot?

A. Yes. For example, they describe it as an optional experience — but it is enabled by default and people can optionally disable it. That's wordsmithing.

Microsoft's CEO referred to “screenshots” in an interview about the product, but the product itself only refers to “snapshots” — a snapshot is actually a screenshot. It's again wordsmithing for whatever reason. Microsoft just need to be super clear about what this is, so customers can make an informed choice.

Q. Recall only applies to 1 hardware device!

A. That isn't true. There are currently 10 Copilot+ devices available to order right now from every major manufacturer:

https://www.microsoft.com/en-gb/windows/copilot-plus-pcs#shop

Additionally, Microsoft's website say they are working on support for AMD and Intel chipsets. Recall is coming to Windows 11.

Q. How do I disable Recall?

A. In initial device setup for compatible Copilot+ devices out of the box, you have to click through options to disable Recall.

In enterprise, you have to turn off Recall as it is enabled by default:

https://i.rdrama.net/images/17176923501077821.webp

WindowsAI Policy CSP - Windows Client Management

Learn more about the WindowsAI Area in Policy CSP

learn.microsoft.com

The Group Policy object for this has apparently been renamed (the MS documentation is incorrect):

Q. What are the privacy implications? Isn't this against GDPR?

A. I am not a privacy person or a legal person.

I will say that privacy people I've talked to are extremely worried about the impacts on households in domestic abuse situations and such.

Obviously, from a corporate point of view organisations should absolutely consider the risk of processing customer data like this — Microsoft won't be held responsible as the data processor, as it is done at the edge on your devices — you are responsible here.

Q. Are Microsoft a big, evil company?

A. No, that's insanely reductive.They're super smart people, and sometimes super smart people make mistakes. What matters is what they do with knowledge of mistakes.

Q. Aren't you the former employee who hates Microsoft?

A. No. I just wrote a blog this month praising them:

Breaking down Microsoft's pivot to placing cybersecurity as a top priority

My thoughts on Microsoft's last chance saloon moment on security

doublepulsar.com

Q. Is this really as harmful as you think?

A. Go to your parents house, your grandparents house etc and look at their Windows PC, look at the installed software in the past year, and try to use the device. Run some antivirus scans. There's no way this implementation doesn't end in tears — there's a reason there's a trillion dollar security industry, and that most problems revolve around malware and endpoints.

Q. What should Microsoft do?

A. In my opinion — they should recall Recall and rework it to be the feature it deserves to be, delivered at a later date. They also need to review the internal decision making that led to this situation, as this kind of thing should not happen.

https://github.com/xaitax/TotalRecall

https://github.com/thebookisclosed/AmperageKit

!chuds :marseywindows: :marseycry:

103
Jump in the discussion.

No email address required.

What is the purpose of this? Allowing employers to spy on their employees? Allowing Microsoft to harvest data/behaviors from users?

Jump in the discussion.

No email address required.

I'm gonna play devil's advocate for a minute.

If this could be made secure, which it can't, but let's say secure ENOUGH somehow, I would want it.

At the moment, I have screenshots, notes, browser history and bookmarks, highlights in pdfs, and memorised file locations. Add my phone in and I've got highlights in books as well.

It would be sweet to have all of that in the same place and searchable, without me having to remember to specifically bookmark/highlight/screenshot/note it, and where to look for it later.

Jump in the discussion.

No email address required.

Imagine being so frickibg stupid that you cant just put the stuff you want to fibd again in a .txt on your goddarn desktop. Darn you dumb BIPOC

Jump in the discussion.

No email address required.

I don't always know what I'm going to want again until later :marseyretard2:

Jump in the discussion.

No email address required.

This is why you take detailed notes on google docs like an neurodivergent EVERY TIME

Note: Highlighted text, page x, book x, date, reason, keywords, filepath/physical location

I refuse to let AI strip me of my neurodivergent powers. :#marseyautism:

Jump in the discussion.

No email address required.

If Microsoft wanted to build a productivity product to help you better organize your learning or work they could just make it an app and give you control over what you want to be monitored/processed so that it could focus specifically on the files/sites/content you need.

I find built in OS level always-on monitoring way more sinister. I feel like it would be useful for employers to track the behaviors of employees since there is a huge pushback against webcam monitoring, this might be a way for them to get around it by saying we just monitor your activity on your work device & achieveing the same result. I can also see Microsoft being interested in how their customers use their products for development/marketing purposes or to sell data to other application developers or advertisers. Since other apps obviously aren't allowed to build this monitoring into their own apps, Microsoft could make a lot of money off this.

Jump in the discussion.

No email address required.

Straight up there are applications that do this like AnyTXT Searcher.

Just add image ocr and routine screenshots and you've got something just as good.

Jump in the discussion.

No email address required.

Why would I homebrew all of that shit when moicrosoft are making a better version tho?

Jump in the discussion.

No email address required.

which it can't

easily possible

Jump in the discussion.

No email address required.

Possible maybe but I dunno about easily. I guess you're thinking of keeping it on-disk and encrypted at rest until in use.

I'm an r-slur but.

It still needs to be encrypted. I guess this can be done using some asymmetric cryptography public key shit - anyway, something where it's encrypted automatically without needing the user's key or any password input.

Decrypting it for use is the trickier bit. If that happens automatically, then the decryption key needs to be also stored locally.

If it's stored on-disk, that's not secure. If it requires a password, then it's also not secure (due to government forcing you to divulge your password, keyloggers and rubber hose beatings), and it will turn off most convenience-loving r-slurs. And if it requires a hardware security module, that's secure but you can't recover it if you lose your device or some other shit happens.

That's all without considering just waiting for it to be decrypted and then swiping the unencrypted version.

Still, some of this is ‘secure enough' as I said. I know a lot of it is redundant (physical access is total access and so on) anyway. It is a major security risk by nature though, kind of no matter how you try to secure it

Jump in the discussion.

No email address required.

think we had different definitions of secure enough in mind. sure government beating you up basically impossible to secure

Jump in the discussion.

No email address required.

Not one single person is gonna read all that

Jump in the discussion.

No email address required.

finding that one image you half remember seeing 2.5 years ago (good reason to store 2tb of screenshots)

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.