DigitalForensick

DigitalForensick@lemmy.world · 5 hours ago

For anyone looking into doing some OSINT work, this is an epic file EFTA00809187

It contains lists of ALL know JE emails, usernames, websites, social medias, etc from that time

DigitalForensick@lemmy.world · 5 hours ago

Nice work man! I also discovered something yesterday that I think is worth pointing out.

DUPLICATE FILES: Within the datasets, there are often emails, doc scans, etc that are duplicate entries. (Im not talking about multi torrent stitching, but actual duplicate documents within the raw dataset.) **These duplicates mustbe preserved. ** When looking at two copies of the same duplicate file, I found that sometimes the redactions are in different places! This can be used to extract more info later down the road.

DigitalForensick@lemmy.world · 5 hours ago

nothing, but event the archived pages arent 100% because some of the files were “faked” in the paginated file lists on the DOJ site. it does work well enough though. I did this to recover all the court records and FOIA files

DigitalForensick@lemmy.world · 1 day ago

What does this contain? anything new?

DigitalForensick@lemmy.world · 1 day ago

I’ve been thinking a lot about this whole thing. I don’t want to be worried or fearful here - we have done nothing wrong! Anything we have archived was provided to us directly by them in the first place. There are whispers all over the internet, random torrents being passed around, conspiracies, etc., but what are we actually doing other than freaking ourselves out (myself at least) and going viral with an endless stream of “OMG LOOK AT THIS FILE” videos/posts.

I vote to remove any of the ‘concerning’ files and backfill with blank placeholder PDFS with justification, then collect everything we have so far, create file hashes, and put out a clean + stable archive on everything we have so far. a safe indexed archive We wipe away any concerns and can proceed methodically through blood trail of documents, resulting in an obvious and accessible collection of evidence. From there we can actually start organizing to create a tool that can be used to crowd source tagging, timestamping, and parsing the data. I’m a developer and am happy to offer my skillset.

Taking a step back - Its fun to do the “digital sleuth” thing for a while, but then what? We have the files…(mostly)… Great. We all have our own lives, jobs, and families, and taking actual time to dig into this and produce a real solution that can actually make a difference is a pretty big ask. That said, this feels like a moment where we finally can make an actual difference and I think its worth committing to. If any of you are interested in helping beyond archival, please lmk.

I just downloaded matrix, but I’m new to this, so I’m not sure how that all works. Happy to link up via discord, matrix, email, or whatever.

DigitalForensick@lemmy.world · edit-2 2 days ago

this dude on pastebin posted his filetree in his epstein ubuntu env - i have a high confidence in whatever lives in his DataSet9Complete.zip file haha

DigitalForensick@lemmy.world · 2 days ago

same

DigitalForensick@lemmy.world · 2 days ago

Have a scraper running on web.archive.org pulling all previously posted Court-Records and FOIA (docs,audio,etc.) from Jan 30th

DigitalForensick@lemmy.world · 2 days ago

Holy shit

The entire Court Records and FOIA page is completely gone too! Fuckers!

DigitalForensick@lemmy.world · 2 days ago

Does anyone have the OTHER data sets from before? Ive been lasered in on the DS1-DS12 but havent looked at the other documents at all

DigitalForensick@lemmy.world · 2 days ago

this is ridiculous. Good thing we got in when we did!

DigitalForensick@lemmy.world · 2 days ago

While I feel hopeful that we will be able to reconstruct the archive and create some sort of baseline that can be put back out there, I also cant stop thinking about the “and then what” aspect here. We’ve see our elected officials do nothing with this info over and over again and I’m worried this is going to repeat itself.

I’m fully open to input on this, but I think having a group path forward is useful here. These are the things I believe we can do to move the needle.

Right Now:

Create a clean Data Archive for each of the known datasets (01-12). Something that is actually organized and accessible.
Create a working Archive Directory containing an “itemized” reference list (SQL DB?) the full Data Archive, with each document’s listed as a row with certain metadata. Imagining a Github repo that we can all contribute to as we work. – File number – Dir. Location – File type (image, legal record, flight log, email, video, etc.) – File Status (Redacted bool, Missing bool, Flagged bool
Infill any MISSING records where possible.
Extract images away from .pdf format, Breakout the “Multi-File” pdfs, renaming images/docs by file number. (I made a quick script that does this reliably well.)
Determine which files were left as CSAM and “redact” them ourselves, removing any liability on our part.

What’s Next: Once we have the Archive and Archive Directory. We can begin safely and confidently walking through the Directory as a group effort and fill in as many files/blanks as possible.

Identify and dedact all documents with garbage redactions, (remember the copy/paste DOJ blunders from December) & Identify poorly positioned redaction bars to uncover obfuscated names
LABELING! If we could start adding labels to each document in the form of tags that contain individuals, emails, locations, businesses - This would make it MUCH easier for people to “connect the dots”
Event Timeline… This will be hard, but if we can apply a timeline ID to each document, we can put the archive in order of events
Create some method for visualizing the timeline, searching, or making connection with labels.

We may not be detectives, legislators, or law men, but we are sleuth nerds, and the best thing we can do is get this data in a place that can allow others to push for justice and put an end to this crap once and for all. Its lofty, I know, but enough is enough. …Thoughts?

DigitalForensick@lemmy.world · 2 days ago

I’m not sure of the exact files that were reported by the NYT, but there certainly were some concerning images in the initial Jan 30 release, however it was certainly more than the reported 40. I saw others as well but I don’t remember what the file numbers we’re.

spoiler

[246249_247010]

From my own observation timeline on the images in question: Jan 30: Images were accessible through DOJ directly. File numbers wereskipped in the list, but were manually reachable through URL. All these photos were fully unredacted (uncensored). **Feb 1: ** Images were NOT accessible through DOJ anymore, returns “Page not found”. However images were (and still are) snapshotted via web.archive.org. Feb 2: Downloading the 87GB Set 9 appeared contain these images as well, meaning we likely all have them on our computers. yikes

These files were scrubbed from the DOJ website, along with many others.

I found many of the scrubbed files by parsing through the lists and finding large gaps in file numbers, where the preceding file did not contain multiple images/documents in one pdf. There are also tons of internal memos in the dataset that precede file groups and talk about the content ahead. These memos surrounded files that seemed like they were meant to be redacted, so its worth poking around. I didn’t go nuts, but things I found around these that interesting and were also removed:

[EFTA00276493]: internal memo referring to Clinton photographed with “nude Gretchen”.
[EFTA00273790-EFTA276487]: (removed) looks like arial Lidar scans of the full estate?
[EFTA00276220]: (removed) panoramic Infrared / xray-ray scan of a room

DigitalForensick@lemmy.world · 3 days ago

Hey that makes sense to me man.

I think there will be plenty of falling chips in the coming weeks. Once the data is aggregated and truly accessible searchable… someone is going to make some AI something that can connect the dots faster than the justice system - because my god is it slow as molasses.

I’m so tired of waiting around.

DigitalForensick@lemmy.world · 3 days ago

This seems like a valid plan - although I’m not that confident in the ‘purge’. It might be good to redact those images ourselves and then nobody is pressed to store them. Better to have a confidently safe dataset that can be passed around safely.

Also, It looks like they went back and repaired the shitty text redactions on docs that were released late 2025 from what I can tell. I ran a script that auto detects and removes “fake” redactions and its not getting any hits anymore. even on files that it flagged in the past. They are definitely trying to cover their tracts* by the day*

DigitalForensick@lemmy.world · 3 days ago

Without a timestamp on the photo its impossible to be 100% but it was obvious enough for me to ask the question. :/ It seems like it was a mistake on their part because everything else has heavily redacted nudity. You can also see references in the internal memo docs preceding the content.

DigitalForensick@lemmy.world · 3 days ago

wondering the same thing myself. Not sure about the latest DS9 dump, but I’ve definitely seen some of the other leaks that included some CSAM. crazy that DOJ let that out the door. :/

DigitalForensick@lemmy.world · 3 days ago

So what’s the consensus on what to do about all the fully uncensored CSAM the DOJ released on the 30th? Much of it has been removed as of today, but that shit is still fully up on archive.org… 🙄…Not Great…