From the article it seems that it’s not even stylometry, but profile features extraction from the large amount of text. So, for example, if I have my full true profile somewhere where I never mention something like BDSM but in another place I have a blog specifically about BDSM but intentionally (and let’s assume efficiently) omit or change every single detail about myself there, then, in theory, this particular technique should fail.
But yes, nothing prevents people from using LLMs in the same way for stylometry (and I’m 101% sure that those who are interested in that are already doing so). And yes, local “rewriter” LLM would help to some extent, but I think there has been another research somewhere that LLM-produced text allows to, if not completely recover the original prompt, then at least kind of fingerprint it, so… I wouldn’t fully trust that method either :)
It mentions style as being among the data points used, along with personal details, though if your hidden account is used for things like whistleblowing or niche erotica, you may not be mentioning telltale biographical details at all often, while you can’t help writing the way you write, with numerous unconscious choices between alternative ways of phrasing things, which will be the bulk of what it has to work with.
Of course, that doesn’t mean you couldn’t slip up, so if you don’t want your posts traced back to you, also look out for any details you’re leaking and file the serial numbers off them (and perhaps rig up a way of delaying your posts outside of your waking hours).
From the article it seems that it’s not even stylometry, but profile features extraction from the large amount of text. So, for example, if I have my full true profile somewhere where I never mention something like BDSM but in another place I have a blog specifically about BDSM but intentionally (and let’s assume efficiently) omit or change every single detail about myself there, then, in theory, this particular technique should fail.
But yes, nothing prevents people from using LLMs in the same way for stylometry (and I’m 101% sure that those who are interested in that are already doing so). And yes, local “rewriter” LLM would help to some extent, but I think there has been another research somewhere that LLM-produced text allows to, if not completely recover the original prompt, then at least kind of fingerprint it, so… I wouldn’t fully trust that method either :)
It mentions style as being among the data points used, along with personal details, though if your hidden account is used for things like whistleblowing or niche erotica, you may not be mentioning telltale biographical details at all often, while you can’t help writing the way you write, with numerous unconscious choices between alternative ways of phrasing things, which will be the bulk of what it has to work with.
Of course, that doesn’t mean you couldn’t slip up, so if you don’t want your posts traced back to you, also look out for any details you’re leaking and file the serial numbers off them (and perhaps rig up a way of delaying your posts outside of your waking hours).