I know there’s other plausible reasons, but thought I’d use this juicy title.
What does everyone think? As someone who works outside of tech I’m curious to hear the collective thoughts of the tech minds on Lemmy.
I know there’s other plausible reasons, but thought I’d use this juicy title.
What does everyone think? As someone who works outside of tech I’m curious to hear the collective thoughts of the tech minds on Lemmy.
Moving too dumb. Something caused Microsoft to ban OpenAI for its employees last week, probably a massive security blunder that we hopefully get to find out about eventually.
I think he was probably lying about where he got all the data used to train the model from, I’m guessing training a model on tons of copyrighted material and stolen user data won’t be legal in the near future.
I have been saying for a while the compute cost and copyright lawsuits are going to be a real bubble burst.
Yeah I’ve done a tiny bit of AI stuff for what I do (biology) and I think it’s very sus they can build such a strong model out of data which costs lots of money. The reason the algos in my field of biology are so strong is because the NCBI has the genomes of everything that’s be sequenced FOR FREE, because obviously you don’t want people patenting genomes and it should all be free for science, etc.
Which begs the question how the a start up that started out as a non-profit get that much user data and keep costs low? I know you can buy user data and I’m not sure how much it is to buy a bunch of google docs from a data broker, but if you buy from hackers who just data breached or used some illegal crawler you can probably cut that to prices a nonprofit could afford.
It doesn’t have to be nefarious. The API change at Twitter and Reddit were ostensibly about the fact that OpenAi et. al. pretty much downloaded all their content for free.
Throw in the fact that you can ingest all of wikipedia for free and you have a shitload of knowledge at your disposal.
I was under the impression that they crawled web sources but it seems like lots of copyrighted work was used.
I hadn’t heard of getting “illegal” data sets before so I looked into it and it sounds like they might have done that. Wow.
Link for the curious: https://www.theverge.com/2023/7/9/23788741/sarah-silverman-openai-meta-chatgpt-llama-copyright-infringement-chatbots-artificial-intelligence-ai
I’d think lobbysts would disagree with you on that point
Very true but they don’t always win, and besides, there are other lobbyists who are out there batting for Disney. If there is one hint of Micky Mouse™ in their data set they might as well just dissolve the company now.
That would honestly be preferable.
They also launched copilot. Could be the actual reason…
Copilot is GPT4
Yes. I meant they want them using it as copilot.