9 Comments

In combination with that Verge piece about the death of robots.txt, this story suggests the ole' tragedy of the commons is reaching its middle.

Expand full comment

oh wow, I had missed that article! Thanks, Mikey. I just read it--and yes, definitely, I see the parallels. These systems that have relied on good will and open sharing--the web and universities--are getting taken advantage of in this mad rush for AI training data.

"For many publishers and platforms, having their data crawled for training data felt less like trading and more like stealing."

https://www.theverge.com/24067997/robots-txt-ai-text-file-web-crawlers-spiders

Expand full comment

Yikes. That's spooky stuff. Thank you for writing about it!

Expand full comment

yeah, it's a little grim. :/

Expand full comment

Indeed...

Expand full comment

Wow! I was shocked as I first reading this, but after having a little bit to sit with it, I guess its not surprising if we consider that these tech companies are already and will continually be searching for any and all high-quality human writing they can find to feed into their models. It seems like they can't eat their own dog food anyway.

Your extending the oil metaphor into consideration of scarcity also seems to undermine the narrative that LLMs will eventually be handling most (or all?) of our literate activity on our behalf. Their continued effectiveness would seem to rely on the continued availability of human writing to fuel them.

Feeling down about the future of open access data but weirdly a little more optimistic about the value of human writing. Great read!

Expand full comment

yeah, I do feel a bit depressed about my own conclusion here--closing open access data.

I think you're totally right to point out the irony that the AI relies on human writing to continue. Unless synthetic data gets good enough, I guess?

Thanks for reading, Michael!

Expand full comment

Man, this cuts to the core of faculty anxiety. I raised the issue when they upgraded Blackboard Ultra with a built in AI assistant. There’s no mention what it pulls from your course when a user hits auto-generate. My guess is most of the written content is used. I have no idea what happens to the data.

The open movement in higher ed is strong. I’d hate to see data scraping destroy OER and other open practices. Great post!

Expand full comment

oh wow, yes--I didn't even think about the course management systems and how they might be harvesting data from students. I would also guess that if they have the data, they're using it--unless there's an explicit statement in the T&C that they're not.

I'm also concerned about what this means for the culture of universities and open education resources. :/ Thanks for reading, Marc!

Expand full comment