Besides not working all that well, with a sky-high hype-to-performance ratio, this generation of large language models is also remarkable for low standards of documentation—even by the pitiful standards of the Internet.
Even government regulators, who are struggling to get a clue what to do, have figured this out. We need to know what is in these beasts, where they come from, what they do, and what they are being used for.
Responding to this demand, researchers at Stanford assembled a report card for this years LLMs [1].
No surprise—everybody fails [2].
The report is based on open information, though the responsible parties had opportunity to add or correct the record. And, for the record, Musk's "OpenAI", may or may not be "AI", but it sure isn't "open".
One of the researchers notes that in recent years, "[a]s the impact goes up, the transparency of these models and companies goes down," (Rishi Bommasani of CRFM quoted in [2]).
The report card covers the obvious things, like software specs, provenance of the training data, training methods, and so on. It also includes "downstream" issues, such as access, access policies, and "impact".
One of the interesting input variables is how human labor is used. These large language models are "refined" by human supervisors. For instance, there has been considerable hype about the supposed "guardrails" on ChatGPT. These are basically human interventions to suppress some dangerously crazy results.
We know that these models are pretty much useless without human "tuning". And we know that this tuning has a strong effect on the results, from the differences in versions of the same model.
So, it is extremely relevant to ask who are these humans, and what are they doing?
The researchers note that it is widely believed that many of these humans are remote workers in low wage areas, such as Kenya. But no one outside the companies really knows.
Is anything going to change?
I'm not holding my breath.
With reports that OpenAI—which hasn't been "open" for years—is preparing a deal that will value the company at $80 billion, we can be sure they ain't gonna' be telling anybody anything anytime soon.
Sigh.
- Rishi Bommasani, Kevin Klyman, Shayne Longpre, Sayash Kapoor, Nestor Maslej, Betty Xiong, Daniel Zhang, and Percy Liang, The Foundation Model Transparency Index. Center for Research on Foundation Models (CRFM), Stanford, 2023. https://crfm.stanford.edu/fmti/
- Eliza Strickland, Top AI Shops Fail Transparency Test, in IEEE Spectrum - Artificial Intelligence, October 23, 2023. https://spectrum.ieee.org/ai-ethics
No comments:
Post a Comment