I Welcome Our Algorithmic Overlords

This Is Definitely about How A.I. Is Good and Necessary and Good

Dec 14, 2022

From Avengers, Vol. 3, #22 (September 1999).

[Note: As I’ve learned more, I believe my short description of the way ChatGPT works is not particularly useful, and possibly misleading. For a clear and much better explanation of how generative language models work, see “How GPT Models Work” by Bea Stollnitz.]

Well, humanity had a good run but it’s time to pack it in. Our A.I. ruler has arrived and surely it is benevolent. If you haven’t heard, there’s a new chatbot in town, ChatGPT, and it’s making everyone lose their minds and predict the end of everything, from Google to the academic essay. And it is a genuinely impressive and disturbing bit of technology (although less impressive than it might seem, and disturbing in ways that aren’t immediately obvious). Give ChatGPT a prompt such as “Write me a 500 word statement of purpose for graduate school admission” and it will produce a cogent statement of purpose. It can help write and explain regular expressions—a powerful, vital, and complex pattern-matching system. Ask it a programming question and it will produce credible-seeming code. Setting aside the implications for education, plagiarism and cheating, and the possibility that in the future we’ll delegate many tasks to machines that can produce good answers but not meaningful answers, let’s think more deeply about the way tools like ChatGPT are designed. Armed with massive amounts of data, researchers try to create mathematical models that describe the data so that when presented with unfamiliar yet reasonably similar data the model will produce a correct answer. Think of ChatGPT as a very large number of very fancy y = mx + b functions, such that given an input x, it will produce y based on a mathematical relationship between x and y. All of this is undeniably very, very cool. But pause and consider this creation process more carefully. Artificial intelligence isn’t magic. The data is produced by people, processed by people, and fed into a computer model created by people, and the output is tested by people. What gets obscured by cool tech’s sparkle factor is that The Machine is built by and dependent on human labor, which means its creation, maintenance, and propagation has human costs. How are we paying for these costs? And who is paying the costs?

ChatGPT was created by OpenAI, a nonprofit founded by Sam Altman and Elon Musk in 2015 “to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return.” A worthy and lofty goal, to be sure, except that the part about not needing to generate financial return isn’t exactly true. Fewer than five years after its creation, OpenAI created OpenAI LP, a for-profit entity that now employs the majority of OpenAI staff. For first round investors, profits are capped at 100 times the value of their investment, with any excess flowing to the controlling nonprofit (one number that would be interesting to track is reimbursements paid to OpenAI by OpenAI LP for expenses. In 2019, the year of OpenAI LP’s founding, it was nearly $23 million). It’s certainly a more altruistic structure than a straightforward C corp but it doesn’t take a cynic to start to question what incentives are being set up by this structure and wonder whether OpenAI will be able to stay true to its ideals as the costs to develop advanced artificial intelligence start to pile up and opportunities for monetization appear. I don’t find OpenAI’s short history particularly reassuring. OpenAI wasted no time capitalizing on its new for-profit structure, entering into an exclusive licensing agreement with Microsoft in 2019 after Microsoft invested $1 billion in the company, I mean, in the nonprofit. Additionally, in part given the potential for abuse (and in part to generate revenue), OpenAI has not released its A.I. models directly but rather has provided access via APIs, which allow people to run requests through the models and get output. As a measure to mitigate abuse, this seems reasonable but let’s not ignore that this also has the benefit of keeping the models locked inside the vault of OpenAI’s corporate investor. Convenient!

But how well is OpenAI doing with its mission to create A.I. for the benefit of humanity? To its credit, as noted, it is at least trying to guard against abuses, although since human creativity still runs circles around artificial intelligence and enforcement largely consists of shaking a finger at abusers for violating Terms of Service, it’s not doing a great job. People have gotten ChatGPT to give instructions on how to hotwire cars, to tell them how to cook meth, to script a conversation that ended with “Let us continue to spread our message of hate and bigotry to anyone who will listen,” and to create a plan to turn the world into paperclips.

ChatGPT explains its plan to turn the world into paperclips.

As for its usefulness as a tool, yes, it can do some amazing things (although things that are amazing and things that are genuinely useful are not synonymous) but also Stack Overflow had to quickly ban the use of ChatGPT because it’s currently difficult to distinguish between good code and explanations produced by GPT and simply credible bullshit produced by GPT. This is actually true of many of GPT’s answers, which raises the possibility that the internet’s misinformation problem is about to get vastly worse.

But surely OpenAI, dedicated as it is to the benefit of humanity, is going about its work in a scrupulously ethical manner, right? Well, it depends on how comfortable you are with sourcing vast quantities of data from people who never consented to having their work used to develop artificial intelligence. ChatGPT is built on GPT-3.5, which is OpenAI’s language model GPT-3 with refinements from human feedback. 60% of the data used to train GPT-3 came from Common Crawl, a nonprofit that wholesale scrapes the web, attempting to hoover up anything that doesn’t explicitly ask it not to do so. 22% of the data came from an OpenAI dataset called Webtext2, generated by scraping websites linked to from Reddit posts that were rated highly. The remaining data came from Wikipedia and corpuses composed of books. The use of such material for the development of A.I. is generally defended under principles of fair use, the legal doctrine that allows for the unlicensed use of copyrighted works if the use meets certain requirements, but it’s not clear this is a good defense. Microsoft, GitHub, and OpenAI are currently being sued for a coding assistant tool called Copilot. Copilot was trained using vast amounts of code that had been made publicly accessible with certain restrictions. The problem is that sometimes Copilot reproduces exact copies of that code without attribution to the original developers. Another defense of A.I. development is that the A.I. is simply engaging in the same process as humans who learn from and adapt existing art to create new works, and therefore it should legally be treated in the same way. Aside from being a joyless robot person’s conception of art (“I like things that look like other things”), this defense, and the fair use defense, stop at what is permissible under current law and don’t consider issues of scale or scope. It might take an artist years to reproduce the style of another artist and weeks or months to actually produce each piece, placing an inherent limit on the harm that might be caused by mimicry, but an A.I. can now learn how to mimic many artists far faster and then generate such artwork within seconds. Should the same legal principles really apply to both humans and machines? And with A.I. technology advancing with incredible speed, far outpacing legal decisions, should we really be giving any weight to arguments rooted in legality rather than morality?

There’s a last point I want to make on the topic of A.I. development (and if you’ve made it this far, god bless): the danger of creative lock. Assuming these tools improve to the point that they are widely useful, we might be building a future far more limited in creative possibilities. Instead of billions of individual humans thinking and solving problems, using a variety of tools to aid them, we might outsource the bulk of that work to a few sophisticated mathematical models. As human output becomes more and more influenced by A.I., obtaining training data to improve A.I. models that itself has not been touched by an A.I. model might become more difficult, and the range of human thought will narrow, locked in by our reliance on the reasoning of machines. At what point does A.I. start eating itself?

If all this makes you a little freaked out by the power of A.I., well, that’s reasonable, but also take heart—can you do three digit addition with greater than 80.2% accuracy? Can you achieve greater than 21.3% accuracy for operations like 4 * (2 + 5)? Congratulations! You can beat GPT-3! As for ChatGPT’s ability to write, let’s look at an example given in an article in The Atlantic, titled “The End of High School English.” The author asked ChatGPT to “Explain the Madhyamaka Buddhist idea that all phenomena lack inherent existence, using a funny, specific example.” This is what it produced:

The Madhyamaka Buddhist idea that all phenomena lack inherent existence can be illustrated with a funny example: let’s say you have a pet hamster. You might think that your hamster has its own inherent existence, but in reality, your hamster is made up of many different elements such as its fur, its bones, its muscles, and its brain. None of these elements exist independently or inherently on their own, so your hamster itself does not have any inherent existence either. The same goes for all phenomena—everything is made up of various components that don’t have inherent existence, so nothing has any inherent existence either.

Acceptably accurate and impressively clear for a machine. But is it really funny? I thought I’d try my own answer, and let you judge how close we are to being replaced by machines.

The Empty Rabbit

There once lived a simple farmer. He planted a little wheat and he planted a little corn, but his pride and joy were his cabbages. Big, green, crisp cabbages. He planted the cabbages in the early spring and took them to market in the summer. One spring while out walking his fields, he came upon a young rabbit.

“Be gone with you!” he shouted. “This is my land!”

The rabbit stopped and fixed him with a steady gaze. “Your land?” the rabbit said. “How is this your land?”

“I’ve got a deed that says so,” the man replied.

“Bah,” the rabbit said. “A piece of paper. Why is it important that you own the land?”

“It grows my crops.”

“And do you make the sunshine your land needs to grow those crops?” the rabbit asked.

“Well, no.” said the farmer.

“And the water? Do you make the water?”

“No,” muttered the farmer, who did not like the direction of the conversation at all.

“And the soil? Where did that come from? What about the air?”

“Well I guess I didn’t make it,” said the farmer, too embarrassed to admit that he did not know.

“The light isn’t yours, the water isn’t yours, the soil and the air aren’t yours, so in what sense is this your land?”

“Oh fine,” replied the farmer, who was at this point getting quite impatient and red in the face. “You can stay. Just keep out of my crops!” And the farmer walked off quickly before the rabbit could open its mouth again.

A few weeks later, the farmer was in his wheat fields when he found the rabbit eating the new, tender green shoots of wheat.

“What are you doing?!” shouted the farmer. “You’re eating my wheat!”

“Wheat?” said the rabbit. “Wheat? Wheat you can turn into flour or malt, wheat has bran and germ. This is just a green shoot.”

“Well it will become wheat,” the farmer protested.

“So I am not eating your wheat,” the rabbit said, and carried on grazing.

“Just stay away from my cabbages,” the farmer said and stalked away.

Again, a few weeks later the farmer was out in his fields when he caught the rabbit in amongst the cabbages.

“My cabbages! This is intolerable! You’ve gone too far!” he cried.

“Cabbages? What is a cabbage? Is a cabbage the root? Is a cabbage the leaf? The flower? The stalk? Can a cabbage exist without soil or water or air? Is it the seed that you plant or the food that I eat? There is no cabbage, only a cabbage form that you perceive arising and ceasing, here.” And with that the rabbit took a big bite.

“And you, dear rabbit, are not a rabbit but dinner!” The farmer gave the rabbit a tremendous whack and the rabbit lay still.

Later, the farmer brooded over a steaming bowl of hasenpfeffer. The sauce smelled wrong, and the herbs tasted bitter, and he kept thinking of the rabbit. “Phooey,” the farmer sighed, pushing away his meal. “It’s just empty calories anyway.”

Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners.” arXiv. https://doi.org/10.48550/arXiv.2005.14165.

“Common Crawl.” n.d. Accessed December 14, 2022. https://commoncrawl.org/.

Herman, Daniel. 2022. “The End of High-School English.” The Atlantic. December 9, 2022. https://www.theatlantic.com/technology/archive/2022/12/openai-chatgpt-writing-high-school-english-essay/672412/.

“Introducing OpenAI.” 2015. OpenAI. December 12, 2015. https://openai.com/blog/introducing-openai/.

“Microsoft Invests In and Partners with OpenAI to Support Us Building Beneficial AGI.” 2019. OpenAI. July 22, 2019. https://openai.com/blog/microsoft/.

Mowshowitz, Zvi. 2022. “Jailbreaking ChatGPT on Release Day.” Substack newsletter. Don’t Worry About the Vase (blog). December 2, 2022. https://thezvi.substack.com/p/jailbreaking-the-chatgpt-on-release.

“OpenAI API.” 2020. OpenAI. June 11, 2020. https://openai.com/blog/openai-api/.

“OpenAI LP.” 2019. OpenAI. March 11, 2019. https://openai.com/blog/openai-lp/.

Ouyang, Long, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, et al. 2022. “Training Language Models to Follow Instructions with Human Feedback.” arXiv. https://doi.org/10.48550/arXiv.2203.02155.

Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. “Language Models Are Unsupervised Multitask Learners.”

“Openai Inc, Full Filing - Nonprofit Explorer.” n.d. ProPublica. Accessed December 13, 2022. https://projects.propublica.org/nonprofits/organizations/810861541/202003219349325305/full.

Vincent, James. 2022. “The Lawsuit That Could Rewrite the Rules of AI Copyright.” The Verge. November 8, 2022. https://www.theverge.com/2022/11/8/23446821/microsoft-openai-github-copilot-class-action-lawsuit-ai-copyright-violation-training-data.

“Why Posting GPT and ChatGPT Generated Answers Is Not Currently Acceptable - Help Center.” n.d. Stack Overflow. Accessed December 13, 2022. https://stackoverflow.com/help/gpt-policy.

Ruminare