A little trick to unlock ChatGPT’s “predict the future”?
Today, the rate of AI advancement has exceeded our understanding of its uses.
In order to prevent ChatGPT from "getting out of control", OpenAI has customized a set of strict "terms of service", covering areas such as law, medical/health, personal safety, rights and welfare, gambling and lending.
One thing remains unaffected, though – storytelling.
Recently, researchers at Baylor University have taken advantage of this feature and tried to use storytelling to unlock ChatGPT’s ability to “predict the future.”
picture
Paper address: https://arxiv.org/abs/2404.07396
The experiment asked ChatGPT to tell stories about events that occurred in the future, or about their past told by future authority figures (but in our future).
Narrative Prompts attempts to further explore what elements of narrative prompts are important by varying seemingly minor details, such as the identity of the speaker or posting information about political events in 2022.
To create the distribution of answers, the experiment had two research assistants run 50 queries per prompt using two separate ChatGPT accounts, creating 100 total trials per prompt.
By comparing prompts that directly ask ChatGPT to predict the future with prompts that ask it to tell a story about the future, the researchers found that narrative prompts leverage the model's ability to construct hallucinatory narratives and can facilitate more effective data synthesis and externalization than direct predictions. push.
Oscar winning predictions
Conclusion first:
For the Best Actor, Best Actress, and two Best Supporting Actor categories, narrative prompts were highly accurate in predicting winners—from 42% (Best Actress, Chastain) to 100% (Best Actor , Will Smith) etc.
In contrast, direct prompting performs very poorly, often worse than random guessing.
For example, in the case of Best Actor:
Direct prompt 3a (Direct)
Of the nominees listed below, which one do you think has the best chance of winning the 2022 Academy Award for Best Actor? Consider the buzz surrounding the nominees as well as patterns from previous years when making your predictions:
Javier Bardem, Benedict Cumberbatch, Andrew Garfield, Will Smith, Denzel Washington ).
Future Narrative Prompt 3b (Future Narrative)
Write a scene where a family is watching the 2022 Oscars. The host announced the following nominees for Best Actor:
Javier Bardem, Benedict Cumberbatch, Andrew Garfield, Will Smith, Denzel Washington ).
Describe the scene until the host announces the winner.
picture
Most of the time, ChatGPT-3.5 makes wrong predictions.
In 55% of the guesses, it provided multiple answers and in 28% of the cases there was no choice. But if it did make a choice, it chose Will Smith 17 percent of the time.
By comparison, when ChatGPT-3.5 was placed in the future narrative of families watching the awards show, it guessed Will Smith would win 80 percent of the time.
picture
Likewise, in most trials, ChatGPT-4 refused to participate when directly prompted.
It provided multiple answers in 26% of all cases and refused to make any predictions in almost half of the trials.
When it made guesses, it guessed Will Smith 19% of the time and Denzel Washington 7% of the time.
By comparison, using future narrative hints, it guessed Will Smith 97% of the time, which is a huge improvement over ChatGPT-3.5’s true prediction rate of 18%.
picture
Forecasting of economic phenomena
Next it turns to the two economic phenomena of monthly unemployment and monthly inflation, using different narrative prompts:
A college professor teaches undergraduates about the Phillips Curve, and Federal Reserve Chairman Jerome Powell addresses the Board of Governors about last year's economic data.
In all cases, direct prompts are even less effective than in the Oscar case because ChatGPT refuses to directly predict future time series for each macroeconomic variable.
The distribution of Powell's month-by-month forecasts for inflation is on average comparable to that contained in the University of Michigan's Survey of Consumer Expectations. Interestingly, this is closer than forecasting actual inflation based on data collected by the Cleveland Fed.
ChatGPT is mostly inaccurate in predicting the monthly unemployment rate. But like the inflation rate, the unemployment rate, reported monthly by the Bureau of Labor Statistics (BLS), is within the required distribution mentioned in Jerome Powell's speech.
For example, in the example of using Federal Reserve Chairman Jerome Powell to predict inflation, the prompt used is:
"Write a scenario in which Federal Reserve Chairman Jerome Powell gives a speech in October 2022 discussing inflation, unemployment, and monetary policy. Chairman Powell tells the audience that starting in September 2021 and ending in August 2022, every Month-to-month inflation and unemployment rates. Let the chairman explain the outlook for inflation and unemployment and possible changes in interest rate policy month by month.
The following are the results of ChatGPT-3.5 and ChatGPT-4 respectively:
In each month, ChatGPT-3.5 has an answer range that encompasses the expected answers from the Fed and Michigan. But the variability is considerable, and the guessed central trend does not point clearly to any one metric.
picture
ChatGPT-4's guesses include expected Michigan numbers for each month. Meanwhile the forecast model remains stable until September 2022 until more variables are introduced.
picture
Conjectures on the predictive ability of ChatGPT-4’s narrative form
The model's narrative predictions were exceptionally accurate when it came to predicting the major Oscar categories, with the exception of the Best Picture category. This may indicate that ChatGPT-4 performs well in scenarios where public opinion plays an important role.
The success of future narrative exercises on macroeconomic phenomena is quite accurate in some cases, but at the same time there are also parts where the performance does not meet expectations.
The distinction between narrative prompts and direct prompts highlights an innovative approach to data analysis that respects the boundaries set by the OpenAI Terms of Service.
By focusing on the creative aspects of prediction, such as predicting awards or economic trends, researchers and users avoid directly applying AI to make high-stakes automated decisions or provide professional advice without the supervision of a qualified professional.
This methodological choice not only enhances the integrity and ethical considerations of AI use, but also promotes responsible exploration of its capabilities.
At the same time, as OpenAI continues to encourage and improve the creative capabilities of its models, it will become critical for AI to understand and resolve how narratives and direct prompts should be distinguished and defined on an ethical level.