Flat Circle - How Claude 3.7 makes better investment decisions
Plus: 1 new research paper, 3 new articles and 4 new hedge fund LLM jobs
Flat Circle measures the ability of language models to predict company earnings results. See our methodology for detail and disclaimers. If you haven’t already subscribed, join investors and engineers interested in LLMs+investment research here:
Claude 3.7 and 3.5 make different trading decisions given the same information
Yesterday, Anthropic released Claude 3.7 Sonnet - which shows superior reasoning scores to OpenAI o1 and DeepSeek R1.
Last night, we fed Claude 3.7 and Claude 3.5 the same information about 10 companies scheduled to report earnings this morning and asked both to make BUY/SELL calls for each stock.
Claude 3.7 Sonnet offers an “extended thinking mode” allowing the user to specify a budget for the number of reasoning tokens it should use while formulating a response. While we maxed out the reasoning budget, there is a limit to how high the budget can be set and the model often doesn’t use its entire budget. Each call used only about 4 cents worth of reasoning tokens, though you can imagine a future with investors spending 4 million dollars on a single investment decision.
Across the 10 names, both Claude 3.7 and 3.5 lost money because they incorrectly recommended BUY for Sempra (SRE), which missed on revenue and guidance and is currently down ~20%.
Overall, Claude 3.7 outperformed Claude 3.5 by about 10% thanks to 3.7 correctly calling BUY for Bank of Montreal (BMO) vs Claude 3.5 recommending SELL. BMO beat earnings and is currently up about 5%.
Both generations of the Sonnet model break their reasoning steps into similar concepts, but the newer generation appears to think harder and more deeply. We’ve contrasted the two models’ reasoning on BMO below:
Credit cycle observations
Claude 3.5
Credit Quality Deterioration:
- In Q4 2024, BMO reported significantly higher provisions for credit losses (PCL) of $1,523M vs $446M year-over-year
- Management noted credit performance deteriorated more than anticipated throughout 2024
- While Q4 2024 may represent a "high point" for provisions, they are expected to "remain elevated" through 2025
Claude 3.7 - extended thinking mode
Credit Loss Cycle Appears to Have Peaked
- In Q4 2024, management explicitly stated that Q4 represented a "high point" for provisions for credit losses (PCLs) that would "begin to moderate through 2025"
- The bank has already recognized substantial credit losses in 2024 (PCLs reached $1,523M in Q4 alone)
- Interest rate cuts have begun, which should gradually reduce pressure on borrowers
- The early recognition of credit issues and proactive provisioning positions BMO well for improving credit metrics
- Prior credit cycles show the stock typically recovers strongly once problems are fully recognized and addressed
Peer commentary
Claude 3.5
Challenging Operating Environment:
- Peer commentary (RBC, Scotiabank) indicates continued pressure on Canadian consumers from higher rates and rising unemployment
- Economic growth is expected to remain subdued in H1 2025
- Competitive pressure on deposits remains high, potentially squeezing margins
- Bank of Canada rate cuts may provide some relief but timing remains uncertain
Claude 3.7 - extended thinking mode
Beneficial Rate Environment Emerging
- Central banks have begun easing cycles, with more cuts expected in 2025
- BMO management specifically noted this should be positive for both economic activity and credit performance
- Peer commentary (RBC, BNS) supports expectations for mortgage market improvement and broader economic stabilization
Analyzing recent performance
Claude 3.5
Recent Performance vs Expectations:
- BMO has missed consensus estimates for three consecutive quarters
- Q4 2024 adjusted EPS declined to $1.90 from $2.93 year-over-year
- Return on Equity has fallen below target with management acknowledging it will take time to rebuild to 15% medium-term goal
Claude 3.7 - extended thinking mode
Capital Strength Enabling Shareholder Returns
- BMO has built exceptional capital strength with CET1 ratio reaching 13.6% in Q4 2024
- The bank announced both a dividend increase and a share repurchase program of up to 20 million shares (NCIB)
- January 2025 press releases confirm regulatory approvals have been received and the buyback is proceeding
- This capital return story provides support even if operating results are mixed
Positive Momentum in Core Businesses
- Q4 2024 delivered record pre-provision pre-tax earnings of $3.5 billion (up 8% YoY)
- Canadian P&C business has shown consistent strength with record revenues
- U.S. segment's contribution has significantly increased post-Bank of the West integration
- Management indicated "net bullish" outlook for U.S. growth prospects in 2025
Claude 3.7’s observations seem to be more forward looking. Apparently this is what better investment reasoning looks like.
Interesting articles
New paper details trading system based on LLMs + reinforcement learning. Authors incorporate an LLM monitoring for changes in market sentiment to overcome the structured data limitations of traditional RL based trading strategies.
The paper compares results to o1, GPT 4o and other open source models, and corroborates our conclusions that o1 outperforms other models. However, all models appear to be beaten by the RL-LLM hybrid system discussed in this paper (arXiv)
Is AI really thinking or just pretending to? This is really the key question, and the article lays out the arguments on both sides. One good quote:
The best use case is a situation where it’s hard for you to come up with a solution, but once you get a solution from the AI you can easily check to see if it’s correct. Writing code is a perfect example. Another example would be making a website: You can see what the AI produced and, if you don’t like it, just get the AI to redo it.
… another example is measuring how the models perform in the market (Vox)
Two articles from late last year about Balyasny’s internal LLM tool:
Balyasny’s AI outperforms OpenAI in financial applications (hedgeweek)
A day in the life of an applied AI engineer at Balyasny (efinancialcareers)
Interesting LLM hedge fund job descriptions
Citadel: Commodities - Machine Learning Engineer
“Commodities have undergone an information revolution. From ship tracking to oil storage levels and crop yields, more data on supply, demand, storage, and transport is available than ever before. Commodity markets are more globally connected: natural gas markets impact fertilizer production, while agricultural markets impact gasoline production…We combine specialist domain expertise with advanced modeling techniques to solve problems that others deem unsolvable.”
DE Shaw: Software Developer - Generative AI ($225K)
“Working on greenfield projects, which offer opportunities to shape the future of GAI at the firm and make a significant impact”
Millennium: Senior AI Engineer - Equities Technology ($213K)
“We are building the next generation of Large Language Modeling applications driven by Portfolio Manager's requirements that provide immediate value and scale as a core product.”
Point72 (Cubist Systematics): NLP Engineer
“Build start-of-the-art deep learning models to process large scale unstructured datasets.”
Follow how LLMs are beginning to make investment decisions
If you have feedback or would like to participate in this project, please reply to this email or reach out via X or LinkedIn.