MIT Amazing Proof: Is the Big Language Model the World Model? LLM Can Understand Space and Time
LLMMIThttps://arxiv.org/abs/2310
LLM

https://arxiv.org/abs/2310.02207
Llama-2-70B

Llama-2
tokenLlama-2

300019502010linear probe

LLMLlama-2

15140




LLM
2021Emily M. Benderstochastic parrots

MITLLM
LLM
LLM
LLM
LLM
LLM

1
2 3000
320 50
420102020
Llama 2 probe
plateauing
1
2
3
PCA

metadata
DBpedia Lehmann
5000

(1) 1000 2000
(2) DBpedia1950 2020
(3) 2010 2020
Llama 2 70 700
residual streamactivationtoken
n

LLM
network activationstarget label A Rndmodel Yfit linear ridge regression probes


efficient leave-out-out cross validation
Llama 2-{7B, 13B, 70B}

MLP
R2LLM
linear ridge regression probesMLP more expressive nonlinear MLP
R

token
tokentoken
<><>
10 token
tokentoken

70B

token
token
token
LLM
X /

-


be a weighted sum of membership features


token
GPTLLM
MIT
GPT
2Othello-GPT

https://arxiv.org/pdf/2210.13382.pdf

https://www.deeplearning.ai/the-batch/does-ai-understand-the-world/

Othllo
8*8
Othello
GPTOthello-GPT
Championship7605132921822000379.6

Synthetic2000379.6
token608*8-4
8GPT8512
word embeddingC4B4
Othello-GPT0.01%5.17%Othello-GPT93.29%
C5D6E3F4C5C51/40.02%
Othello-GPT
probe
Othello-GPT
Othello-GPT

MLP

intervention
Othello-GPT


Othello-GPTstop1
MIT
GPT-4AGILLM
AGI
1931 Kurt Gdel
Gdel

How We LearnStanislas Dehaene
2016AlphaGo 4 1
AGI
413OpenAISparks of Artificial General Intelligence:Early experiments with GPT-4GPT-4

https://arxiv.org/pdf/2303.12712
GPT-4
GPT-4GPT-4
GPT-4

MIT 2

https://arxiv.org/abs/2310.02207
https://twitter.com/wesg52/status/1709551516577902782
Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.(Email:[email protected])