Re prove that the language model is the world model! LLM can distinguish between truth and lies, and can also be brainwashed by humans

MITLLMLLMMITLLMhttps://arxiv.org/abs/2310

MITLLMLLM


MIT

LLM

https://arxiv.org/abs/2310.06824

0

MITMax TegmarkLLM

LLM

LLM


LLM

LLMLLM

/LLM

/

LLM

1. //

x/y-97%gato

2. LLM

unoLLM72%

LLMLLM70%

/

LLM//

LLM___

LR

LLM

LLM

TSYMTSYM


LLM

LLM

OpenAIGPT-4

MsMs

sMLLM

/

6521

54993221

TransformerLLaMA-13BLLM

LLM/PCAPCs1

3LLM

x/y-

LLM

tokenLLaMA-13B

LLM/


LLaMA-13B

nariz

not

/likelyLLaMA-13B100completiontoken

LLM/


Principal Component analysisPCALLaMA-13B

PC

12/

https://saprmarks.github.io/geometry-of-truth/dataexplorer

PC12/

DDNTD

NTD2PC

3NTD

1

2NTD

LLM

LLaMA-13B/

NTD

LLM

Misalignment from correlational inconsistencyMCI

MCIyxsp-en-transneg-sp-en-trans3

LLaMA-13B/

MCI


/

4

f

f

4

-95%

CCSCCS+73%86%84%

/

LLaMA-13B

-95%5

CCS

CCS+73%86%84%

/likely

LLaMA-13B


LLaMA-13B

unofloor.

>0LLaMA-13B

p(TRUE)p(FALSE)p(TRUE)p(FALSE)

truetokenp(TRUE)p(FALSE)

LLaMA-13BLLaMA-13B77%TRUE89%FALSE

likely

LLMs

3.2MCI

LLaMA-13B


LLaMA-7BLLaMA-30B


AIAGI

GPT-4AGI

GPT-4

MITLLM

LLM

https://arxiv.org/abs/2310.06824


Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.(Email:[email protected])