When ChatGPT participated in the Chinese college entrance examination and fed it the national A and B papers, it seriously deviated from the subject!

Author | PythonChatGPT, as an intelligent human-machine dialogue application, quickly became popular worldwide after its launch. In just one month, its user base has exceeded the 100 million mark

Author | Python

ChatGPT, as an intelligent human-machine dialogue application, quickly became popular worldwide after its launch. In just one month, its user base has exceeded the 100 million mark. People have also tested many exam items using ChatGPT, such as SAT, AP, GRE, etc. However, what would happen if ChatGPT were to participate in our Chinese college entrance examination? Will he be biased towards the subject? Can we ordinary people pass the ChatGPT? Let's take a look at the evaluations brought by students from Fudan University and East China Normal University.

Thesis Title:
Evaluating the Performance of Large Language Models on GAOKAOB Benchmark


How can ChatGPT answer college entrance examination questions?

This paper adopts a zero supervised prompt method to convert the test questions into ChatGPT input, as shown in the following figure. Different inquiry methods have been designed for different disciplines and question types. For math problems, convert the formula into latex input.

College Entrance Examination Dataset

This article uses a total of 13 years of national A and B papers from 2010 to 2022 for testing. Each set of papers includes 10 subjects, namely Language, Mathematics, English, Materialization, History, and Geography. Mathematics is divided into Science Mathematics and Liberal Arts Mathematics.

The dataset contains a total of 2811 test questions. The specific question types will not be expanded here. I believe readers are still very familiar with the college entrance examination questions.

During the evaluation, a high school teacher from Shanghai Caoyang Second Middle School was hired to review subjective questions.

Experiment and Analysis

The scores obtained by ChatGPT in the college entrance examination over the years are shown in the following figure. Due to the normalization of each subject to a score of 100 when calculating scores, this score cannot be directly compared with your or my college entrance examination scores. But it can also be seen that this score is not ideal, and it is estimated that neither Fudan nor East China Normal University will be able to pass. Why is this?

The above figure shows the performance of ChatGPT in various disciplines and subjective and objective questions. Blue is the objective question, and yellow is the main observation question. Analysis found that ChatGPT performed well on objective questions, especially in English reading comprehension, multiple-choice, and cloze tests, achieving accuracy rates of 88.3%, 78.1%, and 73.8%, respectively. But even for objective questions, the accuracy of science mathematics is less than 40%. Mathematics is really difficult~

On subjective questions, ChatGPT performs poorly, and in physics, chemistry, biology, and mathematics subjects, subjective questions perform significantly worse than objective questions. Combined with objective questions in science, the score is also poor, perhaps ChatGPT leans towards humanities? According to the reviewer's comments, ChatGPT mainly lacks: 1. Complex equations in mathematical problems are difficult to solve correctly, and incorrect formulas are used in the problem-solving process. 2. Insufficient ability to understand and summarize when reading long materials.


summary

ChatGPT may not use the data of China's college entrance examination questions in the training process, so its performance is not affected by Data breach, and has high credibility.

The observation results show that compared to foreign exams, ChatGPT performs slightly worse in terms of Chinese college entrance examination questions. Therefore, domestic students do not need to worry too much about not being able to pass the ChatGPT for the time being. However, the ability to summarize long texts mentioned in the article has significantly improved in GPT4-32K, and the domestic large model has also been further optimized on Chinese data. Therefore, we can expect to achieve more remarkable performance in the future large-scale college entrance examination questions.

In addition, using ChatGPT to solve college entrance examination questions may answer netizens' debate about which province's exam questions are more difficult?


Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.(Email:[email protected])