An investigation of the questions mathematics teachers use on exams

The extent to which the targeted outcomes in education are achieved can be determined by the educational assessment process. Although various alternative ways of assessment have arisen in recent decades, written examinations are still widely used by teachers. This study aims to determine the quality of the questions used by middle school mathematics teachers on exams. Program for International Student Assessment (PISA) proficiency levels framework constitutes the theoretical framework of the study, in which the document analysis method is adopted. In this study, where a total of 1252 written questions were examined, it was observed that teachers mostly preferred open-ended questions in terms of question types. The analysis of the questions in terms of proficiency level showed that teachers mostly preferred the questions at Level 1 and Level 2 at low cognitive level. Level 5 and Level 6 questions were never encountered at all grade levels. In the light of results, some suggestions are made for further research.


Introduction
The skills that societies expect from people are becoming different decades by decades. Effective operational and calculation skills in a limited time, which was expected from humans for a large part of the past century, is no longer relevant today. Instead, what is essential now is to raise creative individuals who can use what they know to solve real-life problems, think critically, and be creative (Kivunja, 2014). These needs have affected the education policies of countries over time and there have been paradigm changes in education systems in this direction. Assessment and subsequent evaluation activities have gained importance in determining how effective this change is in achieving the intended goals. Measurement activities are carried out with national and international tests conducted in certain periods and exams held in schools.
Assessment has been done for various purposes. According to Newton (2007), grading students, monitoring the system, revealing the resources of the regions, placing students, determining their success, giving feedback to students and guardians, and improving learning and teaching are among the main objectives of assessment. Some researchers underline the types of assessment activities and emphasize that the purpose of assessment differs according to these types. For instance, Suurtamm et al. (2016) points out large-scale assessment and classroom assessment as two common types and defines their general purposes and effects on education system. According to them, large-scale assessment helps evaluate the system and programs and it is conducted to make student placement. Large-scale assessment affects a wide audience, from teachers to schools, as it is important in determining the extent to which the general goals of education are achieved.
Compared to large-scale assessment, classroom assessment is a type of assessment that is mostly based on individual feedbacks (NCTM, 2014) and teachers usually prepare their own assessment tools themselves. Classroom assessment is commonly carried out with oral or written questions as well as homework; but among these, written exams are the most used type (Köğce & Baki, 2009). Unlike large-scale assessment that has a more psychometric measurement tradition (Tatto et al., 2012), classroom assessment has a potential to demonstrate students' creativity and reasoning in a deeper way (Suurtamm et al., 2016). Despite these differences, there are researchers who state that classroom assessment is under the influence of large-scale assessment. According to these researchers, since large-scale assessment scores are taken into account in evaluating the success of teachers 'teaching (Wilson & Kenney 2003), these evaluations shape teachers' teaching practices (Çabakçor et al., 2014).
Various taxonomies are used to classify the quality of the questions, may it be large-scale assessment or classroom assessment. The most common taxonomy is Bloom's taxonomy, where the cognitive levels to which the questions belong are classified. This taxonomy, which has been used for more than 60 years, defines cognitive levels from simple to complex as remember, understand, apply, analyze, evaluate, and create (Anderson & Krathwohl, 2001). There are also some taxonomies, such as The Mathematical Assessment Task Hierarchy (MATH), which serve as alternatives to Bloom's taxonomy but are criticized for not being specific to the field. On the other hand, it is seen that there is a search for branch-based classifications in the literature since Bloom taxonomy does not have a structure specific to the area (Aygün et al., 2016). In this context, one of the tools used specific to mathematics is the proficiency levels used in the Program for International Student Assessment (PISA) exam. These levels are both a useful instrument in revealing the cognitive levels of the questions through indicators, and a tool that helps reveal the scale of the scores obtained by the students. This study aims to examine the questions used by middle school mathematics teachers in written exams in the context of PISA's proficiency levels.

Theoretical Framework
PISA is the largest educational research organized by Organization for Economic Cooperation and Development (OECD), aimed at collecting information on mathematics, science, and reading skills of 15-year-old students. The exam aims to predict the extent to which students will use the academic knowledge they have learned in schools in problem situations they will encounter in daily life (Wijaya et al., 2014). In this manner, real-world problems, which require quantitative reasoning, spatial reasoning, or problem solving skills, are used in PISA (OECD, 2003). One of the important outcomes of PISA is the reports of the proficiency scale levels in mathematics, science, and reading domains. In mathematics, six proficiency scale levels are described to evaluate the students' mathematical capabilities. Figure Figure 1 shows the competency levels of PISA and the necessary qualities in order to solve the problems at each level. Students at first level can solve the familiar questions using directly given instructions. Competencies at Level 2 are related to both basic operations and direct reasoning. If direct inference is available, one can create a single-step solution to solve the questions at this level. Level 3 requires adapting basic knowledge to a new situation. Using different information sources, sequential decisions can be made. Level 4 requires the adaptation of a complex situation using concrete models and interpretation. Students at this level work with explicit models for complex concrete situations that call for making assumptions. At Level 5, students can work with models for complex situations. Students can also select, compare, and evaluate appropriate problemsolving strategies for handling complex problems related to these models. Finally, students at Level 6 can form concepts related to complex problem situations based on their own research and modeling studies, can make generalizations and use them, and can make connections among different information sources and forms of representation and easily switch from one to the other.
The framework mentioned above was used to describe not only the levels of students in participant countries of PISA but also the questions in order to measure the skills in levels. In other words, each question is classified by considering the cognitive activities required for its solution. For example, a problem that requires making sequential decisions and is therefore defined as Level 3 by PISA is presented in Figure 2.

The Aim
This paper aims to examine the types of questions used in written exams made by teachers as part of classroom assessment and to determine the competency levels required to solve the questions using PISA framework.

Method
Since this paper aims to investigate the examination papers of the teachers, a document analysis research method was adopted. This method is a form of qualitative approach in which gathered documents are interpreted in order to uncover a meaning, provide understanding, and discover perspectives related to the research problem (Bowen, 2009;Merriam, 1988).
The research sample is located in the northeast of Turkey's metropolitan area and the district constitutes 10 different secondary schools. In the selection of schools, schools' achievements in national tests were taken into consideration and schools with average success were preferred. In this respect, it can be said that purposeful sampling is done. During the data collection phase, meetings were conducted with the mathematics teachers in schools and the exam questions of the teachers who shared the mathematics exam papers were included in the study.

Data Collection
The data of the study were obtained through the analysis of the exam papers of 21 teachers working in 10 different schools in 2015-2016 academic year. Grade levels 5-8 are included in the study since the context of the study is limited to middle schools. Considering the ethical conditions, teachers who wanted to be included in the study were asked to share the exam papers they had used in the last two years; no such request was made from those who did not want to participate in the study.
A total of 1252 questions are included in the analysis: 254 questions (20.3%) in fifth graders, 315 questions (25.1%) in sixth graders, 304 questions (24.3%) in seventh graders, and 379 questions (30.3%) in eighth graders. These exam questions consist of open-ended questions, multiple-choice questions, gap-filling questions, and true-false type of questions.

Data Analysis
A descriptive way was followed in the analysis of the data. Accordingly, each exam question was first analyzed according to the question types. Subsequently, the questions at each grade level were analyzed by considering PISA proficiency levels in Figure 1. Table 1 represents a sample coding that belongs to a question used by a teacher in the examination. In order to increase the reliability of the study, 100 randomly selected questions were re-coded after the first coding by a different researcher who studied PISA proficiency levels; eight questions were found to be different. Based on this compliance percentage, it can be said that the reliability rate is at the desired level (Miles & Huberman, 1994).

Openended
One can solve the given question using a simple routine procedure according to direct instruction. The question is clearly defined and obvious to see the solution.
Level 1 5 A movie that started at 16.50 ended at 19.10. Since a total of 30 minutes of commercials are shown between the films, how many minutes is this film? A) 1 hour 50 minutes B) 1 hour 55 minutes C) 2 hours 10 minutes D) 2 hours 50 minutes

Multiplechoice
The problem can be solved using direct reasoning and literal interpretation in a familiar context.
Level 2   8 Which of the points named in the figure is combined with the endpoints of the segment KL to obtain a triangle equal to ABC? A) M B)N C)P D)R

Multiplechoice
This problem is required to know equal triangles and use basic problem-solving strategies such as dragging. Students can also use various representations and ways on different sources and reason from them.
Level 3   8 Which element of the triangle shows the fold line obtained by folding and unfolding a paper in the form of a equilateral triangular region as above? A) Bisector B) Median C) Perpendicular bisector D) Height

Multiplechoice
The problem contains a complex situation where students can deal with it using concrete models.

Results
In order to determine what types of questions teachers commonly use in their exams, each question was examined in terms of types. Figure 3 shows the distribution of the questions according to the question types.

Figure 3
Question types used in the exams When Figure 3 is examined, it is seen that two types commonly constitute an important part of all question types: open-ended and multiple choice. Besides, few both gap-filling and true-false question types are found to be below 10% of the total. Table 2 shows the distribution of these questions according to grade levels. Considering the distribution of questions in terms of grade level, it is seen that approximately two-thirds of the exam questions in the eighth grade consist of open-ended and multiple-choice questions. Gap-filling and true-false questions were mostly preferred for the seventh graders.
In the next section, the distribution of the questions asked in terms of grade levels according to PISA mathematics proficiency levels was examined.

Results from the Analysis of Fifth-Grade Questions
A total of 254 exam questions were analyzed at this grade level. The distribution of these questions in terms of PISA proficiency levels is presented in Figure 4.
Analyzsis of the written questions asked by the teachers to fifth-grade students revealed that more than half of their questions were at Level 1 (for sample questions, see Table 3). Level 1 was followed by Level 2. While few questions were found at Level 3 (2.3%), surprisingly, no questions were encountered at other higher levels. Table 3 Sample questions of Level 1

Figure 4 Distribution of fifth-grade exam questions with regard to PISA proficiency levels
As can be seen in the sample questions presented in Table 3, the questions that can be solved in a single step without making any inference were the most frequently encountered questions by 5th grade students in their exams. An example of the third-level questions, which is at the highest level and is rarely encountered, is presented in Figure 5.

Results from the Analysis of Sixth-Grade Questions
A total of 315 questions used by the teachers for sixth-grade exams were analyzed. Figure 6 shows the distribution of these questions in terms of PISA proficiency levels. Place the numbers 4, 5, 6, 7, 8 and 9 in the circles indicated in the adjacent triangle such that the sum of the numbers on each side of the triangle is equal.

Figure 6 Distribution of sixth-grade exam questions with regard to PISA proficiency levels
Analysis of the questions showed that the percentage of the questions (49.5%) decreased in the sixth grade compared to the fifth graders, and there was an increase in the second (40.9%) and third levelthird-level (9.7%) questions. Similarly, no questions in Level 4, 5, and 6 were encountered. Table 4 contains sample questions from Level 2 and Level 3, which increased in the sixth grade compared to the fifth grade. Lale who went to the grocery store bought kg of strawberries, kg of bananas, and kg of apples from the kilogram prices given in the table above and gave 50 liras. How much is the change that Lale will receive?

(Coded as Level 2)
If the four-digit 83ab number is fully divisible by both 6 and 10, which of the following are the numbers that can be written instead of a? In order for the student to solve the second-level question in Table 4, it is needed to convert fractions and then to employ a basic algorithm by relating the kilograms of the given fruits to their prices. On the other hand, the question in Level 3 requires the student to know the rules of division with both 6 and 10. In case of a problem, he/she should be able to choose which information to use first, and accordingly, make consecutive decisions, and perform the necessary actions. In this question, the student interprets representations based on different sources of information and can reason directly from these sources.

Results from the Analysis of Seventh-Grade Questions
A total of 304 seventh-grade questions were analyzed. The results obtained from the analysis of these questions according to PISA proficiency levels are presented in Figure 7.

Figure 7 Distribution of seventh-grade exam questions with regard to PISA proficiency levels
The analysis showed that the percentage of the first-level questions (43%) continued to decrease in the seventh grade compared to the fifth and sixth grades, and the increase in Level 2 passed to half of the total questions (52.4%). Interestingly, while there was a decrease in Level 3 questions (4.6%), no questions were found to be of higher levels than this grade level. Sample questions from the first and second levels in terms of constituting examples are presented in Table 5. Table 5 Sample questions from grade 7 In the figure above, quadrilateral ABCD and quadrilateral EFGH are equilateral. cm, cm, cm are given. What is the length of |HG|?
(Coded as Level 1) If the area of the shape consisting of equivalent squares is 16 cm 2 , what is the perimeter of the shape? (Coded as Level 2) As can be seen above, both of the questions are categorized as routine mathematical situations. While the answer to the first question is quite clear, the second question requires a few steps to be solved.

Results from the Analysis of Eighth-Grade Questions
Finally, 379 questions posed for eight fraders were analyzed. The distribution chart according to the proficiency levels of the findings obtained from the analysis of these questions is presented in Figure 8.

Figure 8 Distribution of eighth-grade exam questions with regard to PISA proficiency levels
The analysis results of the questions of the eighth graders showed that Level 2 questions were predominant (53%) just like the seventh graders. Level 4 questions (3%) were encountered for the first time in the eighth grade. While similar to the other grade levels, Level 5 and Level 6 questions were not encountered. Sample questions from the first and second levels in terms of constituting examples are presented in Table 5. Examples of exam questions in Level 3 and Level 4 are presented in Table 6. Table 6 Sample questions from grade 8 What is the probability that Emre, who randomly enters and wanders through door A of a game maze like in the figure, will exit from door E or C? When the question about Level 3 is examined, it can be observed that the question requires a strategy rather than a problem situation that the student can easily reach in a few steps. Level 4, on the other hand, includes a complex problem situation and shows a situation where students can reach a result by using multiple features of different types of triangles. Additionally, students who have reached the Level 4 can work effectively with explicit models as presented in Table 4  to complex concrete situations that may have limitations and that may require the determination of assumptions.

Discussion and Conclusion
This paper investigated the written exam questions used by mathematics teachers in terms of proficiency levels released by PISA framework. By examining the analysis in terms of question types, it was found that teachers mostly preferred open-ended and multiple-choice questions.
Open-ended questions provide opportunities for students to reveal their mathematical thinking skills and creative thoughts while supporting their critical thinking and problem-solving process (Aziza, 2018;Kwon et al., 2006). Birgili (2014), who examined students' cognitive strategies and self-control skills in the context of PISA and TIMMS questions, also found that open-ended questions were more effective on both factors than other question types. In terms of grade levels, it was observed that the percentage of multiple-choice questions was higher in some classes.
Multiple-choice questions are seen as a disadvantage as they do not provide an opportunity to analyze students' understandings in depth by their nature. However, it is also a fact that measuring instruments with multiple-choice exam affect the preferences of teachers (Çabakçor et al., 2014;Ulusoy & İncikabi, 2020).
Considering the characteristics of the question types, one can claim that open-ended questions can be directed to the students, where students can apply their mathematical knowledge, analyze them, and synthesize different types of information. However, the results obtained from this study revealed serious evidence that this potential is not used sufficiently. As a matter of fact, it was observed that none of the questions asked by teachers until the eighth grade were above Level 3. In other words, there were no questions in the last three levels to measure high-level cognitive skills; this shows similarities with some research results in the literature. For example, Bekdemir and Baş (2017), who examined 26 written papers of eighth-grade teachers, revealed that 72% of the questions used by teachers were directly aimed at measuring the operational knowledge. Considering the proficiency levels in PISA conceptual framework, the majority of Level 1 and/or Level 2 questions at all grade levels indicates that the questions measuring operational skills are also used by different samples with the same characteristics.
Turkey is a country that selects high school and undergraduate education students through national examinations, and the results of these examinations are often more important to settle in an educational institution. The evaluation of the effectiveness of teachers' teaching according to the results of central exams is a situation that is considered not only in Turkey but also in different countries (Wilson & Kenney 2003). It is frequently stated that teachers' teaching practices are affected by central exams (Danişman et al., 2020;Özer-Özkan & Acar-Güvendir, 2018). On the other hand, the fact that rather low-level skills are measured in national central exams, especially in the selection of students for high schools (Öztürk & Masal, 2020), may be a reason why teachers frequently prefer low-level questions in their classes.
The PISA exam, in which the theoretical framework of the present study was adopted, is being held since 2003 in a period of three years and was lastly carried out in 2018. This application is based on the preliminary report issued as a result of an increase in the points of Turkey in the last fifteen years and the highest PISA mathematics performance in 2018. Despite this, it was determined that although the difference between scores with OECD countries decreased, it remained below the OECD average in terms of mathematics performance (MoNE, 2019). Recent PISA results show that students' averages in terms of mathematics achievement are stuck between Level 2 and Level 3. From this point of view, it can be said that students also showed expected performances in the PISA exam in terms of the types of questions they encountered. In other words, if the hypothesis that teachers' questions are effective on students' studying habits (Robbins et al., 2002) and accordingly their mathematics performance is true, the results from the PISA exam should not be seen as a mere achievement of students because the students reflected the extent they received.
Although the present study reveals important results in its context, some of its limitations need to be taken into account. The first of these limitations is that the research was carried out with teachers in a region of Turkey. For this reason, it is insufficient to reflect the situation throughout the country. Also, the current study is limited to the analysis of the questions collected in the 2015-2016 academic year. Hence, it is recommended to conduct different studies to illustrate whether changes exist.