IG

CAI and Empirical Explorations
of Deductive Proof Construction*

Marvin J. Croy
Department of Philosophy
The University of North Carolina
at Charlotte

Published in
The Computers and Philosophy Newsletter
vol. 4, 1989, pp. 111-127.
 

*This work was supported in part by a Curriculum and Instructional Development Grant funded by the Foundation of The University of North Carolina at Charlotte and by the State of North Carolina.  I am also grateful to John Amidon for technical support of this project.  An earlier version of this material was presented at the Second  National Conference on Philosophy and Computers held at Michigan State University, June, 1987.
 


Deductive proof checking programs are the most popular form of logic CAI.  Whatever the reason for their widespread use, the proliferation and continuous development of these programs is evident.  Contemporary proof checkers cover a wider variety of texts and rule sets, and offer more helpful editing, diagnostic, and remedial features than were once provided.  These programs appear to be prime candidates for developing in the direction of "intelligent" CAI (ICAI).  The primary thrust of ICAI is to build programs that make use of information about learner strengths and weaknesses, the content of the subject matter being taught, and techniques for teaching various kinds of subject matter.  This is a tall order by any standard, but there are signs that some initial progress is being made in the area of logic CAI.  In particular, the expert system approach for offering strategic advice during proof construction is being explored by some projects.

Since most programs for logic CAI provide highly interactive practice or problem solving sessions, the opportunities for collecting performance data, constructing student models, and offering expert advice are plentiful.  This article will summarize some data collected on deductive rule applications by one such program, a propositional proof checker (DEEP THOUGHT), and will draw some conclusions about the nature and direction of the development of logic CAI programs.
DATA SKETCH
 

The data presented here was collected on students learning propositional proof construction in two introductory, deductive logic courses.  All proofs were constructed using Inference/Replacement style rules for propositional logic.  As students worked their proof problems, records were kept of various aspects of their performance.  In particular, rule misapplications were monitored closely.  Every application that either succeeded or that failed due to faulty pattern-matching was recorded for analysis.  Other types of errors were not recorded.  For example, syntactical errors, either regarding logical expressions or command format, were ignored.   This procedure produced a total data set of 4,43l applications:  2,085 from a class of 38 undergraduate students observed during the Spring, l987, semester and 2,346 from a class of 20 undergraduates in the Fall of l987.

 

THE PROBLEM SETS
 

Spring, l987:  Twenty proof problems were selected from Copi's seventh edition of Introduction to Logic.  These problems were divided into four sets of five problems each.  Some of these problems were presented in Copi's text as justification exercises but students during this semester worked them only as full proof problems.    The first two problems in each set require only the
use of inference rules.  Students were divided into four groups, each of which worked the five problems shown as set A, B, C or D in Figure 1.

Fall, l987:  The problem sets remained the same during this semester, but three types of justification exercises preceded full proof construction.  The justification exercises presented one application inference at a time and required students to (1) name the rule being applied, (2) complete the inference by supplying the concluding expression, or (3) accept or reject a given inference on the basis of its accuracy.  (This third task required that some legitimate inferences be deformed in certain respects prior to presentation.)  Counting full proof problems, this produced four types of problems, and students were again divided into four groups.  Each group faced all twenty problems, but the problem set assigned to each problem type was rotated among the groups.  So, one student may have seen the problems in set A as full proof problems while another student saw them as some form of justification exercise.  Justification exercises produced l,62l applications while 725 applications were made during full proof construction.  Justification exercises and proof problems requiring only inference rule applications preceded exercises and proofs requiring the use of both inference and replacement rules.  Figure 2 summarizes the application success rates for each problem set for full proof construction during the Spring and Fall semesters.  These and other results
described below are a direct function of the problems worked, and it should be remembered that 'Correct' and 'Incorrect' refer to particular applications and not to problem completion.

 

RESULTS BY RULE TYPE
 

The most interesting results pertain to differences among rules and classes of rules.  Figure 3 shows the average success rate over all tasks for each of the rules of transition.  This information is presented graphically in Figure 4 where rules are grouped by type (inference versus replacement).  The average success rate for inference rules is 85.3% as opposed to 80.9% for replacement rules.  Figure 5 presents a different classification with "negative" rules separated from "positive" ones.  A negative rule is defined as any rule that contains at least one negation sign in its rule form.  The average success rate for the class of positive (non-negative) rules is 88.2% as opposed to 77.7% for negative rules.  It is apparent that there is a greater difference between positive and negative rules than there is between inference and replacement rules, and these differences will be explored below.

Collecting performance data provides opportunities for using statistical approaches for answering various questions.  This project has only begun to make progress in this direction and much more sophistication is required.  Initially, one approach to determining the differences between various classes of rules is
to focus on individual student performance.  For each student, a percentage can be calculated for that student's overall application success rate.  In addition, the degree to which that student applied rules of a certain type can also be calculated.  For example, a given student may have an overall success rate of 75%, and 60% of that student's applications may have involved negative rules.  When the correlation between success rate and degree of negative rule use is calculated for the 38 Spring students the result is -.40, which is significant at the .05 level.  (This confirms a previous finding of -.58, P < .0l, for 26 Fall, l986, students using a different problem set (Croy, l988).)   For the Fall class, the correlation was -.4l, but (due to the lower number of students observed) this coefficient was not quite significant at the .05 level (P = .07).  What these correlations demonstrate is that, for these students, success rates decreased as the use of negative rules increased.  Furthermore, the attempt to correlate the extent of using other connective defined classes of rules with success rate does not produce significant results.  So, the clearest pattern that emerges from these descriptive analyses is that, for whatever reasons, these students had greater difficulty with negative rules.

 

PSEUDO-RULES
 

A number of frequently repeated misapplications can be identified for particular rules.  These misapplications account for a relative large segment of the errors made (43% of the total), and accordingly they are termed "pseudo-rules."  Figure 6, for example, shows common misapplication patterns for several rules.  (For easier reference, each pseudo rule is assigned a number.  These numbers are arbitrary for present purposes but actually correspond to error numbers referenced by the program.)  The pseudo rules vary greatly in generality and hence usefulness.  The two pseudo-rules given for MP, for example, are very general.  Pattern #53 merely shows that over 60% of the errors for MP resulted from a failure to match the second premise with the conditional's antecedent.  There may well be a wide variety of ways in which this occurs, and the types of expressions that are particularly troublesome here (perhaps '~(A & B)' as the antecedent versus '~A & B' in the second premise?) are as yet unexplored.  The patterns listed for MT are a bit more detailed, and, as with other negative rules, give some insight into the ways in which negation signs are troublesome.  (It should be remembered, however, that these pseudo-rules can be applied to instances that can vary greatly in complexity.)  Pattern #63 involves the dropping of a required negation sign, while #64 turns on an improperly negated consequent, and #62 may be due to a semantically intuitive but syntactically misleading concept of
negation.  These three pseudo-rules account for approximately 30% of the MT pattern match failures.

These factors also show up in other negative rule error patterns.  Students sometimes merely drop a needed negation sign, or use two negations where only one is required, or eliminate two negations as if applying DN simultaneously with another rule.  One error that runs through many of the negative rule applications occurs when a negation sign in a rule form must operate on a substituted subexpression whose main connective is also a negation.  For DS, pattern #70 fails for the same reason as does MT, #64, and the problem seen in MT, #62, recurs in DS, #72.  These factors also figure into pseudo-rule definitions for Transposition, Implication, and DeMorgans.  What's more, negation-related difficulties can even figure into error patterns for positive rules.  The SMP pseudo-rules #94 and #92 provide examples of this.  The search for pseudo-rule patterns is turning up other often unexpected ways in which students go astray in the pattern matching process.  Pattern #87 for ADD reveals an appropriate pattern applied in the wrong direction.  Sometimes, crucial connectives are confused, as in SMP, #97, where simplification is attempted on a conditional or in ADD, #86, where addition erroneously forms a conjunction.

 

ADDITIONAL QUESTIONS AND OBJECTIVES
 

For the most part what these descriptions do is to suggest further inquiries.  The questions raised in many cases are empirical ones that will require research designs for assessing cause-effect relationships.  For example, Figure 7 shows that the average rule application success rate for proof construction during the Fall is higher than that observed for the same problems during the Spring.  The obvious difference is that the Fall students worked proof problems subsequent to completing justification exercises.  It would be helpful to know whether the intuitive confidence in these exercises seen in textbooks is actually borne out, and if so, to what degree justification exercises contribute to the mastery of proof construction.  This issue may well depend on the degree to which rule application (pattern matching) abilities are important in full proof construction, since presumably these are the abilities honed by justification exercises.  One important source of information in this respect may be measures of response latencies.  The time required to complete a proof problem, as compared to the time required to merely supply the justifications for the given steps of that proof, may be helpful in weighing the roles of rule application and strategy-related factors.  Time measures should also be important in providing an index of problem difficulty, and the application success rates reported herein may also
contribute.  A measure of problem difficulty can support both teaching and research efforts.

The question of whether strategic thinking can, in certain circumstances, impair rule application is also raised here.  It is interesting to note that success rates for the Fall proof construction are lower than those for the preceding justification exercises, despite the fact that (except for variations introduced in the accept/reject task and the leeway provided by multiple solutions to proof problems) the same inferences are involved.  Moreover, our logic programs allow students to individually review their misapplications, occasionally in the presence of their instructor.  During these joint sessions, students sometimes offer explanations of what their plans or intentions were.  In the face of some misapplication, for example, some students say something to the effect that:  "I thought that it probably wouldn't work, but I could see that I needed that expression badly, so I decided to try it anyhow."  If this report is reliable, it may suggest a mechanism by which strategic thinking and rule application skills could interact with negative impact.  This interaction may well be bidirectional.  At first glance it may seem that the processes involved in deciding what rule(s) to apply are completely independent of those involved in applying an already selected rule or series of rules.  Strategies, however, may be formulated at several different levels, and at one level they consist of no
more than well-aimed sequences of rule applications.  Hence, rule application difficulties may surreptiously contribute to what normally appear as defects in strategic thinking: the inability to come up either with a plausible plan or with a useful step at a particular juncture.  At any rate, the investigation of the processes involved in applying rules of transition and formulating useful strategies should be open to possible interactions between the two.

Another question concerns the source(s) of the difficulties plaguing negative rules.  At least some of the trouble may be simply typographical and/or a result of "noisy" interference (fatigue, distraction, etc.) that is unrelated to basic conceptual problems.  The lower success rate for negative rules does, however, occur in this data even during the rule naming and accept/reject justification tasks which do not require the typing of logical expressions.  More interesting perhaps are the conceptual difficulties hinted at that point in the direction of cognitive functions involved in proof construction.  The results presented here suggest that the difficulty with negative rules occurs in a variety of tasks.  In Figure 8, the average success rates for four overlapping classes of rules are compared.  Except for the rule naming justification task, where the differences observed are small, the same relative standing of these rule types is maintained.

 

THE VALUE OF DATA COLLECTION
 

Obviously, these efforts barely constitute a beginning, but it is a mainstay of this project that they are small steps in a direction that is both promising yet too little explored.  The storage capacity of computers and the interactive nature of CAI programs make these systems ideal for data collection.  The opportunities for collecting and analyzing data, however, are not widely taken advantage of by CAI programs.  Many CAI programs record various aspects of student performance, but more can be done with these records than merely to display them for the user or to guide interactive practice.  The following discussion will endeavor to elaborate different types of data that can be recorded and the uses to which that data can be put.  The emphasis will be upon the ways in which data-collecting CAI programs can lead to self-improvement and to more effective instruction in general.

As mentioned earlier, the development of ICAI programs has become established in some areas during the last decade or so.  One objective of these programs is to build a model of the student's knowledge concerning a certain task and to use that model to guide problem selection and sequences of instruction.  These systems must be able to generate problems that are appropriate for the given strengths and weaknesses of a particular student and which occasionally provide feedback for testing hypotheses
about just what type of weaknesses the student has.  In addition, ICAI programs may contain an expert system based tutor which possesses problem solving capabilities of its own and which can use those capabilities along with the student model to provide effective advice, relevant error messages, and remediation sequences when needed.  Not surprisingly, systems such as these require much in the way of technical resources.  Such requirements are particularly taxing for microcomputer-based systems.  Our project is currently based on a Burroughs A9F mainframe, but we are in the process of moving back to a (Macintosh) microcomputer.  Any microcomputer will have the capacity for collecting some data, and the value of such data is significant.  Even when full scale models of a student's task-related knowledge is beyond reach, performance records can indicate student strengths and weaknesses in ways that are extremely helpful.

Periodic summaries of class performance, for example, can keep in-class activities focused on actual student needs.  Success rate profiles for different semesters always seem to reveal unusual weaknesses that are peculiar to particular classes.  These are "class" needs, however, and it can be helpful to review individual records when assessing the particular needs of a given student.  At UNC/Charlotte we do this in special student-teacher meetings.  During these sessions students sometimes reveal misunderstandings concerning the non-syntactical components of
the transition rules.  They mistakenly believe, for example, that the order of premises in certain inference rules is essential, that inference rules can be applied inside of expressions, that inference rules are bidirectional or replacement rules unidirectional, or that rules like DN can be applied twice in a single step.  In fact, we have found at least one instance where a student produced the correct result via a rule's misapplication.  When Tautology is applied to 'L v L' to produce '(L v L) v (L v L)', the result is syntactically correct, yet it may conceal a misunderstanding if the rule is being applied twice to each of the simple components of the original disjunction.  These findings and others reinforce the view that beliefs about student strengths and weaknesses are essentially hypotheses which are never completely confirmed but which can be best supported by careful observation.  Data collection via computers can provide a variety of opportunities for becoming increasingly sensitive to both individual and class needs.

One of the most useful findings in the data presented above is that concerning pseudo-rules. The identification of pseudo-rules has shown something about the complexity of applying transition rules.  It might be thought that a student whose misapplications of Implication often fit some particular pseudo-rule has simply mislearned the rule's definition.  This rarely appears to be the case, however.  More often, the student can state the rule correctly.  What the pseudo-rule reveals, then, is an error in
applying the rule form to particular expressions.  "Knowing a rule" means at least (1) knowing what the rule is and (2) knowing how to apply that rule to a variety of more or less complicated instances.  If these findings stand up in future observations they should contribute to the design of more effective problem sets.   Students should be forced to confront and overcome these temptations before moving on to other tasks.  Otherwise, their ability to apply a rule to the easier instances may too quickly be taken as a sign of competence with a given rule, and the unexposed difficulties may only show up later.  The Copi text is widely heralded on the basis of its examples and problem sets, yet for propositional proof construction those problems do not do enough to force students to repeatedly face the more demanding applications.  Nor do other texts fare better in this respect. Consequently, students may proceed from justification exercises on to full proof problems and even on to quantificational logic with their "bugs" still intact.  Pseudo-rules are serving to point to the more troublesome types of applications which in turn may shed light on the faulty cognitive processes that operate when students fail to match general rule patterns to particular expressions.  CAI programs can do much for supplying the observational basis needed for this effort.

Another important use of data collection has to do with the evaluation of CAI program design.  Much relevant information can be gained just by tracking the ways in which students make use of
the program.  One recent finding, for example, is that certain error messages in our program are not very helpful.  After entering an erroneous application and reading the resulting error message, some students re-entered the exact same misapplication.  A few even repeated the misapplication for a third straight time.  This should be taken as a sign that the corresponding error message is not very helpful.  In addition, other measures can indicate whether particular program features are as useful as expected.  If students are given the opportunity to review their errors, it may be worthwhile to determine whether they actually make use of that opportunity.  The same can be done for a variety of other facilities which intuitively seem to be helpful to the student.  I do not think that it is too strong a claim to say that every CAI program should collect data on the usefulness of the capabilities it provides for its users.  Disputes over what facilities certain types of programs should provide could then be more reasonably settled.  Claims about what constitutes helpful or unhelpful features of a program, such as those often seen in software reviews, are practically baseless unless founded upon such systematic observation.  In particular, CAI programs offered for distribution should provide summaries of studies detailing the characteristics of student populations that have used the program, the types of errors most frequently made by those students, and how the program's problem sets, error messages, or remediation techniques attend to those errors.    Since CAI programs themselves can supply that information, it makes sense
to use CAI not only as a vehicle for providing instruction but also as a means for discovering how to improve that instruction.

Those interested in pursuing the possibilities of ICAI further may find a number of reports helpful.  Sleeman and Brown (l982) provide a wide ranging overview of ICAI projects, and Ford (l987) outlines the basic components of an ICAI program.  Guidelines for constructing an intelligent tutoring system are offered by Clancy (l984).  Different approaches to student modeling are portrayed by Ross, Jones, and Millington (l987).  Burton and Brown (l978) and Sleeman (l982), among others, have developed techniques for identifying student misconceptions in the realm of mathematics.  Suppes (l979), however, offers some suggestions concerning the weaknesses of ICAI research.

Much of what has been learned elsewhere about building intelligent programs for CAI may be useful to those developing programs for teaching logic.  Nevertheless, the field of logic CAI constitutes fertile ground for extending those efforts, and developers who have already produced useful programs are in an ideal position for making their own unique contributions.  Opportunities for discovering more about student needs and the process of learning logic, from truth tables to translation, should be taken advantage of wherever they arise.
References

Brown, J. & Burton, J. (l978). "Diagnostic Models for Procedural Bugs in Basic Mathematical Skills," Cognitive Science, 2, l55-l92.

Clancy, W. (l984). "Methodology for Building an Intelligent Tutoring System," in W. Kintsch (ed.), Methods and Tactics in Cognitive Science (New York: Lawrence Erlbaum).

Croy, M. (l988). "Computer Assisted Instruction and Rule Applications for Deductive Proof Construction," Collegiate Microcomputer, 6, 5l-56.

Ford, L. (l987). "Anatomy of an ICAI System," in R. Lewis and E. Tagg (eds.), Trends in Computer-Assisted Education, 22-3l.

Lewis, R. and E. Tagg (eds.) (l987).  Trends in Computer-Assisted Education. Boston: Blackwell Scientific Publications.

Ross, P., J. Jones, & M. Millington (l987). "User Modelling in Intelligent Teaching and Tutoring," in R. Lewis & E. Tagg (eds.), Trends in Computer-Assisted Education.

Sleeman, D. (l982). "Assessing Aspects of Competence in Basic Algebra," in D. Sleeman, and J. Brown (eds.), Intelligent Tutoring Systems, l85-l99.

Sleeman, D. and J. Brown (eds.) (l982). Intelligent Tutoring Systems. New York: Academic Press.

Suppes, P. (l979). "Current Trends in Computer-Assisted Instruction," in M. Yovits (ed.), Advances in Computers, l8 (New York: Academic Press), pp. l73-229.