*This work was supported in part by a Curriculum and Instructional
Development Grant funded by the Foundation of The University of North
Carolina at Charlotte and by the State of North Carolina. I am
also grateful to John Amidon for technical support of this
project. An earlier version of this material was presented at
the Second National Conference on Philosophy and Computers held
at Michigan State University, June, 1987.
Deductive proof checking programs are the most popular form of logic CAI. Whatever the reason for their widespread use, the proliferation and continuous development of these programs is evident. Contemporary proof checkers cover a wider variety of texts and rule sets, and offer more helpful editing, diagnostic, and remedial features than were once provided. These programs appear to be prime candidates for developing in the direction of "intelligent" CAI (ICAI). The primary thrust of ICAI is to build programs that make use of information about learner strengths and weaknesses, the content of the subject matter being taught, and techniques for teaching various kinds of subject matter. This is a tall order by any standard, but there are signs that some initial progress is being made in the area of logic CAI. In particular, the expert system approach for offering strategic advice during proof construction is being explored by some projects.
Since most programs for logic CAI provide highly interactive
practice or problem solving sessions, the opportunities for
collecting performance data, constructing student models, and
offering expert advice are plentiful. This article will
summarize some data collected on deductive rule applications by one
such program, a propositional proof checker (DEEP THOUGHT), and will
draw some conclusions about the nature and direction of the
development of logic CAI programs.
DATA SKETCH
The data presented here was collected on students learning propositional proof construction in two introductory, deductive logic courses. All proofs were constructed using Inference/Replacement style rules for propositional logic. As students worked their proof problems, records were kept of various aspects of their performance. In particular, rule misapplications were monitored closely. Every application that either succeeded or that failed due to faulty pattern-matching was recorded for analysis. Other types of errors were not recorded. For example, syntactical errors, either regarding logical expressions or command format, were ignored. This procedure produced a total data set of 4,43l applications: 2,085 from a class of 38 undergraduate students observed during the Spring, l987, semester and 2,346 from a class of 20 undergraduates in the Fall of l987.
THE PROBLEM SETS
Spring, l987: Twenty proof problems were selected from
Copi's seventh edition of Introduction to Logic. These problems
were divided into four sets of five problems each. Some of
these problems were presented in Copi's text as justification
exercises but students during this semester worked them only as full
proof problems. The first two problems in each set
require only the
use of inference rules. Students were divided into four groups,
each of which worked the five problems shown as set A, B, C or D in
Figure 1.
Fall, l987: The problem sets remained the same during this
semester, but three types of justification exercises preceded full
proof construction. The justification exercises presented one
application inference at a time and required students to (1) name the
rule being applied, (2) complete the inference by supplying the
concluding expression, or (3) accept or reject a given inference on
the basis of its accuracy. (This third task required that some
legitimate inferences be deformed in certain respects prior to
presentation.) Counting full proof problems, this produced four
types of problems, and students were again divided into four
groups. Each group faced all twenty problems, but the problem
set assigned to each problem type was rotated among the groups.
So, one student may have seen the problems in set A as full proof
problems while another student saw them as some form of justification
exercise. Justification exercises produced l,62l applications
while 725 applications were made during full proof
construction. Justification exercises and proof problems
requiring only inference rule applications preceded exercises and
proofs requiring the use of both inference and replacement
rules. Figure 2 summarizes the
application success rates for each problem set for full proof
construction during the Spring and Fall semesters. These and
other results
described below are a direct function of the problems worked, and it
should be remembered that 'Correct' and 'Incorrect' refer to
particular applications and not to problem completion.
RESULTS BY RULE TYPE
The most interesting results pertain to differences among rules and classes of rules. Figure 3 shows the average success rate over all tasks for each of the rules of transition. This information is presented graphically in Figure 4 where rules are grouped by type (inference versus replacement). The average success rate for inference rules is 85.3% as opposed to 80.9% for replacement rules. Figure 5 presents a different classification with "negative" rules separated from "positive" ones. A negative rule is defined as any rule that contains at least one negation sign in its rule form. The average success rate for the class of positive (non-negative) rules is 88.2% as opposed to 77.7% for negative rules. It is apparent that there is a greater difference between positive and negative rules than there is between inference and replacement rules, and these differences will be explored below.
Collecting performance data provides opportunities for using
statistical approaches for answering various questions. This
project has only begun to make progress in this direction and much
more sophistication is required. Initially, one approach to
determining the differences between various classes of rules is
to focus on individual student performance. For each student, a
percentage can be calculated for that student's overall application
success rate. In addition, the degree to which that student
applied rules of a certain type can also be calculated. For
example, a given student may have an overall success rate of 75%, and
60% of that student's applications may have involved negative
rules. When the correlation between success rate and degree of
negative rule use is calculated for the 38 Spring students the result
is -.40, which is significant at the .05 level. (This confirms
a previous finding of -.58, P < .0l, for 26 Fall, l986, students
using a different problem set (Croy, l988).) For the Fall
class, the correlation was -.4l, but (due to the lower number of
students observed) this coefficient was not quite significant at the
.05 level (P = .07). What these correlations demonstrate is
that, for these students, success rates decreased as the use of
negative rules increased. Furthermore, the attempt to correlate
the extent of using other connective defined classes of rules with
success rate does not produce significant results. So, the
clearest pattern that emerges from these descriptive analyses is
that, for whatever reasons, these students had greater difficulty
with negative rules.
PSEUDO-RULES
A number of frequently repeated misapplications can be identified
for particular rules. These misapplications account for a
relative large segment of the errors made (43% of the total), and
accordingly they are termed
"pseudo-rules." Figure 6, for
example, shows common misapplication patterns for several
rules. (For easier reference, each pseudo rule is assigned a
number. These numbers are arbitrary for present purposes but
actually correspond to error numbers referenced by the
program.) The pseudo rules vary greatly in generality and hence
usefulness. The two pseudo-rules given for MP, for example, are
very general. Pattern #53 merely shows that over 60% of the
errors for MP resulted from a failure to match the second premise
with the conditional's antecedent. There may well be a wide
variety of ways in which this occurs, and the types of expressions
that are particularly troublesome here (perhaps '~(A & B)' as the
antecedent versus '~A & B' in the second premise?) are as yet
unexplored. The patterns listed for MT are a bit more detailed,
and, as with other negative rules, give some insight into the ways in
which negation signs are troublesome. (It should be remembered,
however, that these pseudo-rules can be applied to instances that can
vary greatly in complexity.) Pattern #63 involves the dropping
of a required negation sign, while #64 turns on an improperly negated
consequent, and #62 may be due to a semantically intuitive but
syntactically misleading concept of
negation. These three pseudo-rules account for approximately
30% of the MT pattern match failures.
These factors also show up in other negative rule error patterns. Students sometimes merely drop a needed negation sign, or use two negations where only one is required, or eliminate two negations as if applying DN simultaneously with another rule. One error that runs through many of the negative rule applications occurs when a negation sign in a rule form must operate on a substituted subexpression whose main connective is also a negation. For DS, pattern #70 fails for the same reason as does MT, #64, and the problem seen in MT, #62, recurs in DS, #72. These factors also figure into pseudo-rule definitions for Transposition, Implication, and DeMorgans. What's more, negation-related difficulties can even figure into error patterns for positive rules. The SMP pseudo-rules #94 and #92 provide examples of this. The search for pseudo-rule patterns is turning up other often unexpected ways in which students go astray in the pattern matching process. Pattern #87 for ADD reveals an appropriate pattern applied in the wrong direction. Sometimes, crucial connectives are confused, as in SMP, #97, where simplification is attempted on a conditional or in ADD, #86, where addition erroneously forms a conjunction.
ADDITIONAL QUESTIONS AND OBJECTIVES
For the most part what these descriptions do is to suggest further
inquiries. The questions raised in many cases are empirical
ones that will require research designs for assessing cause-effect
relationships. For example, Figure 7
shows that the average rule application success rate for proof
construction during the Fall is higher than that observed for the
same problems during the Spring. The obvious difference is that
the Fall students worked proof problems subsequent to completing
justification exercises. It would be helpful to know whether
the intuitive confidence in these exercises seen in textbooks is
actually borne out, and if so, to what degree justification exercises
contribute to the mastery of proof construction. This issue may
well depend on the degree to which rule application (pattern
matching) abilities are important in full proof construction, since
presumably these are the abilities honed by justification
exercises. One important source of information in this respect
may be measures of response latencies. The time required to
complete a proof problem, as compared to the time required to merely
supply the justifications for the given steps of that proof, may be
helpful in weighing the roles of rule application and
strategy-related factors. Time measures should also be
important in providing an index of problem difficulty, and the
application success rates reported herein may also
contribute. A measure of problem difficulty can support both
teaching and research efforts.
The question of whether strategic thinking can, in certain
circumstances, impair rule application is also raised here. It
is interesting to note that success rates for the Fall proof
construction are lower than those for the preceding justification
exercises, despite the fact that (except for variations introduced in
the accept/reject task and the leeway provided by multiple solutions
to proof problems) the same inferences are involved. Moreover,
our logic programs allow students to individually review their
misapplications, occasionally in the presence of their
instructor. During these joint sessions, students sometimes
offer explanations of what their plans or intentions were. In
the face of some misapplication, for example, some students say
something to the effect that: "I thought that it probably
wouldn't work, but I could see that I needed that expression badly,
so I decided to try it anyhow." If this report is reliable, it
may suggest a mechanism by which strategic thinking and rule
application skills could interact with negative impact. This
interaction may well be bidirectional. At first glance it may
seem that the processes involved in deciding what rule(s) to apply
are completely independent of those involved in applying an already
selected rule or series of rules. Strategies, however, may be
formulated at several different levels, and at one level they consist
of no
more than well-aimed sequences of rule applications. Hence,
rule application difficulties may surreptiously contribute to what
normally appear as defects in strategic thinking: the inability to
come up either with a plausible plan or with a useful step at a
particular juncture. At any rate, the investigation of the
processes involved in applying rules of transition and formulating
useful strategies should be open to possible interactions between the
two.
Another question concerns the source(s) of the difficulties plaguing negative rules. At least some of the trouble may be simply typographical and/or a result of "noisy" interference (fatigue, distraction, etc.) that is unrelated to basic conceptual problems. The lower success rate for negative rules does, however, occur in this data even during the rule naming and accept/reject justification tasks which do not require the typing of logical expressions. More interesting perhaps are the conceptual difficulties hinted at that point in the direction of cognitive functions involved in proof construction. The results presented here suggest that the difficulty with negative rules occurs in a variety of tasks. In Figure 8, the average success rates for four overlapping classes of rules are compared. Except for the rule naming justification task, where the differences observed are small, the same relative standing of these rule types is maintained.
THE VALUE OF DATA COLLECTION
Obviously, these efforts barely constitute a beginning, but it is a mainstay of this project that they are small steps in a direction that is both promising yet too little explored. The storage capacity of computers and the interactive nature of CAI programs make these systems ideal for data collection. The opportunities for collecting and analyzing data, however, are not widely taken advantage of by CAI programs. Many CAI programs record various aspects of student performance, but more can be done with these records than merely to display them for the user or to guide interactive practice. The following discussion will endeavor to elaborate different types of data that can be recorded and the uses to which that data can be put. The emphasis will be upon the ways in which data-collecting CAI programs can lead to self-improvement and to more effective instruction in general.
As mentioned earlier, the development of ICAI programs has become
established in some areas during the last decade or so. One
objective of these programs is to build a model of the student's
knowledge concerning a certain task and to use that model to guide
problem selection and sequences of instruction. These systems
must be able to generate problems that are appropriate for the given
strengths and weaknesses of a particular student and which
occasionally provide feedback for testing hypotheses
about just what type of weaknesses the student has. In
addition, ICAI programs may contain an expert system based tutor
which possesses problem solving capabilities of its own and which can
use those capabilities along with the student model to provide
effective advice, relevant error messages, and remediation sequences
when needed. Not surprisingly, systems such as these require
much in the way of technical resources. Such requirements are
particularly taxing for microcomputer-based systems. Our
project is currently based on a Burroughs A9F mainframe, but we are
in the process of moving back to a (Macintosh) microcomputer.
Any microcomputer will have the capacity for collecting some data,
and the value of such data is significant. Even when full scale
models of a student's task-related knowledge is beyond reach,
performance records can indicate student strengths and weaknesses in
ways that are extremely helpful.
Periodic summaries of class performance, for example, can keep
in-class activities focused on actual student needs. Success
rate profiles for different semesters always seem to reveal unusual
weaknesses that are peculiar to particular classes. These are
"class" needs, however, and it can be helpful to review individual
records when assessing the particular needs of a given student.
At UNC/Charlotte we do this in special student-teacher
meetings. During these sessions students sometimes reveal
misunderstandings concerning the non-syntactical components of
the transition rules. They mistakenly believe, for example,
that the order of premises in certain inference rules is essential,
that inference rules can be applied inside of expressions, that
inference rules are bidirectional or replacement rules
unidirectional, or that rules like DN can be applied twice in a
single step. In fact, we have found at least one instance where
a student produced the correct result via a rule's
misapplication. When Tautology is applied to 'L v L' to produce
'(L v L) v (L v L)', the result is syntactically correct, yet it may
conceal a misunderstanding if the rule is being applied twice to each
of the simple components of the original disjunction. These
findings and others reinforce the view that beliefs about student
strengths and weaknesses are essentially hypotheses which are never
completely confirmed but which can be best supported by careful
observation. Data collection via computers can provide a
variety of opportunities for becoming increasingly sensitive to both
individual and class needs.
One of the most useful findings in the data presented above is
that concerning pseudo-rules. The identification of pseudo-rules has
shown something about the complexity of applying transition
rules. It might be thought that a student whose misapplications
of Implication often fit some particular pseudo-rule has simply
mislearned the rule's definition. This rarely appears to be the
case, however. More often, the student can state the rule
correctly. What the pseudo-rule reveals, then, is an error
in
applying the rule form to particular expressions. "Knowing a
rule" means at least (1) knowing what the rule is and (2) knowing how
to apply that rule to a variety of more or less complicated
instances. If these findings stand up in future observations
they should contribute to the design of more effective problem
sets. Students should be forced to confront and overcome
these temptations before moving on to other tasks. Otherwise,
their ability to apply a rule to the easier instances may too quickly
be taken as a sign of competence with a given rule, and the unexposed
difficulties may only show up later. The Copi text is widely
heralded on the basis of its examples and problem sets, yet for
propositional proof construction those problems do not do enough to
force students to repeatedly face the more demanding
applications. Nor do other texts fare better in this respect.
Consequently, students may proceed from justification exercises on to
full proof problems and even on to quantificational logic with their
"bugs" still intact. Pseudo-rules are serving to point to the
more troublesome types of applications which in turn may shed light
on the faulty cognitive processes that operate when students fail to
match general rule patterns to particular expressions. CAI
programs can do much for supplying the observational basis needed for
this effort.
Another important use of data collection has to do with the
evaluation of CAI program design. Much relevant information can
be gained just by tracking the ways in which students make use of
the program. One recent finding, for example, is that certain
error messages in our program are not very helpful. After
entering an erroneous application and reading the resulting error
message, some students re-entered the exact same
misapplication. A few even repeated the misapplication for a
third straight time. This should be taken as a sign that the
corresponding error message is not very helpful. In addition,
other measures can indicate whether particular program features are
as useful as expected. If students are given the opportunity to
review their errors, it may be worthwhile to determine whether they
actually make use of that opportunity. The same can be done for
a variety of other facilities which intuitively seem to be helpful to
the student. I do not think that it is too strong a claim to
say that every CAI program should collect data on the usefulness of
the capabilities it provides for its users. Disputes over what
facilities certain types of programs should provide could then be
more reasonably settled. Claims about what constitutes helpful
or unhelpful features of a program, such as those often seen in
software reviews, are practically baseless unless founded upon such
systematic observation. In particular, CAI programs offered for
distribution should provide summaries of studies detailing the
characteristics of student populations that have used the program,
the types of errors most frequently made by those students, and how
the program's problem sets, error messages, or remediation techniques
attend to those errors. Since CAI programs
themselves can supply that information, it makes sense
to use CAI not only as a vehicle for providing instruction but also
as a means for discovering how to improve that instruction.
Those interested in pursuing the possibilities of ICAI further may find a number of reports helpful. Sleeman and Brown (l982) provide a wide ranging overview of ICAI projects, and Ford (l987) outlines the basic components of an ICAI program. Guidelines for constructing an intelligent tutoring system are offered by Clancy (l984). Different approaches to student modeling are portrayed by Ross, Jones, and Millington (l987). Burton and Brown (l978) and Sleeman (l982), among others, have developed techniques for identifying student misconceptions in the realm of mathematics. Suppes (l979), however, offers some suggestions concerning the weaknesses of ICAI research.
Much of what has been learned elsewhere about building intelligent
programs for CAI may be useful to those developing programs for
teaching logic. Nevertheless, the field of logic CAI
constitutes fertile ground for extending those efforts, and
developers who have already produced useful programs are in an ideal
position for making their own unique contributions.
Opportunities for discovering more about student needs and the
process of learning logic, from truth tables to translation, should
be taken advantage of wherever they arise.
References
Brown, J. & Burton, J. (l978). "Diagnostic Models for Procedural Bugs in Basic Mathematical Skills," Cognitive Science, 2, l55-l92.
Clancy, W. (l984). "Methodology for Building an Intelligent Tutoring System," in W. Kintsch (ed.), Methods and Tactics in Cognitive Science (New York: Lawrence Erlbaum).
Croy, M. (l988). "Computer Assisted Instruction and Rule Applications for Deductive Proof Construction," Collegiate Microcomputer, 6, 5l-56.
Ford, L. (l987). "Anatomy of an ICAI System," in R. Lewis and E. Tagg (eds.), Trends in Computer-Assisted Education, 22-3l.
Lewis, R. and E. Tagg (eds.) (l987). Trends in Computer-Assisted Education. Boston: Blackwell Scientific Publications.
Ross, P., J. Jones, & M. Millington (l987). "User Modelling in Intelligent Teaching and Tutoring," in R. Lewis & E. Tagg (eds.), Trends in Computer-Assisted Education.
Sleeman, D. (l982). "Assessing Aspects of Competence in Basic Algebra," in D. Sleeman, and J. Brown (eds.), Intelligent Tutoring Systems, l85-l99.
Sleeman, D. and J. Brown (eds.) (l982). Intelligent Tutoring Systems. New York: Academic Press.
Suppes, P. (l979). "Current Trends in Computer-Assisted
Instruction," in M. Yovits (ed.), Advances in Computers, l8 (New
York: Academic Press), pp. l73-229.