Published in the Computerized Logic Teaching Bulletin, vol. 4,
no. 1, March, 1991, pp. 2-12.
The use of computers in education is often thought of as a
means of putting sound pedagogical principles and techniques into
practice. However, such use can also contribute to building the
empirical foundations for those techniques. This can occur in two
ways. First, CAI programs can collect data on student performance for
the purpose of identifying prominent weaknesses and for investigating
processes involved in mastering various tasks and learning particular
subject matters. In every discipline, from the sciences to the
humanities, there are important questions to be answered concerning
the difficulties students have in learning and applying various
concepts and techniques. CAI can aid in answering those questions.
Second, CAI programs can be used to document student performance for
the purpose of program evaluation. While providing instruction,
computers can serve as data collection devices that provide precise
behavioral measures, including response time, of student learning.
CAI programs can thus facilitate their own evaluation. In both of
these applications, computers function akin to other observational
instruments in the sciences.
The logic CAI project described here has made some attempts to
move in these directions. The results of these efforts are reported
in Croy (1989, 1988). Here, an overview of these activities is given
along with some encouragement for other projects to take up this
task.
It has previously been observed that, when using an
inference/replacement rule set, students applied "negative" rules
(defined as any rule containing a negation sign) less successfully
than "positive" rules (those containing no negation signs). This
difference, moreover, was greater than that observed for inference
versus replacement rules, and the degree to which students used these
negative rules was found to be inversely correlated with rule
application proficiency. That is, greater use of negative rules was
associated with lower application success rates. In addition,
particular patterns of rule misapplications (termed "pseudo-rules")
can be identified.
These observations can potentially be of immediate practical
value. If students experience difficulties with negative rules,
particular lessons can be devised to help remediate such difficulties
both within the classroom and CAI programs. But before these efforts
can begin, questions about the reliability and generalizability of
the observations should be faced. What about the next semester? Will
the next class of students also exhibit these weaknesses? Are the
findings limited to a given problem set? What exactly are the
conditions under which the difficulties occur? These questions are
not purely of scientific interest. Implementing a plan to address
observed weaknesses will require a variety of resources. The
generalizability of the observed weaknesses will dictate the scope
and usefulness of the remediation effort and the degree to which
those resources are effectively used. The relevance of observed
student difficulties and corresponding remediation to students in
similar courses at other universities is also an issue of
interest.
Resolving these issues is not a simple task. While statistical
techniques are normally employed to address such issues, educational
settings often work against their usefulness. Many statistical tests,
for example, assume that the subjects observed have been randomly
selected from some larger pool. In the case of a correlation (such as
that reported above, for example) a measure of statistical
significance would indicate the likelihood that observations of the
selected group would hold for members of the larger group as well.
Students enrolled in a particular section during a particular
semester, however, cannot be assumed to constitute a random sample of
all students taking the course during the year. Even less can they be
considered to be a useful sample for generalizing to all students
studying logic at large. This impairs the ability to make inferences
from one's own students to those attending universities elsewhere.
This problem is significant given the way in which CAI programs and
textbooks developed at one university are exported to a wide variety
of educational institutions.
On the other hand, it should be emphasized that the unending
procession of students through courses, coupled with data collecting
CAI programs, is an asset. Repetition of results affords some
confidence in their reliability. The observed difficulty with
negative rules, for example, has shown up each semester for several
consecutive sessions. Consequently, it seems more reasonable to
predict that the difficulties will continue to occur (given the same
problem sets and sequence of exercises) than that they will
not.
Nevertheless, questions can be raised concerning the
observation that negation rules seem to be particularly troublesome.
Would, for example, this observation hold up for all problem sets?
One reason that it might not is that certain instantiations of a
particular rule are more error-prone than others. Consequently, it
might be that observed difficulties are a function of the particular
problems (and application instances) involved rather than of any
general characteristic of the rules themselves.
One approach to answering this question is to explore a variety
of problem sets in various textbooks. The problems initially used in
our studies are found in Copi's Introduction to Logic (seventh
edition) and Butrick's Deduction and Analysis (revised
edition). Students worked these problems in an introductory course
which averages about 30 students each semester. During the Fall of
1988, an attempt was made to determine whether similar results would
also show up with advanced students using problems from Kahane's
Logic and Philosophy (fifth edition). Ten students in an upper
level logic course worked justification exercises prior to facing
full proof problems just as with the Copi problems. Observations of
these students' performance (involving a total of 1,578 rule
applications) confirmed much of what was found under the Copi problem
set. For example, the difference between success rates for positive
and negative rules (89.1% versus 83.7%) was greater than that between
inference and replacment rules (86.4% versus 86.2%). The inverse
correlation of negative rule use and overall rule application success
rate was also evident. In fact, this correlation (-.83) was the
highest ever observed. (This may be due to the fact that the Kahane
text does a good job of forcing students to face the more difficult
instances of negative rule applications.) By contrast, the
correlation of success rates with replacment rules or other
connective-defined classes of rules do not begin to approach the
magnitude of that found with negative rules.
Very similar results were found in the Spring of 1990 using
problems selected from Klenk's Understanding Symbolic Logic
(second edition). A total of 1,016 applications made by nine advanced
students working eight proof problems each were analyzed. While the
differences are not as sharp, the same patterns described above
emerge. A negative correlation (-.41) holds between the extent of
negative rule use and overall application success rate. A similar,
though somewhat weaker, inverse relation exists in respect to the use
of replacement rules, but no other connective-defined class of rules
shows relationships nearly as strong. So, once again, negative rules
seem to contain an important source of difficulty for students
attempting to master the application of symbolic transformation
rules. Previous analyses have shown that these difficulties increase
when these rules are applied to expressions already containing
negations; for example, when DeMorgans is applied to '~(~A &
~B).' Moreover, these findings support the view that difficulties
with negative rules are not merely a function of particular problem
sets. Of additional interest is the fact that these advanced students
working on more demanding problems displayed some of the same
difficulties characteristic of introductory level students and
problems.
One thing that's become clear from these studies is the
importance of being able to apply transition rules correctly. Close
observation reveals many more types of difficulties than was
previously suspected. Although the inability to construct proofs is
certainly related to failures in strategic thinking, rule application
deficiencies account for many troubles and may even interfere with
some forms of strategic planning. These deficiencies, moreover, arise
not from the inability to define rules but from failures to apply a
given rule pattern to particular expressions correctly. Learning to
apply rules correctly under a variety of trying circumstances thus
constitutes a significant segment of mastering proof construction,
and CAI programs which promote this learning are doubly
valuable.
In light of this, we have spent considerable effort developing
a program for this purpose. This program, named JUSTIFIED THOUGHT
(hereafter, JT), provides exercises on applying rules of transition
to a variety of particular expressions. The form of these exercises
is similar to that found in many textbooks. Students must (1)
accurately name particular rule applications, (2) supply concluding
expressions when presented with premises, and (3) accept or reject
potential applications on the basis of their legitimacy. In the
Spring of 1989, we initiated what is hoped to be the first of several
evaluations of the JT program. Thirty introductory level students
practiced justification exercises prior to attempting proof
construction just as in previous semesters. About half of the class
(randomly selected and designated Group E) used the JT program while
the other half (Group C) worked the identical exercises on paper, to
be handed in and returned as regular homework assignments. Given that
both groups faced identical exercises, it was expected that any
subsequent performance differences observed would be small and would
indicate the effectiveness of computer presentation and capabilities
rather than of justification exercises themselves. These capabilities
included the ability to immediately compare misapplications with rule
forms and to receive more informative error messages for
pseudo-rules.
One measure of the effectiveness of the JT program can be
obtained by comparing the proof construction efforts of the two
groups. After completing justification exercises (via JT or on
paper), all students worked proofs using a data collecting proof
checking program (DEEP THOUGHT). Each student was assigned to one of
four problem sets (a, b, c, d) each of which contained five proof
problems. Since the use of negative rules appears to be a stumbling
block, we have been interested in how well the JT program can help
students overcome this weakness. Thus, student performance using
these particular rules was compared both across Group E and Group C
and the four problem sets. The performance measure taken was the
total number of correct applications using negative rules minus the
number of incorrect applications using those same rules.
Figure 1 shows the results for each
problem set and group. Each cell in Figure 1 contains two numbers,
one representing the number of students in that condition and one
representing average performance. For example, there were five
students assigned to problem set b in Group E, and the average number
of correct minus incorrect applications for these five students was
thirteen. With the exception of problem set c, the students who used
JT scored higher than those who did not, and the overall totals also
favor Group E. However, the statistical test, known as analysis of
variance (ANOVA), used to analyze this data shows that the
differences observed here are not quite large enough to be
statistically reliable. The differences between Group E and Group C
and among problem sets approach but do not achieve significance (P =
.10 and P = .14, respectively) using the .05 criterion.
In addition to rule application proficiency, we were also interested in comparing the two groups in respect to the amount of time required to solve proof problems. Microcomputers (as opposed to time-sharing systems) provide a reliable mechanism for measuring this variable, but a number of other factors are currently restricting our ability to interpret this data in a meaningful way. One of these impediments involves the existence of unsolved problems. It is risky to compare the average time spent working on a problem for different groups of students when both sets of data contain unfinished problems. In order to minimize these risks, two problems were selected from the five problem sequence assigned to each student. The first problem in the five-problem sequence was excluded from consideration to control for possible "warm-up" effects due to the newness of the CAI environment for Group C. The second and third problem in each problem set's sequence contained only four occurrences of unfinished problems (two for each group) out of a total of 58 problems worked, and the results for these two problems are summarized in Figure 2. The average time (minutes) spent working on these two problems is given for each group and problem set. For example, the average time required for the five students in group E to solve the two problems selected from set b was 7.7 minutes. (One student made no attempt at either of these problems and was deleted from Group E, problem set c.) The results of this analysis are very similar to those presented above. Overall, Group C students solved their problems in less time, and this finding holds for each problem set except set c. Again, however, the differences observed here both between the groups and among problem sets are not quite large enough to attain statistical significance (P = .15 and P = .08, respectively).
These analyses should be taken as no more than initial,
quasi-experimental explorations which can and should undergo
continued refinement. There are several variables which still need to
be controlled, and our plan is to conduct such studies about once per
year, tightening them up progressively. The results will initially be
treated as formative rather than summative evaluations. That is, they
are used for guiding program development rather than for rating final
products. Of course, development should evolve to a point where clear
advantages of program use can be demonstrated.
In any event, the current results do not show any clear
advantage to using the JT program, either as a means of overcoming
the difficulties which attend the use of negative rules or as a means
of expediting proof construction. The JT program did not prove itself
to deliver the benefits of justification exercises any better than
standard homework exercises did. (This is assuming that there
are such benefits to be delivered, but without any control
group of subjects working proofs without having previously worked
justification exercises, that assumption is untested here.)
Nevertheless, these results provide an important source of formative
feedback, and the JT program is undergoing a number of modifications
which, upon completion, will also be subject to evaluation.
Carrying out these empirical studies cannot proceed without
walking an ethical tightrope in trying to balance diverse concerns
and interests. This challenge is best seen not as a conflict between
project needs and student needs but between short term and long term
student interests. Clearly, collecting and analyzing data and
publishing the results require steps that ensure privacy and
anonymity. Data should be coded in ways that protect student
identities. Particularly where students are divided into experimental
and control groups, guarantees must be provided that no student is
disadvantaged in terms of grades or learning opportunities.
Individual projects will have to anticipate, seek out, and address
particular ways in which students may be adversely affected by the
development and use of CAI.
Recommendations relevant to this issue are discussed in a case
analysis given by Moor (1986) and Overall (1986). In fact, the
evaluation reported here was carried out in line with Moor's
suggestion that, should statistically significant differences in
grades exist between control and experimental groups, the
disadvantaged group should be compensated by the amount of that
difference. No such difference existed between the exam or homework
assignment grades of Group E versus Group C students, however. We
also offered students the opportunity of changing groups after random
assignment (but none chose to do so). Providing this opportunity
jeopardizes the experimental design but this may be acceptable in
early stages of development where evaluations play the formative role
of providing directions for further development and indications of
when well controlled studies are needed. Had the JT program appeared
to be highly effective in this preliminary study, development could
have been discontinued and more serious evaluation initiated.
If these suggestions were followed, CAI development projects
would simultaneously become educational research projects. Issues
worth investigating are plentiful. In every discipline, questions
about aspects of the subject matter that prove most troublesome are
worth pursuing. The same can be said for identifying effective
techniques of remediation. With the possible exception of
mathematics, these questions have not been systematically pursued.
Since CAI programs are often designed to either accompany or follow
particular textbooks, the opportunity to evaluate the pedagogical
techniques employed in textbooks are also prime targets for
evaluation.
It is not clear, however, how the burden of this research
should be distributed, and there is no doubt that it is in fact a
burden. Would the interests of developing effective CAI be best
served if each project incorporated a research component into its
overall structure? Or should particular projects focus on these
efforts while others emphasize other components of CAI development?
Whatever the answer, the need for widespread research is evident
given the diversity of settings in which CAI is developed and
adopted. (Suggestions related to these themes have also been made by
Millican (1988) and Twidale (1989).)
In order to be productive, these studies need to be both
long-term and coordinated with similar research efforts at other
universities. A long-run coordinated effort could increase the
generalizability and reliability of findings. For example, projects
using similar rule sets for proof construction could jointly
investigate the difficulties associated with those rules. Also,
comparisons could be made among students with different
characteristics at different universities working on identical
problem sets. It should be expected that one project, or even a few
working in isolation, will not be able to accomplish much. But
wherever students are using computers to learn, the opportunity
exists for observing and recording actual student strengths and
weaknesses. Processing and analyzing this data adds a new dimension
to the CAI effort. It seems clear that it is the responsiblity of the
CAI movement to conduct this research, and that responsibility will
be most fully met by a widespread effort in which many projects
participate.
Butrick, R. (1981). Deduction and Analysis, revised
edition. Washington, D.C.: University Press of America.
Copi, I. (1986). Introduction to Logic, seventh edition.
New York: Macmillan.
Croy, M. (1988a). "Computer-Assisted Instruction and Rule
Applications for Deductive Proof Construction," Collegiate
Microcomputer, VI, 51-56.
_______. (1988b). "The Use of CAI to Enhance Human Interaction
in the Learning of Deductive Proof Construction, Computers and the
Humanities, 22, 277-284.
_______. (1989). "CAI and Empirical Explorations of Deductive
Proof Construction," Computers and Philosophy Newsletter, 4,
111-127.
Kahane, H. (1986). Logic and Philosophy, fifth edition.
Belmont, CA: Wadsworth Publishing Company.
Klenk, V. (1989). Understanding Symbolic Logic, second
edition. Englewood Cliffs: Prentice-Hall.
Millican, P. (1988). "Prospects and Problems for Computers in
Logic Teaching," Computerized Logic Teaching Bulletin, 1,
32-38.
Moor, J. (1986). "Computer-Assisted Instruction and the Guinea
Pig Dilemma," Teaching Philosophy, 9, 351-354.
Overall, C. (1986). "Innovation and Injustice: Commentary on
'I'm Not a Guinea Pig'," Teaching Philosophy, 9,
354-358.
Twidale, M. (1989). "Explicit Planning and Instantiation as a
Means of Facilitating Student Computer Dialog," Computerized Logic
Teaching Bulletin, 2, 2-12.