WWW.THESIS.DISLIB.INFO
FREE ELECTRONIC LIBRARY - Online materials, documents
 
<< HOME
CONTACTS



Pages:   || 2 | 3 |

«Tianyou Wang ACT, Inc. Address correspondence to Tianyou Wang, ACT, P.O. Box 168, Iowa City, IA 52243, e-mail: wang Essentially Unbiased EAP ...»

-- [ Page 1 ] --

Essentially Unbiased EAP Estimates in Computerized Adaptive Testing

Tianyou Wang

ACT, Inc.

Address correspondence to Tianyou Wang, ACT, P.O. Box 168, Iowa City, IA 52243,

e-mail: wang@act.org

Essentially Unbiased EAP Estimates in Computerized Adaptive Testing

Abstract

In computerized adaptive testing (CAT), the scoring procedure is usually based on

IRT-based ability ( θ ) estimates instead of number-correct scores because different

examinees typically receive different sets of items. It is well-known that the maximum likelihood estimation (MLE) produces relatively unbiased estimates with relatively high standard error (SE) in CAT. The Bayesian estimation methods, on the other hand, produce estimates with relatively small SE but with large bias if a standard normal prior is imposed.

The purpose of this paper was to propose a new expected a posteriori (EAP) estimation method with a flatter prior distribution than the standard normal distribution to reduce the bias of the Bayesian methods. The simulation results of the paper demonstrated that the EAP with a beta prior distribution can produce estimates with similar or even smaller bias than the MLE and yet does not sacrifice much of the smaller SE and root mean square error (RMSE) of the standard EAP estimation with a normal prior, and that the presence of practical constraints such as content balancing and item exposure rate control does not affect the relative unbiasedness of the new EAP method.

Key Words: Computerized adaptive testing, Bayesian estimation, expected a posteriori, prior distribution, bias.

Essentially Unbiased EAP Estimates in Computerized Adaptive Testing In computerized adaptive testing (CAT), it is common that different examinees receive different sets of items from a given item pool. Because those sets of items are of different difficulty levels, it is inconvenient to derive the reported scores based on the number-correct raw scores as is often done in paper-pencil conventional testing. Therefore, IRT-based ability ( θ ) estimates are often used as a basis in deriving the reported scores. So far, four ability estimation methods primarily have been used in CAT: (1) maximum likelihood estimation (MLE) (Birnbaum, 1968), (2) Owen's Bayesian estimation (OWEN) (Owen, 1969, 1975), (3) expected a posteriori estimation (EAP) (Bock & Aitken, 1981; Bock & Mislevy, 1982), and (4) maximum a posteriori estimation (MAP) (Samejima, 1969). A few studies (Bock & Mislevy, 1982; Weiss & McBride, 1984; de la Torre, 1991; Wang,

1995) have been done to examine and compare these ability estimation methods under CAT settings. The general conclusions are that MLE is relatively unbiased with a well-designed item pool but has relatively large standard error (SE), and that the Bayesian methods are relatively biased toward the prior mean, and that among the Bayesian methods, EAP has relatively small bias, and SE. Bias in this context is defined as the mean θ estimates for an examinee taking the same CAT many times without practice effect minus his/her true θ.

EAP has the advantage of being computationally simpler than MLE and MAP. Wang (1995) also found that if an item pool lacks items of extreme difficulty levels, which is usually the case with real-world item pools, MLE could also be biased, but in the opposite direction of the Bayesian methods.

In many standardized testing programs (e.g., GRE, see Eignor & Schaeffer, 1995), Bayesian methods are not used despite their small standard error only because they are seriously biased. Bias can be problematic when the estimates are used to make inferences in relation to some absolute criterion. For instance, in computerized mastery testing, the estimates may be used to compare with certain cut-scores and make decisions about examinees’ pass/fail status. Bias in the estimation can cause serious false decisions. For some testing programs, the CAT form will co-exist with its paper-pencil conventional form for a period of time and the score scale will remain the same as for the conventional form. In these situations, there is a need to transform the θ estimates into the equivalent numbercorrect score on some base conventional form (e.g., Eignor & Schaeffer, 1995). Any bias in the θ estimates will necessarily affect the transformed reported score in a negative way. To solve this bias problem, some CAT developers resorted to traditional equating methods to eliminate the effect of the bias. For example, Segall (1995) and Segall & Carter (1995) used a random groups design to eliminate the inequivalency of the θ based CAT scores and conventional form test scores. The equating process is usually expensive and may introduce additional errors in the process of data collection and analysis.

Conceptually, the Bayesian methods are intrinsically biased because of the incorporation of the prior information into the estimation process. Like the regression methods in predication problems which regress the predicted values toward the mean, the Bayesian methods also regress estimates toward their prior mean. The Bayesian methods use both the data and the prior for estimation whereas MLE uses only the data. The Bayesian methods can be thought, in some loose sense, as a combination of MLE and the prior distribution which is usually the standard normal distribution. (For convenience, Bayesian methods with a standard normal prior will be called standard Bayesian methods in the remainder of this paper.) The large bias of the standard Bayesian methods in CAT is caused by the steep shape of the standard normal prior. But because MLE also has a relatively small bias in the opposite direction of bias of the Bayesian methods, it was hypothesized that if a flatter prior distribution is specified, the Bayesian estimates can also be relatively unbiased. The purposes of this paper are to study the effect of different specifications of the prior distribution on the bias of the EAP estimates in CAT, and to search for an optimal prior for a given item pool so that the EAP estimates would be basically unbiased in a relatively wide range of the θ scale. The relationship between the characteristics of item pool and the shape of the optimal prior will be investigated. Another purpose of the study is to examine the possible effects of implementing practical constraints such as content balancing and item exposure rate control on the bias of the new EAP method.





EAP was chosen among the Bayesian methods because of its relatively small error and computational simplicity over other Bayesian methods even though similar idea can be applied to MAP; that is, a flatter prior distribution can be applied to MAP to reduce its bias.

Because OWEN was specifically designed to have a standard normal prior, however, this idea does not apply to the OWEN method.

–  –  –

ability estimation methods, in particular, of MLE and the standard EAP methods are described below.

Maximum likelihood estimation: MLE is a widely used for parameter estimation in many statistical applications. In the context of item response theory (IRT) ability estimation, given a response vector u to a set of items with known parameters, the likelihood function is

–  –  –

Iterative numerical methods such as the Newton-Raphson method can be used to solve the likelihood function. Asymptotically, the variance of the MLE estimates can be approximated by the inverse of the test information function.

–  –  –

In the context of CAT, the approximation may not be sufficiently accurate because the test length of a CAT test is supposed to be relatively short. Warm (1989) and Wang (1995) found with simulation

–  –  –

targeted at an examinee’s true ability level, the bias will be close to zero because the term in the parentheses will be close to zero. If the ability level is higher than the average item difficulty level, the bias will be positive; likewise, if the ability level is lower than the average item difficulty level, the bias will be negative.

The expected a posteriori estimation: In the context of ability estimation in IRT, we have

–  –  –

where X k is one of q quadrature points, W (X k ) is a weight associated with the quadrature point, and L(Xk ) is the likelihood function conditioned at that quadrature point. Using this procedure, it can be seen that the EAP estimates become summations and do not require iterative processing. Unlike the OWEN method, the EAP method evaluates the actual posterior distribution directly. So at least logically, the EAP method is superior to the OWEN method. Bock & Mislevy (1982) pointed out that among all possible estimators, EAP has the smallest mean square error over (RMSE) the population for which the distribution of the ability is specified by the prior. The bias of the Bayesian methods all point toward the middle point of the θ scale if a standard normal prior is used. The shape of the prior distribution affects the magnitude of the bias for the Bayesian estimates. In largescale standardized testing, it is often not realistic to use any actual prior information about the individual examinee to form the prior. Therefore it is common practice to use the standard normal distribution as the prior for every examinee. The bias of the Bayesian estimates represents a regression effect toward the group mean which is undesirable in most standardized testing settings.

–  –  –

prior distribution is used instead of the standard normal prior. The goal is to make the new EAP method as good as MLE in terms of bias and still have RMSE similar to the standard EAP. With this new EAP method, the prior distributions no longer aims at reflecting any prior information about the examinees' ability but only at serving as a tool to achieve technical quality such as less bias. For this reason, they can be referred to as uninformative priors.

In choosing such flatter priors, there may be many different options. One such option may be the normal distribution with variance greater than one. But because the magnitude of bias of EAP or other Bayesian methods were found to be generally asymmetric around the middle point of the θ scale (cf. Wang, 1995), the normal distribution is considered not desirable due to its symmetry. The family of beta distributions was considered to be the best option for this situation because of their flexibility in shape. Let this beta distribution be denoted as g(θ|α,β,l, u), where α,β,l, and u are four parameters that characterize the distribution, with the first two parameters characterizing the shape and the last two parameters characterizing the lower and upper bounds of the distribution. The probability density function of this distribution can be expressed as (Johnson & Kotz, 1970; Hanson, 1991)

–  –  –

The shape of this distribution is symmetric when α equals β and is asymmetric otherwise.

When α is greater than β, it is negatively skewed; otherwise it is positively skewed. The smaller the α and β are, the flatter the shape is. Hanson (1991) presented formulas for computing the mean, variance, skewness and kurtosis of the distribution based on the values of these four parameters. The main task of this study is to search for a way for finding the four parameters so that the resulting EAP estimates will be essentially unbiased along a wide range of the θ scale.

In CAT the bias of the Bayesian estimates is not only affected by shape of the prior distribution, but are also affected by the characteristics of the item pool such as the number of items and the discrimination values within different strata of difficulty levels (Wang, 1995). Therefore a universally applicable prior distribution to produce the least biased estimates for all types of item pools can not be found. The search for such prior distributions is then specific for a particular item pool. Because many different aspects of the characteristics of an item pool are expected to influence the bias of the EAP estimates (Wang, 1995), it is not expected that parameters for the beta prior can be determined quantitatively in relationship with some indexes of the item pool characteristics. The different aspects of item pool characteristics may include the pool size, the mean discrimination parameter values, the distribution of the difficulty parameters, and the number of items and the mean discrimination values for items within each strata of the difficulty levels, etc. For this reason, a trial-and-error approach with simulations will be used to find the parameter values of the beta prior for a particular pool that yields estimates with the smallest bias. The parameter values thus found, however, will be examined in relationship to the characteristics of the item pool. This process will be repeated across several item pools with different characteristics with the goal of finding general relationships between the parameters of the beta prior distribution with the characteristics of the item pools.

–  –  –



Pages:   || 2 | 3 |


Similar works:

«NOTE CAREFULLY The following document was developed by Centre for Learning Innovation, DET. Adaptation of this material requires the observation of moral rights obligations regarding attributions to source and author. For example: This material was adapted from ‘(Title of CLI material)’ produced by Centre for Learning Innovation, DET. Furthermore, this material contains 3rd part copyright items which limits the way it can be used. To clarify which items are 3rd party copyright, contact the...»

«Task Force to Study Maryland Insurance of Last Resort Programs 2013 INTERIM REPORT Annapolis, Maryland December 2013 Contributing Staff Writers Laura H. Atas Tami D. Burt Michael F. Bender Jennifer A. Ellick Reviewers Tami D. Burt Robert K. Smith Other Staff Who Contributed to This Report Michelle J. Purcell Theresa A. Silkworth For further information concerning this document contact: Library and Information Services Office of Policy Analysis Department of Legislative Services 90 State Circle...»

«  CHICAGO  JOHN M. OLIN LAW & ECONOMICS WORKING PAPER NO. 516  (2D SERIES)        The Failure of Mandated Disclosure    Omri Ben‐Shahar and Carl E. Schneider          THE LAW SCHOOL  THE UNIVERSITY OF CHICAGO    March 2010    This paper can be downloaded without charge at:  The Chicago Working Paper Series Index: http://www.law.uchicago.edu/Lawecon/index.html ...»

«our lady OF THE CEDARS CHURCH A MELKITE CATHOLIC COMMUNITY 140 MITCHELL STREET, MANCHESTER, NH 03103 TEL # (603) 623-8944 FAX # (603) 645-6017 Email: oloc.church@comcast.net Website: www.olocnh.org REV. THOMAS P. STEINMETZ, Pastor RT. REV. ANDRE ST. GERMAIN (retired) REV. ALAM ALAM Weekend Ministry REV. ROGER BOUCHER Weekend Ministry REV. DEACON ROBERT SPENCER REV. DEACON PAUL LEONARCZYK October 7, 2012 The Commemoration of the Holy Martyrs Sergius and Bacchus THE THIRD SUNDAY AFTER THE HOLY...»

«by Matthew Wales for the University of Minnesota First-Year Writing Program under the Department of Writing Studies WRIT 1401_001 7 December 2015 “It’s Just A Prank, Bro!”: The Deplorable Immorality of YouTube “Prank” Videos YouTube is host to a plethora of various genres of videos, from education to comedy to gaming, and each genre has its quirks. One would think that YouTube videos, regardless of their classification, are just innocent, harmless entertainment due to the site’s...»

«LP-PAN Software Defined IQ Panadapter LP-PAN Installation, Setup & Operation For K3 users only July 2011 TelePost Incorporated Rev. F13 For use with PowerSDR-IF v1.19.35 Table of Contents Introduction Basic Interconnect Diagram Initial LP-PAN Hardware Settings Sound Card Installation PowerSDR-IF v1.19.35 Installation & Setup Using rigs other than K3 Calibration LP-Bridge Operation PowerSDR-IF v1.19.35 Operation Using LP-PAN with CW Skimmer Troubleshooting Copyright and Trademark Disclosures...»

«MOUNT ISA SCHOOL OF THE AIR Mount Isa School of the Air PO Box 1683 137-143 Abel Smith Pde MOUNT ISA QLD 4825 Phone: 07 4744 8333 Fax: 07 4744 8300 NEWSLETTER 2 – TERM 2 2015 Welcome to Term 2, a term where our school community gets together for the Home Tutor Inside this Newsletter: Seminar, sports day, activity days and camps. I would encourage all of our families to fully participate in all of these activities. There is information in this newsletter about each of the  Greetings...»

«Velocitydvi Digital Video Extension System Velocitydvi-3, Velocitydvi-33 Velocitydvi-6, Velocitydvi-63 Single-Link and Dual-Link Fiber Extension Systems PRODUCT MANUAL Thinklogical, LLC® 100 Washington Street Milford, Connecticut 06460 U.S.A. Telephone: 1-203-647-8700 Fax: 1-203-783-9949 www.thinklogical.com Copyright Notice Copyright © 2014. All rights reserved. Printed in the U.S.A. Thinklogical, LLC® 100 Washington Street Milford, Connecticut 06460 U.S.A. Telephone 1-203-647-8700 All...»

«CHAIRPERSON HUMAN SERVICES COMMITTEE MARC THOMAS CURTIS HERTEL, JR., CHAIR MARK GREBNER VICE-CHAIRPERSON REBECCA BAHAR-COOK DIANNE HOLMAN ANDY SCHOR TODD TENNIS VICE-CHAIRPERSON PRO-TEM MIKE SEVERINO MIKE SEVERINO INGHAM COUNTY BOARD OF COMMISSIONERS P.O. Box 319. Mason, Michigan 48854 Telephone (517) 676-7200 Fax (517) 676-7264 THE HUMAN SERVICES COMMITTEE WILL MEET ON MONDAY, FEBRUARY 26, 2007 AT 7:00 P.M., IN THE PERSONNEL CONFERENCE ROOM (D & E) OF THE HUMAN SERVICES BUILDING, 5303 S....»

«A LOST DECADE FOR JAPANESE CORPORATE GOVERNANCE REFORM?: WHAT’S CHANGED, WHAT HASN’T, AND WHY by Curtis J. Milhaupt Working Paper 202 September 2004 Postal address: P.O. Box 6501, S-113 83 Stockholm, Sweden. Office address: Sveavägen 65 Telephone: +46 8 736 93 60 Telefax: +46 8 31 30 17 E-mail: japan@hhs.se Internet: http://www.hhs.se/eijs A Lost Decade for Japanese Corporate Governance Reform?: What’s Changed, What Hasn’t, and Why Prepared for Institutional Change in Japan: Why it...»

«PILGRIM NEWS & NOTES April 1, 2015 Official Publication of the Midwest Pilgrim Holiness Church The Grace of God And he said unto me, My grace is sufficient for thee: for my strength is made perfect in weakness. II Corinthians 12:9 Annie Johnson Flint was born to Eldon and Jean Johnson on Christmas Eve in 1866. Her parents loved her dearly! She was their greatest treasure! When Annie was only 3 years of age, her mother died giving birth to her baby sister. Her daddy was also suffering with an...»

«AGENDA FOR THE ORDINARY MEETING OF COUNCIL TO BE HELD IN THE COUNCIL CHAMBERS, YALGOO ON THURSDAY 22 OCTOBER 2015 COMMENCING 11.00 AM SHIRE OF YALGOO NOTICE OF ORDINARY COUNCIL MEETING THE NEXT ORDINARY MEETING OF COUNCIL WILL BE HELD IN THE YALGOO COUNCIL CHAMBERS, YALGOO ON THURSDAY 22 OCTOBER 2015 COMMENCING AT 11.00 AM. Silvio Brenzi Acting Chief Executive Officer CONTENTS 1. DECLARATION OF OPENING/ANNOUNCEMENT OF VISITORS 2. RECORD OF ATTENDANCE/APOLOGIES/LEAVE OF ABSENCE 3. DISCLOSURE OF...»





 
<<  HOME   |    CONTACTS
2017 www.thesis.dislib.info - Online materials, documents

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.