Linear On-the-Fly Testing (LOFT): Fair and Secure Test Delivery

Linear On-the-Fly Testing (LOFT) is an advanced computer-based testing model that generates a unique but equivalent version of an exam for every candidate. LOFT assembles tests dynamically from a calibrated item bank according to predefined blueprints.

Computer-based testing can be delivered through several models, including fixed-form, adaptive, and linear-on-the-fly testing (LOFT). Like fixed-form testing, LOFT makes sure that every test is equally fair and balanced. But unlike fixed-form testing, LOFT gives each person a different version of the test, which helps protect security and reduce how often the same questions are seen.

Because of this balance between fairness and security, LOFT has become one of the most effective models for large-scale online exams. It allows test administrators to deliver unique yet equivalent tests instantly.

Key takeaways

  • LOFT ensures both fairness and security by generating unique yet equivalent test forms for every candidate.
    • Each test is assembled dynamically from a calibrated item bank according to predefined blueprints that balance topic coverage and difficulty.
      • Content and statistical equivalence guarantee that different test versions measure the same abilities with the same accuracy.
        • Automated test assembly eliminates the need for manual form creation, saving time and improving operational efficiency.
          • LOFT minimizes item exposure and cheating risks, making it ideal for large-scale and continuous testing programs.

            What is linear-on-the-fly testing?

            Linear-on-the-fly testing is a test assembly method where each candidate receives a unique but equivalent version of a test that is generated from an item bank just before the test begins. It combines the principles of classical test assembly and computerized delivery.

            Instead of all candidates taking the same fixed test, LOFT creates a customized version for each candidate by selecting items from an item pool based on pre-defined blueprints (e.g., number of items per topic, difficulty range, item exposure limits).

            Diagram showing LOFT test generation where questions are selected pseudo-randomly from item pools based on a blueprint
            Diagram showing LOFT test generation where questions are selected pseudo-randomly from item pools based on a blueprint

            How linear-on-the-fly testing works

            In linear-on-the-fly Testing, a computer program automatically selects questions from a large question bank in a pseudo-random way. This means the questions are not chosen completely at random, the system follows specific rules to make sure that:

            • Every test has the same balance of topics and difficulty,
              • Each candidate gets a different set of questions that are still equivalent in difficulty and quality.
                Diagram showing pseudo-random question selection from a large item bank to create equivalent tests in linear-on-the-fly testing
                Diagram showing pseudo-random question selection from a large item bank to create equivalent tests in linear-on-the-fly testing

                To make sure all versions of the test are truly comparable, the system can use Item Response Theory (IRT), a statistical method that measures how difficult each question is and how well it differentiates between different ability levels.

                Alternatively, it may rely on Classical Test Theory (CTT) or basic item metadata (such as topic tags and difficulty levels) to maintain fairness and consistency.

                How LOFT testing ensures fair and equivalent tests

                In linear on the fly testing, fairness depends on two core principles: content equivalence and statistical equivalence.

                Content equivalence:

                • Ensures every test form covers the same topics, skills, and difficulty mix.
                  • Even if the questions differ, all candidates face the same balance. For example, equal numbers of grammar, vocabulary, and reading items.
                    • Achieved through a blueprint that guides which items can be selected.

                      Statistical equivalence:

                      • Ensures all test forms are equally difficult and measure ability with the same accuracy.
                        • Uses item data (like difficulty and discrimination, often from IRT) to build comparable test versions.
                          • Guarantees that people with the same skill level get similar scores, even on different forms.
                            • Together, these principles make LOFT tests different in content but equal in challenge, fairness, and quality.

                              Steps in linear-on-the-fly testing

                              1. Item pool preparation: The test admin creates a large bank of items, each tagged with attributes such as topic, difficulty, and discrimination parameters.
                                1. Blueprint definition: A test specification (content balance, length, difficulty targets, etc.) defines what each test form should look like.
                                  1. On-the-fly assembly: When a candidate starts the exam, the system automatically selects items from the bank that satisfy the blueprint and statistical constraints.
                                    1. Delivery: The generated test is linear (same order and fixed sequence for that candidate), but different candidates get different item sets.
                                      Illustration showing LOFT testing steps from item pool preparation to test delivery
                                      Illustration showing LOFT testing steps from item pool preparation to test delivery

                                      Example LOFT testing scenario

                                      An organization uses linear-on-the-fly testing to deliver secure, fair, and flexible online language assessments.

                                      1. Item bank setup:

                                      The test administrator maintains an item bank of around 1,000 questions covering grammar, vocabulary, reading, and listening.

                                      Each question is tagged by topic, skill area, and difficulty level (easy, medium, hard).

                                      The test blueprint defines that each assessment must include 40 questions:

                                      • 10 grammar
                                        • 10 vocabulary
                                          • 10 reading
                                            • 10 listening

                                              With 30% easy, 50% medium, and 20% hard items.

                                              2. When a candidate starts the test:

                                              The LOFT system automatically assembles a test on the fly, selecting questions that match the blueprint and difficulty distribution.

                                              Each candidate receives a unique combination of items. For example:

                                              • Candidate A might get Grammar Q42 (medium), Reading Q311 (hard), Listening Q508 (easy).
                                                • Candidate B might get Grammar Q27 (medium), Reading Q216 (hard), Listening Q745 (easy).

                                                  3. Fairness and consistency:

                                                  Even though candidates see different questions, all tests are equivalent in content coverage and difficulty. Every form aligns with the same blueprint, ensuring fair comparison across all examinees.

                                                  4. Practical benefits:

                                                  Tests are generated instantly and on demand.

                                                  Security is strengthened since no two test forms are identical.

                                                  Administrators can easily update or expand the item bank without recreating entire test versions.

                                                  Benefits of linear on-the-fly testing

                                                  • Minimized cheating opportunities: Each candidate receives a unique test form, making item theft and answer sharing far more difficult.
                                                    • Uniqueness of assessments: Every test form is assembled on the fly, ensuring that no two candidates receive the same set of questions.
                                                      • Reduced predictability: It reduces predictability across sessions and prevents candidates from memorizing or sharing specific questions.
                                                        • Controlled item exposure: It reduces item exposure rates because questions are randomly drawn from a large bank.
                                                          • Fairness and equivalence: Every test form is generated according to the same blueprint and psychometric constraints, ensuring equivalent difficulty and content balance. Candidates are
                                                            • Operational efficiency: No need to pre-assemble test forms manually. The system builds them automatically. It. saves time and administrative effort, especially for large-scale testing programs. It also allows new items to be added to the pool at any time, supporting continuous test delivery.
                                                              • Scalability and flexibility: Ideal for large testing populations and continuous testing programs. Candidates can take the exam anytime, anywhere without waiting for a fixed test window. It can easily adapt to content updates or changes in test design without rebuilding entire forms.
                                                                • Better item bank utilization: Ensures balanced item usage across the pool, preventing some items from being overused while others remain unused. It. extends the life span of the item bank, delaying the need for costly content development cycles.
                                                                  • Consistency with scoring: Although forms differ, all are linear tests, not adaptive, so scoring, scaling, and reporting remain familiar and straightforward. LOFT makes it easier to compare results across candidates and sessions.

                                                                    LOFT vs. Adaptive Testing (CAT) vs. Linear Fixed-Form Delivery

                                                                    FeatureLinear Fixed-Form DeliveryLinear On-the-Fly Testing (LOFT)Computerized Adaptive Testing (CAT)
                                                                    Test form creationPre-assembled manuallyGenerated automatically for each candidateBuilt dynamically during the test
                                                                    Item selection timingBefore testing, same form for allBefore test, unique form per candidateDuring the test, item-by-item
                                                                    AdaptivityNoneNone (but randomized within constraints)Fully adaptive based on performance
                                                                    FairnessHigh – everyone takes the same formHigh – different forms but equivalentHigh – precise for each individual but forms differ
                                                                    SecurityLow – identical form for allHigh – unique form per candidateModerate – certain items may be overused
                                                                    Item exposureVery highControlled and balancedRequires advanced exposure control
                                                                    Operational efficiencyModerate – forms must be rebuilt manuallyHigh – automatic generation saves timeHigh – fewer items needed per test
                                                                    ScoringTraditional scoringTraditional scoringAbility estimation (IRT-based)
                                                                    Test lengthFixedFixedVariable (usually shorter)
                                                                    ScalabilityLimited by manual form creationVery high – ideal for large populationsHigh – limited by algorithm complexity
                                                                    Best suited forSmall, controlled testing programsLarge-scale, secure, on-demand examsPlacement, adaptive learning, diagnostics

                                                                    LOFT testing vs. Standard Randomization

                                                                    FeatureLinear On-the-Fly Testing (LOFT)Standard Randomization
                                                                    Test assemblyGenerated instantly based on blueprint and constraintsRandomly selected or shuffled without rules
                                                                    Content balanceGuaranteed through predefined blueprintNot guaranteed
                                                                    Difficulty controlBalanced by target difficulty levelsUncontrolled, purely random
                                                                    Equivalence (fairness)High, each form is statistically and content-equivalentLow, forms may vary in difficulty and coverage
                                                                    Security & item exposureHigh, each test form is uniqueMedium, same items may appear often
                                                                    Psychometric controlBased on IRT or CTT parametersUsually absent
                                                                    ScalabilityHigh, ideal for large-scale and on-demand deliveryModerate, limited by question pool size
                                                                    Typical use caseCertification, recruitment, secure large-scale examsLow-stakes quizzes or practice tests

                                                                    Challenges and limitations in LOFT Testing

                                                                    • Requires a large item bank: LOFT depends on having a large pool of well-calibrated questions to generate unique yet equivalent tests. If the item bank is too small, some items will appear too often, increasing item exposure and reducing security. Building and maintaining such an item bank requires continuous content development.
                                                                      • High system and algorithm requirements: Real-time test generation demands robust software and fast algorithms to select items instantly while enforcing all constraints. System failures or slow item selection processes can disrupt large-scale exam delivery.
                                                                        • Continuous monitoring and calibration: Over time, item difficulty or discrimination values may drift as the population changes. It requires ongoing statistical analysis to recalibrate items and maintain psychometric quality. Without maintenance, tests may become inconsistent or unreliable across candidates.
                                                                          • Limited personalization: Unlike Computerized Adaptive Testing (CAT), LOFT does not adjust difficulty based on individual performance. Every candidate receives a test of equivalent design, not one optimized for their ability level.

                                                                            Why use linear-on-the-fly testing?

                                                                            Linear-on-the-fly testing stands out among computer-based testing approaches for its focus on both security and fairness. Unlike fixed-form exams, where every participant answers the same set of questions or one of only a few versions, LOFT builds a unique test for each examinee by drawing from a shared item bank. This minimizes question exposure and protects the integrity of the assessment.

                                                                            At the same time, LOFT is often considered more balanced and easier to accept than adaptive testing systems, since every candidate answers the same number of items and experiences a test of comparable content and difficulty.

                                                                            When to use LOFT testing?

                                                                            You should consider using LOFT when:

                                                                            • You want every candidate to take a unique but equivalent test.
                                                                              • You need to test large groups continuously or on demand.
                                                                                • You want to reduce item exposure and prevent question sharing.
                                                                                  • You aim to ensure fairness and comparability across all versions.
                                                                                    • You plan to update or expand your item bank regularly without redesigning test forms.

                                                                                      Use cases of LOFT testing

                                                                                      • Certification exams: LOFT ensures that every candidate receives a unique but equivalent version of the test, protecting question integrity and preventing leaks.
                                                                                        • Recruitment and pre-employment testing: In hiring processes where large numbers of applicants are tested, LOFT enables fair comparison by generating balanced test forms for each candidate while maintaining test security and consistency.
                                                                                          • Educational placement and proficiency testing: Language or academic placement exams benefit from LOFT’s ability to create multiple equivalent forms, ensuring fairness even when tests are taken on different dates or sessions.
                                                                                            • Corporate training and skill assessments: Organizations that assess employees regularly can use LOFT for continuous, on-demand evaluations, avoiding repetition and keeping content secure over time.

                                                                                              Resources

                                                                                              [1] Bourque, Jimmy. (2023). A Primer on Linear-On-The-Fly Testing (LOFT). 10.13140/RG.2.2.18748.82567.

                                                                                              [2] Becker, K. A. & Bergstrom, B. A., (2013) “Test Administration Models”, Practical Assessment, Research, and Evaluation 18(1): 14. doi: https://doi.org/10.7275/pntr-yz21

                                                                                              Created on 2025/11/03 Updated on 2025/11/03 Share
                                                                                              Go Back

                                                                                              Talk to a representative

                                                                                              Discover how TestInvite can support your organization’s assessment goals