Principles behind IOI 2004 Competition Tasks and Their Grading
========== ====== === ==== =========== ===== === ===== =======

Author: ISC
Date: 12 July 2004
Status: Approved by ISC

This document presents some of the principles that have guided the selection of
the IOI 2004 competition tasks and their grading criteria.

Please read the whole document carefully.  The introduction below explains
why the ISC formulated these principles.  The second section states the
principles.  The third section provides some motivation for these principles
and discusses their intention.  An example is presented in a separate section.
The final section addresses the future of these principles.

Introduction
------------
The GA has often expressed their concern about the relatively low IOI
scores and relatively few contestants that fully solve tasks.  This
concern can also be inferred from the responses to the survey about the IOI
2002 competition.  Statistics for the results of some recent IOIs
support this impression.  See the appendix for detailed statistics.

            IOI 1999 2000 2001 2002 2003
            --- ---- ---- ---- ---- ----
  lowest bronze  135  149  143  135  173.1  (score out of 600 max. for tasks)
   # hard tasks    ?    2    3    5    5    (<10% of contestants score >= 90%)

The IC has asked the ISC to address these concerns in the preparation of the
IOI 2004 competition.

Principles
----------
After extensive discussions during the ISC review meeting in Greece
at the end of June 2004, the ISC came up with the following measures
to address the issues mentioned above.

  1.  The competition tasks will be selected to cover a range of difficulty
      levels.

  2.  For each task, half of the test inputs used for grading the submitted
      programs will focus on "testing for correctness".  These inputs
      will be based on "small" cases only.  What is considered "small"
      will be stated explicitly in the task description.  These
      constraints will be referred to as 50%-constraints in this
      document, to distinguish them from the regular constraints (or
      100%-constraints) on inputs.
      
  3.  The other half of the test inputs will focus on "testing for
      efficiency". The "size" of these cases is chosen to distinguish
      the efficiency for a range of algorithm classes specific to that
      task.

Motivation
----------
Concerning the low overall scores, the ISC identified two main causes:

  1.  The set of competition tasks is too hard for most of the contestants.
  2.  The credit for correct-but-inefficient programs is too small.

The capabilities of the best contestants have increased considerably
over the years.  However, roughly 5% to 10% of the contestants seem to
lack the skills for scoring points, no matter how easy it is.  We do
not see ways to change the latter.  It is the responsibility of the
delegations to prepare their contestants for the IOI competition.

The first principle (range of difficulty levels) is intended to address
the first cause.  We explicitly avoid a selection of tasks that are
(almost) all (very) hard.  This follows the recommendation made in the
ISC Guidelines for IOI Competitions, but which was not enforced
sufficiently in the past.  

Some competition tasks should provide a challenge for the very best, so
that they can be recognized.  Other tasks are easier, so that less
capable contestants can accomplish them.  Tasks without algorithmic
challenge (with a "trivial" 100%-solution, involving a standard
text-book algorithm and some straightforward coding) will be avoided. 
This follows past practice.

The second principle (50% for correctness-only) is intended to address
the second cause.  Under this principle, a correct-but-inefficient
program can still obtain 50% of the full score.  In the past, such
programs would typically get no more than 10% to 20% of the full
score.  This principle acknowledges the effort that is required to
develop a correctly functioning program, even if its efficiency leaves
room for improvement.  This principle is the most prominent change
with respect to past IOIs.

The choice of 50% rather than some other percentage is a compromise.
Some have argued that it should be even higher (say 70%), some that it
should be lower (say 30%).  For simplicity in applying the principle, we
have set a uniform percentage at 50% for IOI 2004.

The third principle (50% for a __range__ of efficiencies) is intended to
distinguish algorithms with sufficiently different performance
characteristics.  These tests must not only be focused on establishing
that the program under test meets the 100%-constraints.  It allows
for less efficient programs to score additional points over 50% as
well.  This principle has been applied in the past, but we seek to
apply it more rigorously now.

The 50%-constraints will be stated explicitly in the task description.
There are several reasons for doing so:

  *  We do not want to have arguments over whether particular algorithms
     qualify as correct-but-inefficient.  Instead, this is
     now defined as the ability to solve test inputs within the
     50%-constraints.  In the end, all that matters is whether the
     program can solve certain test inputs.  The 50%-constraints
     characterize those test inputs.

  *  It enables contestants to make an informed trade-off.
  
  *  One can view the 50%-constraints as defining an easier version of
     the task.  A contestant has really accomplished something by
     writing a program that correctly solves all these cases.  This
     serves a pedagogical and psychological purpose.

No further details are provided about the characteristics of the test
inputs.  In particular, the distribution of the test cases for
distinguishing a range of efficiency levels is not explained in the task
description.  There are several reasons for this:

  *  It is often too complicated to describe these details.
  
  *  It might give too many hints on possible solutions to some
     contestants, and it might confuse other contestants.

The handouts with background information provided after the competition
will explain the choice of test data in more detail.

For each task, the 50%- and 100%-constraints, and the distribution of
test cases for efficiency have been carefully selected, by considering
various alternative approaches to solving the task.

Output-only tasks are treated separately when designing the input cases.
For one thing, no program is submitted, so there are no evaluation
runs.  Secondly, because the inputs are not secret, the contestants can
analyse the inputs during the contest and they know the exact
characteristics.

Example
-------
As a concrete example, consider the following hypothetical competition task.
The hypothetical competition machine can execute 10^9 instructions per second.

  TASK SORT
  
  An experiment generates a sequence of distinct integers.  For further
  analysis, this sequence needs to be sorted from low to high.
  
  Write a program that inputs a sequence and outputs the corresponding
  sorted sequence.

  INPUT FORMAT
  
  The first line on standard input contains integer N, the length of
  the sequence.  Each of the subsequent N input lines contains one integer
  of the sequence.

  OUTPUT FORMAT
  
  The sorted sequence is to be written to standard output.  The output must
  consist of N lines, where each line contains one integer.  The integers
  must appear in increasing order.

  EXAMPLE INPUT
  
    3
    1000000000
    0
    99
  
  EXAMPLE OUTPUT
  
    0
    99
    1000000000
  
  CONSTRAINTS

  All inputs will satisfy the following constraints:
  
    *  Length N of the sequence: 0 <= N <= 10^6
    *  Each integer i in the sequence: 0 <= i <= 10^9
    *  The sequence contains no duplicates.
  
  The time limit per run is 2 second.
  The limit on data memory per run is 5 MB.

  50% of the inputs will satisfy the following ADDITIONAL constraints:
  
    *  Length N of the sequence: N <= 10^3

  END OF TASK SORT

Note that the wording of this task is not what matters for this example.
Also note that sorting does not satisfy the novelty requirement for IOI
competition tasks.  We have chosen it precisely because it is a
well-known problem.

Since IOI'94, it has been the custom in IOI competitions that
quantitative constraints are imposed on the time and memory usage of the
algorithms to be designed and implemented.  The scientific committee
selects these constraints to provide interesting yet reasonable
algorithmic challenges, that can also be graded efficiently.

  ANALYSIS FOR TASK SORT
  
  Each input integer can be stored in 4 bytes.  Hence, the maximum-length
  sequence can be stored in 4 MB.  This still leaves 1 MB.
  
  The machine characteristics together with the task constraints imply
  that an O(N log N) in-situ sorting algorithm with low constant will do.
  It is not possible to store 2N integers (make a copy of the sequence)
  for large N.  It is also clear that an O(N^2) in-situ sorting algorithm
  is too slow.

  The 50%-constraints were chosen such that an O(N^2) sorting
  algorithm with low constant will suffice.  It could use more storage
  than needed for just the N input numbers (i.e., not in-situ).
  
  An O(N!) or worse algorithm (e.g. that enumerates all permutations of
  the input sequence and tests them for sortedness) will not pass all of
  the 50% tests for correctness.

  END OF ANALYSIS FOR TASK SORT

A weaker contestant, who is unable to come up with an O(N log N) in-situ
sorting algorithm, but who can design and implement an O(N^2) algorithm,
can still obtain 50% of the full score for this task.

However, the 50%-constraints are such that a O(N!) program will __not__
score 50%.  This is a deliberate choice in designing the task.  The
50%-constraints make explicit what still counts as a
correct-but-inefficient program.  If a program cannot solve the
50%-constrained inputs within the time and memory limits, then it is
judged to be an unacceptable solution for this task.

It was also a deliberate choice not to limit the range of integers
further in the 50%-constraints.  It is not the issue here whether this
was a good choice, but it illustrates that there are various trade-offs
to consider when selecting the 50%-constraints.  They are by no means
obvious.  That is why it is important to make the 50%-constraints
explicit in the task description.

Conclusion
----------
These principles were formulated to guide the preparations for the IOI
2004 competition only.  Their application to the preparation of future
IOIs after 2004 will depend on feedback from the IOI community, on
statistics and surveys for IOI 2004, and on further discussions.

In the future, the ISC Guidelines for IOI Competitions can be updated to
reflect the experience with these principles.


Appendix

STATISTICS
----------
For the past five IOIs, we list the lowest score to obtain a bronze,
silver, and gold medal (the bronze cutoff is approximately the median
score). The maximum score for all tasks together is 600 (this excludes
any non-task bonuses).

                         IOI 1999 2000 2001 2002 2003
                         --- ---- ---- ---- ---- ----
Lowest bronze medalist score  135  149  143  135  173.1 (approx. median score)
Lowest silver medalist score  234  310  266  226  257.7
Lowest gold   medalist score  340  450  378  296  350.9
Top score                     480  600  580  510  455.4

If available, we also list for each task the number and percentage of
contestants that scored 90% or more on that task.

   The threshold of 90% allows for a contestant to fail on 1 test case
   in 10 and still be counted as having "fully solved" the task, that is,  
   solved it __modulo a small mistake__.

   The classification as Easy, Medium, Hard is based on the following
   thresholds (which first appeared in the IOI 2002 competition survey):

   EASY  : > 40% of the contestants "fully solved" the task
   MEDIUM: >= 10% and <= 40% of the contestants "fully solved" the task
   HARD  : < 10% of the contestants "fully solved" the task

IOI 1999: Score per task per contestant no longer available.

IOI 2000 with 274 ranked contestants 

     |---palin---|--- car ---|--median---|---post ---|---walls---|---block---   
>=90%|  80 (30%) |  31 (11%) |  27 (10%) |  56 (21%) |  79 (29%) |  11  (4%)    
     |   MEDIUM  |   MEDIUM  |    HARD   |   MEDIUM  |   MEDIUM  |    HARD      

IOI 2001 with 267 ranked contestants

     |--twofive--|--mobiles--|--ioiwari--|---depot---|---score---|---double--   
>=90%|   1  (0%) |   3  (1%) |  79 (30%) |  18  (7%) |  39 (15%) |  54 (20%)    
     |   HARD    |   HARD    |   MEDIUM  |   HARD    |   MEDIUM  |   MEDIUM     

IOI 2002 with 275 ranked contestants
  Unable to obtain score per task per contestant; the following approximations
  are inferrred from diagrams in the IOI 2004 competition report (pp. 49, 81)

     |---frog ---|--utopia---|--- xor ---|---batch---|--- bus ---|---rods ---   
>=90%| 45 (16%)  |   8 (3%)  |   2 (1%)  |  12 (4%)  |   9 (3%)  |  20 (7%)     
     |   MEDIUM  |    HARD   |    HARD   |    HARD   |    HARD   |    HARD      

IOI 2003 with 265 ranked contestants

     |-maintain--|---code ---|--reverse--|---guess---|--robots---|-boundary--   
>=90%| 81 (31%)  |   4 (2%)  |   0 (0%)  |  14 (5%)  |  11 (4%)  |  12 (5%)     
     |   MEDIUM  |    HARD   |    HARD   |    HARD   |    HARD   |    HARD      

END OF STATISTICS

End of Document

