Changes during perfctr-2.5:
- New AoS counter state layout:
	struct perfctr_cpu_user_state {
		unsigned int cstatus;
		struct {
			unsigned int start;
			unsigned long long sum;
		} tsc;
		struct {
			unsigned int map;
			unsigned int start;
			unsigned long long sum;
		} pmc[18];
	};
  Suspend:
	Read map & start & sum, write sum. 16 bytes = 1/2 or 1/8 line.
        Can do TSC + one (P6) or seven (P4) PMCs in one cache lines.
  Resume:
	Read map, write start; skip sum. 1/2 or 1/8 line touched.

  Old SoA layout:
  Suspend: read 4+4+8 bytes, write 8 bytes. 3 diff lines touched.
  Resume: read 4, write 8. 2 diff lines touched.
- New control data layout for syscall iface (only): variable-length
  array of <attribute,value> pairs, skipping zero/unused fields.
  This allows for binary compatibility even if driver and user-space
  use different control structs, and allows x86 binaries on x86_64.
- New vperfctr ioctl to retrieve control.
- Add kernel resource manager to avoid oprofile/perfctr conflicts.
- Add /dev/perfctr ATTACH ioctl for vperfctrs, and allow access
  both that way and via /proc/pid/perfctr.
- Investigate the feasibility of adding back inheritance support.
- Make sure sched_setaffinity() and the kernel's own set_cpus_allowed()
  calls don't break perfctr on HT P4s.

Changes for perfctr-2.6:
- Kill /proc/<pid>/perfctr. It enlarges the interface between the
  kernel and the driver, and suffers from kernel version dependencies.
  It also makes a patch-less version of the driver impossible.
  Open vperfctrs via /dev/perfctr and create a private vperfctr_vfs for
  the vperfctr file descriptors. The code for this already exists in
  perfctr-3.1 (RIP) and perfctr-1.6.
- After /proc/<pid>/perfctr is gone, tidy up the vperfctr_vfs code by
  putting 2.2 code in separate files in a 2.2 subdirectory.
  (This is difficult to do right now with the ugly /proc/<pid>/code.)

Changes after perfctr-2.6:
- Implement a patch-less version of the driver. Insert a glue module
  that hooks into the kernel via code backpatching and symbol table
  information. Afterwards, the driver module proper can interface with
  the glue module for the kernel callbacks, IDT, and irq return path.
  This requires the /proc/<pid>/perfctr removal in perfctr-2.6.

Driver:
- When an overflown perfctr is reset, we should take into account
  how many events past 0 or 1 it is at.

Library:
- Add vperfctr_mmap() to libperfctr.c: the goal is to perform all
  accesses via the library, even for examples/perfex/.
- Implement gethrvtime(). Don't ever STOP the counters. To stop PMC
  updates, call CONTROL with tsc_on == 1 and nractrs == nrictrs == 0.
  The driver will continue sampling the TSC. Then gethrvtime() reduces
  to scaling the virtualised TSC with cpu_khz.
- Describe derived events in event_set.c.

Documentation:
- Write it :-(

Possible Changes:
- The P6 and P4 sub-models don't matter for the driver. Should the driver
  just export the major model and the cpuid, and let user-space figure
  out sub-model details?
- Access control mechanism for global-mode perfctrs?
- Interrupt support for global-mode perfctrs?
- Multiplexing support? PAPI seems to do fine w/o it.
- A "kernel profiling" mode which uses global-mode perfctrs in
  interrupt mode to profile the kernel?
- Buffer interrupts and signal user-space when buffer is nearly full?
